Mark Jaquith

GitHub Copilot: A Virtuous Circle

July 11, 2022

Copilot has utterly changed how I write code. I can never go back. Even if Copilot and anything else like it were to disappear, I would still be a better coder for having used it.

I began using GitHub’s Copilot service in beta form one year ago, in July of 2021. Promised as “your AI pair programmer”, Copilot integrates with several popular code editors and offers advanced code completion suggestions and code generation, based on a machine learning model that was trained on a mountain of publicly available code. Initial pricing is $10 USD per month, but it is free for “verified students and maintainers of popular open source projects” (which means I haven’t paid anything to continue using it).

At first, I was only playing around, seeing what silly code I could get it to generate.

In a WordPress plugin, I wrote this code comment:

/**
 * Reverse the post content on Thursdays.
 */

Copilot generated this function, incrementally, as I pressed the tab key:

/**
 * Reverse the post content on Thursdays.
 */
add_filter('the_content', function($content) {
	global $post;
	if (date('D') == 'Thu') {
		$content = strrev($content);
	}

	return $content;
});

It’s not perfect. There was no need for global $post; to be there. But that unnecessary line doesn’t hurt, and Copilot did actually do what I asked — an absurd request that surely no one has ever coded before.

Copilot is not magic. It doesn’t write entire applications for you, and it doesn’t know how to make architectural decisions. What it does do, it does very well.

Pattern completion.
Common manipulations.
Documentation.
Syntax and parameters.
Keeping you moving.

Pattern Completion

There are a few common ways to create an array of US states. You could look for a repository with the list, you could type it yourself, or you could search StackOverflow for the answer.

With Copilot, the solution is obvious, fast, and entirely contained inside your code editor. All you have to do is get the ball rolling. You could write this:

const usStateNames = [

Or you could write:

// Array of US state names.

Copilot will see what you’re going for, and finish the array for you.

This pattern matching will also work for continuing patterns you are crafting in your code.

const cat = 'cat'
const dog = 'dog'
const bird = 'bird'
const mouse = 'mouse'

announceAnimal(cat)

What comes next? You probably have a guess, and so does Copilot:

announceAnimal(dog)
announceAnimal(bird)
announceAnimal(mouse)

Copilot has no idea what announceAnimal does (neither do I). It just saw the pattern, and predicted where it might be going.

This sort of thing works on a larger scale, when you write entire functions that are similar in pattern, like getters and setters. It can even work for more complicated functions, where your code shows obvious signs of a repeated pattern. I am frequently astounded at how well Copilot is recognizing larger patterns that are entirely of my creation.

The overwhelming feeling I get when using Copilot is that I are pair coding with a competent junior programmer. Every time I pause, they say “yes, I see where you’re going with this”. And then they finish my code.

Common Manipulations

As a coder, you have to perform endless rote tasks like filtering out null values, or splitting a string into an array of words, or writing a regex to replace spaces with dashes. For these things, Copilot acts like a friend who is looking through StackOverflow (or your own previous code) and shouting out the answer. Even when I know a regex or string manipulation by heart, Copilot can generate it faster and with less effort from me.

const person = 'Mark Jaquith'

// Log the name as Last Name, First Name.

↓

console.log(`${person.split(' ').reverse().join(', ')}.`)

You don’t realize how much time you’ve been wasting looking up or recreating these trivial snippets — but it’s a lot.

Documentation

It can be very powerful to write a line of documentation first, and have Copilot infer the code from that. Other times, you’ll be in the coding flow, and Copilot can do the inverse: generate documentation from surrounding code.

function splitOnChar(char) {
	// Return a {I left my cursor here}
	return (str) => str.split(char)
}

↓

function splitOnChar(char) {
	// Return a function that splits a string on the given character.
	return (str) => str.split(char)
}

That code comment is more or less exactly what I would have written. It’s uncanny how well Copilot can finish my thoughts.

Syntax and Parameters

I’ve been coding PHP since 2003, but my brain cannot remember PHP’s woefully inconsistent parameter ordering. array_map() takes callable, array, but array_filter() takes array, callable. Madness.

Now, instead of having to look it up (as I literally just did to write this), or having to devote brain space to memorizing this mistake of history, I can let Copilot take over. Even better than an IDE suggesting types, more often than not Copilot will contextualize the variables I’m using, sense my intent, and suggest the correct variable and a reasonable callback.

Keeping You Moving

Coding inspiration can come in fits and bursts. Sometimes you’ll suddenly visualize the entire flow of a function and type it out as fast as your fingers can move. Other times, you’ll get distracted, become stuck, or start to slow down. Copilot takes a small moment to work and begin delivering suggestions, so right when you start to lose momentum, Copilot will nudge you forward. Much like a real copilot would do if the captain froze and stared blankly into the distance while going through their landing checklist.

The Virtuous Circle

Copilot enables a virtuous circle. The better it works, the better I work. And the better I work, the better it works. These concrete and helpful augmentations would be noteworthy if they merely made me a faster coder. But they add up to so much more.

More of my coding time is now spent doing the fun and interesting parts — the parts that give me satisfaction and pride in my work. Moreover, now that I have a feel for how Copilot can be most helpful, I craft my code in ways that will maximize its helpfulness. The better I name variables, the better Copilot understands. The more I break my code up into small functions and methods that do well-defined tasks, the better Copilot understands. The better Copilot understands, the better it helps. The better it helps, the more my work becomes about designing the structure and visualizing the solutions to problems, and less about trudging through a tangle of curly braces or getting lost in a tedious Google search.

These improvements to my code aren’t just in service to feeding that virtuous circle and making Copilot more effective. They are fundamentally better coding practices that make my code easier for humans (including myself) to understand and maintain. It’s wild, but my reliance on Copilot to automate some aspects of my coding has legitimately improved the quality of the code I write.

We Don’t Serve Their Kind Here

Copilot has generated some controversy because of how it was trained and because of the copyright implications of the code it generates. The first argument is that it is fraught or sketchy for GitHub (Microsoft) to create a for-profit service that was trained on open source code. I understand the sentiment here, but I think it is misguided. No one would object to a human learning how to code from reading open source code and then going on to write proprietary code for a client or a giant corporation. Even if the code they learned from was licensed under a “copyleft” license like the GPL, there is no expectation that coding abilities learned while studying this code should be restricted in any way. The spirit of open source is explicitly in favor of education and openness; a machine-learning model being trained on this open source code is largely similar to a human doing the same. It’s not reasonable or practical to suggest that for-profit AI services cannot learn from publicly available code, while for-profit humans can.

Copyright Undefined

The second objection has much more merit. When Copilot generates code, there is a small chance that it might generate non-trivial snippets of public code. These non-trivial snippets could introduce copyright and licensing issues. For instance, if the snippet comes from a project with a copyleft license, but the current project is not copyleft, it could be considered infected by that code. Even if the snippet comes from something under a permissive license such as MIT or BSD, no copyright information would be available, which would put the software in violation of those licenses’ attribution clauses. Indeed, the person accepting this Copilot suggestion would have no idea that this piece of code existed elsewhere.

What’s clear to me is that this scenario happens infrequently. Overwhelmingly Copilot generates novel code. Yes, I have seen demonstrations of Copilot copying entire functions, like the famous Quake III “fast inverse square root” function, but these demonstrations required the human to prompt Copilot with enough context to make it clear that they wanted Copilot to reproduce that specific and extremly prevalent snippet. And it required the human to accept each line of the code in turn. In real world use, the chances of more than coincidental or trivial code overlap are slim.

Nevertheless, events with low probabilities by definition do sometimes happen, so I do think copyright is a real problem that GitHub should focus on addressing comprehensively. For now, they have a failsafe setting that will block all code suggestions that match public code.

GitHub Copilot options screen reads: Suggestions matching public code. GitHub Copilot can allow or block suggestions matching public code. See GitHub Copilot FAQ to learn more.

In the future, perhaps Copilot could generate comments that contextualize the generated code in cases of material overlap, and allow for fine-grained control of what sorts of publicly-matching snippets are allowed.

For now, expect many larger corporations to institute policies that forbid AI code completion. Their lawyers are just doing their job by cautiously reducing potential liability. I believe GitHub and others in this space can overcome these reasonable concerns with settings and policies that control and monitor AI code completion organization-wide. The difference between telling a computer not to copy code without proper attribution and licensing compatibility versus telling a human the same thing is that the computer will actually listen!

What Next

Does the dawn of AI code assistance mean we are entering into a world where neophytes can code as well as someone with years of training? Not even close. That’s like asking if industrial construction tools mean that anyone can make a safe and functional building. But I do think that beginners will be able to get up to speed faster, as AI code completion will mean they spend less time wrestling with syntax quirks and other such nonsense. I’m hopeful that AI code completion might even open up programming to more people, who might otherwise have given up because of the unforgiving learning curve and the massive amount of free time required to become competent.

As ever, the coders of the future will be judged not by how many lines of code they generate in a day, or how many times they resort to consulting StackOverflow. They’ll be judged by how well they listen to and empathize with the humans who use their software, and how robustly and thoughtfully they architect their solutions.

And if they use AI coding assistance, they’ll have a lot more energy to direct towards those goals.