Ethan Zuckerman

Environmental Intelligence: Hello (AI-Assisted, Open-Source) World

Published Originally by Claire Gorman. According to GitHub’s CEO, the platform’s new AI-based coding assistant wrote over a billion lines of operational code over the course of 2023.

By Claire Gorman

According to GitHub’s CEO, the platform’s new AI-based coding assistant wrote over a billion lines of operational code over the course of 2023. Known as “Copilot,” the AI software is a tool that communicates the predictions of a large language model (LLM) called “Codex” developed in collaboration with OpenAI. By answering questions posed directly to a chat interface as well as by suggesting in-line completions to partially written programs, Copilot generated a billion lines of code deemed sufficiently effective by software developers to be inducted into real, active code repositories. 

While similar statistics with respect to lines of published poetry or love notes from a significant other might induce outrage on the behalf of creative integrity, artificially generated computer code violates no such standards of originality. In reality, most programming projects contain a mix of original code and lines grabbed from elsewhere. Across the industry, sharing and copying code improves efficiency in software development workflows; generative AI offers an additional boost to programmer productivity that could be directed towards improving efficiency from an energy use standpoint.

Traditionally, some of the borrowed code in a given project is considered “boilerplate,” which describes standardized setup operations that are copied and pasted across projects in order to structure files consistently. While boilerplate is typically established and repeated within an organization or repository, other segments of code might be adopted directly from the internet, most likely from StackOverflow threads addressing common questions or sub-problems, ranging from the straightforward (“how do I change the color of the points in my scatter plot?”) to the strategic (“what is the best approach to automatically detecting an object and determining its color in a 2D image, using Python?”). 

Extending far beyond the productive pilfering of a few lines here and there, programming as a whole is supported by a culture of information-sharing that has resulted in the production of plentiful “open-source” software. Initiated in the 1980s by MIT’s Richard Stallman in opposition to the inefficiencies of proprietary code protection, the open-source movement has constructed some of the most fundamental and widely-used infrastructures of computing in use today. These range from key background systems like the Linux operating system and Kubernetes container-management apparatus to applications as user-friendly as Mozilla Firefox. As major open-source systems become increasingly embedded in the software complex that supports not only individual projects but many of the world’s largest corporations, they are continuously updated by thousands of developers mobilized by nonprofit foundations.

While open-source software development relies on the contributions of many, usually unpaid, correspondents, it is no free-for-all: large open-source projects are supported by a process of peer-production, sharing, revision, and review. This process is common to collaborative coding projects across the industry, and it is essentially reproduced at micro-scale in the case of GitHub’s Copilot suggestions. In essence, programming involves assembling logical components and resolving constraints across multiple systems and contributors, as well as testing solutions to ensure proper functionality. These dynamics make programming much more similar to building a house of found materials or cooking a meal given an assortment of existing ingredients than recording an unprecedented thought.

The rapid introduction of LLMs into software development workflows represents the entry of a speedy new contributor to this collaboration between humans, other humans, and the machines they are trying to communicate with. Chatbots make coding faster for experienced programmers, who can pose specific questions to a chat interface rather than conducting searches online to augment their original software—although, some of the saved effort in this case is redirected towards testing AI-generated solutions to ensure they actually work. LLM-assisted programming also lowers the barrier for beginners, enabling inexperienced programmers to generate code faster, with the caveat that they run the risk of generating code they don’t fully understand, whose bugs they are unable to fix.

Despite the potential pitfalls, the changes brought forth by coding Chatbots will have a substantial impact on how beginners learn to code and how experts mass-produce it. In both cases, a process already underway through the original impacts of open-source culture has now accelerated: AI displaces the burden of learning away from memorization and language mastery, and onto the logical process of testing and debugging code. In other words, the human role in programming is transforming towards evaluating that code rather than generating it, refocusing towards the logical big picture and the fine detail of missing parentheses rather than the middle ground of pounding out a set of known commands.

As programming iterations become faster and programmers’ roles shift towards evaluation, the value system by which working code is identified comes to the fore. Typically, beyond a segment of code’s dependable performance on the task for which it is designed, the attributes associated with “good” code are brevity (in terms of number of lines in the file) and speed (in terms of computation time). Each value is aligned with improvements in the environmental impact of computing: concise codebases take up less space in data storage, and efficient code, by far more impactful where large AI models are concerned, requires less computation. Ultimately, in both cases, better code requires less energy. While the difference between efficient code and inefficient code—for example, a function that runs in 0.02 seconds rather than 0.2 seconds—can be almost imperceptible to a human user, they become critical when that code is run thousands or millions of times. Considering the 98 million new GitHub projects  started in 2023, small advancements in efficiency distributed across the massive field of software development can have a significant impact over time.

Of course, there is a separate environmental cost associated with querying the LLMs that underpin coding assistant bots, and energy savings from AI-assisted code may only emerge when the efficiency benefits have reached an ubiquitous scale. However, given the pace of generative AI implementation, such a scenario is fully and imminently possible. This potentiality is furthered by the structure of accountability for energy usage: while chatbots are generally offered for free or for a flat monthly fee (rather than a cost that scales per query), the cost of inefficiency is borne linearly by an end user or business, creating a financial incentive to deploy chatbots in the service of improving proprietary code. As LLMs make programming more accessible by capitalizing on the aggregation practices associated with open-source culture, it will be incumbent upon prompt-based software engineering to look beyond the convenience of reaching a workable solution and demand from these models all they can offer. If all this requires is the input “re-write my code to make it as efficient as possible,” we can and should reach for optimality every time.

Claire Gorman is a dual Masters student at MIT pursuing degrees in Environmental Planning and Computer Science. Her research interests include deep learning-based computer vision methods, remote sensing for ecological sustainability, and design as a mediator between science and society. Her bachelor’s degree is in Computer Science and Architecture, from Yale University.

Claire Gorman