Ethan Zuckerman

Few-shot Learning and AI beyond Code

Published Originally by Claire Gorman. As deep learning and artificial intelligence have improved dramatically, few-shot learning offers new forms of accessibility to AI models.

By Claire Gorman

The history of artificial intelligence (AI) research is traditionally narrated in terms of seasons: periods of interest and investment in AI are known as “summers,” while the droughts of enthusiasm in between them are known as “AI winters.” The present AI summer in which we find ourselves was incited by the publication of a model known as AlexNet in 2012. AlexNet is a neural network whose performance on an image classification task blew away previous approaches; its central innovation was the use of “depth,” that is, more layers, in the model architecture. The success of this approach to what we now call “deep learning” spurred a wave of depth-driven advancements in computer vision and other areas of machine learning, revealing a trend in which bigger models (with more layers and therefore more parameters) were found to perform better.

While algorithmic innovations and training data availability also play key roles in AI advancement, model size plays a primary role because it directly informs the complexity that can be captured in the model’s internal data representation. That representation captures meaning and relationships across the information the model has ingested, and it is the generalization of these relationships that allows the model to make predictions based on new inputs it hasn’t seen before. Larger models, which have more trainable parameters, are able to capture more nuanced representations of their training data, enabling them to achieve more complex prediction behavior.

Because they are big enough to internalize very high-level conceptual relationships, Large Language Models (LLMs) exhibit an exciting capability known as “few-shot learning.” Few-shot learning is possible for very sophisticated models whose internal representations are complex enough to learn new tasks based on a small handful of examples, without being re-trained. Related to other practices like fine-tuning pre-trained models with additional examples that help them specialize, or meta-learning algorithms that help models learn new tasks faster, few-shot learning is one recent advance in a larger research effort to access a wider range of model capabilities by decoupling learning from time-consuming training procedures.

For environmental professionals and the broader environmental public, few shot learning with LLMs represents a quantum leap. Unlike previous techniques for adapting pre-trained models to new tasks, which still required significant programming skills, chat-based LLM interfaces have made it possible to achieve new machine learning results manually, using plain-text instructions. This paradigm shift makes deep learning accessible to a new and extensive public audience without formal skills in traditional machine learning or data science. It may also materially change the landscape of model-building for everyone, shifting a significant portion of machine learning efforts towards few-shot learning on large existing models rather than training a new small model for each specialized task.

From this observation follows the proposition that few-shot learning enables the production of both leaner and greener machine learning applications. Relative to an equivalent set of smaller models each trained from scratch, these many slight adaptations of larger models will be leaner because they require less training data, and greener because they require less training time and computing power (therefore, less energy). To demonstrate the possibilities opened by this advancement, we can assess an example task from natural language processing.

Sentiment analysis—the determination of feeling or intent within a passage of text—is applicable within the environmental policy and planning field due to the rampant bias and politicization of climate change communication. It is also socially relevant to a broader environmental public who may be interested in quantifying the attitudes of large and small communities towards timely topics ranging from local elections to extreme weather events.

Especially on social media, where tensions run high and complex linguistic phenomena such as sarcasm appear frequently, sentiment analysis has historically been a challenging but useful means of evaluating public opinion at scale. Two scientific papers on this topic can illustrate that precedent: “Challenges of Evaluating Sentiment Analysis Tools on Social Media" (Maynard and Bontcheva 2016) and “Climate Change Sentiment Analysis Using Lexicon, Machine Learning and Hybrid Approaches” (Sham and Mohamed 2022). The second paper applies a set of traditional (statistical and machine learning) sentiment analysis approaches to measure the attitudes present in a dataset of climate change-related tweets provided by the first paper. Sham and Mohamed’s objective of classifying text into “positive,” “negative,” and “neutral” tones would seem straightforward, but understanding their paper requires substantial familiarity with natural language processing methods like featurization and lemmatization. These prerequisites make it appropriate for their peers in research but inaccessible to a public audience.

Two years after this paper was published, few-shot learning with ChatGPT is able to yield equivalent or better results on a toy example of the exact same task, with no additional training (nor featurization or lemmatization of the text) at all. To engage the GPT-3.5 model’s few-shot learning ability, I prompted the chat interface with three examples from the tweet dataset published by Maynard and Bontcheva, including both the tweet text and the correct sentiment classification. Based on only these few examples provided in plain text, the model was able to continue classifying new examples from the same dataset very successfully (achieving 90% accuracy in the 10 examples I tested). Of course, a test set this small is not robust enough to provide scientifically credible results, but the object lesson remains: having written no code and applied no technical natural language processing methods, few-shot learning enabled me to mimic the study effectively.

While the few-shot learning approach does sacrifice interpretability—that is, relative to the statistical approaches Sham and Mohamed use, we know much less about what the model is doing behind the scenes to classify the text—it offers a fantastic jump in accessibility. What was until very recently a data processing task only technically feasible for trained experts is now possible for any interested individual with a dataset.

As in the context of AI-assisted coding and other tasks that apply complex foundation models, the human role here is given an opportunity to evolve away from domain expertise and towards critical evaluation of results. That is, by using few-shot learning, the labor associated with getting machine learning results is far diminished, leaving more space for the consideration of what those results mean and how to use them. Perhaps with powerful machine learning tools now in the hands of the public, it will be possible to foreground local knowledge and the intuition of lived experience in these processes as the barriers of technical expertise subside.

Claire Gorman

Claire Gorman is a dual Masters student at MIT pursuing degrees in Environmental Planning and Computer Science. Her research interests include deep learning-based computer vision methods, remote sensing for ecological sustainability, and design as a mediator between science and society. Her bachelor’s degree is in Computer Science and Architecture, from Yale University.