Table of Contents
Softmax Attention and Nearest Neighbor Search
Implications and Further Research
Conclusion
Acknowledgment
Home Technology peripherals AI The Math Behind In-Context Learning

The Math Behind In-Context Learning

Feb 26, 2025 am 12:03 AM

In-context learning (ICL), a key feature of modern large language models (LLMs), allows transformers to adapt based on examples within the input prompt. Few-shot prompting, using several task examples, effectively demonstrates the desired behavior. But how do transformers achieve this adaptation? This article explores potential mechanisms behind ICL.

The Math Behind In-Context Learning

The core of ICL is: given example pairs ((x,y)), can attention mechanisms learn an algorithm to map new queries (x) to their outputs (y)?

The softmax attention formula is:

The Math Behind In-Context Learning

Introducing an inverse temperature parameter, c, modifies the attention allocation:

The Math Behind In-Context Learning

As c approaches infinity, attention becomes a one-hot vector, focusing solely on the most similar token – effectively a nearest neighbor search. With finite c, attention resembles Gaussian kernel smoothing. This suggests ICL might implement a nearest neighbor algorithm on input-output pairs.

Implications and Further Research

Understanding how transformers learn algorithms (like nearest neighbor) opens doors for AutoML. Hollmann et al. demonstrated training a transformer on synthetic datasets to learn the entire AutoML pipeline, predicting optimal models and hyperparameters from new data in a single pass.

Anthropic's 2022 research suggests "induction heads" as a mechanism. These pairs of attention heads copy and complete patterns; for example, given "...A, B...A", they predict "B" based on prior context.

Recent studies (Garg et al. 2022, Oswald et al. 2023) link transformers' ICL to gradient descent. Linear attention, omitting the softmax operation:

The Math Behind In-Context Learning

Resembles preconditioned gradient descent (PGD):

The Math Behind In-Context Learning

One layer of linear attention performs one PGD step.

Conclusion

Attention mechanisms can implement learning algorithms, enabling ICL by learning from demonstration pairs. While the interplay of multiple attention layers and MLPs is complex, research sheds light on ICL's mechanics. This article offers a high-level overview of these insights.

Further Reading:

  • In-context Learning and Induction Heads
  • What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
  • Transformers Learn In-Context by Gradient Descent
  • Transformers learn to implement preconditioned gradient descent for in-context learning

Acknowledgment

This article is inspired by Fall 2024 graduate coursework at the University of Michigan. Any errors are solely the author's.

The above is the detailed content of The Math Behind In-Context Learning. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Elon Musk's Self-Driving Tesla Lies Are Finally Catching Up To Him Elon Musk's Self-Driving Tesla Lies Are Finally Catching Up To Him Aug 21, 2025 pm 04:51 PM

Nine years ago, Elon Musk stood before reporters and declared that Tesla was making a daring leap into the future—equipping every new electric vehicle with the complete hardware necessary for full self-driving capability.“All Teslas produced from thi

Are Browsers Key To An Agentic AI Future? Opera, Perplexity Think So Are Browsers Key To An Agentic AI Future? Opera, Perplexity Think So Aug 17, 2025 pm 03:45 PM

Why is Perplexity so determined to acquire a web browser? The answer might lie in a fundamental shift on the horizon: the rise of the agentic AI internet — and browsers could be at the heart of it.I recently spoke with Henrik Lexow, senior product le

Fear Of Super Intelligent AI Is Driving Harvard And MIT Students To Drop Out Fear Of Super Intelligent AI Is Driving Harvard And MIT Students To Drop Out Aug 07, 2025 am 11:39 AM

Now she’s taking a permanent leave of absence, gripped by fear that the arrival of “artificial general intelligence”—a theoretical form of AI capable of matching or exceeding human performance across countless domains—could lead to the collapse of ci

AI Agent Types – And Memory AI Agent Types – And Memory Aug 17, 2025 pm 06:27 PM

As the conversation around AI agents continues to evolve between businesses and individuals, one central theme stands out: not all AI agents are created equal. There’s a wide spectrum—from basic, rule-driven systems to highly advanced, adaptive model

Why Nvidia Earnings Matter More To Markets Than What The Fed Chair Says Why Nvidia Earnings Matter More To Markets Than What The Fed Chair Says Aug 22, 2025 pm 06:51 PM

Why is Nvidia’s upcoming earnings report drawing more attention than the Federal Reserve Chair’s speech? The answer lies in growing investor anxiety over the actual returns from massive corporate investments in artificial intelligence. While Powell’s

The Prototype: AI Tools May Degrade Doctors' Skills The Prototype: AI Tools May Degrade Doctors' Skills Aug 16, 2025 pm 07:09 PM

A new study in The Lancet investigated how using AI during colonoscopies affects doctors' diagnostic abilities. Researchers assessed physicians’ skill in identifying specific abnormalities over three months without AI, then re-evaluated them after th

Is The AI Bubble Bursting? Lessons From The Dot-Com Era Is The AI Bubble Bursting? Lessons From The Dot-Com Era Aug 22, 2025 pm 06:39 PM

The AI Bubble And The Dot-com Era There are growing concerns. The so-called “Magnificent Seven” — Alphabet, Amazon, Apple, Meta, Microsoft, Nvidia, and Tesla — now represent over a third of the S&P 500’s total value, with much of their recent su

What Does OpenAI's GPT-5 Mean In The Race For AI Model Supremacy? What Does OpenAI's GPT-5 Mean In The Race For AI Model Supremacy? Aug 12, 2025 pm 06:12 PM

As OpenAI CEO Sam Altman puts it, GPT‑5 is “a significant step” toward AGI and is “the smartest, fastest, most useful model yet.” He compares the jump from GPT-4 to GPT-5 to moving from a college graduate to a “PhD-level expert.” The model’s release

See all articles