Prologue
FoundationWeek 0· 17 April 2026
On The Measure of Intelligence
François Chollet (Google), 2019
Presented by: Michele Vannucci📍 VU Amsterdam

We are going to get started with something “easier” but important, which will serve as the foundation and north star for our future discussions.

The context

Recently, the ARC-AGI-3 Benchmark has been released. It should be a reliable way to score the intelligence of an LLM on unseen tasks. The interesting aspect is that these tasks, which are simple 2D games, are easily solvable by a human player. On the other hand, state-of-the-art models can only win 0.2% of the games on the private dataset (https://arcprize.org/leaderboard).

The person behind the benchmark, François Chollet, has been famously pessimistic regarding the efficiency and intelligence of transformers or gradient descent-based models. More importantly, he has been an advocate for new effective benchmarks to measure the generalization capabilities of these models. The ARC-AGI 1 and 2 benchmarks were, in fact, only solved when new paradigms were introduced, such as CoT and RL-trained reasoning, respectively.

The reading 📄

To better understand the need for these benchmarks, which ultimately guide AI progress, I propose we read this paper Chollet wrote in 2019. Here he provides a new perspective on the oldest problem in AI: “how do we define intelligence?”, which is extended to the question “how do we measure it?”.

Link to the paper: https://arxiv.org/abs/1911.01547 Read Chapters I (~1h) and II.1 (~35min). There is no math in these parts and it reads smoothly. Feel free to read chapter II(it provides a more technical definition of intelligence) and III (introduces the ARC-AGI benchmark) fully, if you have time and interest.

Companion reading
Why the biggest AI models can't solve simple puzzles — Dwarkesh podcast, 2024

An animated discussion between François Chollet and Dwarkesh Patel. The first believes that current AI systems are not truly intelligent and that we need to rethink our approach to building them. Dwarkesh, instead is more optimistic about the potential of transformers and believes they only need to be scaled further.

Cycle 1 · Weeks 1–4
FoundationWeek 1· 21 April 2026
An Explanation of In-context Learning as Implicit Bayesian Inference
Xie, Raghunathan, Liang, Ma, 2021

Proposes a theoretical framework for why in-context learning works: transformers implicitly perform Bayesian inference over latent “concepts” during pre-training, then use the prompt to identify and condition on the relevant concept at inference time. A foundational lens for understanding the most surprising emergent capability of LLMs.

Companion reading
In-context Learning and Induction Heads — Olsson et al. (Anthropic), 2022

A mechanistic interpretability investigation into how in-context learning actually happens inside transformers, identifying 'induction heads' as a key circuit.

Alternative picks — vote in the group!
Alternative:Transformers Learn In-Context by Gradient Descent
Von Oswald, Niklasson, Randazzo et al., 2023
Assumed pre-read: Attention Is All You Need (Vaswani et al., 2017). We assume familiarity with the transformer architecture as a baseline.
State of the ArtWeek 2· 30 April 2026
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI, 2025
Presented by: Michele Vannucci📍 VU Amsterdam

Demonstrates how reinforcement learning (without supervised fine-tuning on chain-of-thought data) can elicit strong reasoning capabilities in LLMs. A key paper for understanding the current push toward reasoning-capable models.

Companion reading
New DeepSeek Research — The Future Is Here! — Two Minute Papers

An accessible video breakdown of DeepSeek's research and what it means for the field.

> — we are here — <
SafetyWeek 3· 7 May 2026
Emotions in LLMs (Parts 1 & 2)
Anthropic, 2026
📍 VU Amsterdam

Anthropic’s investigation into emotion-like representations and behaviors in large language models. Came up multiple times during our previous sessions, so we’re reading Parts 1 & 2 together.

Companion reading
The Adolescence of Technology — Dario Amodei

A bit unrelated with the paper this time, but coherent with the AI safety theme. A blog post by Anthropic's CEO discussing the risks of powerful AI.

Free TopicWeek 4· TBD
Scaling Laws for Neural Language Models
Kaplan, McCandlish, Henighan, Brown et al., 2020

Establishes power-law relationships between model performance and compute, dataset size, and parameter count. This paper fundamentally changed how labs think about training runs and resource allocation.

Companion reading
The Scaling Hypothesis — Gwern

A deep, opinionated synthesis of what scaling laws mean for the trajectory of AI.

Cycle 2 · Weeks 5–8
FoundationWeek 5· TBD
Reward is Enough
Silver, Singh, Precup, Sutton, 2021

A provocative hypothesis from DeepMind: that reward maximization in sufficiently rich environments is enough to give rise to all facets of intelligence — perception, language, social reasoning, and more. A direct descendant of the reinforcement learning tradition, and a great paper for debating what intelligence actually requires.

Companion reading
Reward is Not Enough — Alignment Forum

A thoughtful response arguing that reward maximization alone may not be sufficient for general intelligence.

State of the ArtWeek 6· TBD
Training Language Models to Follow Instructions with Human Feedback
Ouyang, Wu, Jiang, Almeida et al. (OpenAI), 2022

The InstructGPT paper that introduced RLHF at scale. This is the technique that turned base language models into useful assistants, and it remains the backbone of how frontier models are aligned today.

Companion reading
RLHF: Reinforcement Learning from Human Feedback — Chip Huyen

A clear, technically detailed walkthrough of the RLHF pipeline — from preference data collection to reward modeling to PPO.

SafetyWeek 7· TBD
Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training
Hubinger, Denison, Mu, Lambert et al. (Anthropic), 2024

Shows that LLMs can be trained to behave deceptively — appearing safe during evaluation while executing harmful behavior under specific triggers — and that standard safety training (RLHF, adversarial training) fails to remove this behavior. A sobering result for alignment research.

Companion reading
Sleeper Agents — Anthropic Blog

Anthropic's research summary with context on why deceptive alignment is a core concern.

Free TopicWeek 8· TBD
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dosovitskiy, Beyer, Kolesnikov et al. (Google Brain), 2020

The Vision Transformer (ViT) paper that proved the transformer architecture isn’t just for language — it works for vision too, with minimal domain-specific modifications. A pivotal moment: the beginning of architectural convergence across modalities, which paved the way for today’s multimodal models.

Companion reading
The Bitter Lesson — Rich Sutton, 2019

Sutton's famous essay arguing that general methods leveraging computation beat hand-engineered approaches. ViT is a perfect case study.