Roadmap
A prologue week plus eight weeks of readings, two full cycles through our four themes.
FoundationWeek 0· 17 April 2026On The Measure of IntelligenceFrançois Chollet (Google), 2019
Presented by: Michele Vannucci📍 VU Amsterdam
We are going to get started with something “easier” but important, which will serve as the foundation and north star for our future discussions.
The context
Recently, the ARC-AGI-3 Benchmark has been released. It should be a reliable way to score the intelligence of an LLM on unseen tasks. The interesting aspect is that these tasks, which are simple 2D games, are easily solvable by a human player. On the other hand, state-of-the-art models can only win 0.2% of the games on the private dataset (https://arcprize.org/leaderboard).
The person behind the benchmark, François Chollet, has been famously pessimistic regarding the efficiency and intelligence of transformers or gradient descent-based models. More importantly, he has been an advocate for new effective benchmarks to measure the generalization capabilities of these models. The ARC-AGI 1 and 2 benchmarks were, in fact, only solved when new paradigms were introduced, such as CoT and RL-trained reasoning, respectively.
The reading 📄
To better understand the need for these benchmarks, which ultimately guide AI progress, I propose we read this paper Chollet wrote in 2019. Here he provides a new perspective on the oldest problem in AI: “how do we define intelligence?”, which is extended to the question “how do we measure it?”.
Link to the paper: https://arxiv.org/abs/1911.01547 Read Chapters I (~1h) and II.1 (~35min). There is no math in these parts and it reads smoothly. Feel free to read chapter II(it provides a more technical definition of intelligence) and III (introduces the ARC-AGI benchmark) fully, if you have time and interest.
An animated discussion between François Chollet and Dwarkesh Patel. The first believes that current AI systems are not truly intelligent and that we need to rethink our approach to building them. Dwarkesh, instead is more optimistic about the potential of transformers and believes they only need to be scaled further.
FoundationWeek 1· 21 April 2026An Explanation of In-context Learning as Implicit Bayesian InferenceXie, Raghunathan, Liang, Ma, 2021
Proposes a theoretical framework for why in-context learning works: transformers implicitly perform Bayesian inference over latent “concepts” during pre-training, then use the prompt to identify and condition on the relevant concept at inference time. A foundational lens for understanding the most surprising emergent capability of LLMs.
A mechanistic interpretability investigation into how in-context learning actually happens inside transformers, identifying 'induction heads' as a key circuit.
State of the ArtWeek 2· 30 April 2026
Presented by: Michele Vannucci📍 VU Amsterdam
Demonstrates how reinforcement learning (without supervised fine-tuning on chain-of-thought data) can elicit strong reasoning capabilities in LLMs. A key paper for understanding the current push toward reasoning-capable models.
An accessible video breakdown of DeepSeek's research and what it means for the field.
SafetyWeek 3· 7 May 2026Emotions in LLMs (Parts 1 & 2)Anthropic, 2026📍 VU Amsterdam
Anthropic’s investigation into emotion-like representations and behaviors in large language models. Came up multiple times during our previous sessions, so we’re reading Parts 1 & 2 together.
A bit unrelated with the paper this time, but coherent with the AI safety theme. A blog post by Anthropic's CEO discussing the risks of powerful AI.
Free TopicWeek 4· TBDScaling Laws for Neural Language ModelsKaplan, McCandlish, Henighan, Brown et al., 2020
Establishes power-law relationships between model performance and compute, dataset size, and parameter count. This paper fundamentally changed how labs think about training runs and resource allocation.
A deep, opinionated synthesis of what scaling laws mean for the trajectory of AI.
FoundationWeek 5· TBDReward is EnoughSilver, Singh, Precup, Sutton, 2021
A provocative hypothesis from DeepMind: that reward maximization in sufficiently rich environments is enough to give rise to all facets of intelligence — perception, language, social reasoning, and more. A direct descendant of the reinforcement learning tradition, and a great paper for debating what intelligence actually requires.
A thoughtful response arguing that reward maximization alone may not be sufficient for general intelligence.
State of the ArtWeek 6· TBDTraining Language Models to Follow Instructions with Human FeedbackOuyang, Wu, Jiang, Almeida et al. (OpenAI), 2022
The InstructGPT paper that introduced RLHF at scale. This is the technique that turned base language models into useful assistants, and it remains the backbone of how frontier models are aligned today.
A clear, technically detailed walkthrough of the RLHF pipeline — from preference data collection to reward modeling to PPO.
SafetyWeek 7· TBDSleeper Agents: Training Deceptive LLMs That Persist Through Safety TrainingHubinger, Denison, Mu, Lambert et al. (Anthropic), 2024
Shows that LLMs can be trained to behave deceptively — appearing safe during evaluation while executing harmful behavior under specific triggers — and that standard safety training (RLHF, adversarial training) fails to remove this behavior. A sobering result for alignment research.
Anthropic's research summary with context on why deceptive alignment is a core concern.
Free TopicWeek 8· TBDAn Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleDosovitskiy, Beyer, Kolesnikov et al. (Google Brain), 2020
The Vision Transformer (ViT) paper that proved the transformer architecture isn’t just for language — it works for vision too, with minimal domain-specific modifications. A pivotal moment: the beginning of architectural convergence across modalities, which paved the way for today’s multimodal models.
Sutton's famous essay arguing that general methods leveraging computation beat hand-engineered approaches. ViT is a perfect case study.