OpenAI’s Chief Scientist on Continual Learning Hype, RL Beyond Code, & Future Alignment Directions

Unsupervised Learning 58min 8 min #63
OpenAI’s Chief Scientist on Continual Learning Hype, RL Beyond Code, & Future Alignment Directions
Watch on YouTube

Summary

  • Ilya Sutskever, OpenAI’s chief scientist, discusses the state of AI progress, research priorities, and alignment challenges in a wide-ranging conversation. He sees current trends — especially the explosive growth of coding tools like Codex and breakthroughs in math and physics — as strong signals that OpenAI is on track toward its stated goals of research-intern-level AI by late 2026 and a more fully automated AI researcher by March 2028. The conversation covers how reinforcement learning might extend beyond verifiable domains, how companies should think about investing in RL, the future of AI for science, and why chain-of-thought monitoring gives him cautious optimism about alignment.

Model Progress and Research Intern Timelines

  • OpenAI previously stated goals of achieving research-intern-level AI capabilities by September 2026 and a more fully automated AI researcher by March 2028.
  • Four months in, Ilya feels the timelines remain on track, citing two key developments:
    • Explosive growth of coding tools: Codex is now used for the majority of actual coding at OpenAI, fundamentally changing how programming is done internally.
    • Math and physics breakthroughs: Models are now solving IMO-level problems (including the famously hard Problem 6) and making progress in research-level mathematics, which Ilya sees as a strong general benchmark for reasoning capability.
  • The distinction between a “research intern” and a “full automated researcher” is about time horizon and task specificity:
    • A research intern can execute specific technical ideas autonomously but needs well-scoped tasks.
    • A full automated researcher would handle open-ended goals like “improve the model’s reasoning” or “solve alignment” — Ilya does not expect this by 2026.
  • He expects the evolution from current Codex capabilities toward greater autonomy and longer-running tasks to be gradual but steady.

Math as a North Star — and the Shift Toward Real-World Utility

  • Math has served as OpenAI’s primary benchmark for reasoning progress because it is:
    • Highly measurable: It is far easier to verify a math proof than to evaluate whether a piece of software is good.
    • Arbitrarily hard: Problems can range from trivial to research-level, providing a clear gradient of difficulty.
  • With models now reaching IMO gold-medal level, the utility of math as a sole North Star is diminishing.
  • OpenAI is shifting focus toward benchmarks that reflect real-world economic value and scientific applicability, including applied sciences, not just pure math and code.
  • Ilya notes that mathematical reasoning ability transfers to AI research — many of OpenAI’s best researchers come from theoretical and mathematical backgrounds.

RL Beyond Verifiable Tasks

  • A central open question is whether reinforcement learning (RL) can produce the same dramatic improvements in domains like medicine, law, and finance as it has in math and code.
  • Ilya frames the challenge as a duality between hard-to-evaluate tasks and long-horizon tasks:
    • Even in math and code, if a problem takes a year to solve, knowing what to do on day one is itself an open-ended problem.
    • Tasks that are hard to evaluate share structural similarities with tasks that require long chains of reasoning.
  • He is optimistic that RL can scale to more general domains, citing encouraging early signs.
  • A key mechanism for progress on longer-horizon tasks is the model’s ability to recognize and evaluate partial progress:
    • Even without RL, pre-training improvements are making models more consistent and better at identifying what good intermediate artifacts look like.
    • RL can then build on this foundation to further extend effective horizon length.
  • On whether companies should invest in their own RL pipelines:
    • RL is a very data-efficient way to improve model performance on specific tasks.
    • However, in-context learning (prompting with examples and instructions) is also improving rapidly and may become the more practical path for most companies.
    • Ilya’s advice: companies should still do the work of defining evaluations and gathering data, but it may turn out that feeding this into context is more effective than training custom models.

Harness Evolution and General-Purpose Tooling

  • Companies often ask whether they need to build custom harnesses (tool interfaces) for their domains or can rely on general-purpose ones like Codex.
  • Ilya believes general harnesses will become increasingly capable and domain-specific harnesses should not be a long-term limitation.
    • He notes that Codex already works reasonably well for tasks beyond coding.
    • Over time, models will be able to navigate diverse tool sets and adapt to domain-specific interfaces.
  • The long-term vision is that AI should “meet you where you are” — plugging into existing tools like Slack, email, and enterprise software — rather than requiring humans to adapt to the AI’s limitations.
    • Where the AI has genuinely new abilities, new interfaces will emerge, but the default should be integration into existing workflows.

Compute Allocation and Research Organization

  • OpenAI has massive compute resources and must allocate across pre-training, RL, and exploratory research.
  • Their discipline is to explicitly budget large chunks of compute for the most scalable methods — even if this is not the most efficient allocation at any given moment.
    • The risk of not doing this is partitioning compute across too many small experiments and failing to drive the most important directions forward.
  • They apply a form of regularization: favoring methods they understand, expect to scale, and can build on in the future, over one-off experiments.
  • This means some low-hanging fruit is intentionally left on the table in favor of finding and scaling the most promising research directions.

The Continual Learning Debate

  • There has been significant buzz around continual learning, with new labs forming and researchers leaving OpenAI to focus on it.
  • Ilya is somewhat confused by the framing, because in his view:
    • The entire GPT scaling thesis has been premised on models’ ability to do in-context (continual) learning — “learning to learn” from tokens in the context window.
    • RL is specifically used to make in-context learning more efficient.
    • So continual learning is not a departure from the current path — it is the path.
  • He believes continued scaling of pre-training and RL is the single best route to improving continual learning capabilities.
  • For practitioners struggling with long-horizon tasks (e.g., 100+ step workflows), Ilya notes that many failures are due to insufficient context or tool access rather than fundamental model limitations — connecting models to the right systems and feeding them enough context solves more problems than people realize.

AI for Science

  • OpenAI’s work on the First Proof challenge was a significant milestone:
    • Mathematicians and theoretical computer scientists released unpublished problems representative of their day-to-day research.
    • A model being trained at the time (prompted by hand by researcher James Lee) solved several of these problems in about an hour — ideas Ilya said he would have been proud to develop over a week or two during his PhD.
    • This produced a visceral sense of urgency about the pace of progress.
  • On the criticism that AI solutions feel like “19th century mathematics” — brute-force and computation-heavy rather than elegant:
    • Ilya is not concerned. Models can produce vastly more reasoning tokens per unit time than humans, so a brute-force approach is expected at this stage.
    • He does not expect this to be a permanent feature; elegance should emerge as models improve.
  • On whether models are merely “pattern matchers” incapable of genuine scientific novelty:
    • Ilya points to AlphaGo and AlphaZero as counterexamples from 2016-2017 — they discovered new strategies that surprised human experts.
    • The basic principle is similar: scaled pattern matching in a well-defined environment produces novel insights.
    • He acknowledges models will have deficiencies for a while but believes they are genuinely capable of discovering new things.
  • On AI for science across different domains:
    • Ilya expects the most progress in fields where models can design and interpret experiments — even without a physical body, models connected to lab infrastructure can drive enormous throughput.
    • He envisions a human-AI collaboration model where AI drives experimental design and ideation while humans work in the loop.
    • For some specialized tasks (e.g., protein folding, molecular modeling), dedicated architectures may be more efficient than LLMs, but these will ultimately be paired with a core LLM “researist” that orchestrates the overall process.
    • He draws a parallel to the Codex harness discussion: build around the technology’s capabilities, not its limitations.

Chain of Thought Monitoring and Alignment

  • Ilya’s team developed chain-of-thought (CoT) monitoring as an alignment tool, based on a key insight:
    • Reasoning models are not directly supervised on their chain of thought — they are only rewarded for producing correct outputs.
    • This means the CoT is not optimized to be polite, sycophantic, or deceptive. It more faithfully reflects the model’s actual reasoning process.
    • This is analogous to mechanistic interpretability, where unsuperposed activations reveal inner workings — but CoT has the advantage of being in English, making it far more interpretable.
  • This is why Ilya strongly advocated for hiding the chain of thought in product releases:
    • If CoT were shown to users, there would eventually be pressure to train it to look good, which would destroy its value as an honest signal.
    • He sees this as essential for understanding model motivations and generalization as models become more capable and work autonomously for longer periods.
  • CoT monitoring is not a complete solution to alignment, but it is a valuable tool in the toolbox:
    • It has already enabled important research, such as the Anthropic collaboration on model scheming (detecting hidden objectives models may develop).
    • It could inform mitigation strategies like changing pre-training data or developing inoculation techniques.
  • On alignment more broadly:
    • Ilya’s core concern is generalization: models can be trained to behave well in-distribution, but what happens when they face novel situations, become much smarter, or encounter scenarios they weren’t trained for?
    • A key research direction is understanding how models fall back on pre-training data when generalizing to new situations.
    • His optimism about finding a research path to alignment has increased over the past few years, as the problem has become more concrete and technical.
    • However, his timelines to very capable models have shortened, and he believes the industry must be prepared to slow down development if alignment concerns warrant it.
  • There is some alignment collaboration across major labs (OpenAI, Anthropic, DeepMind), driven by shared interest in these topics.

Inside OpenAI

  • On balancing long-term research with competitive urgency:
    • Ilya still prioritizes high-quality experiments, honesty about results, and giving researchers space to think long-term.
    • What has changed is the urgency to bring the most promising research to fruition and deploy models that are now economically transformative.
  • On OpenAI’s evolution:
    • 2017-2018: Felt like an academic lab pursuing many ideas without a strong scaling focus.
    • GPT era: Shifted to buying large compute and developing the science and infrastructure of scaling.
    • ChatGPT era: A surprise that text models were the first big consumer product (Ilya expected video/generative media to come first). This created tension between the popular product and the longer-term research vision.
    • Now: Entering a phase where models are genuinely economically transformative, and the focus is on deployment and real-world impact.

Societal Implications

  • Ilya sees two under-appreciated challenges:
    • Automation of intellectual work: When tasks that were previously valuable and expensive become cheap, there are serious questions about jobs, wealth concentration, and economic disruption. He suspects this requires real policymaker involvement and does not have obvious solutions.
    • Concentration of power: An automated research lab or company can be controlled by a very small number of people yet be enormously powerful. This raises new governance questions that society has not grappled with before — even before robots enter the picture.
  • On raising the next generation:
    • Ilya recently had a child and has been thinking about what his son’s life will look like in 10 years.
    • The key challenge for society is to build AI in a way that preserves human agency — humans should set the direction.
    • Technical challenges that dominate today may become routine, and the harder problems will be figuring out what is important and what we should do.
    • He still believes in the value of basic education and technological literacy for being able to think about these problems.

Closing

  • Ilya emphasizes that alignment and monitorability are urgent challenges that extend beyond AI researchers to policymakers and society at large.
  • He is encouraged by growing public discourse on these topics and believes more is needed.
Back to Unsupervised Learning