Ex-OpenAI Researcher On Why He Left, His Honest AGI Timeline, & The Limits of Scaling RL — Unsupervised Learning

Jerry Torque, former VP of Research at OpenAI, spent seven years at the lab and was central to some of the most consequential advances in modern AI — including the development of reasoning models like o1 and o3, and the scaling of reinforcement learning (RL) that underpins today’s most capable systems. He recently left OpenAI to pursue research directions he felt were harder to explore inside a large, increasingly product-driven lab. In this conversation, he offers a candid, wide-ranging assessment of where AI stands today, what the current scaling paradigms can and cannot do, what problems like continual learning will actually require, how the competitive landscape between labs is evolving, and what he sees as the most important open questions going forward.

Scaling Pre-Training and Reinforcement Learning

Scaling works — and it works predictably. Both pre-training and RL deliver real, measurable returns: pre-training builds richer world models and better language understanding, while RL makes models highly skilled at whatever task they’re trained on. The core principle is simple: “you get what you train for.”
The bottleneck is generalization, not capability. Models are excellent at what they’re explicitly trained on, but performance degrades quickly outside those domains. Tasks not covered by RL training are handled poorly, and knowledge absent from the pre-training corpus is largely inaccessible. This is the central remaining challenge.
Scaling is data-constrained, not compute-constrained. Each new model generation improves primarily because labs add more targeted data — specifically data aimed at the previous model’s weaknesses. This iterative loop is powerful but slow.
The open question is whether there’s a more data-efficient path. Jerry frames it as an economic question: can research produce methods that generalize better from less data, or are we stuck in a regime where every new skill requires dedicated data collection and RL training?

What RL Can and Cannot Do Today

Easy-to-verify domains work well. Coding and math competitions are natural fits because correctness is cheap to check. These domains have produced the most dramatic and visible RL successes.
Hard-to-verify domains are fundamentally difficult. Writing a good book, starting a successful company, or making a novel surgical decision all involve feedback that is delayed, noisy, or ambiguous. RL on these tasks is possible in principle but extremely challenging in practice.
Jerry’s rule of thumb: if you can tell whether a job was done well, you can do RL on it. The harder it is to evaluate quality, the harder it is to train.
Even in domains with clear feedback, expert-level innovation is hard. Surgeons sometimes succeed by going against established rules — doing something that’s never been done before. Models could eventually do this too, but it would require many more attempts and much more time than current systems can afford.

Generalization as a Property of the Model

Generalization is not automatic — it’s an architectural property. Nearest-neighbor classification can technically solve any ML problem but generalizes poorly because its internal representations are too simple. Transformers, through large-scale pre-training, learn rich, useful world representations that generalize surprisingly well — almost “for free.”
But there’s almost certainly a better architecture. The question isn’t whether better generalization is possible, but what that model looks like and how to find it.

AGI Timelines and What’s Still Missing

Jerry has become more cautious about AGI timelines, not because models aren’t impressive, but because of a specific limitation: current models lack the ability to update their internal knowledge and beliefs in response to failure.
The “hopelessness” problem. When a model hits a wall, it can’t work itself through the difficulty. A user can paste error messages or offer encouragement, but there’s no robust mechanism for the model to genuinely learn from the failure in real time. Intelligence, in Jerry’s view, “always finds a way” — it probes and persists until it solves the problem. Current models don’t do this.
Models are powerful force multipliers but not autonomous problem-solvers. They can dramatically accelerate programming work, but they still make mistakes that require human intervention, and when they fail, the user often has to step in and do it themselves.

Continual Learning

Continual learning — the ability to keep learning without collapsing — is unsolved. Training deep learning models is fundamentally fragile. Without careful effort, models “explode” into weird failure modes. This fragility is at odds with how human learning works, which is anti-robust and self-correcting.
Why hasn’t it been solved? Jerry’s hypothesis: it likely requires research at scale, and only a handful of well-funded labs can do that work. Those labs have been busy with other priorities. It’s not that the ideas don’t exist — it’s that the right experiment hasn’t been run yet with sufficient resources.
Continual learning is probably necessary for AGI. Jerry now believes a static model — one that can’t continue to learn after deployment — can never truly be AGI. The ability to continuously update and adapt is a core requirement.

Convergence Among AI Labs

Labs are converging on very similar approaches. The economic incentives are strong: customers can switch easily, competition is fierce, and the pressure to produce better models at lower cost drives everyone toward the same playbook — more data, more compute, iterative improvement.
This creates a prisoner’s dilemma around exploration. Trying something radically different risks losing market share to competitors who are optimizing the known path. The safe strategy is exploitation (making the current approach more efficient) rather than exploration (trying something that might be 10x better or might fail).
But first-mover advantage is real and compounding. OpenAI’s early bet on large-scale pre-training made it one of the most successful companies in history. Its early lead in RL research still gives it an edge today. Ideas diffuse, but the lead — the accumulated expertise, infrastructure, and talent — can persist for a long time, similar to how early movers in semiconductor manufacturing maintained durable advantages.
The Darwinian reality: some companies will adapt and survive, others will die. Newcomers will emerge. Not every country or company ends up with a semiconductor industry, and the same will likely be true for frontier AI.

Why Jerry Left OpenAI

The decision was gradual, not sudden. OpenAI was deeply important to him — many friends, shared history, years of his life. He tried hard to make it work internally.
The core reason: declining enthusiasm. For a researcher, losing excitement about the work is a signal that it’s time to move on. “It is basically impossible as a researcher to do your best work if you are not 100% excited.”
He wants to chase another paradigm shift. Introducing reasoning models to the world was a “tectonic shift.” He wants to find the next missing piece in how models are trained and make it mainstream.
He knows what the important problems are — the question isn’t identifying them but figuring about solving them differently than everyone else has tried.

OpenAI’s Evolution

Every year felt like a different company. OpenAI went from a 30–40 person lab with no pre-training to the pre-training company, to the RL company, to a balanced mix of research and product. The executive team shifted significantly. The scale went from tiny to one of the largest companies in the world.
Pivotal decisions:
- Releasing ChatGPT was not expected to go viral internally, but it created momentum that defined the company.
- Committing massive resources to GPT-4 involved major trade-offs but proved critical.
- Betting on reasoning models before there was any product market fit was a first-principles call. o1 was impressive but not practically useful. It took o3 and investments in tool use to reach real product-market fit.
Research and product are mostly separate. One team focuses on product metrics; the rest focus on making models more intelligent. The tension isn’t between research and product — it’s between focus and opportunity.
The focus problem is OpenAI’s biggest risk. There’s so much opportunity in AI that it’s tempting to do everything. But companies are bad at doing multiple hard things simultaneously. OpenAI lost focus on coding for a while, which cost it market share, and is now working hard to regain its lead.

The Coding Market and Competition

Anthropic’s coding success is about focus. Its founders always believed coding was critical to AGI, and they concentrated on it relentlessly. The result is Claude Code and coding agents that are deeply embedded in how Anthropic itself operates.
Two possible futures for specialization:
- Data-driven world: Different labs get better at different things by shifting their data mix. This leads to natural specialization and trade-offs — better at coding means worse at something else.
- Research-driven world: A single breakthrough can improve all domains at once. The lab with the best research leapfrogs everyone everywhere.
Coding as the next layer of abstraction. Coding agents are a higher-level programming language with different semantics. The trend is clear: fewer people will type code directly. The challenge is ensuring software reliability when humans aren’t writing or even reading the code.
The best skill for working with AI coding agents is being a good manager of junior engineers — understanding the trade-offs deeply while giving the agents autonomy to make their own choices.
Application companies face a structural disadvantage unless they eventually train their own models. The natural path for a successful AI application company is: start with someone else’s model, then post-train on your own data, then pre-train your own models, then build your own data centers.
Can smaller companies compete? It depends on whether data or research is the primary driver of progress. If data matters, differentiation is possible. If research matters, a smaller company could still produce a breakthrough that improves everything. But it’s hard — and the risk is that by the time you improve in one domain, the next generation of frontier models has leapfrogged you.

What Makes a Great AI Researcher

Great researchers need both systems/engineering skills and theoretical understanding. Being strong in only one is a severe limitation. Being competent in both makes you roughly 10x more productive.
Independence of thought is essential. Research requires working on things that don’t yet work, which by default people don’t believe in. Groupthink kills research — “100 researchers that think the same thing are essentially one researcher.”
Courage is the missing ingredient. Experiments are expensive (the cost of Hollywood movies), and there’s no guarantee of success. Great researchers need the conviction to pursue ideas that most people don’t believe in yet, and the willingness to stack every advantage they can — compute, talent, data — behind those bets.
Hiring should be about fit, not universal appeal. Different researchers want different things. The best approach is to define the team’s shared values and goals, then find people who align with them. Aligned teams move faster and are more attractive to future hires.

Quickfire

What Jerry changed his mind on: He no longer believes a static model can be AGI. Continual learning — the ability to keep updating and adapting — is a necessary element.
Robotics timeline: ~2–3 years for a “ChatGPT moment.” Progress is better than most people realize, though it needs more time and investment to fully play out.
Biology timeline: Longer than robotics. It’s a harder problem requiring more fundamental investment and more intelligence to manipulate successfully.
Underestimated impact: Widespread work automation will be a reality within decades. Society is not taking this seriously enough. The job market will look drastically different, and the transition will likely be painful.
Parenting in the age of AI: Jerry is deliberately not pushing his young daughters toward academic specialization or competition. He values critical thinking but believes the future job market will look so different that hyper-specialization now is a bad bet. He wants them to have a happy childhood and figure out the rest later.
Existential risk: Jerry is not worried about extinction-level scenarios. The incentives are aligned — no one wants humanity to go extinct, and it’s bad business to kill everyone. His greater concern is a dystopian future where entertainment becomes so compelling that people prefer virtual worlds to reality — a human problem, not an AI problem.

Summary

Scaling Pre-Training and Reinforcement Learning

What RL Can and Cannot Do Today

Generalization as a Property of the Model

AGI Timelines and What’s Still Missing

Continual Learning

Convergence Among AI Labs

Why Jerry Left OpenAI

OpenAI’s Evolution

The Coding Market and Competition

What Makes a Great AI Researcher

Quickfire