Oxford's AI Chair: LLMs are a HACK — Johnathan Bi

The episode features Michael Wooldridge, Oxford’s AI Chair, arguing that large language models (LLMs) are best understood as engineering hacks rather than genuine models of intelligence, and that their impressive performance is rooted in pattern recognition rather than true reasoning or problem solving.

LLMs are built on next-word prediction using the transformer architecture, designed to predict the next token in a sequence, not to reason, plan, or understand.
Their capabilities emerge from massive scale in data and compute, not from any deep cognitive or philosophical model of mind.
Wooldridge emphasizes that this does not make them useless—they are extremely useful—but their usefulness should not be mistaken for genuine intelligence.

Planning is a core AI capability: given a starting state, a goal, and a set of actions, how do you sequence actions to reach the goal?
LLMs appear to plan well when given familiar problems (e.g., trip planning), but this is likely because they have seen thousands of similar examples in training data and are doing pattern matching.
The critical test: if you obfuscate the problem by using novel terms the model has never seen—while keeping the underlying logical structure identical—LLMs fail to solve it.
This failure suggests they are not solving problems from first principles but are instead recognizing and reproducing patterns from training data.
Humans, by contrast, can solve novel problems they have never encountered before because they reason from first principles.

Wooldridge argues the issue is not just insufficient data or scale but is rooted in the transformer architecture itself.
Transformers were designed for next-word prediction, not for logical reasoning, planning, or robotic control.
He sees no reason to believe that simply scaling up data and compute will produce genuine reasoning capabilities.
The weight of current evidence suggests LLMs cannot do logical reasoning or problem solving in a deep, generalizable way.

Wooldridge acknowledges that this is a watershed moment in AI history.
Questions that were purely philosophical a few years ago—such as “Can a machine be conscious?” or “Can a machine think?”—are now experimental, empirical questions.
Researchers can now run actual experiments on real systems rather than debating thought experiments.
The speed of this transition—from philosophical speculation to hands-on experimental science—is, in his view, mind-boggling.
He is genuinely dazzled by what LLMs can do and is regularly surprised by new capabilities, even as he maintains a clear distinction between useful pattern recognition and genuine reasoning.

Summary