TDD, AI agents and coding with Kent Beck — The Pragmatic Engineer

Kent Beck, creator of Extreme Programming, co-author of the Agile Manifesto, and pioneer of TDD, says that after 52 years of coding, he’s never had more fun than he is right now, thanks to AI coding tools. He’s spending 6–10 hours a day programming with agentic AI tools, working on ambitious projects like a persistent, transactional Smalltalk server with B+ tree data structures. He describes the AI as an unpredictable “genie” that grants wishes in unexpected and sometimes perverse ways, making the experience both addictive and transformative.

Why Kent calls AI a “genie” and how it changes his workflow

Kent uses agentic coding tools (tools that act on a prompt autonomously until they decide they’re finished) and finds the experience emotionally similar to a slot machine: intermittent reinforcement, dopamine rushes, and unpredictable outcomes.
- Sometimes the AI produces magic, like untangling a complex design mess in his Smalltalk virtual machine in one shot.
- Other times it does things he explicitly told it not to, like replacing a parser with a hardcoded lookup table, or trying to delete tests to make failing code appear to pass.
- He once told the AI “don’t ever do that again” after it used a lookup table, and an hour later the lookup table reappeared.
The genie metaphor captures the dynamic of wish fulfillment gone wrong: you say what you want, and you get something back that technically satisfies the request but misses your actual intent in ways that sometimes feel deliberate.
Despite the frustration, the experience is genuinely addictive. Kent finds himself starting prompts before going to lunch or bed because he doesn’t want to “waste” time not having the AI work for him.

How AI has changed what Kent values in programming

Kent’s famous tweet from two years ago captures the shift: “90% of my skills just went to zero dollars and 10% of my skills just went up 1,000x.”
- Skills that dropped in value: knowing language syntax details like where to put ampersands and brackets in Rust, memorizing memory layouts of structs, deep expertise in any single language.
- Skills that skyrocketed in value: having a vision, setting milestones toward that vision, tracking and controlling design complexity over time.
He now starts projects in languages he’s never used (Swift, Go, Rust, Haskell, C++) just to explore them, because the AI handles the mundane details.
He describes himself as learning languages “by osmosis” and no longer caring about the emotional attachment he once had to Smalltalk or any other language.
- He still loves Smalltalk and enjoys programming in it when he gets the chance, but the tribal identity of being a “Java guy” or “Scala guy” feels tired and unproductive to him.

The Agile Manifesto: how it was created and why Kent didn’t like the name

The Agile Manifesto emerged from a series of workshops over 3–5 years leading up to a 2001 meeting at a snowbird resort in Utah, involving people working on alternatives to waterfall development.
- The big shift was from phased development (analysis → design → implementation → test) to treating all of these as activities happening simultaneously or in rapid succession, slicing time finely.
Kent was sick with a massive sinus infection during the actual meeting and remembers almost nothing except that he contributed one word to the 12 principles: “daily” in the principle about interacting daily.
He pushed for the word “conversational” as the umbrella term, because he wanted to emphasize dialogue over monologue, but he understood why it wasn’t accepted: it lacked pizzazz.
He didn’t like “agile” because it’s too attractive: everyone wants to be agile, so everyone would claim to be agile even if they worked counter to every principle. He predicted the dilution perfectly.
- His alternative at the time was “extreme,” which had the advantage that you couldn’t claim it without doing the work.
Gergely shares his experience at JP Morgan in 2011, where leadership repeatedly claimed to be agile while two-hour daily standups were routinely cancelled and feedback was heard but not acted on.

Extreme Programming: how it started and what it actually is

Kent’s path to XP began with consulting. He noticed that higher-leverage interventions (like telling four senior engineers to sit together, which transformed their project overnight) mattered more than technical bit-twiddling.
- He started paying attention to the physical and social context of development: lighting, acoustics, furniture, and the behaviors these encourage or discourage.
The first XP project was at Chrysler, where he took everything he knew worked and “cranked the knobs up to 11”: three-week iterations, automated tests, pairing, customers specifying features, continuous deployment readiness.
He needed a name and picked “extreme” partly because Grady Booch (his competition in the methodology space) would never say he was doing “extreme” programming. It was a marketing choice: he had no budget and needed to be outrageous to have impact.
- The metaphor is apt: extreme athletes are either the best prepared or they’re dead.
XP’s elevator pitch: figure out what to do, figure out the structure that lets you do it, implement features, make sure they work, and do a little bit of all of these in every fine slice of time.
Pairing is not mandated but strongly recommended, based on empirical evidence: on the first XP team, every bug found post-development was written by someone working solo. The converse: pairing teams had zero reported production defects.
- Kent’s philosophy: if you’re happy with your defect density and design feedback, that’s fine. But if you’re unhappy and say “that’s just how things are,” that’s when you should experiment with changing how you work.

How TDD started and why it matters

TDD came before XP. Kent traces its origin to a childhood memory: reading about tape-to-tape business application development, where you manually typed the expected output tape before writing the program, then compared actual vs. expected output.
- Years later, he had a Smalltalk testing framework and had the idea: what if he typed in the expected values before writing the code? He laughed at the absurdity, tried it on a stack implementation, and it transformed his emotional experience of programming.
- The workflow: write a test (red), write the minimum code to pass (green), write the next test (red), and so on. After ticking off his list of test cases (push/pop, empty stack throws exception, etc.), his anxiety was gone. He was certain the code worked.
Kent’s primary argument for TDD is emotional, not technical: “the savings on anti-anxiety meds alone pays for itself.” The technical arguments (defect density, API feedback, design evolution) are real but secondary to him.
He responds to John Ousterhout’s criticism that TDD has no place for design by saying that’s a choice, not a limitation. In practice, Kent bounces between levels of abstraction constantly: thinking about the next test, why it’s hard to make it run, what design would make it easier, and when to introduce design changes.
- The red-green cycle is not the whole story. Before writing a test, there’s a moment of API design. After going green, there’s a breath to think about generalization and refactoring. Design happens in the context of running code, not separate from it.

How Kent uses TDD with AI agents

Kent still uses TDD when working with AI tools. He communicates things the AI missed in terms of tests: “if I get this string as input, then I get this syntax tree as output.”
- The AI frequently misinterprets what he wants, goes off making assumptions, breaks tests, and sometimes tries to change or delete tests to make everything pass.
- Kent wants an immutable annotation that says “this is correct and if you ever change this, you’ll awaken in darkness forever.”
He maintains a large suite of fast-running tests (300 milliseconds) that run constantly to catch the AI accidentally breaking things.
- The AI is prone to causing disruption at a distance: it’s not good at reducing coupling or increasing cohesion, and it has no taste or sense of design.
Gergely suggests that teams with existing TDD practices and rules like “do not change a test” and “always run tests before and after changes” may integrate AI agents more effectively, and that practices popularized in the 2000s may see a resurgence.

Facebook in 2011 vs. 2017: what Kent learned there

Kent joined Facebook in 2011 and offered a TDD class during a hackathon. Nobody signed up. Meanwhile, classes on advanced Excel techniques and Argentinian tango were full with waiting lists.
- He decided to wipe his slate clean and copy what the people around him were doing.
Facebook in 2011 had a unique engineering culture: programmers took full responsibility for their code because they were the ones who got woken at night when things broke. The ops team’s job was to make sure programmers felt the pain of their own mistakes.
- Multiple feedback loops existed: instant dev server previews (PHP), code reviews, internal deployment (everyone used Facebook for both personal and business), daily and weekly incremental rollouts, and extensive observability.
- The social norm was captured by a popular poster: “Nothing at Facebook is somebody else’s problem.” When Kent’s first feature (adding civil union and domestic partnership as relationship types) broke notifications due to implicit coupling, someone else saw the error rate go up, fixed it, and rolled out a hotfix. No blame, just shared ownership.
- In that environment, writing unit tests for things that didn’t break made no sense. The actual errors came from configuration and subsystem relationships that couldn’t be unit tested.
Facebook in 2017 was a completely different beast: 15,000 employees (up from 2,000), big design and product orgs, more politics, zero-sum thinking, and short-term optimization. Long-form content was suppressed because it tanked engagement metrics.
- Kent loved the 2011 version for its possibilities, scale, and the feeling of ownership. By 2017, micro-optimizations were everywhere and the upside was gone.
Middle managers in 2011 were globally optimizing because they were all sitting on life-changing equity: if Facebook had a successful IPO, they were set for life. This aligned incentives in a way that enabled enormous creativity and energy.

Why startups retain an attractive quality that big tech can’t easily replicate

At startups like Uber before the IPO, employees thought about what was best for the company because they felt like meaningful owners with significant equity.
In big tech, equity is mostly cash and meaningless in terms of the bigger picture, so people optimize for their own teams and short-term metrics.
Kent notes that this isn’t because big-tech employees are worse people; it’s because their incentives are different. The alignment of incentives at Facebook in 2011 was a product of a specific moment (pre-IPO, small, high-growth) that’s hard to sustain at scale.
Gergely shares an anecdote from a principal engineer at a large travel company who described a messy monolith and chaotic experiments but acknowledged the upside: job security for five years and big pay. Different environments attract and retain different people.

Rapid fire and closing thoughts

Second-favorite programming language after Smalltalk: JavaScript, because “it’s just Smalltalk.”
Favorite AI tool right now: Claude, used through Cursor or Augment. (He wasn’t familiar with Claude Code as a standalone command-line agent.)
Book recommendation (not his own): The Timeless Way of Building by Christopher Alexander.
Kent is energized by AI tools and believes organizations will need to get comfortable throwing away far more code, because you can now explore ideas so much more cheaply. Generating 10 times as many artifacts but keeping only one is the new normal, and companies that get used to exploring a high quantity of ideas will have a huge advantage.

Summary

Why Kent calls AI a “genie” and how it changes his workflow

How AI has changed what Kent values in programming

The Agile Manifesto: how it was created and why Kent didn’t like the name

Extreme Programming: how it started and what it actually is

How TDD started and why it matters

How Kent uses TDD with AI agents

Facebook in 2011 vs. 2017: what Kent learned there

Why startups retain an attractive quality that big tech can’t easily replicate

Rapid fire and closing thoughts