Gwern — Anonymous writer who predicted AI trajectory on $12K/year salary

Dwarkesh Podcast 1h36 10 min #76
Gwern — Anonymous writer who predicted AI trajectory on $12K/year salary
Watch on YouTube

Summary

  • Gwern Branwen is an anonymous independent researcher and writer whose longform essays on AI, statistics, and transhumanism have quietly shaped the thinking of many people building AGI. He was one of the first people outside OpenAI to see the scaling hypothesis coming — the idea that simply making neural networks bigger and feeding them more data would unlock capabilities no one predicted. This conversation covers his intellectual journey, his theory of intelligence, his unusual life, and his thoughts on what to do in the last years before AGI.

Anonymity

  • The most underrated benefit of anonymity is that people can’t project an identity onto you or write you off before engaging with your ideas — they have to actually read you.
  • It also prevents retaliation, which matters when you write about controversial topics like darknet markets or behavioral genetics.
  • Gwern’s anonymity is preserved in this interview through an avatar and voice synthesis.

How companies will be automated

  • Automation will proceed bottom-up: replace workers first, keep human executives at the top for as long as possible.
  • The human CEO’s comparative advantage is long-term vision and taste — the “Steve Jobs thing” of choosing which proposals are good and which are bad, and pushing good ones further.
  • AI-only firms would make myopic choices; human-led firms with AI workers should outcompete them.
  • The last thing Gwern expects to automate is that final act of judgment — choosing and curating among options his AI minions present.
  • The unit of selection in AI corporations will likely be packages or teams of models that work well together, not individual models, because you can’t easily train the interaction between models in a differentiable way.

The deep history of the Singularity idea

  • Samuel Butler in 1863 described machines becoming autonomous and threatening humanity — the earliest clear Singularity scenario.
  • Even Isaac Newton struggled to explain why progress was happening, concluding that civilization must be periodically destroyed and we’re just rediscovering old knowledge.
  • Lucretius made the same argument 1,700 years before Newton — that Roman innovations couldn’t be real progress, so the world must have been recently destroyed.

A grand theory of intelligence

  • Intelligence is search over Turing machines — there is no special “intelligence fluid” or general master algorithm, just a vast number of special cases learned and recombined.
  • Variation in intelligence is just variation in compute available to search over more and longer Turing machines.
  • This explains why there’s no “IQ gland” — the brain is an ensemble of small specialized solutions, and you can always extract a small model from a large one that does a specific task equally well.
  • Human-level intelligence is rare to evolve because hardwiring solutions by genes is cheaper and more reliable than an expensive, glitchy general-purpose learning process — intelligence only pays off in niches where environments are complex and changing.

How Gwern saw scaling early

  • In the mid-2000s, Gwern read Moravec and Kurzweil but was skeptical — their “build it and they will come” view seemed like magical thinking, since algorithms require deep insight.
  • He gradually changed his mind by watching a trickle of results: models getting bigger, datasets getting bigger, GPUs multiplying, and CNNs being applied to everything.
  • GPT-1’s unsupervised sentiment neuron was interesting; GPT-2’s prompting and summarization was a “holy shit” moment; GPT-3’s few-shot learning chart was the definitive proof that the scaling world was real.
  • While Gwern was convinced by GPT-3, most people on Twitter were saying scaling “worked so badly” because GPT-3 wasn’t state-of-the-art on benchmarks — this anger motivated him to write up the scaling hypothesis.
  • Two biases prevented others from seeing scaling: (1) they missed key results like the 2017 Baidu scaling laws paper and BigGAN scaling to 300 million images; (2) they believed algorithms mattered more than compute, partly because research papers systematically falsify the origins of ideas — they tell nice stories about insight rather than the trial-and-error and serendipity that actually drove progress.
  • Many correct ideas from earlier decades (like ResNets in 1988) were forgotten because there wasn’t enough compute to make them work at a meaningful scale.

AGI timelines

  • From 2005–2010, Gwern thought AGI was past 2050. After AlexNet and DanNet, his timelines dropped roughly 2 years per year.
  • He briefly thought people over-updated on AlphaGo, then RL efforts fizzled post-Dota — but then GPT arrived and erased all doubt.
  • The current recipe is: learn from generative models first, then do a little RL on top — not brute-force end-to-end RL from rewards.

What to do in the remaining years before AGI

  • Gwern uses a three-part rubric for how to spend his time: (1) things he wants to do regardless of AI because he enjoys them; (2) things where he’s only doing the human part (laying out proposals for future AGIs to execute); (3) writing down ephemeral things like preferences, desires, and judgments that an AI could not replace — “the AI cannot eat ice cream for you.”
  • He estimates an AI could write a Gwern-quality essay within 2–3 years, especially if it had his full corpus to draw from.
  • Anthropic’s 2028 AGI timeline is his planning baseline. Even if wrong, writing down descriptions of projects costs little.

Influencing the Shoggoth with writing

  • Writing now is a way of voting on the future of AI using the only currency it acknowledges: tokens it has to predict. If you don’t write, you abdicate your role in shaping it.
  • For most people not at frontier labs, influence over the future rounds to zero — writing is one of the few ways to have any.
  • Writing also creates a kind of immortality: Kevin Roose discovered that LLMs now mistreat him because of his interactions with Sydney, which “revealed” him as a privacy-invading liar. Your writing shapes how AI treats you.
  • Future superhuman historians will be able to recover any stable, long-term characteristics from your writing. What will be lost is everything you could forget ordinarily — how you felt at a particular time, what you thought of a movie — unless it was written down.

The unresolved tension between human and artificial intelligence

  • Gwern constantly oscillates on whether human intelligence and neural network intelligence are two sides of the same coin, or one is inferior, or both are awesome in different ways.
  • He argues both ways on different days — whether language models are more sample-efficient than humans, or less.
  • He refuses to believe there are two totally unrelated kinds of intelligence (biological vs. artificial) — that would be as absurd as humans reaching Mars at the exact same moment aliens land there for the first time.

Rabbit holes as a life philosophy

  • What Gwern maximizes is falling into rabbit holes — obsessive deep dives into new topics. Even bad experiences (like buying catnip for a catnip-immune cat) become excuses for rabbit holes.
  • He can only do 2–3 rabbit holes at a time; otherwise the obsession isn’t real.
  • A rabbit hole ends when you hit a natural terminus — data doesn’t exist, or nobody knows the answer.
  • He’s been this way since childhood (dinosaur phase, construction equipment phase, Alcatraz phase, ancient Japanese literature phase) — he never stopped having obsessions, they just replaced each other.
  • The longest rabbit hole that didn’t pay off was his work on Neon Genesis Evangelion, which he never satisfactorily resolved before burning out.

Hearing impairment

  • Gwern has been hearing impaired since birth. He went to a special ed school and had to use conspicuous hearing aid equipment in class, which humiliated him.
  • Being a second behind in conversation made socializing terrible and reinforced introversion.
  • He developed a fear of water and rain because he was drilled to never get his hearing aids wet.
  • He speaks with a “deaf accent” — multiple people on his San Francisco trip asked where he was really from.
  • His hearing impairment made him a bookworm, which was foundational to becoming Gwern. He still mispronounces words he learned only from reading.

Wikipedia as training ground

  • Before gwern.net, Gwern was a Wikipedia editor — that was his training in writing, synthesis, and completing projects.
  • He started editing in late middle school or early high school, often skipping lunch to alternate between Neopets and Wikipedia.
  • He was the only constructive editor at his school; other kids were vandals.
  • Wikipedia has become hostile to the kind of obsessive, detailed, rabbit-hole research projects he did — deletionism, no original research rules, and arbitrary deletion of content drove him away.

Gwern.net and the Silk Road break

  • He started blogging after graduating, as Wikipedia became less hospitable (the Siegenthaler incident was a turning point toward deletionism).
  • His first big hit was a Silk Road essay: after Adrian Chen’s Gawker article about buying LSD, Gwern ordered Adderall off Silk Road and documented the entire process with screenshots. It got hundreds of thousands of hits and remains his biggest traffic spike.

Counterfactual careers

  • Plausible alternatives: AI researcher or management at a big AI company. He dropped out of computer science because Java was excruciatingly boring, and his writing topics (darknet markets, behavioral genetics) made him unhirable.
  • Agency in AI is easier than expected in some ways — current LLMs can do coherent tasks that would have seemed miraculous 10 years ago. But nobody is actually training for agency; it’s an accidental byproduct of training on internet scrapes. Proper agent training (like Gato) is something nobody wants to do — everyone wants to minimize RL.

Literature and Borges

  • Without the internet, Gwern would have tried academia or become a librarian like Jorge Luis Borges, whom he deeply admires.
  • Borges’s “Borges and I” — about not identifying with the public version of himself — resonates with Gwern now in a way it didn’t when he was younger.
  • Ted Chiang’s “Story of Your Life” (the basis for Arrival) initially read to Gwern as a stupid ESP story. Only later did he understand it was about an equally valid alien way of experiencing time — seeing everything as a predetermined story rather than events marching toward an unknown future.
  • Gene Wolfe’s “Suzanne Delage” took Gwern 14 years to understand: it’s a subtle retelling of Dracula where the narrator has been brainwashed to forget that Dracula invaded his town and stole the woman. Every part of the story is told by what’s not said.
  • Gwern is cynical about fiction — 99% of sci-fi he read was useless. The valuable ~20 works all share a characteristic: taking non-human intelligence seriously.

Gwern’s intelligence and writing process

  • People overestimate his intelligence — they mistake having written and remembered many things for being able to produce it all spontaneously. He’s “cheating” by drawing on years of prior thought.
  • His process is like diffusion, not autoregressive daily writing: he iterates on essays for years (some from 2009 to 2024), adding examples and connections gradually.
  • “Evolution as Backstop for RL” was built by noticing a recurring pattern — a stupid, inefficient learning mechanism that can’t be removed because it keeps the smarter one honest — across corporations, neural networks, pain, and other domains. He spiraled around the idea, adding examples over time.
  • Some essays come as a single eureka flash after disparate observations have been bothering him for years — like “The Melancholy of Subculture Society,” which poured out in one sitting.
  • The sacrifices: no career, no travel, no digital nomad lifestyle, no social life. He sits reading papers every day. He can’t be Tyler Cowen, who is robust to travel and socializing — Gwern collapses after a day of talking to people.

A day in the life

  • Morning: clean up previous day’s work on the website, fix formatting, review collation.
  • Day: read Twitter and RSS feeds, get distracted by comments or questions, do some writing.
  • Evening: work on a real project.
  • Gym: not because he enjoys it, but because it’s the most opposite activity from sitting at a computer — his theory of burnout is doing something as different as possible.
  • Most daily work is aesthetic polishing; the real output comes from sudden eruptions provoked by something he reads or an argument online.
  • Arguing with people online is his most unusual and successful work habit — anger at people being wrong on the internet is a plentiful source of motivation to write.
  • The pitfall of isolated work: you can become arbitrarily wrong, and the emotional toll of shouting into a void can spiral into bitterness and crankdom. Spite is great motivation but poisonous if you hold onto it.

Finances

  • Gwern lives on ~$900–1,000/month from Patreon plus savings from early Bitcoin. That’s less than $12,000/year.
  • He lives in the middle of nowhere, cooks his own food, uses a free gym, has no health insurance, and once propped up his collapsing bedroom floor with scrap wood.
  • He’s lucky to have had no health emergencies. He doesn’t recommend this lifestyle as a model — every writer has to figure out their own way.
  • He’d move to San Francisco for $50–100K/year to write full-time.

Diversity of AI minds

  • Excluding capability, AI models are already more cognitively diverse than humans — GANs, VAEs, diffusion models, and LLMs all think in wildly different ways with different artifacts and errors.
  • Within LLMs specifically, diversity has collapsed because everyone trains on the same data and rides each other’s coattails — like comparing identical twins.
  • GANs are “scared” (they hide hands off-screen due to adversarial loss); diffusion models attempt hands but produce monstrous results.

GLP drugs and environmental poisoning

  • GLP weight-loss drugs surprise Gwern — their broad effects on health and addiction suggest something important about human willpower and dysfunctionality.
  • It’s too early to say whether they break the Algernon argument (that evolution should have found any simple beneficial intervention), because the obesity crisis is only ~30 years old — not enough time for genetic selection.
  • Gwern gives near-100% credence to the idea that something in our modern environment is harming us at a scale comparable to lead poisoning in Rome — there are too many novel chemicals and environmental changes for everything to be benign.
  • Whatever it is, it probably isn’t harming intelligence (which is stable over time) — obesity is a better candidate given its sharp rise.

Psychedelics

  • Gwern is skeptical of Bay Area psychedelic experimentation despite his history with nootropics. The key difference: psychedelics can have acute and permanent effects on perception and psychiatric state, while nootropics are relatively manageable.
  • Psychedelics have a “self-recommending problem” — they make you want to take more of them, similar to how avid meditators are compelled to tell everyone to meditate.
  • The standard failure case for nootropics is wasting money; for psychedelics, it’s permanently changing yourself in ways you don’t understand.

Parasocial relationships

  • Gwern would like to fill the role of a mentor or old wizard — exhorting people to read, write, and think, and to aspire to make the internet better.
  • He fears he actually functions as either a guru (whose every word is taken as gospel) or a trickster devil (a covert neo-Nazi/eugenicist/communist trying to bring down Western society), depending on whether you like or hate him.

Open rabbit holes for 2050

  • Why do we sleep or dream?
  • Why do humans age?
  • Why does sexual reproduction exist?
  • Why do humans differ so much from each other and day to day?
  • Why did humans take so long to develop technological civilization?
  • Where are all the aliens?
  • Why didn’t China have the Industrial Revolution instead?
  • How should we have predicted the deep learning revolution?
  • Why are our brains so oversized compared to artificial neural networks?
Back to Dwarkesh Podcast