Former Meta CTO: The Path to Powering the AI Revolution

Unsupervised Learning 45min 6 min #33
Former Meta CTO: The Path to Powering the AI Revolution
Watch on YouTube

Summary

  • Mike Schroepfer, former CTO of Meta (9 years) and founder of Gigascale Capital (a VC firm investing in tech-driven climate solutions), discusses the deep intersection of AI and energy, arguing that cheap, abundant energy is the key bottleneck for both AI progress and global prosperity. He explains how AI demand is accelerating clean energy deployment, reflects on open sourcing Llama and PyTorch, and shares views on VR, chip design, and the future of the CTO role.

AI and Energy: The Central Bottleneck

  • AI’s explosive growth is creating massive new electricity demand, which Schroepfer sees as a positive catalyst for clean energy deployment rather than a contradiction.
    • Hyperscalers have the technical savvy, capital, and urgency to build quickly, pulling forward investment in new energy technologies.
    • Even without AI, the US needs to roughly 5x its grid by 2050 to electrify vehicles, manufacturing, steel, cement, and concrete.
  • The fundamental constraint on global human welfare is energy cost.
    • Billions lack air conditioning or clean water—both are solvable engineering problems limited only by energy cost.
    • If energy costs drop by 10% annually for 20–30 years, it unlocks AI compute at scale, universal comfort, and new forms of manufacturing (e.g., synthetic fuels).
  • Schroepfer frames the question personally: if every person on Earth had a full-time AI reasoning agent (24/7), how much power would that require—a kilowatt, megawatt, or terawatt per person—and what does that imply for total global energy needs?

Clean Energy Technologies He’s Most Bullish On

  • Solar: 80% of new US grid capacity in 2024 was utility-scale solar—it’s simply the cheapest way to add electrons. But it only works ~25% of the time (night, winter, cloudy regions), so it can’t alone power 24/7 data centers.
  • Fusion: Schroepfer is highly optimistic.
    • One supertanker of fuel could power the entire US for a year; a pickup truck could fuel a major power plant for a year.
    • It’s incredibly power-dense, produces no long-lived waste, and requires only ~40 acres and ~2 years to build a plant.
    • Multiple private fusion companies are making progress; hyperscalers should be rooting for them because success would unlock vast new energy supply.
  • Offshore compute platforms (e.g., a company called Panasa): A ~200-meter-tall floating platform that harnesses wave energy while using seawater for immersive cooling—potentially the cheapest inference platform on the planet.
  • Next-generation geothermal: Companies like Zanar use AI to identify optimal drilling spots where hot water or steam naturally emerges, reducing exploration risk.
  • Batteries: Lithium-ion batteries are 97–98% cheaper than at introduction (1991) and still declining >10% per year, enabling solar to serve more hours of the day.

Short-Term Messiness vs. Long-Term Planning

  • Over the next ~5 years, some natural gas turbines will be deployed to meet urgent data center demand because they can deliver ~1 GW at 70–90% capacity factor within 1–2 years.
    • Schroepfer doesn’t love the “AGI will solve climate change” excuse for short-term fossil fuel use—it kicks the can down the road.
    • But he acknowledges the rational calculus: get power now, buy offsets, plan cleaner replacements later.
  • The right approach is to “walk and chew gum”: deploy gas short-term while simultaneously investing in solar, batteries, next-gen geothermal, and fusion for the long term.
    • Every hyperscaler should be placing bets across multiple next-gen energy technologies now.

Open Source, PyTorch, and Llama

  • Schroepfer helped launch Facebook AI Research (FAIR) in 2013, the same year AlexNet won ImageNet by a landslide, signaling the deep learning revolution.
  • FAIR produced PyTorch (now the dominant AI framework) and Faiss (a nearest-neighbor search library used widely in production systems).
  • His philosophy on technical stacks: the further down you go (chips, OS, frameworks), the more you want commoditization and open collaboration—not every company should build its own foundation model, just as most don’t build their own chips or operating systems.
  • Open sourcing Llama was a deliberate strategic choice:
    • Meta’s thesis was always about access, not ownership—being able to use the best technology, whether built in-house or elsewhere.
    • Open weights accelerate global innovation, let others optimize inference, and give Meta zero-cost access to the best models.
    • This was controversial at the time but is now widely seen as the right call.

Developer Tools and the Shifting Stack

  • AI development has moved from low-level optimization (assembly → C → Python) to system design problems:
    • How to collect datasets, manage pre-training, post-training (RLHF, RL), and inference across clusters of 25,000+ nodes where hardware failures are constant.
    • It’s become more like physics—requiring large shared infrastructure rather than individual workstations.
  • The current gap is in tooling around the full lifecycle: data pipeline management, checkpointing, fault tolerance, and orchestration—not in model architectures themselves (Transformers are largely settled for now).

Hardware, Chips, and Nvidia’s Moat

  • Nvidia has an enormous R&D advantage and compounding investment moat; beating them on general-purpose GPU compute is extremely hard.
  • The only viable path for custom chips is specialization: hardwiring a specific algorithm (e.g., Transformer inference) into silicon for ~10x better performance per watt.
    • The risk is guessing wrong—if the algorithm shifts before your chip ships, it’s worthless.
  • Schroepfer oversaw Meta’s transition from leasing data centers and buying servers to designing nearly everything in-house (data centers, servers, networking, storage).
    • The lesson: understand your supply chain deeply, outsource what’s good enough, and own what’s strategically critical.
  • Data center capacity planning is brutally hard—you must commit to steel and concrete 18–24 months ahead with no certainty about future demand.
    • Schroepfer’s rule: under-predicting is worse than over-predicting—missing capacity means lost product opportunities; excess capacity is just a financial problem.

VR, Smart Glasses, and AI Integration

  • The long-term VR vision is “The Matrix operator”: a generative AI that instantly creates any immersive 3D world on demand, eliminating the content creation bottleneck.
    • This is years away but technically within reach.
  • More near-term: contextual AI via smart glasses that acts as a live translator, tour guide, memory aid, and social companion—always present, not something you go to a screen to use.
    • The form factor is still being figured out, but Meta’s Ray-Ban glasses and others are early experiments.

Key Open Questions in AI (Next 2–3 Years)

  • Reasoning models (post-training RL on top of LLMs) have been the big recent breakthrough—but how far can this be pushed?
    • It works well in easily verifiable domains (math, coding) but struggles where ground truth is ambiguous (video generation, open-ended tasks).
  • Memory: Gemini’s 1-million-token context window is impressive, but true associative long-term memory (like humans have) is still missing.
  • Scaling laws for plain pre-training are showing diminishing returns; the field is searching for the next paradigm shift.

AI Applications in Climate

  • Exploration: Using AI to find optimal locations for geothermal wells, copper deposits, or hydrogen reserves—reducing costly physical drilling.
  • Weather prediction and risk modeling: Better insurance and disaster preparedness.
  • Materials discovery: Searching vast multi-dimensional spaces for catalysts, carbon capture sorbens, or pharmaceutical targets—AI can down-select candidates from millions to hundreds for testing.
    • Schroepfer is bullish long-term but notes that in many cases, AI addresses <10% of the total effort to bring a material to market (scaling manufacturing, finding customers, etc.).
    • The most impactful AI applications are end-to-end, directly tied to a company’s bottom line (e.g., faster geothermal exploration → more assets → more revenue).
  • Consumer-facing example: A company backed by Gigascale uses AI + thermal camera phone scans to diagnose home energy inefficiency and recommend upgrades with 7-month payback periods—something consumers will do for $500 but wouldn’t pay a human auditor for.

Personal AI Use and Parenting

  • Schroepfer uses AI daily for summarization and research (e.g., Deep Research for investment theses on geothermal, biochar) and for explaining complex papers in simple terms.
    • He treats AI as a “really fast tutor” but always double-checks facts—accuracy is still imperfect.
  • For his kids, he emphasizes fundamentals: critical thinking, basic math, writing, reading, and the confidence that you can learn anything by applying your mind.
    • Skills that served him best: debate (comfort with public speaking, persuasion) and computer science fundamentals (how computers work, problem decomposition, intellectual ego to learn anything).

The CTO of the Future

  • AI is to coding what JavaScript was to C or C was to assembly—a higher level of abstraction.
    • “Write me code that sorts this array” is faster than calling a library function, which is faster than writing it in assembly.
  • But the core skill of a CTO doesn’t change: identifying the most important problem, organizing smart people to solve it, and maintaining relentless prioritization.
    • The danger is brilliant people crushing 1% problems while ignoring the 99% that matters.
  • Team sizes will shrink over time as AI tools amplify individual productivity—companies will launch faster with fewer people.

Quickfire

  • Model progress this year vs. last: More.
  • Go-to test for new models: Summarize a very complicated research paper.
  • Weirdest prediction: Most people will have an AI friend—a non-judgmental, always-on-your-side companion, more valuable than group-chat experiments with mixed humans and AIs.
  • Regrets from Meta/FAIR: Few—proud of PyTorch, Llama, and the open-weights bet. Wishes they’d moved slightly faster on the “scale up” side of the scale-vs-algorithms debate.
  • Where to find him: gigascalecapital.com, LinkedIn, X, and Threads.
Back to Unsupervised Learning