Mark Zuckerberg discusses the release of Llama-3, Meta’s open-source AI model, and the broader trajectory of AI development at Meta, including infrastructure strategy, open-source philosophy, and long-term bets on the metaverse and artificial general intelligence.
Llama-3 Release and Capabilities
Meta is rolling out Llama-3 in three versions: 8B and 70B parameter models released now, and a 405B dense model still in training.
The 8B model is nearly as powerful as the largest Llama-2 model (70B), representing a major efficiency leap.
The 70B model scores around 82 on MMLU, with leading results in math and reasoning.
The 405B model is already at ~85 MMLU during training and is expected to lead on multiple benchmarks upon release later in the year.
Llama-3 is open source for the developer community and now powers Meta AI, which Zuckerberg calls “the most intelligent, freely-available AI assistant.”
Meta AI is being integrated more prominently across Facebook, Messenger, Instagram, and WhatsApp, with search-box access at the top of apps.
New features include real-time image generation that updates as you type a prompt, and image animation.
Meta AI now integrates Google and Bing for real-time knowledge retrieval.
The 70B model was trained on ~15 trillion tokens—more than compute-optimal—because Meta prioritized inference performance for its massive user base; the model was still learning at the end of training and had not yet saturated.
Future Llama releases will add multimodality, more languages, and larger context windows.
Infrastructure and GPU Strategy
Meta began stockpiling H100 GPUs in 2022, driven by the need to train recommendation models for Reels and unconnected content (content from people/users don’t follow), which expanded the candidate corpus from thousands to hundreds of millions.
Zuckerberg ordered enough GPUs for Reels and then doubled the order, following the principle of always having capacity for something not yet visible on the horizon.
At the time, he expected the extra capacity would be for content-related AI, not general-purpose LLMs, but the decision proved prescient.
Meta operates two GPU clusters of ~22,000–24,000 GPUs each for training large models.
Meta’s inference-to-training compute ratio is much higher than most companies due to serving billions of users.
By end of 2024, Meta expects to have ~350,000 GPUs in its fleet.
Custom silicon is already handling inference for ranking and recommendations (Reels, News Feed, ads), freeing NVIDIA GPUs for training. Eventually, Meta plans to use its own silicon for training large models too, but not for Llama-4.
Coding, Reasoning, and the Path to AGI
Initially, Meta did not prioritize coding in Llama-2 because it didn’t seem relevant to social app use cases.
Over the past 18 months, it became clear that training on coding improves reasoning and rigor across all domains, even non-coding ones.
For Llama-3, heavy emphasis was placed on coding data to improve general reasoning.
Multi-step reasoning is critical even for social interactions—e.g., a business AI must think holistically about a customer’s goals, not just respond to a single message.
Zuckerberg concluded that Meta must solve general intelligence to avoid having an inferior product compared to competitors.
He does not think of AGI as a single threshold but as a progressive accumulation of capabilities: multimodality (images, video, 3D for the metaverse), emotional understanding (which he considers a distinct modality), memory (beyond just context windows, toward personalized memory stores), and agent-like behavior.
Agent capabilities are being built in stages: Llama-2 had hand-engineered tool use; Llama-3 internalizes much of that; Llama-4 will aim to internalize even more, with hand-engineered solutions serving as stepping stones to inform what gets trained into the next model.
Scaling, Bottlenecks, and Physical Constraints
Zuckerberg is optimistic but measured about continued scaling; he does not believe a runaway intelligence explosion is likely due to physical constraints.
The main near-term bottleneck is energy: no one has built a single gigawatt data center yet.
A gigawatt-scale data center would be comparable to a nuclear power plant’s output.
Energy permitting and transmission line construction are heavily regulated and take many years.
Current large data centers are typically 50–150MW; 300MW–1GW facilities are coming but not imminent.
GPU supply constraints have eased, but capital deployment is now limited more by energy availability than money.
Distributed training across sites is possible but raises open questions about how future training will be structured.
Synthetic data generation (inference used to create training data) may become a larger part of training, but Zuckerberg believes there are fundamental limits to how far a given model architecture can improve from synthetic data alone—each new generation (e.g., Llama-3 70B vs. Llama-2 70B) represents a step function that the community cannot replicate just by scaling up older architectures.
Open Source Philosophy and Risks
Zuckerberg is strongly pro-open-source but has not committed to open-sourcing every future model.
If a model reaches a qualitative capability shift where open-sourcing it would be irresponsible, Meta will not release it.
The concern is not specific behaviors per se, but behaviors that cannot be mitigated.
He acknowledges that open weights allow bad actors to strip out safety fine-tuning, and that this is a real issue.
However, he argues that concentration of AI power in a few closed models is at least as dangerous as broad availability:
A single institution with vastly more powerful AI than everyone else poses existential security risks (e.g., hacking any system, dominating economically or militarily).
Open source allows the ecosystem to harden collectively, analogous to how open-source software improved security across banks, hospitals, and governments.
A world where strong AI is widely deployed and progressively hardened is healthier than one where it is concentrated.
He worries more about an untrustworthy actor (adversarial government or company) having a monopoly on super-strong AI than about the technology being widespread.
On bioweapons specifically: sufficiently advanced AI could help bad actors, and mitigations like withholding certain knowledge from models have limits; the best counter is having equally strong defensive AI.
Meta’s current safety focus is on concrete, present-day harms (violence, fraud, misinformation) rather than speculative existential risks, because those are the harms models are actually being used for today.
Deception in models is a concern, but currently indistinguishable from hallucination; Meta monitors for it but sees no evidence of intentional deception yet.
Nation-state election interference is an area where adversaries are genuinely sophisticated and improving each year—an arms race Meta believes it is currently winning by building AI systems that grow faster than adversarial ones.
Meta’s Open Source History and Economic Logic
Meta has a long history of open-sourcing infrastructure: PyTorch, React, and the Open Compute Project (server, switch, and data center designs).
Open Compute standardized the industry’s hardware designs, increased volumes, reduced supply chain costs, and saved Meta billions.
The economic case for open-sourcing Llama:
If the community finds ways to run models 10% more efficiently, Meta saves billions on its tens-of-billions AI investment.
Open source prevents a handful of closed-model companies from becoming gatekeepers that dictate what developers can build—Zuckerberg explicitly compares this to Apple and Google’s control over mobile platforms, which he finds unacceptable.
Meta does not open-source its end-user products (e.g., Instagram’s code), but open-sources foundational infrastructure.
Meta’s Llama license is permissive but includes a clause requiring the largest companies (e.g., AWS, Microsoft Azure) to negotiate a revenue-sharing deal if they resell the model as a hosted service.
Llama-2 is already available on all major clouds under such arrangements.
Zuckerberg does not believe training will be fully commoditized, but he also doesn’t think the base model will be the primary product—differentiation will come from application-specific work built on top.
The Metaverse and Historical Perspective
Zuckerberg’s interest in the metaverse is rooted in his lifelong focus on how people communicate and express themselves (he studied computer science and psychology).
The core value of the metaverse is enabling people to feel present with each other regardless of physical location—transforming socializing, work, medicine, and industry.
He is less interested in using the metaverse to revisit historical periods (since we lack records for accurate reconstruction) and more interested in the social presence it enables.
On classical history: he was struck by Caesar Augustus’s novel conception of peace—not as a temporary respite between wars, but as a positive-sum economic transformation.
This illustrates how the bounds of what people can conceive as rational or possible are often narrower than reality allows.
He sees a parallel in how investors and others cannot understand why Meta would open-source something so valuable—they assume it must be temporary before going proprietary, when in fact open source can be a sustainable, value-creating model.
He also notes that Augustus was already one of the most important people in Roman politics by age 19, and reflects on how youth enables bold ideas before institutional commitments create inertia (the innovator’s dilemma at a personal level).
Zuckerberg’s Personal Drive
He describes himself as constitutionally incapable of not building new things—even when the market punishes Meta for investing heavily (as with the metaverse or AI capex).
This drive extends beyond tech: he designed buildings for his family’s ranch in Kauai and is working to build a world-class cattle operation there.
He frames major bets as expressions of conviction and values rather than purely analytical calculations, though he does build business cases showing the investments can pay off.
Custom Silicon and Final Notes
Meta’s custom silicon program is methodical: first deployed for inference on ranking/recommendation workloads, freeing NVIDIA GPUs for training; eventually will be used for training large models.
On Google+: Zuckerberg notes it lacked a dedicated CEO and was just a division within Google, illustrating that for large companies, the scarcest resource is not capital but focus—the CEO and management team’s capacity to oversee and direct priorities.