Aidan Gomez: Scaling Limits Emerging, AI Use-cases with PMF & Life After Transformers — Unsupervised Learning

Aidan Gomez, co-author of the original Transformer paper and CEO of Cohere, discusses enterprise AI adoption, model architecture evolution, and the future of foundation models. He argues that the best AI application companies will also build their own models, that reasoning has been a step-change unlock for enterprise automation, and that the scaling-is-all-you-have hypothesis is breaking, pushing the field toward more creative algorithmic approaches. He also makes the case that transformers won’t be the final dominant architecture and that the next major breakthroughs will come in specialized domains like biology and material science.

Enterprise AI Adoption

Enterprises face a unique integration challenge with AI agents because agents need access to the same systems humans use—email, chat, CRM, ERP, HR software—which raises privacy and customization issues no other enterprise software category has faced at this scale.
- Each company runs a different software stack, so every deployment requires custom integration work to bring all that context together into the model.
- Cohere’s agent platform, North, is designed to be highly customizable to each company’s specific tapestry of tools and data sources.
Aidan believes the winning model for enterprise AI will fall between pure consulting and out-of-the-box products—some degree of support and customization will remain necessary because the stakes of mistakes (e.g., salary data, patient data) are too high for fully self-serve setup.
- He expects partial automation of setup but not full removal of humans from the process.

Enterprise Use Cases with Product-Market Fit

Healthcare (vertical-specific): Passive listening during doctor-patient interactions to auto-populate notes and forms, reducing the time physicians spend on documentation.
Customer support: The technology is ready and demand is universal across verticals—telco, healthcare, financial services—making it one of the fastest-moving enterprise use cases.
Research augmentation: Agents that can do a month’s worth of research in an hour or two, returning robust reports with citations to source documents so humans can audit them. This is especially valuable in time-sensitive domains like wealth management, where a client might ask a manager to hedge against a geopolitical event days away.
- Aidan sees deep research as ready for prime time and predicts it will be integrated into every enterprise on the planet.

Reasoning as a Step-Change Unlock

Reasoning models are essential because the input space for language models ranges from trivial (“what’s 1+1”) to extraordinarily complex (“prove Fermat’s Last Theorem”), and models need to spend different amounts of energy on different problems.
Before reasoning, many enterprise tasks were simply impossible for models—they almost always failed. With reasoning, they almost never fail.
- The key mechanism is reflection: the model can understand why its first attempt failed and find an alternative path to the same outcome.
Aidan describes moments where reasoning traces show the model having its own “epiphany”—checking non-obvious places after failing at the obvious ones—which he finds beautiful and jaw-dropping.

Custom Models vs. General Models

General models trained on web data are extraordinary, and synthetic data can close many gaps, but custom models remain important for domains with data not on the web—manufacturing data, customer transactions, detailed personal health records.
- Cohere partners with organizations that have proprietary data to build custom models that only they can access.
- He doesn’t see every team in an organization having its own fine-tuned model; a handful at most, focused on data types the base model hasn’t seen.

Data Labeling and Synthetic Data

Human evaluation remains the gold standard—you can’t fully remove humans from the loop because models are built for people, and people are best positioned to judge usefulness.
Human data generation is too expensive at scale (e.g., you can’t hire 100,000 doctors to teach a model medicine), so Cohere uses a hybrid approach:
- A small pool of human experts (e.g., 100 doctors) provides high-quality lessons.
- That trusted data is used to generate a thousandfold of synthetic lookalike data.
- In verifiable domains like code and math, results can be checked to filter synthetic data; in other domains it’s harder but still viable.
The overwhelming majority of data Cohere generates for new models is now synthetic.

Cohere’s Strategic Position

Cohere is vertically integrated—building both models and applications—which gives them more levers to pull. The next version of their generative model, Command, is optimized for the specific use cases their North platform needs (ERP, CRM, etc.).
- Aidan argues that application-only companies using someone else’s model weights face a fundamental barrier to product quality because they can’t change the underlying technology for their customer’s needs.
Cohere is not locked into one hyperscaler ecosystem, can deploy anywhere including VPC, and releases model weights non-commercially, making them what Aidan calls the best enterprise partner.
He believes the best AI application companies will have deep model-building knowledge, even if they’re not training from scratch—they find ways to approximate that level of control at the model layer.

The Scaling Hypothesis Is Breaking

The “scale is all you need” hypothesis is in heavy diminishing returns of capital and compute. The field must become smarter and more creative to unlock the next step up.
- The old strategy of just throwing more money at compute was “boring and dumb”; the new era of test-time compute and algorithmic breakthroughs is more exciting.
Aidan was a loyalist to the scaling hypothesis because evidence kept supporting it (bigger models could suddenly do math, etc.), but the field has now been “beaten over the head” that scaling alone won’t get us there.
- He’s skeptical that test-time compute is just “another scaling vector” in the same sense, since much of what’s called scaling now is actually about data diversity and domain-specific demonstrations.

Future Architecture Beyond Transformers

Aidan has publicly questioned why transformers are still the dominant architecture and named a meeting room in Cohere’s New York office “SSM” (State Space Models) expecting transformers to be replaced.
- So far, the good bits of alternative architectures like SSMs have been absorbed into transformers, reducing the need to switch.
- He’s intrigued by discrete diffusion models as a cool UX but isn’t convinced they’re fundamentally better language models than transformers.
- He’d have estimated near-zero chance in 2018 that transformers would still be dominant seven years later—their longevity has genuinely surprised him, and he hopes a new architecture emerges in the next 5–10 years.

Hardware and Compute Trends

Test-time compute makes inference 3–10x more expensive and training still requires enormous compute, but compute is getting cheaper and more abundant per flop.
Multiple options now exist for training compute (not just one type of chip), allowing effective supercomputers to be built from heterogeneous hardware—a positive trend for the industry.

Specialized Foundation Models

Beyond general language models, new generations of foundation models will emerge for biology, chemistry, material science, and other domains.
- Aidan sits on the advisory board for KA, a cancer data and compute sharing alliance, and sees enormous potential if massive capital were applied to siloed but existing cancer data.
- The challenge in domains like cancer research is less about token scarcity and more about data being siloed across hundreds of institutions that refuse to share—a human/political problem, not a data generation problem.
- In robotics and bio-foundation models, companies are spinning up labs to generate more data, but validation can take 5–10 years.

Market Structure and Consolidation

Foundation model companies are finding distinct lanes: OpenAI pushing consumer, Gemini competing there, Anthropic best at code, Cohere focused on enterprise back-office. He expects a couple handfuls of foundation model companies to cover most needs.
For enterprise applications, Aidan predicts a “scattershot phase” where individual teams buy their own narrow applications, followed by consolidation as companies realize the maintenance burden of integrating disparate apps across all their data sources.
- The long game is building one platform plugged into everything—which is what Cohere is doing with North.

International and Language Strategy

Cohere takes an international approach, partnering with companies like Fujitsu in Japan and LG in Korea to deeply invest in Japanese and Korean language capabilities.
- Through the Yoda project, thousands of native speakers of over 100 languages contributed data that was open-sourced to improve all language models, not just Cohere’s.
- Aidan believes the technology won’t be useful for huge swaths of the global population if it doesn’t speak their language and understand their culture.

AI Research Culture

At Google Brain, the formula was complete research freedom, abundant compute, great infrastructure, and brilliant colleagues—which produced breakthroughs but wasn’t optimized for product delivery.
At Cohere, the focus is narrower and more targeted (driving automation, making models good at using enterprise software), but they preserve incredible people, abundant compute, world-changing ambition, and a warm culture.

Learning from Experience

A key missing capability in current models is learning from experience. Today’s models are frozen after training—they don’t remember past interactions or learn from feedback.
- Aidan envisions a future where a model acts like an intern: it messes up the first time, the user teaches it, and it never makes those mistakes again—becoming “me 2.0.”
- This would likely be implemented through a database of interaction history that’s always available as context to the model, though the capability doesn’t fully exist yet.
- He’s excited by the deeper connection this creates: users become invested because the model grows with them.

Societal Impact and Risks

Aidan is not worried about existential “Terminator” scenarios and thinks policymakers should focus on near- and mid-term risks instead.
His real concerns:
- Bad actors, especially at the state level, gaining access to powerful capabilities—he wants liberal democracies to establish an advantage first.
- Whether we have the infrastructure to retrain and move people into new careers if particular jobs are impacted.
He rejects the mass unemployment narrative: humanity is supply-constrained, not demand-constrained, and AI will augment people to deliver more of what the world needs.
On a personal level, his father is a cancer survivor, and he’s hopeful about AI-driven improvements in treatment and cost reduction. He’s excited about the future but realistic—it’ll be a much better world, not a utopia.

Summary

Enterprise AI Adoption

Enterprise Use Cases with Product-Market Fit

Reasoning as a Step-Change Unlock

Custom Models vs. General Models

Data Labeling and Synthetic Data

Cohere’s Strategic Position

The Scaling Hypothesis Is Breaking

Future Architecture Beyond Transformers

Hardware and Compute Trends

Specialized Foundation Models

Market Structure and Consolidation

International and Language Strategy

AI Research Culture

Learning from Experience

Societal Impact and Risks