Mercor CEO: Evals Will Replace Knowledge Work, AI x Hiring Today & the Future of Data Labeling — Unsupervised Learning

Brendan Foody, co-founder and CEO of Mercor, discusses how AI is transforming talent evaluation, data labeling, and the future of knowledge work. Mercor builds infrastructure for AI-native labor markets, already used by top AI labs to label data, screen candidates, and evaluate both human and AI performance. The company recently raised $100 million and sits at the intersection of recruiting, evaluation, and foundation model improvement.

AI in Talent Evaluation

AI is approaching superhuman performance in evaluating candidates through text-based signals like resumes, interview transcripts, and written assessments.
- Earlier models (pre-2023) struggled with hallucinations and inconsistency, but reasoning models have dramatically improved context handling and judgment depth.
- Multimodal evaluation (e.g., detecting passion or authenticity in video) remains harder due to limited focus from labs and challenges with reinforcement learning (RL) on video data.
Humans still outperform models on nuanced “vibe checks”—assessing cultural fit, passion, or genuineness—but even these are being partially captured via eval development.
AI enables far deeper preparation for interviews (e.g., analyzing a candidate’s podcast, papers, or blog posts), making assessments harder to game and more personalized.

How AI and Humans Collaborate in Hiring

The hiring process splits into two functions: assessment (predicting performance) and selling (convincing candidates to join).
- AI will soon dominate assessment, making human-only evaluation obsolete for predictive accuracy.
- Humans will remain critical in the selling phase—building relationships, explaining roles, and fostering excitement.
AI allows recruiters to focus only on top-tier candidates, eliminating time wasted on unqualified applicants.
To prevent gaming, assessments must be dynamic—either changing problems frequently or probing deep into specific background details.

Explainability and Trust in AI Predictions

Explainability matters for two reasons:
1. Building customer trust through transparent reasoning chains.
2. Ensuring models select candidates for valid, job-relevant reasons.
Long-term, the economy may shift toward simple APIs that output a confidence interval for job performance, reducing human intermediation.

Challenges in Evaluating Complex Roles

Stack-ranking is easy when many people do the same job (e.g., 20 account executives), but extremely hard for unique roles (e.g., founders) with high variability and confounding factors.
A major bottleneck is ensuring all relevant context reaches the model—humans often omit informal signals (e.g., word-of-mouth feedback, interpersonal dynamics).
Future solutions may involve AI conducting exit interviews or analyzing recorded meetings to capture tacit knowledge.

The Evolving Data Labeling Market

The data labeling landscape has shifted dramatically since ChatGPT’s release:
- Early models needed massive volumes of simple SFT/RLHF data, enabling crowdsourcing.
- Now, as models improve, only highly skilled experts can create data that meaningfully improves them—especially in open-ended domains like medicine or strategy.
Mercor’s platform excels because it sources exceptional global talent quickly, aligning with this shift toward quality over volume.
Legacy crowdsourcing players face churn; new entrants focused on elite talent will capture market share.

Will Humans Always Be Needed in Data Labeling?

As long as humans can do things models cannot, there will be demand for human-generated evals to teach models those tasks.
Domains like math or coding may soon require little human data due to verifiability, but most knowledge work (e.g., evaluating founders, consultants) remains open-ended and hard to verify.
This implies an orders-of-magnitude growth in the human eval market over time.

Expanding Beyond Coding

Mercor started by connecting undervalued global coders with U.S. companies—a natural fit given coding’s clear evaluation criteria.
Expanding into fuzzier domains (e.g., doctors, consultants) requires domain experts to co-design assessments and evals.
- Example: Evaluating doctors involves identifying proxies like diagnostic reasoning patterns or learning agility.
Researchers increasingly rely on subject-matter experts to interpret model failures and build accurate error taxonomies.

Company Vision and Strategic Sequencing

Mercor’s long-term vision is a global unified labor market where every candidate applies once and every company hires from the same pool.
Their strategy follows a wedge approach:
1. Start with high-demand, short-cycle data labeling (proven product-market fit).
2. Expand into high-volume contractor hiring (competing with Accenture/Deloitte).
3. Move into full-time hiring across all roles.
Even in Year 1, they hired contractors for non-data tasks—showing continuity toward end-to-end labor automation.

Key Inflection Point: The xAI Meeting

In August 2023, a customer introduced Mercor to xAI’s co-founders at Tesla’s office.
- Within two days, Mercor met the entire xAI founding team (except Elon).
- This revealed explosive, unmet demand for elite global engineering talent—just as the market shifted toward needing high-quality human data.
Though xAI wasn’t ready for human data yet, this signaled the coming wave. Six months later, Mercor scaled rapidly with frontier labs.

Using Mercor’s Own Product

Mercor uses its AI interviewer for all non-executive roles.
- For executives, humans still conduct first interviews (for relationship-building, not vetting).
In one case, replacing human case studies with AI pre-screening increased on-site conversion rates due to better standardization and objectivity.
They also use marketplace experts to build internal evals, mirroring customer workflows.

Multimodal and Future Capabilities

Video and multimodal analysis are key frontiers.
- RL could help models search vast video token spaces for signals like excitement or cheating.
- Success depends on creating targeted data that trains models to attend to relevant cues.

The Future of Knowledge Work

Foody believes a huge portion of knowledge work will shift toward creating evals—defining what good looks like in domains models can’t yet handle.
- This won’t necessarily look like today’s annotation tools; it may involve dynamic conversations with AI about problem-solving approaches.
People should optimize for fast learning rates and deep familiarity with AI’s strengths/weaknesses in their domain.
Working extensively with AI builds crucial judgment about when to use it—and when not to.

Skills and Career Advice

Focus on elastic-demand domains where AI amplifies need (e.g., software engineering) rather than fixed-demand roles (e.g., accounting).
- Example: Making engineers 10x more productive likely increases total demand for software work.
Avoid over-optimizing for today’s “hot” skills; instead, develop adaptability and AI collaboration fluency.

Why End-to-End Over Copilot?

Mercor chose to automate hiring end-to-end rather than build copilots for recruiters.
- Copilots assume the job stays the same—but many roles will be restructured or eliminated.
- End-to-end systems learn from feedback loops and improve predictions autonomously.
This model was validated by the data labeling market’s structure, which allowed full automation early.

Fundraising and Growth Strategy

Mercor raised its seed round primarily to justify founders dropping out of college.
Series A and B were preemptive, keeping dilution low (~5%) to fund:
- Supply-side growth (referral incentives, consumer-style product features).
- Post-training data and eval development to improve prediction models.
Their ML team’s main bottleneck is creating more evals and RL environments—a challenge aligned with their core business.

Foundation Model Landscape

Foody sees OpenAI as a product company, not an API provider—most API capabilities will commoditize.
The market is large enough for multiple labs to dominate verticals (e.g., one building an AI hedge fund).
Customization at the application layer will drive value, especially for specialized use cases requiring sophisticated labeling.

Biggest Unknown: What Will Humans Do?

The central question shaping Mercor’s mission: What roles will humans play in 5–10 years?
- Many jobs will be automated, but new opportunities will emerge around defining, training, and overseeing AI systems.
Policymakers are overly focused on China competition and existential safety, neglecting near-term workforce disruption.
- Regulators must proactively manage public expectations and guide education/training for the next generation of jobs.

Quickfire Takes

Underhyped: Evals—they’re critical and still underestimated.
Overhyped: Legacy SFT/RLHF data—companies are overspending on outdated data types.
Changed mind: AI software engineers will arrive sooner than expected—likely late 2025 or early 2026.
- Impact depends on role elasticity: software engineering will evolve, not disappear.
Most excited about: OpenAI’s coding agents and stealth startups building custom vertical agents.
If starting today: Would pick a high-impact knowledge work vertical (e.g., finance) and build custom agents to automate it—focusing on societal value over pure profit.

Summary

AI in Talent Evaluation

How AI and Humans Collaborate in Hiring

Explainability and Trust in AI Predictions

Challenges in Evaluating Complex Roles

The Evolving Data Labeling Market

Will Humans Always Be Needed in Data Labeling?

Expanding Beyond Coding

Company Vision and Strategic Sequencing

Key Inflection Point: The xAI Meeting

Using Mercor’s Own Product

Multimodal and Future Capabilities

The Future of Knowledge Work

Skills and Career Advice

Why End-to-End Over Copilot?

Fundraising and Growth Strategy

Foundation Model Landscape

Biggest Unknown: What Will Humans Do?

Quickfire Takes