The AI Startup That Sky Rocketed to $750M

Unsupervised Learning 1h1 5 min #11
The AI Startup That Sky Rocketed to $750M
Watch on YouTube

Summary

  • Pinecone is a vector database company that has become a core tool for building AI applications, raising over $130 million and reaching a $750 million valuation. CEO and founder Edo Liberty joined the podcast to discuss the vector database landscape, the explosion of AI application development, and where the infrastructure and application layers are heading.

The ChatGPT moment and explosive growth

  • Before ChatGPT, vector databases were a “well-known secret” used internally by big tech companies (Google, Amazon) for search, recommendations, and ranking, but the broader market didn’t understand the category. Investors confused Pinecone with an MLOps platform.
  • When ChatGPT launched, the technology itself didn’t change much for practitioners, but it brought massive capital, energy, and non-AI-engineer users into the space.
  • Usage spiked so dramatically that Pinecone exhausted machine capacity on GCP and AWS, spending millions a month on its free tier. At peak, they had 10,000 signups per day.
  • This forced a complete redesign of their architecture, resulting in a “serverless” solution that is roughly two orders of magnitude more efficient.
  • A key inflection point was the open-source project AutoGPT, a minimal precursor to modern agents, which brought in a wave of non-traditional users (e.g., “the dentist who remembers Python from college”).

Where Pinecone shines

  • Small scale (millions of vectors): Customers can operate for under $100/month. Pinecone doesn’t differentiate much here.
  • Large scale (hundreds of millions to billions of vectors): This is Pinecone’s sweet spot. SaaS companies like Notion and Gong serve thousands of customers, each with their own data, creating massive multi-tenant workloads. Pinecone’s serverless architecture can bring the cost per paying customer down to roughly $0.50–$1.00 per year.
  • Dominant use cases: Q&A, semantic search, chatbots, support bots, legal discovery, medical history analysis, and RAG (retrieval-augmented generation). Image, video, and anomaly detection applications are growing but text and search remain the “meat and potatoes.”
  • Multimodal outlook: Edo is skeptical that multimodal AI will reach mainstream developers in the next year or two, citing a persistent gap between what research labs can do and what average companies can actually deploy.

Hallucinations and trust

  • Hallucinations remain one of the biggest barriers to deploying AI in production. LLMs are designed to generate language, so they will confidently produce nonsense when they lack knowledge.
  • Measuring hallucinations is itself hard: a model that always says “I don’t know” never hallucinates but is useless. The real challenge is measuring usefulness, correctness, and faithfulness to data.
  • Progress is being made on the RAG and knowledge-layer side: making data available to models in secure, governed ways (e.g., GDPR-compliant deletion). Edo notes that loading a large chunk of the internet into Pinecone and running a relatively simple RAG setup can already outperform GPT-4 on specific tasks.

The vector database landscape

  • There’s a land grab to store vectors, with startups and incumbents (including a company “that rhymes with MongoDB”) rushing to add vector support.
  • Edo argues that simply adding a float array data type to a traditional database doesn’t make it a vector database. The numeric array becomes the primary lookup key, the way data is organized on blob storage, and how segments are searched. Bolting this onto a non-AI database yields poor performance and serious issues.
  • He draws an analogy to the human brain: this is an incredibly unique and optimized stack.

The RAG stack and cost economics

  • Edo’s recommended stack: Smaller, cheaper, open-source models (not defaulting to OpenAI); AnyScale for bulk data transformations and movement; partners like Cohere, AI21, and Hugging Face for embedding, ranking, and summarization. He sees no clear leader yet in evaluation tooling.
  • Retrieval vs. generation intelligence: The market will find a stable tradeoff between cost/compute and output quality. Running a 100-billion-parameter model for every API call is economically unsustainable. Smaller models with good retrieval can match or approach larger model performance at a fraction of the cost.
  • Context windows vs. RAG: Even as context windows grow, stuffing documents into the prompt is slow, expensive, and often counterproductive. A vector database can retrieve the equivalent of 3,000–10,000 tokens that perform as well as 100,000 tokens in the prompt, at roughly 10% of the cost. Edo calls it “the least surprising thing in the world” that model companies allow larger contexts—it’s their pricing model.

Helping customers vs. empowering them

  • For Pinecone’s first three years, Edo was “religiously” against professional services. The company either fully automated something or didn’t do it at all, even turning down requests from large customers.
  • This paid off: thousands of customers became successful without ever talking to Pinecone, and the sales team couldn’t even get meetings because customers were too busy building.
  • Now that the company has more capacity, they do consult and help, especially with cost estimations, which Edo calls the most common failure mode. Companies frequently overestimate costs by orders of magnitude (e.g., projecting $50,000/month when the actual cost is ~$500/month), causing them to abandon projects that should be built.

The serverless transition

  • Moving to serverless was painful: revenue growth flattened even as workloads grew faster, because the product became so much cheaper. Some customers went from paying $100K/month to $2,000/month.
  • Investors were unhappy, but Edo argues the transition is easier done earlier than later. The goal is to fit into the cost structure of the tens or hundreds of thousands of workloads that will eventually use vector databases.

Where new startups should focus

  • Infrastructure: Winner-take-all dynamics limit opportunities for new entrants.
  • Applications and solutions: This is where Edo sees the most energy and opportunity. Every digital-native company has ~20 AI startups trying to displace incumbents, while incumbents try to reassert themselves as AI-native. Enterprises are simultaneously buying these tools and learning to use them natively. This “conveyor belt of innovation” from tiny startups to enterprises is “teeming with innovation and great ideas.”
  • If he weren’t building Pinecone, Edo would build something around human communication data (email, Slack, meeting transcripts, Jira tickets)—messy, rich, knowledge-dense content.

Looking ahead

  • Hardware: The current GPU-centric model is not sustainable. Edo expects a shift toward CPUs, GPUs, and specialized servers optimized for training or inference.
  • Data pipelines: Tools from 5–10 years ago can’t handle current data volumes, operational complexity, or cost requirements. Something has to change.
  • Governance and control: Companies need moderating systems with visibility and control over their AI stacks, which today run “open loop” for most organizations.
  • Agents: Edo thinks agents already work at roughly human-assistant levels of reliability. The mistakes are sometimes more embarrassing than human errors, but the probability of task completion is approaching human levels.

Over-hyped / under-hyped

  • Overhyped: Foundation models. Edo believes we know what they can and cannot do, and there hasn’t been significant qualitative progress for quite some time.
  • Underhyped: Coding assistance. He finds it “exceedingly useful” and one of the most exciting use cases of the technology.

Biggest surprise in building Pinecone

  • A complete rewrite of the entire database in Rust, which Edo expected would take six months and set the company far behind. His CTO promised it would take a month; it took two to three months and produced a dramatically better result—a rare case of a rewrite that actually delivered as promised.

Reflections on Amazon and AWS

  • At AWS scale, a product needs to generate hundreds of millions of dollars per year to matter. Startups operate with different risk appetites and innovation horizons. Pinecone is already working on technology that will hit the market in 1.5–2 years, just as vector databases were misunderstood five years ago.
Back to Unsupervised Learning