Snowflake sits in a uniquely powerful position in the AI ecosystem: it already hosts vast amounts of structured, semi-structured, and unstructured data for thousands of enterprise customers, and its Cortex platform now lets those customers run large language models directly on that data without moving it. Baris Gekken, Snowflake’s head of AI, explains how the company is building AI products—including its own LLM (Arctic), a managed inference service (Cortex), a search engine for RAG (Cortex Search), and a text-to-SQL product (Cortex Analyst)—and shares what he’s learning from customers about what actually works in production, from BI and data extraction to governance, evaluation, and the emerging role of agents.
Arctic LLM: Why Snowflake Built Its Own Model
Snowflake built Arctic in roughly three to four months starting around December 2023, with a relatively small team that included researchers from DeepSpeed and vLLM.
The motivation was enterprise-specific needs: customers wanted AI-powered BI experiences, and existing models struggled with SQL complexity.
Arctic is optimized for SQL generation and instruction-following, not general-purpose tasks like poetry, though it handles those too.
The team developed an efficient architecture that trained at roughly one-eighth the cost of comparable models, with strong inference efficiency.
Arctic Embed, Snowflake’s embedding model, is a quarter the size of OpenAI’s while scoring higher on benchmarks—reflecting a broader focus on smaller, more efficient models driven by both cost and latency concerns.
Cortex: Snowflake’s Managed AI Inference Layer
Cortex is Snowflake’s managed service for running LLMs—including Arctic, Mistral, Meta’s Llama, and others—directly inside Snowflake’s data cloud.
Three major use cases dominate:
BI and text-to-SQL: Natural language questions over structured data.
Chatbots over unstructured data: RAG-style applications over documents and PDFs.
Batch text analytics: Processing large volumes of semi-structured text like sales call logs, support tickets, and employee surveys—extracting themes, categories, and example quotes at scale.
Snowflake emphasizes making these capabilities simple through task-specific functions that minimize prompt engineering and data pipeline work for customers.
Cortex Analyst: The Hard Reality of Text-to-SQL
Text-to-SQL is deceptively hard in production. Benchmarks like Spider are simplistic compared to real-world environments where customers may have tens of thousands of tables and hundreds of thousands of columns with ambiguous names.
Cortex Analyst uses three to four LLMs working together, with systems for:
Knowing when to ask for clarifications.
Refusing to answer questions it can’t handle.
Self-healing: generating SQL, validating it, and checking correctness.
A feedback loop where customers create “verified queries” that the system returns when possible, increasing confidence.
Quality is in the 90–95% range, but that’s still not enough for high-stakes use cases like reporting revenue to a CFO—so human-in-the-loop systems are being added.
Common failure points include hallucinated column names, incorrect joins, and queries that won’t execute.
Snowflake has deliberately chosen to prioritize precision over recall: the system would rather decline to answer than give a wrong result, especially for business users who can’t self-correct like analysts can.
Cortex Search and RAG Applications
Cortex Search is a hybrid search system combining vector search with lexical/keyword search to reduce hallucinations.
It supports three types of applications:
External, user-facing RAG applications (where hallucination and abstention matter most).
Internal productivity tools (lower risk, most common today).
Enterprise search—a resurgent use case where companies want to upgrade their internal search stacks.
Access controls are built deeply into the search layer: users can only retrieve documents they already have permission to access, leveraging Snowflake’s existing granular governance infrastructure.
Governance and Security: Snowflake’s Core Advantage
Because Snowflake already hosts customer data with granular, mature access controls, it can run AI directly next to the data—eliminating the need to move data to external AI services.
All models run inside Snowflake by default; external models are configurable but not the default.
Governance examples: an HR chatbot should return different answers depending on who asks (e.g., a manager vs. an individual contributor), with zero room for hallucination or data leakage.
Snowflake’s governance is built from the ground up, with granular access controls on databases, tables, and columns—customers who spent years setting this up can now layer AI on top without re-architecting.
Cortex Guard, built on Meta’s Llama Guard, provides additional safety guardrails that resonate with enterprises concerned about brand alignment and policy compliance.
Evaluation and Observability: The TruEra Acquisition
Evaluation is a major gap in the industry: teams build systems but lack frameworks to measure quality, compare models, or run evaluations at scale.
TruEra enables evaluation at scale using LLM-as-judge techniques, helping customers move from proof-of-concept to production with confidence.
When to Fine-Tune, Use Off-the-Shelf, or Train from Scratch
Snowflake recommends starting with large off-the-shelf models plus RAG for proofs of concept.
Once in production, teams can optimize by fine-tuning smaller models for latency and cost advantages.
Custom pre-trained models make sense for a small number of customers, typically in regulated industries (e.g., healthcare) where control over training data and domain-specific language is critical.
Fine-tuning is supported within Snowflake; full pre-training is offered for a small number of interested customers.
Model selection today is often driven by brand recognition rather than capability—customers gravitate toward well-known models even when less famous ones are equally or more capable.
Supporting New Models and the vLLM Team
Snowflake integrated the founders of vLLM (from UC Berkeley) into the company, giving it deep inference optimization expertise.
New models like Meta’s Llama 3.1 405B are incorporated relatively quickly, though large or architecturally novel models require stack upgrades.
The vLLM team has added optimizations for multi-node inference and fine-tuning, which are upstreamed to the open-source project.
The 405B model is capable but large and slow; it’s being used for fine-tuning, distillation, and synthetic data generation—making it valuable as a teacher model even when not deployed directly for all use cases.
Cost and Enterprise Adoption Patterns
Cost has not been a major blocker for production deployment yet, because most current use cases are internal and volumes aren’t massive.
Costs are coming down rapidly through model distillation and efficiency improvements.
Enterprises are still largely in use case discovery mode rather than scaling proven use cases where cost is prohibitive.
The Future of Arctic and Snowflake’s AI Strategy
Snowflake does not aim to build a general-purpose frontier model to compete with GPT-5 class systems.
Arctic will focus on Snowflake customer needs: SQL generation and RAG quality.
The broader strategy is to be the platform where customers can talk to both structured and unstructured data using natural language, with agents increasingly plugging into the system.
Internal Use of AI at Snowflake
Snowflake uses LLMs internally for:
Sales conversation analysis (summarizing wins and losses).
Employee assistants for querying internal documents.
Documentation chatbots.
Product development, from SQL engine optimization to AI capabilities in the Marketplace.
Snowflake vs. Databricks: Different Angles on AI
Snowflake differentiates on ease of integration: a single product where everything works together, making it delightful to use even if harder to develop.
Cortex launched with SQL-integrated AI, reflecting Snowflake’s data-warehouse-first audience.
Both companies invest in governance, but Snowflake emphasizes its long-standing, granular access controls.
Cortex Analyst and Cortex Search are positioned as high-quality, easy-to-end solutions for text-to-SQL and RAG respectively.
Opportunities for Startups in AI Infrastructure
Despite Snowflake’s comprehensive platform, Baris sees ample room for startups across the AI infrastructure stack.
The market is massive and growing; innovation is happening at every layer.
Startups should focus on nailing a few deep, specific use cases rather than trying to be a platform from day one—examples include Devin (AI engineer) and startups building AI salespeople or SDRs.
What’s Overhyped and Underhyped
Underhyped: Evaluation. The industry lacks mature measurement frameworks, and this remains a critical gap.
Overhyped (in the short term): Agents. The hype cycle has already peaked and dipped; the technology is promising but still early, and real production agentic systems are just beginning to emerge.
The Biggest Surprise in Building Snowflake AI
The text-to-SQL problem was known to be hard, but the process of defining exactly which problem to solve—and what “good” looks like—was itself the core challenge, echoing classic machine learning principles about problem definition.
Open Source vs. Closed Models
Open source models from Meta, Mistral, and others have been hugely influential in proving the ecosystem isn’t limited to two players.
Flexibility to take open source models and build on them has been welcomed by the ecosystem.
What Baris Would Build and Who He’s Watching
Given his background in assistants (he started Google Now and worked on Google Assistant), he would build in the assistant/agent space.
He’s excited about Mistral—a small, fast-moving team building capable models and creating awareness.
He believes the next two years will see the application space mature, with customers shifting from wanting building blocks to wanting end-to-end solutions in areas like customer service, sales, and beyond.
Where to Learn More
The Snowflake AI website and videos from Snowflake’s recent Summit event on YouTube are the best resources for learning more about Snowflake’s AI efforts.