How Kubernetes is Built with Kat Cosgrove

The Pragmatic Engineer 1h8 7 min #37
How Kubernetes is Built with Kat Cosgrove
Watch on YouTube

Summary

  • Kubernetes is the second-largest open source project in the world (after Linux) and has become the de facto standard for managing containerized applications at scale. This episode features Kat Cosgrove, a maintainer and sub-project lead on the Kubernetes release team, who explains what Kubernetes is, why it won, how it’s built and governed, and how people can get involved as users or contributors.

What Kubernetes is and why it exists

  • Kubernetes automates the management and scaling of applications that run as swarms of containers.
    • It automatically scales resources (networking, storage, compute) based on demand, up to limits you define, without manual intervention.
    • This is especially valuable for keeping costs down while maintaining high availability.
  • It exists because the rise of microservices architecture made manual cluster management extremely difficult and error-prone.
    • Before Kubernetes, teams managed container clusters by hand — it was possible but painful.
    • Kubernetes is essentially an abstraction layer over work that used to be done manually, similar to how most computing advances since the 1950s have automated previously manual processes.
  • Containers are not virtual machines.
    • Containers virtualize the operating system and applications; VMs virtualize hardware and are much larger and heavier.
    • Containers are lightweight, easily shareable (via Dockerfiles and registries), and more configurable, which is why microservices became practical.
    • Kubernetes can run inside a VM, but containers themselves are a lighter form of virtualization.

Origins: from Google’s Borg to open source

  • Kubernetes originated from Borg, an internal Google tool for managing clusters of microservices (the name is a Star Trek reference).
    • Borg is still used internally at Google today.
    • Google engineers recognized the broader need and decided to open-source the concept.
  • Kubernetes was donated to the Cloud Native Computing Foundation (CNCF), a directed fund of the Linux Foundation, almost 11 years ago (July 2014), with the first commit by Joe Beda.
    • The name “Kubernetes” means “helmsman” in Greek; its logo has seven spokes (a nod to Seven of Nine from Star Trek, referencing Borg).
  • Before Kubernetes, alternatives like Docker Swarm and Mesosphere existed, but the market clearly needed something better.
  • Google’s motivation for open-sourcing it wasn’t purely altruistic — it gave them enormous influence over the cloud-native ecosystem.
    • They still have employees on the CNCF governing board, the Linux Foundation governing board, and the Kubernetes steering committee.
    • However, governance rules cap any single company at two steering committee representatives, preventing any one vendor from taking control.

Why Kubernetes won

  • Initial hype came from the Google brand name and its association with Docker, which was already popular.
  • Sustained popularity is largely attributed to Kubernetes’ exceptional documentation.
    • Every user-facing change must be documented before it can be included in a release — this is enforced strictly.
    • The project uses Kubernetes Enhancement Proposals (KEPs), inspired by Python’s PEPs, which require documentation as a completion criterion.
    • Kat, who has a background in docs leadership, is blunt: developers hate writing docs, but the project prioritizes usability and sustainability over developer convenience.
  • Managed services like Google Kubernetes Engine (GKE) lowered the barrier to entry with high-quality tutorials and sandboxes.
  • Open-source advantages compound the documentation benefit: a small army of contributors continuously improves docs, and the transparency means no knowledge is locked as “tribal knowledge.”

The scale of the project

  • Kubernetes has a tiered contributor structure:
    • Users consume the tool.
    • Contributors write code, docs, or do project management work but have limited permissions — over 1,000 contributors per month.
    • Maintainers hold titles (SIG chairs, technical leads, sub-project leads) and have governance authority — roughly 150–200 people.
      • There are a couple dozen Special Interest Groups (SIGs), each with 2–3 chairs and 2–4 technical leads.
  • All SIG leaders function as project managers, even if the title isn’t used explicitly.
  • The contributor ladder is open: anyone can show up, open an issue or PR, and work their way up to maintainer status over time.

How the release team works

  • Kubernetes releases on a 12–16 week cycle (typically ~14 weeks).
  • The release team has two halves: the release team (managed by Kat) and release engineering (which cuts the actual releases).
  • The release team consists of several sub-teams and a release lead:
    • Communications: Manages feature blogs, coordinates with CNCF on release webinars, handles press embargoes and media interviews for the release lead.
    • Release Docs: Ensures every KEP with user-facing changes has documentation by the deadline — if docs aren’t ready, the PR gets reverted.
    • Enhancements: Tracks all KEPs through their requirements (code complete, tested, production readiness review, SIG lead opt-in).
    • Release Signal: Monitors CI signal boards, chases down bugs, and gives the go/no-go decision on whether a release can be cut.
  • Each cycle requires 20–30 people: 5 leads are selected, and the rest are “shadows” chosen through an open application process.
    • This rotating structure is unusual for open source and exists because the project is too large for a small fixed team without catastrophic burnout.
    • Even with rotation, the release team is described as a “burnout factory” due to the intensity of people management and conflicting priorities.

Anti-burnout policies

  • Kubernetes has unusually strong anti-burnout policies for an open source project:
    • After leading a release, you must take a cycle off before participating again.
    • The SIG release charter mandates that the release will be delayed before the team is asked to work nights or weekends.
    • Stepping back is fully supported and doesn’t affect your standing in the project.
    • Maintainers are expected to actively mentor their own replacements — part of the job is finding someone to take over.
  • These policies exist because of the XKCD comic problem: Kubernetes is critical infrastructure, and the project cannot afford to collapse because a single maintainer burns out.

How to propose a new feature (the KEP process)

  • To propose a new feature, you open a Kubernetes Enhancement Proposal (KEP) — an issue in the Kubernetes Enhancements repo using a template.
  • Discussion happens publicly on the GitHub issue or in the relevant SIG’s Slack channel (e.g., SIG Storage for storage features).
    • The SIG evaluates whether the feature is useful, viable, and whether contributors are willing to work on it.
  • If approved, the SIG lead opts in by adding a label, and the KEP is tracked for a specific release.
  • The KEP must hit a series of deadlines across the 14-week cycle: code freeze, docs freeze, test freeze.
  • Features progress through stages:
    • Alpha: Off by default, enabled via feature flag. Can be iterative (alpha 1, alpha 2, etc.).
    • Beta: On by default, can be disabled via feature flag. Requires minimal architectural changes going forward.
    • GA (General Availability): On by default, cannot be disabled. Feature flag is removed.
  • There are dozens of feature gates at any given time, all exhaustively documented with their stage, default state, and introduction version.

Funding and sustainability

  • Most contributors and maintainers are not paid by the CNCF or Linux Foundation.
    • Many do it as a hobby, for career advancement, or for the community.
    • Cloud vendors (AWS, Google, Microsoft, Red Hat) pay some contributors to ensure their use cases are met or to gain influence.
    • The CNCF covers infrastructure costs (testing, CI) but not salaries.
  • Students are a significant portion of applicants to the release team — hundreds apply each cycle, and a handful are accepted.
    • Having “Kubernetes release team” on a resume is a genuine career boost, especially early career.
  • Contributing also provides access to a large professional network of experienced engineers across the industry.

Kat’s views on GenAI tools

  • Kat’s personal opinion is that the majority of GenAI tools are scams designed to extract money from VCs and customers.
  • Within SIG Release, GenAI is not useful because the work is primarily people management.
  • SIG Docs has an explicit policy against using GenAI tools for documentation.
    • They caught someone submitting a blog written with ChatGPT and pulled it.
    • They received a drive-by PR from a paid product offering a free license in exchange for logo placement — the tool didn’t actually work (it edited generated files humans shouldn’t touch and failed at style guide compliance).
  • Where Kat would like to see AI help: automating toil like applying correct GitHub labels to issues and PRs.
  • She’s fine with using LLMs to explain complex topics or understand code, but not to generate documentation or blog posts.

When to use (and not use) Kubernetes

  • Use Kubernetes when you anticipate needing to scale rapidly — migrating to it later is painful.
  • The real question isn’t whether Kubernetes is overkill, but whether you can afford the people to manage it.
    • A mismanaged Kubernetes cluster is dangerous; it is not secure by default.
    • Don’t roll your own cluster — use a managed service (GKE, AKS, EKS) and hire an experienced SRE.
  • Kubernetes is not needed for simple blogs or basic web applications.
  • It makes more sense when you want cost control through auto-scaling rather than paying a cloud provider’s premium for on-demand scaling.

Getting started

  • As a user: Kubernetes’ own documentation and quick-start guides (including Minikube), plus GKE’s tutorials with an interactive sandbox.
  • As a contributor: Reach out to Kat on Kubernetes Slack (@cat cosgrove). Good entry points:
    • Documentation updates in an area of interest (storage, networking, etc.).
    • The Kubernetes release team (competitive — hundreds of applications for ~25 spots — but open to anyone, even first-time contributors).
  • The project actively encourages people to borrow its organizational practices — documentation requirements, structured release processes, anti-burnout policies — and adapt them to projects of any size.

Rapid fire

  • Favorite language: Python — versatile enough to prototype anything, even though Kubernetes itself is written in Go.
  • Book recommendation: A Fire Upon the Deep by Vernor Vinge — strange, ambitious hard science fiction and one of Kat’s favorites.
Back to The Pragmatic Engineer