Building Pi, and what makes self-modifying software so fascinating — The Pragmatic Engineer

Pi is a minimalist, self-modifying coding agent built by Mario Zner in Austria, designed as a reaction against bloated AI coding tools. It has become the engine behind OpenClaw, a popular personal AI assistant, and is notable for letting users ask Pi to modify its own code through simple extension points — no forking required.

How Pi came to be

Mario was an enthusiastic Claude Code user until summer 2025, when the team’s rapid iteration broke his workflows: they injected hidden system reminders, modified system prompts without visibility, and changed tool definitions between releases.
He values simple, stable tools where the deterministic parts are reliable even if the LLM itself is stochastic.
Alternatives like Amp and Devin were too expensive for individual tinkerers; Open Code (open source) modified context behind his back (e.g., pruning tool results, running LSP diagnostics after every edit, which confuses the model mid-task).
So he built Pi himself: a minimal core with abstractions over LLM provider APIs, a generalized agent loop with tool calling and streaming, and a bespoke terminal UI.
The key design insight: many hook points let you extend Pi by loading a TypeScript module into the same Node process, enabling custom tools, compaction implementations, full TUI overhauls, and more.

Self-modifying software

Pi doesn’t ship with MCP support, plan mode, or many features users expect — people just ask Pi to build those features into itself.
This works because Pi can write code that extends itself through its extension system. It’s trivial to implement but a significant unlock.
Armen built custom debugging tools into a game project by having Pi set up the codebase so the agent could validate its own changes (screenshots, simulation dumps) while he stayed in the loop.
The broader vision: software that modifies itself on behalf of the user’s wishes, enabled by agents with enough rope to change their own tooling. Mario sees this extending beyond coding tools into knowledge work more generally.

What Armen learned from 30+ engineering teams

AI tool adoption in companies often spikes during vacation periods (Thanksgiving, summer, Christmas) when engineers have 2–3 weeks to explore — that’s when it “clicks.”
After Christmas 2025, adoption exploded in more than half the companies he talked to, with predictable consequences: code quality dropped, PRs got larger and more frequent, and engineers struggled to review them.
A key insight: agents produce code that no engineer would commit because engineers think of their future selves and feel pain from complexity; agents don’t. This is worse than human-made complex systems because agents add recovery paths and failure states instead of failing properly.
Automation bias is real: engineers see a few good outputs, trust the agent, and then miss the garbage it produces moments later.
Agents don’t learn like humans do. Humans feel pain from bad interfaces and complexity, which incentivizes fixing the root cause. Agents just keep adding. Senior engineers are valuable precisely because they’ve been burned before and say “no” to unnecessary complexity.

Non-engineers writing code

Product managers, marketing teams, and sales teams now submit PRs or build features directly using agents. One company’s sales demo built a feature that didn’t exist, and nobody noticed.
This is empowering — a PM can try a feature without wasting an engineer’s time — but the integration problem is hard, and guardrails are missing.
Peter Steinberger proposed the “prompt request” idea: instead of reviewing PRs, just share the prompt so he can run it himself in his own style. Armen disagrees — he values seeing terrible implementations because they reveal what someone actually wanted to build, saving him time.
The deeper issue: responsibility doesn’t scale with automation. In a factory, commoditization means nobody cares about a bad shirt. In engineering, postmortems and accountability depend on humans in the chain who can explain their decisions. Machines can’t be responsible yet.

Complexity as the enemy

Mario’s biggest enemy is complexity, and it’s also his agent’s biggest enemy. With a 200K token context window and a 600K-line codebase, the agent can only see a fraction of the code.
Even with perfect information retrieval, the agent’s own output becomes its worst enemy: it generates so much code that it can’t ingest all the context it needs for the next task.
Models are trained on internet code, which is mostly mediocre-to-garbage. The mean converges toward cargo culting and trend-chasing, not the handful of excellently engineered projects like Linux.

”Slow the F down”

Agents can produce 10x more code per day, but even at half the error rate, that’s still 5x more bugs. Codebase deterioration accelerates.
Humans can review ~1.5K lines of code per day meaningfully. Agents produce 3-10K lines. There’s no way to review that volume, and armies of agents (“dark factory”) make it worse.
The best possible spec is the software itself; any spec with blanks gets filled from training data — which is garbage-to-mediocre.
The better approach: use agents to automate the annoying parts, freeing humans to think about what to build and then polish it with agents — rather than building armies of agents and hoping for the best.

Friction and back pressure

Armen noticed that companies are removing all friction to let agents run autonomously, but deliberate friction (code review gates, tier-based approval requirements, checklists) exists to make engineers think before acting.
Without this back pressure, there’s no signal that something feels wrong in the codebase. The “gentic regret” — changes you wouldn’t have made yourself — accumulates.
Mario keeps Pi’s quality high by refactoring mercilessly, which forces him to understand the codebase structurally. This is the opposite of the industry wisdom of burning as many tokens as possible.

Open source under AI pressure

OpenClaw (built on Pi) generates enormous volumes of autonomous PRs and issues. Mario auto-closes all PRs from unknown accounts and asks for a human-written issue first — agents don’t see the auto-close comment, making it a perfect filter.
The real problem: there’s no back pressure mechanism. In open source, PRs used to require significant human investment, which naturally limited volume. Now they’re trivial to generate.
Armen argues the fundamentals of open source haven’t changed — the same small percentage of projects survive long-term. We just have more projects that die after two days.
GitHub itself is under immense pressure from millions of OpenClaw instances hammering its infrastructure.

MCP vs. CLI

MCP (Model Context Protocol) was originally designed for consumer chat apps to connect external services (email, OneDrive), then got adopted by developer tools.
Problems with MCP:
- The spec is complex, and many big companies just auto-convert entire OpenAPI specs into thousands of tools — which doesn’t work well.
- It’s inherently non-composable: combining outputs from two MCP servers requires the model to do data transformation through context, whereas CLI pipes let the model see only the final result.
- It fills context quickly and takes away the creative problem-solving agents show when working with raw tool output (e.g., an agent seeing a 20MB file and deciding to grep it instead of reading it all).
CLI/code execution is more flexible: agents are excellent at running code, and code generation is the paradigm most likely to dominate because of abundant training data and the ease of controlling computers through code.
MCP has found a real niche in enterprise settings where code execution isn’t acceptable and OAuth + structured APIs are needed. It’s not going away, but it needs better composability.

Predictions and staying grounded

Mario believes self-mutable software will expand beyond tech tools into broader knowledge work applications.
Armen thinks the real conversation in 2027 will be about dependency on a small number of AI labs — engineering teams already report they couldn’t maintain their codebases without AI, and when that becomes public and expensive, it will dominate discourse.
Both try to stay grounded by consuming information with a delay: if something is still important three weeks after it trended, it’s probably worth paying attention to. The honeymoon period with new tools typically lasts 1-2 months before the complexity costs become visible.
Both believe the current hype will self-correct — the dark factory and “software is dead” narratives are part of the hype machine, and sustainable practices will emerge over time.

Summary

How Pi came to be

Self-modifying software

What Armen learned from 30+ engineering teams

Non-engineers writing code

Complexity as the enemy

”Slow the F down”

Friction and back pressure

Open source under AI pressure

MCP vs. CLI

Predictions and staying grounded