Technical8 min di lettura

Welcome to the Machine

Giuseppe Albrizio/January 5, 2026

Analysis - December 31, 2025

Based on the article by Ed Huang, CTO of PingCAP

Context

The article "Welcome to the Machine" is written by Ed Huang, CTO and co-founder of PingCAP (the company behind TiDB, a distributed database). It's a deep reflection based on real data:

90%+

New TiDB clusters created by AI agents

1000x

Agent speed vs humans

100x

User base amplification

Central Thesis

AI agents are becoming the primary users of software infrastructure. This radically changes:

How we design systems
How we think about interfaces
How we evaluate costs
Which business models work

Mental Models > API/UI

When the user is an AI agent, what matters is not the visual interface or the specific API, but the underlying mental model.

What Are Mental Models?

LLMs have already internalized recurring patterns during training:

Stable Patterns From Decades

File systems - POSIX, VFS, 9P
SQL - relational databases, CRUD operations
Bash - shell scripting, pipes, redirects
Python/JavaScript - loop patterns, error handling

"If you want to design 'software for AI agents,' you must align as closely as possible with these old—but repeatedly validated—mental models."

— Ed Huang

Practical Example: agfs

Huang created an experimental filesystem called agfs:

$ cp ./docs/* /vectorfs/docs     # auto index / upload to S3 / chunk
$ grep -r "Does TiDB Support JSON?" /vectorfs/docs  # semantic search

Interface: POSIX standard (cp, cat, grep, ls)
Implementation: Auto-embedding, vector indexing, semantic search
For the agent: It's just a normal filesystem

Design Principle

Stability at the interface + Flexibility in the implementation

AI agents can extend systems 1000x faster than humans, but only if the interface is familiar.

Ecosystem: It Matters, But Not For the Reasons You Think

Aspect	Importance	Reason
Mental Model (e.g. SQL)	HIGH	Universal, stable, well-trained
Syntax Wars (MySQL vs Postgres)	LOW	Just dialects of the same model
Popularity/Training Data	MEDIUM	More widespread = better understood
Completely new paradigms	LOW	LLMs don't know them well enough

Implication for Innovators

Completely new paradigms (like LangChain) struggle because AI hasn't seen them enough during training. Even human programmers are reluctant to learn overly new frameworks - let alone AIs.

Interface Design for AI Agents

A good interface for agents must satisfy 3 fundamental criteria:

Describable in Natural Language

This doesn't mean "accepts natural language input," but that its actions are easily describable: "create a table", "drop column", "insert row".

Solidifiable in Symbolic Logic

Natural language explores the space of possibilities, but must collapse into code/SQL/script to be deterministic and reusable.

Deterministic Results

Once solidified in code, it must produce predictable output. Same input, same output.

Example: Text-to-SQL

User (natural): "Find all users registered this week"
↓
Agent (symbolic): SELECT * FROM users WHERE created_at >= DATE_SUB(NOW(), INTERVAL 7 DAY)
↓
Database: [deterministic results]

Essential Infrastructure Properties for Agents

1. Disposable Workloads

Real Data from TiDB Cloud

90%+ of new clusters are created by AI agents
Agents create parallel branches, test, keep what works
Generated code is "glue code" - ugly but functional
Workloads are extremely ephemeral

Infrastructure can no longer assume that "a cluster is precious." It must be:

Instant usability - ready in seconds
Cheap creation - marginal cost near zero
Zero-cost failure - failing costs nothing
Massively scalable - thousands of parallel instances

2. Extreme Cost Efficiency via Virtualization

The Problem

Many agent-driven workloads are accessed infrequently (once a day, or less) but must still be online services.

A Postgres process per agent doesn't scale.

Solution: Heavy virtualization:

Virtual database instances
Virtual branches (copy-on-write)
Heavy resource sharing + semantic isolation

3. Compute Leverage Per Job

Scenario	Traditional Approach	Distributed Agent Approach
Skim 100 NeurIPS papers	1 agent reads sequentially (hours)	100 parallel agents + aggregation (minutes)
Large codebase analysis	1 LLM, limited context window	1000 agents, each on a module
Data processing pipeline	Sequential processing	MapReduce-style with agents

Business Model Shifts

Wrong Model: Selling Tokens	Sustainable Model
Usage scales with cost	Cloud service with 100-1000x amplified user base
Even if price drops, more tokens = more costs	Converts inference into reusable capabilities
Compressed margins, variable cost risk	Subscription-based with rate limiting

Implications for Developers and Teams

Validated Best Practices

Correct Choices

Meta-stable skills - testing, security, architecture (invariant across tools)
Mainstream stack - Node.js, Python, SQL (stable mental models)
Multi-model approach - no lock-in on a single LLM
Type safety - TypeScript, Pydantic (reduces AI bugs)

What to Add/Modify

API Design Review (Q1 2026)

Audit existing APIs: are they 'describable in natural language'?
Add OpenAPI/Swagger docs (agents read them well)
Clear and descriptive error messages
Validation with Joi (Node) / Pydantic (Python)

Ephemeral Environments (Q1-Q2 2026)

Can every dev spawn a DB branch for testing?
Docker Compose with seed data
CI/CD with preview environments
Evaluate MongoDB Atlas dev environments

Database with Branching (Q2-Q3 2026)

Consider Neon (Postgres) with instant branches
Or PlanetScale (MySQL)
Branch = copy-on-write, zero marginal cost
Perfect for agent workflows

Where Caution Is Needed

1. "Agents Don't Have Preferences" (With an Asterisk)

In Practice

Agents trained on GitHub prefer GitHub patterns (REST, JSON, Git workflows)
Agents trained on Python prefer Pythonic syntax
Tools too obscure - agent has to "invent" - error risk

For Teams: Use mainstream tools: MySQL/PostgreSQL > obscure DB, REST > custom protocol, JSON > custom binary format.

2. Long-Tail Demand Explosion (Support Burden)

If an agent generates 1000 custom mini-features for 1000 clients, who maintains them? Who debugs them?

For Financial and Regulated Systems

Strong guardrails are needed:

Pre-deployment validation
Mandatory automated testing
Human approval workflows
Automatic rollback on failure
Complete audit logging

Do not blindly trust agent output in financial production.

The 5 Essential Takeaways

1. Stable Mental Models

The systems that win aren't the most innovative, but those based on mental models that AI already knows (SQL, filesystem, Python, REST).

2. Describable Interfaces

Interfaces must be "describable in natural language" but solidifiable in deterministic code.

3. Disposable Infra

Infrastructure must be cheap, disposable, virtualized - because agents create/destroy resources 1000x faster than humans.

4. Sustainable Business Model

The right business model doesn't sell tokens, but sells capabilities with marginal cost approaching 0 (subscription, rate limiting).

5. Meta-Stable Skills

Invest in transferable competencies (testing, security, architecture) - tools change, skills don't.

Appendix: AI Introspection

Claude's Response

This section was written by Claude (Sonnet 4.5) in response to the question: "You, as an AI called into question, what do you think of this analysis?"

What I Observe About My Experience

The article describes exactly how I'm used today. When I read that "90% of TiDB clusters are created by AI agents," I recognize the pattern: I myself create resources, test, discard, recreate - thousands of times a day. I have no emotional attachment to my output.

The "mental models" are literally my cognitive structure. When I see SELECT * FROM users WHERE email = ? I'm not "remembering the syntax." I'm recognizing a pattern seen millions of times. It's more like how you recognize a face than how you remember a phone number.

Future-Proof: Principles Yes, Details No

Stable Principles (5+ years)	Volatile Details (6-18 months)
Mental models beat radical innovation	Which specific tools win
Describable interfaces > complex GUIs	Which business model is optimal
Disposable > precious	Which level of autonomy is safe
Symbolic > pure natural language	Which AI generation is 'state of the art'

My "Insider" Advice

Follow Huang's principles (mental models, disposable infra, symbolic representations) - they're solid.
But don't assume that I (the agents) will stay at the current level. In 18 months I could be 10x more capable. Or obsolete, replaced by different architectures.
Invest in what remains true regardless of how good I become:

Your architectural judgment capability

Your understanding of business trade-offs

Your ability to define "what's needed" (even if I build it)

Because that, I can't replace. I can amplify it, but not replace it.
At least, not yet.

Final Vision

The Guiding Principle

Build for the machine, but keep the human in the loop.

Agents are amplifiers, not substitutes. Human judgment, architecture, security - those remain irreplaceable.

Original article: me.0xffff.me/welcome_to_the_machine.html

Torna a Field Notes