Scaling PostgreSQL: Lessons from 800 Million Users
PostgreSQL scales. Seriously. The OpenAI case proves it at planetary level.
The Context
OpenAI runs ChatGPT — 800 million active users — with PostgreSQL as its primary database. Not a custom distributed database. Not a NoSQL system designed for scale. PostgreSQL.
A single primary node, over 50 read replicas on Azure, millions of queries per second.
No sharding.
A single primary PostgreSQL + 50 read replicas handles millions of QPS for one of the most used services on the planet. Without sharding. Without exotic databases.
This should give pause to anyone considering abandoning PostgreSQL because it "doesn't scale."
The Architecture
OpenAI's approach is deceptively simple:
| Component | Detail |
|---|---|
| Primary | Single Azure PostgreSQL Flexible Server node |
| Read Replicas | 50+ geographically distributed replicas |
| Connection Pooling | PgBouncer in front of every instance |
| Sharding | None |
| Average Latency | < 5ms (after optimizations) |
Before and After PgBouncer
Latency dropped from ~50ms to under 5ms simply by introducing connection pooling with PgBouncer. Not an architectural rewrite. Not a database change. A proxy layer.
If you don't have PgBouncer (or equivalent) in front of your production PostgreSQL, you're leaving free performance on the table. It's the single change with the best cost-to-benefit ratio in any PostgreSQL deployment.
The Real Bottleneck: Writes
Reads scale horizontally with replicas. Add nodes, distribute load. Problem solved.
Writes don't. Everything goes through the primary. That's physics, not a bug.
OpenAI's Write Strategies
01 — Eliminate unnecessary writes
Before optimizing, they removed. Application writes that weren't needed, logs that could go elsewhere, updates that could be lazy.
The best optimization is not doing the work.
02 — Lazy writes with controlled backfill
Instead of writing synchronously, they implemented deferred writes with rate-controlled backfill. This eliminates the write traffic spikes that cause the worst problems.
03 — Offload write-heavy workloads
Workloads with a natural sharding key generating extreme write volumes were migrated to dedicated systems. PostgreSQL for the core, specialized systems for the exception.
04 — Multi-level rate limiting
Rate limiting at the application, connection, and query digest levels. Not a single control point, but defense in depth against resource exhaustion.
Architectural Considerations
Why Not Sharding?
Sharding introduces enormous operational complexity: cross-shard queries, distributed transactions, rebalancing, routing layers. OpenAI demonstrated that with the right optimizations, a single primary can handle loads most engineers would consider impossible.
| Sharding | Single Primary + Replicas |
|---|---|
| High operational complexity | Operationally simple |
| Expensive cross-shard queries | All queries on one node |
| Fragile distributed transactions | Native ACID transactions |
| Non-trivial rebalancing | Scale-up + read replicas |
Sharding makes sense when writes exceed the capacity of a single node. But that moment arrives much later than you think — and OpenAI is proof.
MVCC: The Price of Concurrency
PostgreSQL uses Multi-Version Concurrency Control. Every update creates a new row version. This means:
- Table bloat: dead rows occupying space
- Index bloat: indexes pointing to obsolete rows
- Complex autovacuum: the garbage collector needs careful tuning
- Growing WAL: more replicas = more WAL = more network bandwidth
Autovacuum tuning is one of the most underestimated activities in PostgreSQL management. The defaults are conservative. At scale, they need aggressive revision.
Read Replicas: Not All Created Equal
A crucial pattern adopted by OpenAI: traffic segregation by priority.
Not all queries are created equal. A user waiting for a real-time response has different priority than an analytics job.
| Type | Dedicated Replicas | Characteristics |
|---|---|---|
| High-priority | Dedicated replicas | Minimal latency, no interference |
| Low-priority | Shared replicas | Analytics, batch jobs, reports |
| Long-running | Isolated replicas | Queries > 1s moved here |
This prevents a 30-second analytics query from blocking real-time reads. Simple, but too many teams put everything on the same replicas.
Schema Migration: The Minefield
Production migrations on high-traffic databases are where things break. OpenAI's approach:
- Lightweight operations only: no full table rewrites in production
- 5-second DDL timeout: if it doesn't complete in 5s, abort
- Indexes always
CONCURRENTLY: never block reads for an index build - Slow queries moved to replicas: queries > 1s get migrated to avoid blocking migrations
Never run ALTER TABLE ... ADD COLUMN ... DEFAULT ... with a table rewrite on a database with millions of rows in production. Always ADD COLUMN nullable, then backfill separately.
Incidents and Lessons
The Redis Cascade Failure
A Redis outage caused a cascade collapse of the entire system. PostgreSQL wasn't the culprit, but the unmanaged dependency was.
Lesson: every external dependency is a single point of failure if you don't have circuit breakers and fallbacks.
The WALSender Bug
A bug where high CPU triggered a spin-loop in WALSender, preventing WAL transmission to replicas. The lag persisted even after CPU normalized.
Lesson: replication lag monitoring isn't optional. And you need alerting on anomalies, not just thresholds.
What PostgreSQL Is Missing (According to OpenAI)
These are real feature requests OpenAI has brought to the PostgreSQL community:
- Index disabling: ability to mark an index as invalid without dropping it, monitor the impact, then decide
- Latency percentiles: native P95/P99 in
pg_stat_statements, not just averages - DDL history tracking: a schema change history accessible via query
- Wait event semantics: sessions "active" with wait event "ClientRead" for hours — confusing semantics
- Heuristic defaults: auto-tuned parameters based on detected CPU/RAM/disk
These aren't complaints. They're concrete feedback from someone operating PostgreSQL at a scale few reach. If you work on the PostgreSQL ecosystem, these are the areas where contribution has the highest impact.
My Takeaways
PostgreSQL Is Enough. Almost Always.
The OpenAI case demolishes the argument that "PostgreSQL doesn't scale." If it scales for 800 million ChatGPT users, it scales for your project. The problem is never PostgreSQL — it's how you use it.
"The best technology is boring technology. PostgreSQL is gloriously boring. And that's why it works."
Complexity Is a Choice, Not a Requirement
OpenAI could have chosen a custom distributed database. They could have sharded from day one. Instead they chose the simplest approach that could work and optimized from there.
That's engineering. Not the newest technology, but the simplest solution that solves the problem.
Connection Pooling Is Not Optional
If there's one thing to take away from this article: PgBouncer in production. Always. The difference between 50ms and 5ms isn't an optimization — it's a category change.
Monitoring Decides Everything
OpenAI had a single Sev0 incident attributable to PostgreSQL in nine months. Not because PostgreSQL is magic, but because they invested in observability: replication lag, query performance, connection states, WAL volume.
You can't optimize what you don't measure.
Sharding Is the Last Resort
Too many architectures start with sharding "just in case." OpenAI demonstrates that the cost of sharding's operational complexity almost always outweighs the benefit, up to scales that 99.99% of projects will never reach.
Operational Takeaways
For those scaling PostgreSQL today:
| Priority | Action | Impact |
|---|---|---|
| 1 | Introduce PgBouncer | 10x latency reduction |
| 2 | Segregate traffic on dedicated replicas | Eliminate workload interference |
| 3 | Aggressive autovacuum tuning | Prevent bloat and degradation |
| 4 | Timeouts on DDL and queries | Prevent lock chains |
| 5 | Monitor replication lag and WAL | Early warning on degradation |
| 6 | Lazy writes for spike control | Smooth write traffic |
| 7 | Shard only when everything else isn't enough | Complexity as last resort |