Skip to content
Formray
Back to Field Notes
Technical7 min read

Scaling PostgreSQL: Lessons from 800 Million Users

Giuseppe Albrizio/

PostgreSQL scales. Seriously. The OpenAI case proves it at planetary level.


The Context

OpenAI runs ChatGPT — 800 million active users — with PostgreSQL as its primary database. Not a custom distributed database. Not a NoSQL system designed for scale. PostgreSQL.

A single primary node, over 50 read replicas on Azure, millions of queries per second.

No sharding.

A single primary PostgreSQL + 50 read replicas handles millions of QPS for one of the most used services on the planet. Without sharding. Without exotic databases.

This should give pause to anyone considering abandoning PostgreSQL because it "doesn't scale."


The Architecture

OpenAI's approach is deceptively simple:

ComponentDetail
PrimarySingle Azure PostgreSQL Flexible Server node
Read Replicas50+ geographically distributed replicas
Connection PoolingPgBouncer in front of every instance
ShardingNone
Average Latency< 5ms (after optimizations)

Before and After PgBouncer

Latency dropped from ~50ms to under 5ms simply by introducing connection pooling with PgBouncer. Not an architectural rewrite. Not a database change. A proxy layer.

If you don't have PgBouncer (or equivalent) in front of your production PostgreSQL, you're leaving free performance on the table. It's the single change with the best cost-to-benefit ratio in any PostgreSQL deployment.


The Real Bottleneck: Writes

Reads scale horizontally with replicas. Add nodes, distribute load. Problem solved.

Writes don't. Everything goes through the primary. That's physics, not a bug.

OpenAI's Write Strategies

01 — Eliminate unnecessary writes

Before optimizing, they removed. Application writes that weren't needed, logs that could go elsewhere, updates that could be lazy.

The best optimization is not doing the work.

02 — Lazy writes with controlled backfill

Instead of writing synchronously, they implemented deferred writes with rate-controlled backfill. This eliminates the write traffic spikes that cause the worst problems.

03 — Offload write-heavy workloads

Workloads with a natural sharding key generating extreme write volumes were migrated to dedicated systems. PostgreSQL for the core, specialized systems for the exception.

04 — Multi-level rate limiting

Rate limiting at the application, connection, and query digest levels. Not a single control point, but defense in depth against resource exhaustion.


Architectural Considerations

Why Not Sharding?

Sharding introduces enormous operational complexity: cross-shard queries, distributed transactions, rebalancing, routing layers. OpenAI demonstrated that with the right optimizations, a single primary can handle loads most engineers would consider impossible.

ShardingSingle Primary + Replicas
High operational complexityOperationally simple
Expensive cross-shard queriesAll queries on one node
Fragile distributed transactionsNative ACID transactions
Non-trivial rebalancingScale-up + read replicas

Sharding makes sense when writes exceed the capacity of a single node. But that moment arrives much later than you think — and OpenAI is proof.

MVCC: The Price of Concurrency

PostgreSQL uses Multi-Version Concurrency Control. Every update creates a new row version. This means:

  • Table bloat: dead rows occupying space
  • Index bloat: indexes pointing to obsolete rows
  • Complex autovacuum: the garbage collector needs careful tuning
  • Growing WAL: more replicas = more WAL = more network bandwidth

Autovacuum tuning is one of the most underestimated activities in PostgreSQL management. The defaults are conservative. At scale, they need aggressive revision.


Read Replicas: Not All Created Equal

A crucial pattern adopted by OpenAI: traffic segregation by priority.

Not all queries are created equal. A user waiting for a real-time response has different priority than an analytics job.

TypeDedicated ReplicasCharacteristics
High-priorityDedicated replicasMinimal latency, no interference
Low-priorityShared replicasAnalytics, batch jobs, reports
Long-runningIsolated replicasQueries > 1s moved here

This prevents a 30-second analytics query from blocking real-time reads. Simple, but too many teams put everything on the same replicas.


Schema Migration: The Minefield

Production migrations on high-traffic databases are where things break. OpenAI's approach:

  • Lightweight operations only: no full table rewrites in production
  • 5-second DDL timeout: if it doesn't complete in 5s, abort
  • Indexes always CONCURRENTLY: never block reads for an index build
  • Slow queries moved to replicas: queries > 1s get migrated to avoid blocking migrations

Never run ALTER TABLE ... ADD COLUMN ... DEFAULT ... with a table rewrite on a database with millions of rows in production. Always ADD COLUMN nullable, then backfill separately.


Incidents and Lessons

The Redis Cascade Failure

A Redis outage caused a cascade collapse of the entire system. PostgreSQL wasn't the culprit, but the unmanaged dependency was.

Lesson: every external dependency is a single point of failure if you don't have circuit breakers and fallbacks.

The WALSender Bug

A bug where high CPU triggered a spin-loop in WALSender, preventing WAL transmission to replicas. The lag persisted even after CPU normalized.

Lesson: replication lag monitoring isn't optional. And you need alerting on anomalies, not just thresholds.


What PostgreSQL Is Missing (According to OpenAI)

These are real feature requests OpenAI has brought to the PostgreSQL community:

  1. Index disabling: ability to mark an index as invalid without dropping it, monitor the impact, then decide
  2. Latency percentiles: native P95/P99 in pg_stat_statements, not just averages
  3. DDL history tracking: a schema change history accessible via query
  4. Wait event semantics: sessions "active" with wait event "ClientRead" for hours — confusing semantics
  5. Heuristic defaults: auto-tuned parameters based on detected CPU/RAM/disk

These aren't complaints. They're concrete feedback from someone operating PostgreSQL at a scale few reach. If you work on the PostgreSQL ecosystem, these are the areas where contribution has the highest impact.


My Takeaways

PostgreSQL Is Enough. Almost Always.

The OpenAI case demolishes the argument that "PostgreSQL doesn't scale." If it scales for 800 million ChatGPT users, it scales for your project. The problem is never PostgreSQL — it's how you use it.

"The best technology is boring technology. PostgreSQL is gloriously boring. And that's why it works."

Complexity Is a Choice, Not a Requirement

OpenAI could have chosen a custom distributed database. They could have sharded from day one. Instead they chose the simplest approach that could work and optimized from there.

That's engineering. Not the newest technology, but the simplest solution that solves the problem.

Connection Pooling Is Not Optional

If there's one thing to take away from this article: PgBouncer in production. Always. The difference between 50ms and 5ms isn't an optimization — it's a category change.

Monitoring Decides Everything

OpenAI had a single Sev0 incident attributable to PostgreSQL in nine months. Not because PostgreSQL is magic, but because they invested in observability: replication lag, query performance, connection states, WAL volume.

You can't optimize what you don't measure.

Sharding Is the Last Resort

Too many architectures start with sharding "just in case." OpenAI demonstrates that the cost of sharding's operational complexity almost always outweighs the benefit, up to scales that 99.99% of projects will never reach.


Operational Takeaways

For those scaling PostgreSQL today:

PriorityActionImpact
1Introduce PgBouncer10x latency reduction
2Segregate traffic on dedicated replicasEliminate workload interference
3Aggressive autovacuum tuningPrevent bloat and degradation
4Timeouts on DDL and queriesPrevent lock chains
5Monitor replication lag and WALEarly warning on degradation
6Lazy writes for spike controlSmooth write traffic
7Shard only when everything else isn't enoughComplexity as last resort

Sources

Back to Field Notes