Helping the Planner Help You: Extended Statistics in PostgreSQL

The presentation will take place in Ballroom H on Thursday, March 5, 2026 - 14:30 to 15:30

The PostgreSQL Query Planner is often viewed as a "black box," but its decisions rely heavily on cardinality estimation. By default, this estimation process assumes column independence -- an assumption that breaks down in many schemas, leading to severe miscalculations and slow queries.

This session gives a deep dive into the mathematics behind PostgreSQL's cardinality estimation.

We will cover:

The Independence Assumption: The standard probability formulas PostgreSQL uses and exactly where they fail with correlated data.
The Math of Extended Stats: How ANALYZE computes statistical objects (dependencies, N-distinct counts, and MCV lists) and how the planner applies these formulas to derive row estimates.
Performance Impact: Benchmark results demonstrating the queries that can benefit from extended statistics.
Join Statistics: An introduction to a proof-of-concept for statistics on joins, analyzing key implementation decisions and their performance impact on autovacuum.