The presentation will take place in Ballroom H on Thursday, March 5, 2026 - 14:30 to 15:30
The PostgreSQL Query Planner is often viewed as a "black box," but its decisions rely heavily on cardinality estimation. By default, this estimation process assumes column independence -- an assumption that breaks down in many schemas, leading to severe miscalculations and slow queries.
This session gives a deep dive into the mathematics behind PostgreSQL's cardinality estimation.
We will cover:
- The Independence Assumption: The standard probability formulas PostgreSQL uses and exactly where they fail with correlated data.
- The Math of Extended Stats: How ANALYZE computes statistical objects (dependencies, N-distinct counts, and MCV lists) and how the planner applies these formulas to derive row estimates.
- Performance Impact: Benchmark results demonstrating the queries that can benefit from extended statistics.
- Join Statistics: An introduction to a proof-of-concept for statistics on joins, analyzing key implementation decisions and their performance impact on autovacuum.



