The PostgreSQL Query Planner Robert Haas PostgreSQL East 2010.

The PostgreSQL Query Planner Robert Haas PostgreSQL East 2010

Why Does My Query Need a Plan? ● SQL is a declarative language. ● In other words, a SQL query is not a program. ● No control flow statements (e.g. for, while) and no way to control order of operations. ● SQL describes results, not process.

Why Didn't The Planner Do It My Way? ● Maybe your way is actually slower, or ● Maybe you gave the planner bad information, or ● Maybe the query planner really did goof. ● Related question: – How do I force the planner to use my index?

Query Planning ● Make queries run fast. – Minimize disk I/O. – Prefer sequential I/O to random I/O. – Minimize CPU processing. ● Don't use too much memory in the process. ● Deliver correct results.

Query Planner Decisions ● Access strategy for each table. – Sequential Scan, Index Scan, Bitmap Index Scan. ● Join strategy. – Join order. – Join strategy: nested loop, merge join, hash join. – Inner vs. outer. ● Aggregation strategy. – Plain, sorted, hashed.

Table Access Strategies ● Sequential Scan (Seq Scan) – Read every row in the table. ● Index Scan or Bitmap Index Scan – Read only part of the table by using the index to skip uninteresting parts. – Index scan reads index and table in alternation. – Bitmap index scan reads index first, populating bitmap, and then reads table in sequential order.

Sequential Scan ● Always works – no need to create indices in advance. ● Doesn't require reading the index, which has both I/O and CPU cost. ● Best way to access very small tables. ● Usually the best way to access all or nearly the rows in a table.

Index Scan ● Potentially huge performance gain when reading only a small fraction of rows in a large table. ● Only table access method that can return rows in sorted order – very useful in combination with LIMIT. ● Random I/O against base table!

Bitmap Index Scan ● Scans all index rows before examining base table, populating a TID bitmap. ● Table I/O is sequential, with skips; results in physical order. ● Can efficiently combine data multiple indices – TID bitmap can handle boolean AND and OR operations. ● Handles LIMIT poorly.

Join Planning ● Fixing the join order and join strategy is the “hard part” of query planning. ● # of possibilities grows exponentially with number of tables. ● When search space is small, planner does a nearly exhaustive search. ● When search space is too large, planner uses heuristics or GEQO to limit planning time and memory usage.

Join Strategies ● Nested loop. ● Nested loop with inner index-scan. ● Merge join. ● Hash join. ● Each join strategy takes an “outer” relation and an “inner” relation and produces a result relation.

Nested Loop Pseudocode for (each outer tuple) for (each inner tuple) if (join condition is met) emit result row; ● Outer or inner loop could be scanning output of some other join, or a base table. Base table scan could be using an index. ● Cost is roughly proportional to product of table sizes – bad if BOTH are large.

Nested Loop Example #1 SELECT * FROM foo, bar WHERE foo.x = bar.x Nested Loop Join Filter: (foo.x = bar.x) -> Seq Scan on bar -> Materialize -> Seq Scan on foo This might be very slow!

Nested Loop Example #2 SELECT * FROM foo, bar WHERE foo.x = bar.x Nested Loop -> Seq Scan on foo -> Index Scan using bar_pkey on bar Index Cond: (bar.x = foo.x) Nested loop with inner index-scan! Much better... though probably still not the best plan.

Merge Join ● Only handles equality joins – something like a.x = b.x. ● Put both input relations into sorted order (using sort or index scan) and scan through the two in parallel, matching up equal values. ● Normally visits each input tuple only once, but may need to “rescan” portions of the inner input if there are duplicate values in the outer input. – Take OUTER={1 2 2 3} and INNER={2 2 3 4}

Merge Join Example SELECT * FROM foo, bar WHERE foo.x = bar.x Merge Join Merge Cond: (foo.x = bar.x) -> Sort Sort Key: foo.x -> Seq Scan on foo -> Materialize -> Sort Sort Key: bar.x -> Seq Scan on bar

Hash Join ● Like merge join, only handles equality joins. ● Hash each row from the inner relation to create a hash table. Then, hash each row from the outer relation and probe the hash table for matches. ● Very fast – but requires enough memory to store inner tuples. Can get around this using multiple “batches”. ● Not guaranteed to retain input ordering.

Hash Join Example SELECT * FROM foo, bar WHERE foo.x = bar.x Hash Join Hash Cond: (foo.x = bar.x) -> Seq Scan on foo -> Hash -> Seq Scan on bar

Join Removal ● Upcoming 9.0 feature. ● Consider this query: ● SELECT p.id, p.name FROM projects p LEFT JOIN person pm ON p.project_manager_id = pm.id; ● If there is a unique index on person (id), then the join need not be performed at all. ● Common scenario when using views.

Join Reordering ● SELECT * FROM foo JOIN bar ON foo.x = bar.x JOIN baz ON foo.y = baz.y ● SELECT * FROM foo JOIN baz ON foo.y = baz.y JOIN bar ON foo.x = bar.x ● SELECT * FROM foo JOIN (bar JOIN baz ON true) ON foo.x = bar.x AND foo.y = baz.y

EXPLAIN Estimates Hash Join (cost=8.28..404.52 rows=9000 width=118) Hash Cond: (foo.x = bar.x) -> Hash Join (cost=3.02..275.52 rows=9000 width=12) Hash Cond: (foo.y = baz.y) -> Seq Scan on foo (cost=0.00..145.00 rows=10000 width=8) -> Hash (cost=1.90..1.90 rows=90 width=4) -> Seq Scan on baz (cost=0.00..1.90 rows=90 width=4) -> Hash (cost=4.00..4.00 rows=100 width=106) -> Seq Scan on bar (cost=0.00..4.00 rows=100 width=106)

EXPLAIN ANALYZE Hash Join (cost=8.28..404.52 rows=9000 width=118) (actual time=0.743..51.582 rows=9000 loops=1) Hash Cond: (foo.x = bar.x) -> Hash Join (cost=3.02..275.52 rows=9000 width=12) (actual time=0.368..30.964 rows=9000 loops=1) Hash Cond: (foo.y = baz.y) -> Seq Scan on foo (cost=0.00..145.00 rows=10000 width=8) (actual time=0.021..9.908 rows=10000 loops=1) -> Hash (cost=1.90..1.90 rows=90 width=4) (actual time=0.280..0.280 rows=90 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 4kB -> Seq Scan on baz (cost=0.00..1.90 rows=90 width=4) (actual time=0.010..0.138 rows=90 loops=1) -> Hash (cost=4.00..4.00 rows=100 width=106) (actual time=0.354..0.354 rows=100 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 14kB -> Seq Scan on bar (cost=0.00..4.00 rows=100 width=106) (actual time=0.007..0.167 rows=100 loops=1) Total runtime: 59.376 ms

Not The Same Thing! ● SELECT * FROM (foo JOIN bar ON foo.x = bar.x) LEFT JOIN baz ON foo.y = baz.y ● SELECT * FROM (foo LEFT JOIN baz ON foo.y = baz.y) JOIN bar ON foo.x = bar.x

Review of Join Planning ● Join Order ● Join Strategy – Nested loop – Nested loop with inner index-scan – Merge join – Hash join – Join removal ● Inner vs. outer

Aggregates and DISTINCT ● Plain aggregate. – e.g. SELECT count(*) FROM foo; ● Sorted aggregate. – Sort the data (or use pre-sorted data); when you see a new value, aggregate the prior group. ● Hashed aggregate. – Insert each input row into a hash table based on the grouping columns; at the end, aggregate all the groups.

Statistics ● All of the decisions discussed earlier in this talk are made using statistics. – Seq scan vs. index scan vs. bitmap index scan – Nested loop vs. merge join vs. hash join ● ANALYZE (manual or via autovacuum) gathers this information. ● You must have good statistics or you will get bad plans!

Confusing The Planner SELECT * FROM foo WHERE a = 1 AND b = 1 ● If 20% of the rows have a = 1 and 10% of the rows have b = 1, the planner will assume that 20% * 10% = 2% of the rows meet both criteria. ● SELECT * FROM foo WHERE (a + 0) = a ● Planner doesn't have a clue, so will assume 0.5% of rows will match.

What Could Go Wrong? ● If the planner underestimates the row count, it may choose an index scan instead of a sequential scan, or a nested loop instead of a hash or merge join. ● If the planner overestimates the row count, it may choose a sequential scan instead of an index scan, or a merge or hash join instead of a nested loop. ● Small values for LIMIT tilt the planner toward fast-start plans and magnify the effect of bad estimates.

Query Planner Parameters ● seq_page_cost (1.0), random_page_cost (4.0) – Reduce these costs to account for caching effects. If database is fully cached, try 0.005. ● default_statistics_target (10 or 100) – Level of detail for statistics gathering. Can also be overridden on a per- column basis. ● enable_hashjoin, enable_sort, etc. - Just for testing. ● work_mem – Amount of memory per sort or hash. ● from_collapse_limit, join_collapse_limit, geqo_threshold – Sometimes need to be raised, but be careful!

Things That Are Slow ● DISTINCT. ● PL/pgsql loops. FOR x IN SELECT... LOOP SELECT... END LOOP ● Repeated calls to SQL or PL/pgsql functions. SELECT id, some_function(id) FROM table;

Upcoming Features ● Join removal (right now just for LEFT joins). ● Machine-readable EXPLAIN output. ● Hash statistics. ● Better model for Materialize costs. ● Improved use of indices to handle MIN(x), MAX(x), and x IS NOT NULL.

Questions? Any questions?

The PostgreSQL Query Planner Robert Haas PostgreSQL East 2010.

Similar presentations

Presentation on theme: "The PostgreSQL Query Planner Robert Haas PostgreSQL East 2010."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The PostgreSQL Query Planner Robert Haas PostgreSQL East 2010.

Similar presentations

Presentation on theme: "The PostgreSQL Query Planner Robert Haas PostgreSQL East 2010."— Presentation transcript:

Similar presentations

About project

Feedback