Presentation is loading. Please wait.

Presentation is loading. Please wait.

Query Execution in Main Memory DBMS

Similar presentations


Presentation on theme: "Query Execution in Main Memory DBMS"— Presentation transcript:

1 Query Execution in Main Memory DBMS

2 TPC-H Q1 Scan Select 99% Group into 4 groups Aggregate 8 numbers
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date ' ' - interval '90' day (3) group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;

3 TPC-H Q1 100s 0.25s

4 TPC-H Q1 100s 96s 0.25s

5 Chain of Work MonetDB X100 Monet DB (1999) Hyper (2011) (2005)
How much faster

6 Properties of RAM RAM ==== Volatile Expensive (100x HDD)
Random Access => how random – 64 bytes Memory pages – physical addressed Address Translation – Complicated & Expensive – Cacheable Designated Address Cache Fast - how fast ? 1600 MHz * 4 channels * 8 bytes ~ 50GBps 100x Faster than disk

7 Columnar Layout > Reads lesser data > No tuple header overhead
> Better cache utilization

8 Properties of CPU

9 Functions calls in Postgres

10 Function calls are bad 5165ms 1104ms 227ms

11 TPC-H Q1 Profile

12 Solution Elementary columnar operations WHERE A < 5 AND B = 2
int v[len] // Bitmap sel_lt(A, 5, v) sel_eq(B, 2, v) Operators are connected by materializing Intermediate results as temporary tables. Significantly reduces number of functions calls

13 In MonetDB select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date ' ' - interval '90' day (3) group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;

14 Branching is bad Select X from table if X > 5

15 Predication if input[i] > 5: output[j++] = input[i]
Transforms a control dependency into data dependency Pro: Does not cause pipeline buddle Con: Writes additional data

16 Is this it ? No!

17 Vectorized Execution MonetDB missed due to cost of materialization
Instead of operating on column-at-a-time, operate on vector at a time

18 Example For op Pos.SELECT Without Vectorized Execution
Would read entire sym column and generated the entire position bitmap With Vectorized Execution Would read ~ 1k entries at a time and run it through the pipeline

19 Now

20 The Gap SELECT X FROM table WHERE X > 5 AND X < 10; In C++:
In X100: for (i = 0; i < size; i++) if (x[i] > 5 && x[i] < 10) output[j++] = x[i] for (j = 0; j < size; j += 1024) sel_col_lt_init(&x[j], b, 10) sel_col_gt_and(&x[j], b, 5) ret = gather(output, b, ret) void sel_col_gt_and(col, bitmap, val) for (i = 0; I < 1024; i++) bitmap[i] = bitmap[i] && (col[i] > val)

21 Query Compilation

22 LLVM

23 Why LLVM

24 Voila 96s 0.41s 0.25s

25 Almost Done 

26 Paper Stack MonetDB () MonetDB X100 ( Hyper ( generation.pdf)


Download ppt "Query Execution in Main Memory DBMS"

Similar presentations


Ads by Google