Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 C-Store: A Column-oriented DBMS By New England Database Group.

Similar presentations


Presentation on theme: "1 C-Store: A Column-oriented DBMS By New England Database Group."— Presentation transcript:

1 1 C-Store: A Column-oriented DBMS By New England Database Group

2 2 M.I.T Current DBMS Gold Standard Current DBMS Gold Standard  Store fields in one record contiguously on disk  Use B-tree indexing  Use small (e.g. 4K) disk blocks  Align fields on byte or word boundaries  Conventional (row-oriented) query optimizer and executor (technology from 1979)  Aries-style transactions  Store fields in one record contiguously on disk  Use B-tree indexing  Use small (e.g. 4K) disk blocks  Align fields on byte or word boundaries  Conventional (row-oriented) query optimizer and executor (technology from 1979)  Aries-style transactions

3 3 M.I.T Terminology -- “Row Store” Record 2 Record 4 Record 1 Record 3 E.g. DB2, Oracle, Sybase, SQLServer, …

4 4 M.I.T Row Stores are Write Optimized Row Stores are Write Optimized  Can insert and delete a record in one physical write  Good for on-line transaction processing (OLTP)  But not for read mostly applications  Data warehouses  CRM  Can insert and delete a record in one physical write  Good for on-line transaction processing (OLTP)  But not for read mostly applications  Data warehouses  CRM

5 5 M.I.T Elephants Have Extended Row Stores  With Bitmap indices  Better sequential read  Integration of “datacube” products  Materialized views  With Bitmap indices  Better sequential read  Integration of “datacube” products  Materialized views But there may be a better idea…….

6 6 M.I.T Column Stores

7 7 M.I.T At 100K Feet….  Ad-hoc queries read 2 columns out of 20  In a very large warehouse, Fact table is rarely clustered correctly  Column store reads 10% of what a row store reads  Ad-hoc queries read 2 columns out of 20  In a very large warehouse, Fact table is rarely clustered correctly  Column store reads 10% of what a row store reads

8 8 M.I.T C-Store (Column Store) Project  Brandeis/Brown/MIT/UMass-Boston project  Usual suspects participating  Enough coded to get performance numbers for some queries  Complete status later  Brandeis/Brown/MIT/UMass-Boston project  Usual suspects participating  Enough coded to get performance numbers for some queries  Complete status later

9 9 M.I.T We Build on Previous Pioneering Work….  Sybase IQ (early ’90s)  Monet (see CIDR ’05 for the most recent description)  Sybase IQ (early ’90s)  Monet (see CIDR ’05 for the most recent description)

10 10 M.I.T C-Store Technical Ideas  Code the columns to save space  No alignment  Big disk blocks  Only materialized views (perhaps many)  Focus on Sorting not indexing  Automatic physical DBMS design  Code the columns to save space  No alignment  Big disk blocks  Only materialized views (perhaps many)  Focus on Sorting not indexing  Automatic physical DBMS design

11 11 M.I.T C-store (Column Store) Technical Ideas  Optimize for grid computing  Innovative redundancy  Xacts – but no need for Mohan  Data ordered on anything, Not just time  Column optimizer and executor  Optimize for grid computing  Innovative redundancy  Xacts – but no need for Mohan  Data ordered on anything, Not just time  Column optimizer and executor

12 12 M.I.T How to Evaluate This Paper….  None of the ideas in isolation merit publication  Judge the complete system by its (hopefully intelligent) choice of  Small collection of inter-related powerful ideas  That together put performance in a new sandbox  None of the ideas in isolation merit publication  Judge the complete system by its (hopefully intelligent) choice of  Small collection of inter-related powerful ideas  That together put performance in a new sandbox

13 13 M.I.T Code the Columns  Work hard to shrink space  Use extra space for multiple orders  Fundamentally easier than in a row store  E.g. RLE works well  Work hard to shrink space  Use extra space for multiple orders  Fundamentally easier than in a row store  E.g. RLE works well

14 14 M.I.T No Alignment  Densepack columns  E.g. a 5 bit field takes 5 bits  Current CPU speed going up faster than disk bandwidth  Faster to shift data in CPU than to waste disk bandwidth  Densepack columns  E.g. a 5 bit field takes 5 bits  Current CPU speed going up faster than disk bandwidth  Faster to shift data in CPU than to waste disk bandwidth

15 15 M.I.T Big Disk Blocks  Tunable  Big (minimum size is 64K)  Tunable  Big (minimum size is 64K)

16 16 M.I.T Only Materialized Views  Projection (materialized view) is some number of columns from a fact table  Plus columns in a dimension table – with a 1-n join between Fact and Dimension table  Stored in order of a storage key(s)  Several may be stored!!!!!  With a permutation, if necessary, to map between them  Projection (materialized view) is some number of columns from a fact table  Plus columns in a dimension table – with a 1-n join between Fact and Dimension table  Stored in order of a storage key(s)  Several may be stored!!!!!  With a permutation, if necessary, to map between them

17 17 M.I.T Only Materialized Views  Table (as the user specified it and sees it) is not stored!  No secondary indexes (they are a one column sorted MV plus a permutation, if you really want one)  Table (as the user specified it and sees it) is not stored!  No secondary indexes (they are a one column sorted MV plus a permutation, if you really want one)

18 18 M.I.T Example User view: EMP (name, age, salary, dept) Dept (dname, floor) Possible set of MVs: MV-1 (name, dept, floor) in floor order MV-2 (salary, age) in age order MV-3 (dname, salary, name) in salary order

19 19 M.I.T Different Indexing Few valuesMany values Sequential RLE encoded Conventional B-tree at the value level Delta encoded Conventional B-tree at the block level Non sequentialBitmap per value Conventional Gzip Conventional B-tree at the block level

20 20 M.I.T Automatic Physical DBMS Design  Not enough 4-star wizards to go around  Accept a “training set” of queries and a space budget  Choose the MVs auto-magically  Re-optimize periodically based on a log of the interactions  Not enough 4-star wizards to go around  Accept a “training set” of queries and a space budget  Choose the MVs auto-magically  Re-optimize periodically based on a log of the interactions

21 21 M.I.T Optimize for Grid Computing  I.e. shared-nothing  Dewitt (Gamma) was right  Horizontal partitioning and intra-query parallelism as in Gamma  I.e. shared-nothing  Dewitt (Gamma) was right  Horizontal partitioning and intra-query parallelism as in Gamma

22 22 M.I.T Innovative Redundancy  Hardly any warehouse is recovered by a redo from the log  Takes too long!  Store enough MVs at enough places to ensure K-safety  Rebuild dead objects from elsewhere in the network  K-safety is a DBMS-design problem!  Hardly any warehouse is recovered by a redo from the log  Takes too long!  Store enough MVs at enough places to ensure K-safety  Rebuild dead objects from elsewhere in the network  K-safety is a DBMS-design problem!

23 23 M.I.T XACTS – No Mohan  Undo from a log (that does not need to be persistent)  Redo by rebuild from elsewhere in the network  Undo from a log (that does not need to be persistent)  Redo by rebuild from elsewhere in the network

24 24 M.I.T XACTS – No Mohan  Snapshot isolation (run queries as of a tunable time in the recent past)  To solve read-write conflicts  Distributed Xacts  Without a prepare message (no 2 phase commit)  Snapshot isolation (run queries as of a tunable time in the recent past)  To solve read-write conflicts  Distributed Xacts  Without a prepare message (no 2 phase commit)

25 25 M.I.T Storage (sort) Key(s) is not Necessarily Time  That would be too limiting  So how to do fast updates to densepack column storage that is not in entry sequence?  That would be too limiting  So how to do fast updates to densepack column storage that is not in entry sequence?

26 26 M.I.T Solution – a Hybrid Store Read-optimized Column store Write-optimized Column store Tuple mover (Much like Monet) (What we have been talking about so far) (Batch rebuilder)

27 27 M.I.T Column Executor  Column operations – not row operations  Columns remain coded – if possible  Late materialization of columns  Column operations – not row operations  Columns remain coded – if possible  Late materialization of columns

28 28 M.I.T Column Optimizer  Chooses MVs on which to run the query  Most important task  Build in snowflake schemas  Which are simple to optimize without exhaustive search  Looking at extensions  Chooses MVs on which to run the query  Most important task  Build in snowflake schemas  Which are simple to optimize without exhaustive search  Looking at extensions

29 29 M.I.T Current Performance  100X popular row store in 40% of the space  10X popular column store in 70% of the space  7X popular row store in 1/6 th of the space  Code available with BSD license  100X popular row store in 40% of the space  10X popular column store in 70% of the space  7X popular row store in 1/6 th of the space  Code available with BSD license

30 30 M.I.T Structure Going Forward  Vertica  Very well financed start-up to commercialize C-store  Doing the heavy lifting  University Research  Funded by Vertica  Vertica  Very well financed start-up to commercialize C-store  Doing the heavy lifting  University Research  Funded by Vertica

31 31 M.I.T Vertica  Complete alpha system in December ‘05  Everything, including DBMS designer  With current performance!  Looking for early customers to work with (see me if you are interested)  Complete alpha system in December ‘05  Everything, including DBMS designer  With current performance!  Looking for early customers to work with (see me if you are interested)

32 32 M.I.T University Research  Extension of algorithms to non-snowflake schemas  Study of L2 cache performance  Study of coding strategies  Study of executor options  Study of recovery tactics  Non-cursor interface  Study of optimizer primitives  Extension of algorithms to non-snowflake schemas  Study of L2 cache performance  Study of coding strategies  Study of executor options  Study of recovery tactics  Non-cursor interface  Study of optimizer primitives


Download ppt "1 C-Store: A Column-oriented DBMS By New England Database Group."

Similar presentations


Ads by Google