Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cloud Computing Lecture Column Store – alternative organization for big relational data.

Similar presentations


Presentation on theme: "Cloud Computing Lecture Column Store – alternative organization for big relational data."— Presentation transcript:

1 Cloud Computing Lecture Column Store – alternative organization for big relational data

2 C-store  C-store is Read-optimized, for OLAP type apps  Traditional DBMS, write-optimized (optimized for online transactions) Based on records(rows)

3 C-Store  What are the cost-sensitive major factors in query processing? Size of database Index or not Join  Current hardware configuration and what a DBMS can do… Cheap storage – allow distributed redundant data store Fast CPUs – compression/decompression Limited disk bandwidth – reduce I/O

4 C-store  Supporting OLAP (online analytic processing) operations Optimized read operations Balanced write performance Address the conflict between writes and reads  Fast write – append records  Fast read – indexed, compressed  Think if data organized in columns, what are the unique challenges (different from the row- organization)?

5 C-store’s features  Column based store saves space Compression is possible Index size is smaller  Multiple projections Allow multiple indices Parallel processing on the same attributes Materialized join results  Separation of writeable store and read- optimized store Both write/read are optimized Transactions are not blocked by write locks

6 Data model  Same as relational data model Tables, rows, columns Primary keys and foreign keys Projections  From single table  Multiple joined tables  Example EMP1 (name, age) EMP2 (dept, age, DEPT.floor) EMP3 (name, salary) DEPT1(dname, floor) EMP(name, age, dept, salary) DEPT(dname, floor) Normal relational model Possible C-store model

7 Physical projection organization  Sort key each projection has one Rows are ordered by sort key Partitioned by key range  Linking columns in the same projection Storage key – (segment id, key, i.e.,offset in segment)  Linking projections To reconstruct a table Join index

8 Conceptual organization column Segment: by sort key range Sort key column Seg id offset Join index Projection 1 Projection 2

9 Architectural consideration between writes and reads  Read often  needs indices to speedup  Write often  index unfriendly: needs to update indices frequently  Use “read store” and “write store”

10 Read store: Column encoding  Use compression schemes and indices Self-order (key), few distinct values  (value, position, # items)  Indexed by clustered B-tree Foreign-order (non-key), few distinct values  (value, bitmap index)  B-tree index: position  values Self-order, many distinct values  Delta from the previous value  B-tree index Foreign-order, many distinct values  Unencoded

11 Write Store  Same structure, but explicitly use (segment, key) to identify records Easier to maintain the mapping Only concerns the inserted records  Tuple mover Copies batch of records to RS  Delete record Mark it on RS Purged by tuple mover

12 Tuple mover  Moves records in WS to RS  Happens between read-only transactions  Use merge-out process

13 How to solve read/write conflict  Situation: one transaction updates the record X, while another transaction reads X.  Use snapshot isolation

14 Benefits in query processing  Selection – has more indices to use  Projection – some “projections” already defined  Join – some projections are materialized joins  Aggregations – works on required columns only

15 Evaluation  Use TPC-H – decision support queries  Storage

16 Query performance

17  Row store uses materialized views

18 Summary: the performance gain  Column representation – avoids reads of unused attributes  Storing overlapping projections – multiple orderings of a column, more choices for query optimization  Compression of data – more orderings of a column in the same amount of space  Query operators operate on compressed representation


Download ppt "Cloud Computing Lecture Column Store – alternative organization for big relational data."

Similar presentations


Ads by Google