Presentation is loading. Please wait.

Presentation is loading. Please wait.

C-Store: Class Overview Spring, 2009 Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Feb 27, 2009.

Similar presentations


Presentation on theme: "C-Store: Class Overview Spring, 2009 Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Feb 27, 2009."— Presentation transcript:

1 C-Store: Class Overview Spring, 2009 Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Feb 27, 2009

2 C-Store: A Column-Oriented DBMS Instructor: Jianlin Feng ( 冯剑琳 )  Office: Lab Center B111  Teaching: Friday (2-3 and 4-5), D202.  Teaching Style: Try to present the Basic Ideas in a clear and unified manner Be your guide if you like  Email: fengjl9@gmail.comfengjl9@gmail.com

3 C-Store: Class Motivation We are doing Software!!!  A database management system (DBMS) is computer software that manages databases.computer softwaredatabases  3 Turing Award Winners since 1966  Oracle, DB2, SQl Server Wanna be a Software Architect?  Not a Naïve Coder  Learning from top software developers  Learning from open source code  Understanding System Design and Implementation Better

4 C-Store ’ s Father: Michael Stonebraker A former Professor at Berkeley, an Adjunct Professor at M.I.T. ACM Software System Award, 1988  INGRES, developed by undergraduates  POSTGRES, Mariposa, C-Store ACM SIGMOD Innovation Award, 1994 National Academy of Engineering, 1998

5

6 C-Store: The Home Page http://db.lcs.mit.edu/projects/cstore/ http://db.lcs.mit.edu/projects/cstore/ C-Store: A Column-Oriented DBMS download-Source code download overview-Project description overview papers-Publications papers people-Who are we? people The CStore project is a collaboration between MIT, Yale, Brandeis University. Brown University, and UMass Boston.MIT YaleBrandeis UniversityBrown University UMass Boston Commercialized C-Store: Vertica

7 Course Work: Assignments, and Course Project Reading papers  Each student will be individually responsible for writing up a short summary of every paper. Reading source codes Team work  5 students  Some related project as you like,  Or specified by Instructor  Doing presentation

8 An example summary LRVM (Satyanarayanan, et al.) Good points:  1) Providing an abstraction of a greatly needed behavior (transactions) makes system code implementation much easier: this stuff is useful.  2) Returns to UNIX mentality of small and simple building blocks.  3) Performance analysis (Rmem/Pmem) very applicable to stated domain (fs metadata). Bad points:  1) It would have been nice if they had explicitly stated that set-range can be called multiple times within a transaction; they only comment on it in 5.2 when discussing optimizations (for overlapping region specification).  2) It's unclear why the throughputs are almost equivalent for sequential access even though their CPU utilization is much different. This seems to contradict their scalability concern, as it would seem both systems are IO bound as opposed to to CPU bound; given the rate of CPU improvement, IO would seem to be the greater concern. Of course, it's still good that the very simple RVM performs better.

9 The Starting Point C-Store: A Column Oriented DBMS Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. VLDB, pages 553-564, 2005.

10 C-Store: the Column Store Project Row Store or Column Store ? Record 1 Record 2 Column 1Column 2 Record 3 Column 3 Relation or Tables

11 Example of a Relation

12 The History: Relational Model Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 13 (6): 377–387.A Relational Model of Data for Large Shared Data BanksCommunications of the ACM Physical Data Independence  Row Store Vs. Column Store on the same Conceptual Model: Relation

13 Row Store: Why? OLTP (On-Line Transaction Processing)  ATM, POS in supermarkets Characteristics of OLTP applications :  Transactions that involve small numbers of records (or tuples)  Frequent updates (including queries)  Many users  Fast response times OLTP Needs Write-Optimized Row Store.  Insert and delete a record in one physical write.

14 Row Store: Columns Stored Together Record id = Page i Rid = (i,N) Rid = (i,2) Rid = (i,1) Pointer to start of free space SLOT DIRECTORY N... 2 1 20 1624 N # slots Slot Array Data

15 Current DBMS Gold Standard Current DBMS Gold Standard Store Columns in one record contiguously on disk Use B-tree indexing Use small (e.g. 4K) disk blocks Align fields on byte or word boundaries Conventional (row-oriented) query optimizer and executor (technology from 1979) Aries-style transactions

16 From OLTP to OLAP and Data Warehouse OLAP (On-Line Analytical Processing, Codd, 1993)  Flexible Reporting for Business Intelligence Characteristics of OLAP applications :  Transactions that involve large numbers of records  Frequent Ad-hoc queries and Infrequent updates  A few decision making users  Fast response times Data warehouses are designed to facilitate reporting and analysis.  Read-Mostly

17 A Flavor of OLAP: Data Cube (Jim Gray, 1996)

18 Data Cube vs. Star Schema

19 Data Warehouse Architecture

20 Other Read-Mostly Applications CRM (Customer Relationship Management )  Siebel (Oracle) SiebelOracle Catalog Search in Electronic Commerce  Amazon.com Amazon.com  Shopping.com

21 Column Store: Why? The Intuition: Only read relevant columns  Say, Ad-hoc queries read 2 columns out of 20 Column Store is not a new idea  Sybase IQ (early ’90s, bitmap index)  Addamark (i.e., SenSage, for Event Log data warehouse)  MonetDB (Hyper-Pipelining Query Execution, CIDR’05)

22 C-Store Technical Ideas Logical Data Model: Relational Model Column Store Only Materialized Views on Each Relation (perhaps many) Active Data Compression Column-Oriented Query Executor and Optimizer Shared Nothing Architecture Replication-Based Concurrency Control and Recovery

23 How to Evaluate The C-Store Paper None of the ideas in isolation merit publication Judge the complete system by its (hopefully intelligent) choice of  Small collection of inter-related powerful ideas  That together put performance in a new sandbox

24 Architecture of C-Store (Vertica) On a Single Node

25 C-Store code base version 0.2 http://db.lcs.mit.edu/projects/cstore/cstore0.2. tar.gz http://db.lcs.mit.edu/projects/cstore/cstore0.2. tar.gz runs on Linux x86 computers  Tested on RedHat Linux This code compiles on old versions BerkeleyDB and gcc.  BerkeleyDB.4.2 LZO version 1 (http://www.oberhumer.com/opensource/lzo/)

26 References Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. C-Store: A Column Oriented DBMS VLDB, pages 553-564, 2005.C-Store: A Column Oriented DBMS VERTICA DATABASE TECHNICAL OVERVIEW WHITE PAPER. http://www.vertica.com/php/pdfgateway?file=Vertic aArchitectureWhitePaper.pdf http://www.sensage.com/English/Products/Event_ Data_Warehouse.html


Download ppt "C-Store: Class Overview Spring, 2009 Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Feb 27, 2009."

Similar presentations


Ads by Google