Lecture 6 Indexing Part 2 Column Stores. Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)--

Slides:



Advertisements
Similar presentations
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Advertisements

Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
1 Lecture 8: Data structures for databases II Jose M. Peña
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
CS4432: Database Systems II
CS CS4432: Database Systems II Basic indexing.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Chapter 9 of DBMS First we look at a simple (strawman) approach (ISAM). We will see why it is unsatisfactory. This will motivate the B+Tree Read 9.1 to.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
B+ - Tree & B - Tree By Phi Thong Ho.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Primary Indexes Dense Indexes
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
CS 255: Database System Principles slides: B-trees
Ch12: Indexing and Hashing  Basic Concepts  Ordered Indices B+-Tree Index Files B+-Tree Index Files B-Tree Index Files B-Tree Index Files  Hashing Static.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
CS4432: Database Systems II
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Lecture 11 Main Memory Databases Midterm Review. Time breakdown for Shore DBMS Source: “OLTP Under the Looking Glass”, SIGMOD 2008 Systematically removed.
Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Nimesh Shah (nimesh.s) , Amit Bhawnani (amit.b)
DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
Lecture 5 Cost Estimation and Data Access Methods.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Indexing and Hashing By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 5 Index and Clustering
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
File Organizations and Indexing
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.
CS 440 Database Management Systems Lecture 6: Data storage & access methods 1.
CS4432: Database Systems II More on Index Structures 1.
Indexing. 421: Database Systems - Index Structures 2 Cost Model for Data Access q Data should be stored such that it can be accessed fast q Evaluation.
Data on External Storage – File Organization and Indexing – Cluster Indexes - Primary and Secondary Indexes – Index data Structures – Hash Based Indexing.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
Indexing Structures Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts – 6 th Edition.
CS4432: Database Systems II
1 Query Processing Part 3: B+Trees. 2 Dense and Sparse Indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Insertions.
CS 405G: Introduction to Database Systems 12. Index.
1 Ullman et al. : Database System Principles Notes 4: Indexing.
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: ec2620m.htm Office: TEL 3049.
Chapter 5 Ranking with Indexes. Indexes and Ranking n Indexes are designed to support search  Faster response time, supports updates n Text search engines.
CPS216: Data-intensive Computing Systems
CS 540 Database Management Systems
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
File organization and Indexing
Lecture 19: Data Storage and Indexes
CSE 544: Lecture 11 Storing Data, Indexes
CPS216: Advanced Database Systems
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
Lecture 20: Indexes Monday, February 27, 2006.
Presentation transcript:

Lecture 6 Indexing Part 2 Column Stores

Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)-- / O(P) O( log B n + R ) LookupO(P)O(C)O(1)O( log B n ) n : number of tuples P : number of pages in file B : branching factor of B-Tree (keys / node) R : number of pages in range C: cardinality (#) of unique values on key

B+ Tree Indexes Balanced wide tree Fast value lookup and range scans Each node is a disk page (except root) Leafs point to tuple pages

4 Secondary Indices Example Index record points to a bucket that contains pointers to all the actual records with that particular search-key value. Secondary indices have to be dense Secondary index on balance field of account

B+ Tree Insertion Locate leaf where for new key and pointer Insert into leaf node If overfull, split node Recursively update parents to keep tree balanced and (non-root) nodes >= half full

B+ Tree Insertion Insert Clearview

B+ Tree Insertion B + -Tree before and after insertion of “Clearview”

B+ Tree Deletion Find leaf key and pointer Delete from leaf If leaf underfull (> ½ entries used), rebalance with neighbors Recursively update parents to keep balance and reflect new leaf contents – May delete root with one entry

B+ Tree Deletion Example Deleting “Downtown” causes merging of under-full leaves – leaf node can become empty only for n=3! Before and after deleting “Downtown”

Study Break: B+ Tree See tree on board Insert 9 into the tree Insert 3 into the original tree Delete 8 from start tree w/left leaf redistribution Delete 8 with right redistribution

Column Store Performance How much do these optimizations matter? Wanted to compare against best you could do with a commercial system

12 Emulating a Column Store Two approaches: 1.Vertical partitioning: for n column table, store n two-column tables, with ith table containing a tuple-id, and attribute i Sort on tuple-id Merge joins for query results 2.Index-only plans Create a secondary index on each column Never follow pointers to base table

13 Bottom Line Time (s)  SSBM (Star Schema Benchmark -- O’Neil et al ICDE 08)  Data warehousing benchmark based on TPC-H  Scale 100 (60 M row table), 17 columns  Average across 12 queries  Row store is a commercial DB, tuned by professional DBA vs C-Store Commercial System Does Not Benefit From Vertical Partitioning

14 Problems with Vertical Partitioning ①Tuple headers  Total table is 4GB  Each column table is ~1.0 GB  Factor of 4 overhead from tuple headers and tuple-ids ②Merge joins  Answering queries requires joins  Row-store doesn’t know that column-tables are sorted  Sort hurts performance  Would need to fix these, plus add direct operation on compressed data, to approach C-Store performance

Problems with Index-Only Plans Consider the query: SELECT store_name, SUM(revenue) FROM Facts, Stores WHERE fact.store_id = stores.store_id AND stores.country = “Canada” GROUP BY store_name Two WHERE clauses result in a list of tuple IDs that pass all predicates Need to go pick up values from store_name and revenue columns But indexes map from value  tuple ID! Column stores can efficiently go from tuple ID  value in each column

16 Recommendations for Row-Store Designers Might be possible to get C-Store like performance ①Need to store tuple headers elsewhere (not require that they be read from disk w/ tuples) ②Need to provide efficient merge join implementation that understands sorted columns ③Need to support direct operation on compressed data Requires “ late materialization ” design

Study Break: Column Stores Given the schema: grades (a_cid int, student_id int, grade char(2), grade_num int) Estimate how much data we would read if we select avg(grade_num) from 1M records in column store? – What about a row store? If we have 5k students, how much data do we need to access to count the number of students who have earned an A where a_cid=339. Do the same exercise with a row store.

Column Stores Solution Column store: avg(grade_num) = 8 bytes * 1M tuples = 8 MB Row store: (3*8 + 2) bytes * 1M = 26 MB Count # of tuples from two cols, 8 bytes (a_cid) + 2 bytes (grade) * 1M = 10 MB Row store: 26 MB again