Effect of Node Size on the Performance of Cache- Conscious B+ Trees Written by: R. Hankins and J.Patel Presented by: Ori Calvo.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Quicksort
Advertisements

1 Cache and Caching David Sands CS 147 Spring 08 Dr. Sin-Min Lee.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Chapter 4: Trees Part II - AVL Tree
B+-trees. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B = n pages I/O complexity:
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
CSE332: Data Abstractions Lecture 10: More B-Trees Tyler Robison Summer
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Last Time –Main memory indexing (T trees) and a real system. –Optimize for CPU, space, and logging. But things have changed drastically! Hardware trend:
Cache Conscious Indexing for Decision-Support in Main Memory Pradip Dhara.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Cache Oblivious Search Trees via Binary Trees of Small Height
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
(B+-Trees, that is) Steve Wolfman 2014W1
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
E.G.M. PetrakisB-trees1 Multiway Search Tree (MST)  Generalization of BSTs  Suitable for disk  MST of order n:  Each node has n or fewer sub-trees.
B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node.
CS4432: Database Systems II
Making B+-Trees Cache Conscious in Main Memory
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Xiaoying Gao, Peter Andreae, VUW Indexing Large Data COMP
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
Chapter Tow Search Trees BY HUSSEIN SALIM QASIM WESAM HRBI FADHEEL CS 6310 ADVANCE DATA STRUCTURE AND ALGORITHM DR. ELISE DE DONCKER 1.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
L. Grewe.  An array ◦ stores several elements of the same type ◦ can be thought of as a list of elements: int a[8]
Advanced Databases: Lecture 6 Query Optimization (I) 1 Introduction to query processing + Implementing Relational Algebra Advanced Databases By Dr. Akhtar.
COSC 2007 Data Structures II Chapter 15 External Methods.
DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
An experimental study of priority queues By Claus Jensen University of Copenhagen.
Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
B-Tree – Delete Delete 3. Delete 8. Delete
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
B+ tree & B tree Extracted from Garcia Molina
1/14/20161 BST Operations Data Structures Ananda Gunawardena.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Internal and External Sorting External Searching
Sunpyo Hong, Hyesoon Kim
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Database Management System
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Experimental evaluation of Navigation piles
CSC 4250 Computer Architectures
CSE Algorithms Quicksort vs Heapsort: the “inside” story or
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Wednesday, April 18, 2018 Announcements… For Today…
Data Structures and Algorithms
Address-Value Delta (AVD) Prediction
CACHE-CONSCIOUS INDEXES
CSE 373: Data Structures and Algorithms
Multiway Search Tree (MST)
Lecture 13: Computer Memory
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Cache - Optimization.
10/18: Lecture Topics Using spatial locality
Presentation transcript:

Effect of Node Size on the Performance of Cache- Conscious B+ Trees Written by: R. Hankins and J.Patel Presented by: Ori Calvo

Introduction Who cares about cache improvement Traditional databases are designed to reduce IO accesses. But … Chips are cheap. Chips are big. Why not store all the database in memory? Reducing main memory accesses is the next challenge.

Objectives Introduction to cache-conscious B+Trees. Provide a model to analyze the effect of node size. Examine “ real-life ” results against our model ’ s conclusions.

B+Tree Refresher d Ordered B+Tree has between d and 2d keys in each node. Root has between 1 and 2d keys. Every node must be at least half full. 2*(d+1)^(h-1) <= N <= (2d+1)^h Fill percentage is usually ln2 ~ 69%

B+Tree Refresher (Cont … ) Good search performance. Good incremental performance. Better cache behavior than T-Tree. What is the optimal node size ?

Improving B+Tree Question: Assuming node size = cache line size, how can we make B+Tree algorithm to utilize better the cache? Hint: Locality !!!

Pointer Elimination Node size = cache line size. Only half of a node is used for storing keys. Get rid of pointers and store more keys. Instead of pointers to child nodes use offsets.

Introducing CSB+Tree Balanced search tree. Each node contains m keys, where d<=m<=2d and d is the order of the tree. All child nodes are put into a node group. Nodes within a node group are stored contiguously. Each node holds: pFirstChild- pointer to first child nKeys- number of keys arrKeys[2d]- array of keys

CSB+Tree PNK1K2 PNK1K2PNK1K2PNK1K2 PNK1K2PNK1K2PNK1K2 PNK1K2PNK1K2PNK1K2 PNK1K2PNK1K2PNK1K2

CSB+Tree vs. B+Tree Assuming, node size = 64B B+Tree: 7 Keys + 8 Pointers + 1 Counter CSB+Tree: 1 Pointer + 1 Counter + 14 Keys Results: A cache line can satisfy almost one more level of comparisons The fan out is larger  Less space

CSS Tree Can we do more elimination ?

Shaking our foundations Should node size be equal to cache line size ? What about instructions count ? How can we measure the effect of node size on the overall performance ?

Building Execution Time Model We need to take into account: Instruction executed. Data cache misses. Instruction cache misses (Only 0.5%). Mis-predicted branches. Model the above during an equality search. Should be independent of implementation and platform details, but …

Execution Time Model T = I*cpi + M*miss_latency + B*pred_penalty Depend uponValueDescriptionVariable Platform0.63 (P3)Processor clock cycles per instruction executed cpi Platform78 (P3)Processor clock cycles per L2 cache miss miss_latency Platform15 (P3)Processor clock cycles to correct a mis-predicted branch pred_penalty ImplementationInstructions countI ImplementationData cache misses countM ImplementationMis-predicted branchesB

CPI – 0.63 ? Can be extracted from a processor ’ s design manual, but.. Modern processor are very complex Some instructions require more time to retire than others On Pentium 3 CPI is between 0.33 to 14

Other PSV – Where do they come from? Miss_latency Same problems as CPI Pred_penalty The manual provides tight upper and lower bounds.

PSV Experiment For(I=0; I<Queries; I++) { address = origin + random offset val = *address; for(j=0; j<Instructions; j++) { /* Computing involving “ val ” */ }

PSV Results

Calculate I I is depended upon the actual implementation of the CSB+Tree Two main components: I_search - Searching inside a node I_trav - Node traversals Analyzing code leads to the following conclusions: I_search ~ 5 I_trav ~ 30

Calculate I_Serach BinarySearch: middle = (p1+p2)/2; comp *middle,key; jle less; p1 = middle; less: p2 = middle; jump BinarySearch;

Calculate T_Trav Node *Find(Node *pNode,int key) { int *pKeysBegin = pNode->Keys;(1) int *pKeysEnd = pNode->Keys + pNode->nKeys;(3) int *pFoundKey,foundKey; pFoundKey = BinarySearch(pKeysBegin,pKeysEnd,key);(8) ? if( pFoundKey < pKeysEnd ) {foundKey = *pFoundKey;}(3,1) else {foundKey = INFINITE;}(1) int offset = (int)(pFoundKey - pKeysBegin);(2) Node *pChild = NULL; if( key pChilds + offset;}(4,1) else {pChild = pNode->pChilds + offset + 1;}(3) return pChild; }(23-25)

Calculate I (Finishing) h- Height of the tree f- Fill percentage e- Max number of keys in a node

Calculate M M_node – Cache misses while searching inside a node When L is the number of cache line inside a node

Calculate M (Cont … ) Cache misses per tree traversal is bounded by: TreeHeight * M_node What about q traversal ?

Calculate M for q traversals Let ’ s assume there are no cache conflicts and no capacity misses On first traversal there are M_node cache misses per node access On subsequent traversals Nodes near the root will have high probability of being found in the cache Leaf nodes will have substantially lower probability

Calculate M for q traversals (Cont..) Suppose, q is the number of queries b is the number of blocks Then, the number of Unique Blocks that are visited is:

Calculate M for q traversals (Finishing) Assuming q*M_node queries is performed by each tree traversal, then: M is the sum of UB at each level of the tree:

Calculate B h- Height of the tree f- Fill percentage e- Max number of keys in a node

Mid year evaluation We built a simple model T = I*cpi + M*miss_latency + B*pred_penalty Now, we want to use it

Our model ’ s prediction We want to look at the performance behavior that our model predicts on Pentium 3 The following parameters are used 10,000,000 items Number of queries = Fill percentage = 67% Cache line size = 32 bytes

Effect of node size on cache misses count

Effect of node size on instructions count

Effect of node size on execution time

Numbers Best cache utilization at small node sizes: bytes For larger node sizes there ate fewer instructions executed, the minimum is reached at 1632 bytes. Optimal node size is 1632 bytes, performing 26% faster over a node size of 32 bytes.

Our Model Conclusions Conventional wisdom suggests: Node size = Cache line size We show: Using large node size can result in better search performance.

Experimental Setup Pentium 3 768MB of main memory 16KB of L1 data cache 512KB of L2 data/instruction cache 4-way, set associative 32 byte of cache line Linux, kernel version ,000,000 entries in database The database is queried 10,000 times

Effect of node size on cache misses count

Effect of node size on instructions count

Effect of node size on execution time

Final Conclusions We investigated the performance of CSB+Tree We introduced first-order analytical models We showed that cache misses and instruction count must be balanced Node size of 512 bytes performs well Larger node size suffer from poor insert performance