Indexing Data Relationships Michael J. Franklin University of California, Berkeley & RightOrder Inc.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

XML DOCUMENTS AND DATABASES
Hashing and Indexing John Ortiz.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
File Systems Examples.
ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,
Hierarchy-conscious Data Structures for String Analysis Carlo Fantozzi PhD Student (XVI ciclo) Bioinformatics Course - June 25, 2002.
Modern Information Retrieval Chapter 8 Indexing and Searching.
Modern Information Retrieval
BTrees & Bitmap Indexes
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Midterm 2 Overview Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Representing Block and Record Addresses Rajhdeep Jandir ID: 103.
Spring 2003 ECE569 Lecture ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
2010/3/81 Lecture 8 on Physical Database DBMS has a view of the database as a collection of stored records, and that view is supported by the file manager.
6/24/2015B.RamamurthyPage 1 File System B. Ramamurthy.
Spring 2004 ECE569 Lecture ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
An Approach to Generalized Hashing Michael Klipper With Dan Blandford Guy Blelloch.
Indexing and Searching
Data Structures & Algorithms Radix Search Richard Newman based on slides by S. Sahni and book by R. Sedgewick.
Index Structures Parin Shah Id:-207. Topics Introduction Structure of B-tree Features of B-tree Applications of B-trees Insertion into B-tree Deletion.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
7/15/2015B.RamamurthyPage 1 File System B. Ramamurthy.
Address Lookup in IP Routers. 2 Routing Table Lookup Routing Decision Forwarding Decision Forwarding Decision Routing Table Routing Table Routing Table.
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Master Informatique 1 Qiuyue WangXML Data Management Structure Indexes for XML.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
6/2/20161 Database Systems Lecture # 3 By: Asma Ahmad Jan 21 st, 2011.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
Data Structures and Algorithms Lecture 1 Instructor: Quratulain Date: 1 st Sep, 2009.
1 Tries When searching for the name “Smith” in a phone book, we first locate the group of names starting with “S”, then within those we search for “m”,
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
Compressed Suffix Arrays and Suffix Trees Roberto Grossi, Jeffery Scott Vitter.
Performance of Compressed Inverted Indexes. Reasons for Compression  Compression reduces the size of the index  Compression can increase the performance.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 5 Index and Clustering
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
1 i206: Lecture 16: Data Structures for Disk; Advanced Trees Marti Hearst Spring 2012.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.
RDF languages and storages part 2 - indexing semi-structure data Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
Query Optimization Cases. D. ChristozovINF 280 DB Systems Query Optimization: Cases 2 Executable Block 1 Algorithm using Indices (if available) Temporary.
Ofir Luzon Supervisor: Prof. Michael Segal Longest Prefix Match For IP Lookup.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
Tries 07/28/16 11:04 Text Compression
CS522 Advanced database Systems
Tries 5/27/2018 3:08 AM Tries Tries.
IP Routers – internal view
Multidimensional Access Structures
File System Structure How do I organize a disk into a file system?
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
CSCI206 - Computer Organization & Programming
File System B. Ramamurthy B.Ramamurthy 11/27/2018.
Variable Length Data and Records
2018, Spring Pusan National University Ki-Joune Li
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Data Structures in Ethereum
Presentation transcript:

Indexing Data Relationships Michael J. Franklin University of California, Berkeley & RightOrder Inc.

2 Overview Data relationships can be complex. Hierarchical views: XML, LDAP, … Semistructure & dynamic schema Approach:Encode paths as tagged strings “raw” paths encode structure “refined” paths accelerate lookups Index strings in a highly-compact structure. Live on top of, next to or inside DBMS. Benefits Performance, Scalability + Adaptivity Leverages mature DBMS technology

3 Raw paths w/Designators  ABC Corp.  123 ABC Way  17 Main St.  Goods Inc.  widget  thingy  jobber Invoice as a tree Invoice Buyer Seller Itemlist Name Address Item ABC Corp.123 ABC Way Goods Inc. 17 Main St. widgetthingyjobber Name Address Item           

4 Refined paths Optimize specific access paths “Find invoices where X sold to Y ” “Find invoices where X bought Y and Z” “Find invoices where a buyer bought X, Y and Z ”  X Y  ABC Corp. Goods Inc.  XYZ Corp. Acme Inc.  ABC Corp. jobber widget  XYZ Corp. drill hammer  X Y Z  X Y Z  jobber thingy widget  drill hammer nail

5 Index Fabric An index structure for long strings. Provides fast lookups Handles long strings Ideal substrate for designated keys Based on Patricia tries Highly compressed string representation Cost in index independent of string length But, need to balance.

6 Patricia tries Indexes first point of difference between keys greenbeans greentea gc r w 0 22 corn cow a 2 grass 5 e b t greenbeansgreentea D. R. Morrison. “PATRICIA – Practical algorithm to retrieve information coded in alphanumeric.” J. ACM, 15 (1968) pp

7 Multiple Hierarchical Views Can store multiple permulations of relationships Find animals and the plants they eat Find plants and the animals that eat them Represent as a new set of keys Store data once using “permutation records”  corn  cow  corn  cow

8 Example  0 2 o a  cat 4  5 c  cow  corn 1  w 5 c  wheat  corn 6  c  cow

9 Example  0 2 o a  cat 4  5  corn 1  w 5 c  wheat 6   cow   cc

10 Balancing Patricia tries gc r w 0 22 corn cow a 2 grass 5 e b t greenbeansgreentea

11 Balancing Patricia tries Step 1: divide trie into blocks gc r w 0 22 corn cow a 2 grass 5 e b t greenbeansgreentea

12 Balancing Patricia tries Step 2: build another layer g 0 2 Layer 1 Layer 0 e gc r w 0 22 corn cow a 2 grass 5 e b t greenbeansgreentea

13 Balancing Patricia tries Search for “cash” g 0 2 Layer 1 Layer 0 e greenbeans gc r w 0 22 corn cow a 2 grass 5 e b t greenbeansgreentea

14 Balancing Patricia tries Search for “cash” g 0 2 Layer 1 Layer 0 e greenbeans gc r w 0 22 corn cow a 2 grass 5 e b t greenbeansgreentea

15 Balancing Patricia tries Search for “cash” g 0 2 Layer 1 Layer 0 e greenbeans gc r w 0 22 corn cow a 2 grass 5 e b t greenbeansgreentea

16 Balancing Patricia tries Layer 0 Data Search Layer 0 Layer 1 Layer 2 Layer 3

17 Performance Number of layers is small Fixed (small) space per key  High branching factor per block  Bushy, shallow tree Example: 8 KB blocks 32 bit pointers + 2 bytes for keys/structure = pointers per block = 3 layers for 1 billion pointers to data ( ) Upper layers are tiny (10 megabytes), in RAM Only layer 0 on disk  Usually one index I/O per key lookup Data

18 Find publications by co-authors RDBMS STORED 2.5 : 1 Index Fabric Raw Paths 5 : 1 Index Fabric Refined Paths 25 : 1 RDBMS Edge mapping 10,000 queries

19 Find publications by co-authors RDBMS STORED Index Fabric Raw Paths Index Fabric Refined Paths RDBMS Edge mapping 2.1 : 1 4 : 1 20 : 1 10,000 queries

20 Conclusion Index arbitrary relationships Encode as designated strings Relationships and structures can be complex Index many data access paths No need for DTD or pre-defined schema Index Fabric Special data structure for long keys High performance key lookups Supports designator encoding

21 For more information