A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.

Slides:



Advertisements
Similar presentations
AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
Advertisements

TREES Chapter 6. Trees - Introduction  All previous data organizations we've studied are linear—each element can have only one predecessor and successor.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Introduction to Trees. Tree example Consider this program structure diagram as itself a data structure. main readinprintprocess sortlookup.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
QUANZHONG LI BONGKI MOON Indexing & Querying XML Data for../Regular Path Expressions/* SUNDAR SUPRIYA.
1 Suffix Trees and Suffix Arrays Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 8)
Modern Information Retrieval Chapter 8 Indexing and Searching.
Trees Chapter 8.
Processing Data in External Storage CS Data Structures Mehmet H Gunes Modified from authors’ slides.
BTrees & Bitmap Indexes
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
© 2006 Pearson Addison-Wesley. All rights reserved11 A-1 Chapter 11 Trees.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
Data Structures & Algorithms Radix Search Richard Newman based on slides by S. Sahni and book by R. Sedgewick.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
C o n f i d e n t i a l HOME NEXT Subject Name: Data Structure Using C Unit Title: Trees.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
Trees. Tree Terminology Chapter 8: Trees 2 A tree consists of a collection of elements or nodes, with each node linked to its successors The node at the.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Advanced Algorithms Analysis and Design Lecture 8 (Continue Lecture 7…..) Elementry Data Structures By Engr Huma Ayub Vine.
Algorithms and data structures Protected by
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Indexing Data Relationships Michael J. Franklin University of California, Berkeley & RightOrder Inc.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Starting at Binary Trees
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
1 CS 430: Information Discovery Lecture 4 Files Structures for Inverted Files.
Tree Traversals, TreeSort 20 February Expression Tree Leaves are operands Interior nodes are operators A binary tree to represent (A - B) + C.
Rooted Tree a b d ef i j g h c k root parent node (self) child descendent leaf (no children) e, i, k, g, h are leaves internal node (not a leaf) sibling.
CIS 068 Welcome to CIS 068 ! Lesson 12: Data Structures 3 Trees.
MA/CSSE 473 Day 30 B Trees Dynamic Programming Binomial Coefficients Warshall's algorithm No in-class quiz today Student questions?
Trees By JJ Shepherd. Introduction Last time we discussed searching and sorting in a more efficient way Divide and Conquer – Binary Search – Merge Sort.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.
DATA STRUCURES II CSC QUIZ 1. What is Data Structure ? 2. Mention the classifications of data structure giving example of each. 3. Briefly explain.
1 Trees. 2 Trees Trees. Binary Trees Tree Traversal.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
Query Optimization Heuristic Optimization
Data Structures: Disjoint Sets, Segment Trees, Fenwick Trees
Multiway Search Trees Data may not fit into main memory
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+ Tree.
Find in a linked list? first last 7  4  3  8 NULL
Indexing and Hashing Basic Concepts Ordered Indices
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
Tree (new ADT) Terminology: A tree is a collection of elements (nodes)
Presentation transcript:

A Summary of XISS and Index Fabric Ho Wai Shing

Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms Index Fabric (Cooper et al, VLDB2001) Patricia Balanced Trie Raw Path Index

Definition of Terms Absolute Path Expression (APE): the path which start from root, each step is a traversal of child axis or attribute axis, no wildcards e.g., /, /A/B,

Definition of Terms Regular Path Expression (RPE): may start from root or not, may traverse different axes (restricted to child, descendant-or-self, attribute for discussions since they are the most commonly used ones) may contain wildcards e.g., //, /A//C, /A/_/B,

XISS XISS = XML Indexing and Storage System by Li and Moon, published in VLDB 2001, with title “Indexing and Querying XML Data for Regular Path Expressions” decomposes and stores XML documents in the indices can answer regular path expressions

XISS - General Idea solve RPE by decomposing RPE into these 5 basic subexpressions element retrieval attribute retrieval steps involve an element and an attribute steps involve two elements a Kleene Closure of another subexpression

XISS - General Idea each subexpression is solved by its own method: element index lookup attribute index lookup EA-join EE-join KC-join

XISS - General Idea result lists from the subexpressions are joined to produce the final result to make this decomposition and join efficient, an efficient method to determine ancestor-descendant relationship is needed XISS uses an extended preorder based numbering scheme

XISS - Numbering Scheme number all the nodes with a tuple order is assigned based on an extended preorder traversal size can be imagined as the size of the subtree rooted at that node

XISS - Numbering Scheme The rules for number assignment if x precedes y in the preorder traversal, x.order < y.order (preorder) if x and y are siblings, either x.order + x.size < y.order or y.order + y.size < x.order(siblings won’t overlap) if x is an ancestor of y, x.order < y.order <= x.order + x.size (ancestor contains descendant)

XISS - Numbering Scheme Actual Assignment uses heuristics to reserve some “space” between orders reserve more space to the sizes for future node insertions attributes are place before sibling elements

XISS - Index Organization There are 5 indices Name Index Element Index Attribute Index Structure Index Value Table

XISS - Name Index maps element or attribute name to a name identifier (or nid) nid is used for further query evaluation representing that element or attribute reduce the time for string comparison in further index lookup stored in a B + -tree

XISS - Name Index Name B + -tree nid

XISS - Value Table stores all the string values of the XML document vidvalue

XISS - Element Index input: nid, output: list of element records implemented by a B + -tree leaves are pointers to list of document ID (did), each list element points to a list of all elements with the same name in the same document

XISS - Element Index nid B + -tree did list element list, Depth, ParentID element record

XISS - Attribute Index Very similar to element index always has a value identifier, vid

XISS - Structure Index Input: did, Output: array containing all the element and attributes in the document implemented by a B + -tree

XISS - Structure Index did B + -tree nid, Parent order, Child order, Sibling order, Attribute order record array

XISS - Indices When to use which index? first use Name Index to find nid of the element/attribute to be queried search Element/Attribute index for the records if we need values, lookup Value Table use Structure Index to rebuild or traverse the XML document tree

XISS - Join Algorithms After getting the record lists from each subexpression, we need to find out which are answers to the original query e.g., to find /A/B, we found a record list of all element A, another list of all element B, and we have to find out which B’s are A/B

XISS - Join Algorithms Three join algorithms proposed: EA-join - merges an element record list and an attribute record list (solves EE-join - merges two element record lists (solves A/B or A//B) KC-join - self-merge an element record list (solves (E)*)

XISS - EA-Join to solve input: an element record list and an attribute record list find out the attribute records which have parents in the element record list two lists are sorted by did and then order

XISS - EA-join 2-stage sort-merge group by did first merge using order then output criterion: E is a parent of A single scan on both list is enough

XISS - EE-join to solve E/_*/E, e.g., E/E, E//E, E/_/E input: two Element record lists, E, F output: (e,f) where e is an ancestor of f also use 2-stage sort-merge however, may need scanning of lists multiple times (for special cases, e.g., the document has /A/A/B/B)

XISS - KC-join to solve Kleene Closure of a subexpression input: a list of element records fits the base case recursively use EE join on the list, and stop until no more grow in the result list

Index Fabric by Cooper at el, published in VLDB 2001, with title “A fast index for semistructured data” has 2 subtypes, raw path index and refined path index use Patricia technique to compress the index

Index Fabric - General Idea it is a disk balanced indexing structure based on Patricia each data node is associated with a key string and this string is stored in the trie index for retrieval the layered approach in building the index ensure the number of disk pages accessed per query

Index Fabric - General Idea raw path index answers absolute path queries refined path index answers any predefined queries the difference is how to generate the key

Patricia Patricia = Practical Algorithm To Retrieve Information Coded in Alphanumeric by Morrison, in JACM 1968 a method to store and retrieve strings in a space efficient way binary, use bit comparisons, has a “skip” in each internal node

Patricia an example Patricia trie

Patricia it’s basically a trie with internal nodes having single child removed search is done by branch according to the value of bit at skip retrieve the string at leaf compare it with the query string

Index Fabric - Balanced Trie The number of disk pages accessed per query is bounded by the number of layers in the layered index The idea is similar to that of B-tree, The Patricia trie is decomposed into blocks, and there is an upper layer trie which traverse the blocks

Index Fabric - Balanced Trie e.g Layer 0Layer 1

Index Fabric - Balanced Trie There are 3 types of links in the balanced trie: far link: across layer, a result of branching near link: within the same block, a result of branching direct link: across layer, the root nodes are the same Each query will access 1 block in 1 layer

Index Fabric - Balanced Trie increase the speed by skipping nodes of original trie using traversals in upper layers number of page accessed is bounded

Index Fabric - Raw Path each data node is associated with a key key = path (encoded in designators) + value designators are special characters, each represents a name APE queries are translated to prefix to keys and submitted to the index trie

Index Fabric - Raw Path Example: HKU is translated to IBNHKU (bolded & underlined are designators query of /invoice/buyer/name[“HKU”] is translated to query string IBNHKU

Index Fabric - Refined Path Special designators can be assigned to special queries (can be regular) e.g., we define P as the path //buyer/name, and PHKU means there is a buyer/name has value HKU in the document can answer any predefined RPE very quickly

Comparison XISS can solve general RPE solve APE by dividing it into steps Index Fabric RPE solved by compile time expansion of RPE or using predefined Refined Path Index solve APE by single index lookup