Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant.

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Fundamentals of Computer Networks ECE 478/578 Lecture #13: Packet Switching (2) Instructor: Loukas Lazos Dept of Electrical and Computer Engineering University.
Depth-First Search1 Part-H2 Depth-First Search DB A C E.
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
TREES Chapter 6. Trees - Introduction  All previous data organizations we've studied are linear—each element can have only one predecessor and successor.
Breadth-First Search Seminar – Networking Algorithms CS and EE Dept. Lulea University of Technology 27 Jan Mohammad Reza Akhavan.
Graphs Chapter 12. Chapter Objectives  To become familiar with graph terminology and the different types of graphs  To study a Graph ADT and different.
Introduction This chapter explores graphs and their applications in computer science This chapter explores graphs and their applications in computer science.
Data Structure and Algorithms (BCS 1223) GRAPH. Introduction of Graph A graph G consists of two things: 1.A set V of elements called nodes(or points or.
1 Suffix Trees and Suffix Arrays Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 8)
ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Graph & BFS.
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
PODS Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Courant Institute, NYU Joint work with Jason Wang.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Spring 2010CS 2251 Graphs Chapter 10. Spring 2010CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs.
Course Review COMP171 Spring Hashing / Slide 2 Elementary Data Structures * Linked lists n Types: singular, doubly, circular n Operations: insert,
1 Efficient Processing of XPath Queries Using Indexes Yan Chen 1, Sanjay Madria 1, Kalpdrum Passi 2, Sourav Bhowmick 3 1 Department of Computer Science,
Using Search in Problem Solving
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
1 Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki.
1 Exact Set Matching Charles Yan Exact Set Matching Goal: To find all occurrences in text T of any pattern in a set of patterns P={p 1,p 2,…,p.
Important Problem Types and Fundamental Data Structures
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
Trees. Tree Terminology Chapter 8: Trees 2 A tree consists of a collection of elements or nodes, with each node linked to its successors The node at the.
A Level Computer Science Topic 9: Data Structures T eaching L ondon C omputing William Marsh School of Electronic Engineering and Computer Science Queen.
Requests to Tsong-Li 1. Related work at end of each section 2. Screen dumps of treebase at end of treesearch section (you’ll see where) 3. Web addresses.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
“On an Algorithm of Zemlyachenko for Subtree Isomorphism” Yefim Dinitz, Alon Itai, Michael Rodeh (1998) Presented by: Masha Igra, Merav Bukra.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Querying Structured Text in an XML Database By Xuemei Luo.
Representing and Using Graphs
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
Web Data Management Indexes. In this lecture Indexes –XSet –Region algebras –Indexes for Arbitrary Semistructured Data –Dataguides –T-indexes –Index Fabric.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
GRAPHS 1. Outline 2  Undirected Graphs and Directed Graphs  Depth-First Search  Breadth-First Search.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Graphs and Graph Algorithms Fundamentals, Terminology, Traversal, Algorithms SoftUni Team Technical Trainers Software University
Data Structures and Algorithms in Parallel Computing Lecture 2.
Graphs. Graphs Similar to the graphs you’ve known since the 5 th grade: line graphs, bar graphs, etc., but more general. Those mathematical graphs are.
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Chapter 10: Trees A tree is a connected simple undirected graph with no simple circuits. Properties: There is a unique simple path between any 2 of its.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.
Graph Indexing From managing and mining graph data.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Graphs ORD SFO LAX DFW Graphs 1 Graphs Graphs
Graphs and Graph Algorithms
Mehdi Kargar Department of Computer Science and Engineering
Top 50 Data Structures Interview Questions
Greedy Technique.
DATA STRUCTURES AND OBJECT ORIENTED PROGRAMMING IN C++
Source Code for Data Structures and Algorithm Analysis in C (Second Edition) – by Weiss
OrientX: an Integrated, Schema-Based Native XML Database System
Graphs.
Control Flow Analysis (Chapter 7)
Searching for and Comparing Trees and Graphs
Trees-2, Graphs Data Structures with C Chpater-6 Course code: 10CS35
Important Problem Types and Fundamental Data Structures
Analysis and design of algorithm
Introduction to XML IR XML Group.
Presentation transcript:

Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant

Introduction Searching in Trees Approximate Containment Queries Path-Only Searches Extension to Trees Searching in Graphs Keygraph Searching in Graph DBs GraphGrep Subgraph Matching Conclusion

Introduction Modern search engines Keyword-based queries Impressive speed Several research efforts have attempted to generalize keyword search to keytree and keygraph searching

XQuery

AQUA Query

Query expressed as a tree pattern, termed “query tree” DB can be represented as single tree or as set of trees Each tree could be ordered or unordered Queries often concerned with the parent-child, ancestor-descendant”, or path relationship among nodes Queries can be expressed by containment mapping.

Query tree may contain fixed length don’t cares (FLDCs) ex. “?” Query tree may contain variable length don’t cares (VLDCs) ex. “*” This class of queries referred to as approximate containment (AC) queries

Path-Only Searches Many AC queries are concerned with paths only. Ex. “Find the descendants of Mary who is a child of John” XISS is an indexing and querying system designed to support regular path expressions

Extension to Trees Pathfix algorithm Phase 1: Encodes each root-to-leaf path of every data tree into a suffix array DB Phase 2: Compares the query tree Q with each data tree D in the DB allowing a difference of DIFF

Handling Don’t Cares Partition query into connected subtrees having don’t cares Match each of those don’t care free subtrees with data trees in the DB For the matched subtrees that belong to the same data tree, determine whether they combine to match the query based on the matching semantics of the don’t cares. Filtering

Implementation ATreeGrep

Graphs

Graphs Abstract data type of elements (nodes or vertices) interconnected by edges. A graph is a specialized tree in which there is no constraint on the number of paths is possible from a node No root Graph may contain cycles

Keygraph Searching Searching for a particular graph or order of elements inside of a large graph (i.e. internet) Searching for a particular graph or structure among several graphs (i.e. chemical elements) Use indexing to reduce complexity

Keygraph Searching Three basic steps 1. Reduce the search space by filtering 2. Formulate query into simple structures 3. Match

Keygraph Searching (survey) A* algorithm GraphDB Daylight Lore

A* Seminal work by Nilson (1980) Route finding algorithm that keeps track of its visited nodes and the distance it has traveled. Applications: Protein databases (discovery and search) Image databases Chinese character databases CAD circuit data and software source code

A* Pseudocode function A*(start,goal) var closed := the empty set var q := make_queue(path(start)) while q is not empty var p := remove_first(q) var x := the last node of p if x in closed continue if x = goal return p add x to closed foreach y in successors(x) enqueue(q, p, y) return failure

GraphDB Specifies a data model and query model. 1. Queries are in the form of regular expressions 2. Nodes are classes representing data objects 3. Edges are classes to store paths in the database 4. Path classes are and indexing data structures are used to index database Provides graph and search operations to: Shortest path between two nodes Subgraphs from a starting node and range

GraphDB

Daylight "Provide the best known computer algorithms for chemical information processing to those who need them." Uses finger printing to index/prune

ChemDB (Contains 6.5 million unique structures or subgraphs)

Lore Database management system for XML Modeled using rooted labeled subgraph Indexed in four ways for fast regular expression use Vindex, Tindex, Lindex, Pindex(Data Guide)

Lore 1) Vindex: For each edge labeled l, all nodes are index with incomming edges labeled l and some unique atomic value that satisfy some condition. 2) Tindex: A text index for all nodes with l-labeled edges a with a string of specific values containing specific words 3) Lindex: Link index to index nodes with outgoing l-labled edges 4) Pindex (DataGuide): indexes all nodes reachable from root through labled path. The DataGuide is used by all queries from root. Other queries traverse paths using indexs(1-3), pruning what is not a match.

Tindex (1999) A Data structure to index semistructured database nodes that are reachable from several regular path expressions T-index may be more efficient than P- index because it relaxes some constraints Reportedly in graph of size 1500 T-index is 13% of database

GraphGrep Uses variable length paths (cyclic or acyclic) to index DB. This provides for efficient filtering. Nodes have ids (numbers) and labels (letters).

GraphGrep Index Construction 1. Choose an l p max indexing length 2. Create “path-representation” 3. Create fingerprint

GraphGrep Filtering the Database 1. Query graph is parsed and a fingerprint built 2. Fingerprint are compared 1. If a graph has at least one value in its fingerprint that is less than the query fingerprint it is discarded. 2. Remaining graphs may contain > 1 sub graphs

GraphGrep Filtering the Database Takes linear time to the size of the database But discards 99% of database!!!

GraphGrep Finding Subgraphs Matching with Queries Query tree depth first traversal branches are decomposed into sequences of overlapping label-paths (patterns)

GraphGrep Overlaps 1. Last node in a patters coincides with first node of next pattern (e.g. ABCB (l p = 3) ABC CB) 2. If a node has branches, it is included in the first pattern of every branch 3. The first node in a cycle is visited twice

GraphGrep Matching Example 1. Select the set of paths 2. Combine lists with constraints 3. Remove lists with equal id nodes in non overlapping positions

GraphGrep Techniques for Queries with Wildcards Consider the parts of the query graph that is between wild cards (like pathfix) The cartesian product of the components that match are valid. An entry in the cartesian product is a valid path (length = wildcards) between nodes.

GraphGrep 1 GHz pentium III NCI databases (1,000 – 16,000 nodes) Average 20 nodes in db (max 270 nodes) Queries nodes L p = 4 and 10

GraphGrep Linear in size of DB Different l p influence running time

Conclusions / Questions Searching in Trees Introduces ATreeGrep Searching in Graphs Introduces GraphGrep

Thanks to: God Class Wikipedia Various other Googled sources