Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

Covering Indexes for XML Queries by Prakash Ramanan
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
1 Union-find. 2 Maintain a collection of disjoint sets under the following two operations S 3 = Union(S 1,S 2 ) Find(x) : returns the set containing x.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
Nearest Neighbor Queries using R-trees
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
TREES Chapter 6. Trees - Introduction  All previous data organizations we've studied are linear—each element can have only one predecessor and successor.
SplitMEM: graphical pan-genome analysis with suffix skips Shoshana Marcus May 29, 2014.
Topology Matching For Fully Automatic Similarity Matching of 3D Shapes Masaki Hilaga Yoshihisa Shinagawa Taku Kohmura Tosiyasu L. Kunii.
The Fourth WIM Meeting 1 Active Nearest Neighbor Queries for Moving Objects Jan Kolar, Igor Timko.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
1 COS 425: Database and Information Management Systems XML and information exchange.
EA* A Hybrid Approach Robbie Hanson. What is it?  The A* algorithm, using an EA for the heuristic.  An efficient way of partitioning the search space.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Tirgul 7 Review of graphs Graph algorithms: – BFS (next tirgul) – DFS – Properties of DFS – Topological sort.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Important Problem Types and Fundamental Data Structures
Cooperative Query Answering for Semistructured data Michael Barg Raymond K. Wong Reviewed by SwethaJack Christian (Absent) Chris.
1/17 ITApplications XML Module Session 7: Introduction to XPath.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
Querying Structured Text in an XML Database By Xuemei Luo.
Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2012 Lecture 10.
Processing of structured documents Spring 2003, Part 7 Helena Ahonen-Myka.
1 Exact Top-k Nearest Keyword Search in Large Networks Minhao Jiang†, Ada Wai-Chee Fu‡, Raymond Chi-Wing Wong† † The Hong Kong University of Science and.
Store segmentation using SAS clustering Baofu Ma Merchandising AUTOZONE ANALYST,MERCH RESEARCH.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
CSE332: Data Abstractions Lecture 24.5: Interlude on Intractability Dan Grossman Spring 2012.
Gökay Burak AKKUŞ Ece AKSU XRANK XRANK: Ranked Keyword Search over XML Documents Ece AKSU Gökay Burak AKKUŞ.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Group 8: Denial Hess, Yun Zhang Project presentation.
On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:
9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Designing Streamable XPath Expressions Roger L. Costello January 5,
Keyword Searching and Browsing in Databases using BANKS Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S. Sudarshan Presented by Sushanth.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
1 Data Structures CSCI 132, Spring 2014 Lecture23 Analyzing Search Algorithms.
1 A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
Efficient processing of path query with not-predicates on XML data
Distance Computation “Efficient Distance Computation Between Non-Convex Objects” Sean Quinlan Stanford, 1994 Presentation by Julie Letchner.
Probabilistic Data Management
Associative Query Answering via Query Feature Similarity
Web Data Extraction Based on Partial Tree Alignment
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Toshiyuki Shimizu (Kyoto University)
Information Retrieval
Comparative RNA Structural Analysis
Bidirectional Query Planning Algorithm
Lecture 12 CSE 331 Sep 22, 2014.
A Semantic Peer-to-Peer Overlay for Web Services Discovery
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
Introduction to XML IR XML Group.
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong

Outline Motivations Overview Basic Concepts Cooperative Query Processing Experiment

Motivations XML data – same semantic content – very different structures

Example: same semantics, diff structures insurance claims related to smoking for woman User Query: Court Transcript: insurance claim plaintiff woman smoking Insurance Record: insurance claim insurer woman smoking

Motivations No exact query result phone number of Bob Who is the new sales manager User Query: personnel sales manager Joe phone number assistant sales manager Bob phone number salesman Data:

Overview Goal: – Return approximate answers for XML queries – approximate: semantic + structural similar Solution: – Return a set of results – ranked by an overall score score: indicates how well the subgraph containing the result satisfies the query criteria.

Basic Concepts:Query Tree Query:/restaurant[.//Soho]/phone_number Result Term For each edge: head: the end which is closer to nearest result term end: the other end In case of tie, head is the end closer to root Query Tree: restaurant soho phone_number r h t h t

Basic Concepts: Converging Order Order of edges considered in query processing Converge on a result term

Basic Concepts:Similarity Semantically similar topologies restaurant address soho restaurant soho restaurant soho eating_ places restaurant shopping_ center soho restaurant (a) (c)(e)(b)(d)

Basic Concepts: Similarity (cont.) Deviation Proximity (DP) – Measure how far one structure deviates from a desired structure – Given: r a : data node with value a r b : data node with value b Q(a,b): query tree edge – DP: the actual position of r b to the nearest position, r b, which satisfies the topological relationship specified by Q(a,b) Topological relationship: parent-child, ancestor-descendent

Deviation Proximity restaurant address soho restaurant soho eating_ places restaurant shopping_ center soho restaurant Q (restaurant, soho) requires parent-child relationship (soho) soho restaurant (soho) DP(restauarent, soho):

Deviation Proximity Q (restaurant, soho) requires anc-desc relationship restaurant address soho restaurant soho eating_ places restaurant shopping_ center soho restaurant (soho) soho restaurant (soho) DP(restauarent, soho):

Cooperative Query Processing Input: a Query Tree Q T, an XML Document Tree D T Output: ordered list of Cooperative Query Processing – Structural proximity calculation – Progressive Score

Cooperative Query Processing (cont.) Progressively matching edges in Q T with D T – Consider edges in converging order – For each edge Q T (a,b), where a is head and b is tail, get a list of r a is a node in D T with value a score is the progressive score of r a w.r.t the nearest r b use graph encoding to calculate structural proximity of r a and r b

Structural Proximity Calculation Encodings and Compressed Arrays – Compact – Preserve relationship to a larger graph – Facilitate distance calculations Proximity Searching

Encodings and Compressed Arrays Basic Concepts: – Common Node – Terminal Node – Annotated Node Path representation – Representing Single Path – Representing Multiple Paths – Representing Multiple Elements Compressed Arrays – Each encoding is a path/muti-path for a node/a set of nodes

Encodings and Compressed Arrays

Representing Single Path y y 2

Representing Multiple Paths 1.3 B.B C.3 C.C.2 y 3

Representing Multiple Elements 1 A.A.1.1 y y 2.3 B.B C.3 C.C.2 y 3

Compressed Arrays

Drawback of Encoding 1 A.A.1 B.B.1 D.2 E. ?.2 C.C.1 F.2 G

Proximity Searching Multi-Element Comparison – Input: A compressed array, caN, containing the multi-element encoding of the Near Set. A compressed array, caF, containing the multi-path encoding or path encoding of all paths from the root to the specified element of the Find Set, EF. – output: dist, the shortest path from EF to the closest element in Near Set

Proximity Searching MinDist=5MinDist = 4MinDist = 2

Progressive Score Accumulative Deviation Proximity (DP) – Calculated from structural proximity Boolean operator at Query Tree branches a b c a b c prog(a) = prog(b)+prog(c) prog(a) = min (prog(b),prog(c))

Experiment Query: //restaurant/soho XML: Query Result:

Thank you!

Questions & Answers