1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Fast Algorithms For Hierarchical Range Histogram Constructions
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.
SASH Spatial Approximation Sample Hierarchy
Continuous Data Stream Processing  Music Virtual Channel – extensions  Data Stream Monitoring – tree pattern mining  Continuous Query Processing – sequence.
Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Bell Laboratories Lucent Technologies 600 Mountain Avenue Murray Hill, NJ.
Graph-Based Synopses for Relational Selectivity Estimation Joshua Spiegel and Neoklis Polyzotis University of California, Santa Cruz.
Approximate XML Query Answers Alkis Polyzotis (UC Santa Cruz) Minos Garofalakis (Bell Labs) Yannis Ioannidis (U. of Athens, Hellas)
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
Approximate XML Query Answers Neoklis Polyzotis (UC Santa Cruz) Minos Garofalakis (Bell Labs) Yannis Ioannidis (U. of Athens, Hellas) Represented by: Gal.
Approximate XML Query Answers Alkis Polyzotis (UC Santa Cruz) Minos Garofalakis (Bell Labs) Yannis Ioannidis (U. of Athens, Hellas)
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
L. Padmasree Vamshi Ambati J. Anand Chandulal J. Anand Chandulal M. Sreenivasa Rao M. Sreenivasa Rao Signature Based Duplicate Detection in Digital Libraries.
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.
Busby, Dodge, Fleming, and Negrusa. Backtracking Algorithm Is used to solve problems for which a sequence of objects is to be selected from a set such.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey.
Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
Querying Structured Text in an XML Database By Xuemei Luo.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.
Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:
Constructing Optimal Wavelet Synopses Dimitris Sacharidis Timos Sellis
Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University.
GSLPI: a Cost-based Query Progress Indicator
New Sampling-Based Summary Statistics for Improving Approximate Query Answers Yinghui Wang
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
XCluster Synopses for Structured XML Content Alkis Polyzotis (UC Santa Cruz) Minos Garofalakis (Intel Research, Berkeley)
Evolving RBF Networks via GP for Estimating Fitness Values using Surrogate Models Ahmed Kattan Edgar Galvan.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
The Canopies Algorithm from “Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching” Andrew McCallum, Kamal Nigam, Lyle.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
AQAX: Approximate Query Answering for XML Josh Spiegel, M. Pontikakis, S. Budalakoti, N. Polyzotis Univ. of California Santa Cruz.
By A. Aboulnaga, A. R. Alameldeen and J. F. Naughton Vldb’01
A paper on Join Synopses for Approximate Query Answering
Structure and Value Synopses for XML Data Graphs
Probabilistic Data Management
Structure and Content Scoring for XML
Scale-Space Representation for Matching of 3D Models
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Structure and Content Scoring for XML
Relax and Adapt: Computing Top-k Matches to XPath Queries
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

2 Outline of this talk Motivation Motivation TreeSketch Approach TreeSketch Approach Experimental Results Experimental Results Contributions and Limitations Contributions and Limitations

3 Outline  Motivation TreeSketch Approach TreeSketch Approach Experimental Results Experimental Results Contributions and Limitations Contributions and Limitations

4 Motivations XML de-facto standard for data exchange XML de-facto standard for data exchange Need to explore large XML data sets and get fast feedback from complex XML queries Need to explore large XML data sets and get fast feedback from complex XML queries Conflict between fast ‘on-line’ response and query execution cost Conflict between fast ‘on-line’ response and query execution cost --Need fast feedback

5 XML Query Challenges Involve complex traversals of the XML data hierarchy Involve complex traversals of the XML data hierarchy Complex queries over massive tree-structured data--very expensive Complex queries over massive tree-structured data--very expensive Approaches: Optimize the query or optimize the data structure Approaches: Optimize the query or optimize the data structure No need for accurate results, we can instead return approximate query answers No need for accurate results, we can instead return approximate query answers

6 Approximate Query answers Obtain an approximation to the true result Obtain an approximation to the true result Currently employed in relational systems successfully Currently employed in relational systems successfully Use approximate result to get timely feedback Use approximate result to get timely feedback XML Data. Synopsis XML R R’ Query

7 Outline Motivation Motivation  TreeSketch Approach Experimental Results Experimental Results Contributions and Limitations Contributions and Limitations --A technique being used to return fast, approximate results

8 Data and Query Model a: author n: name b: book p: paper y: year y: year k: keyword k: keyword t: title t: title --Some background, XML document

9 Data and Query Process --Twig Query, Query Tree, and Nested Result Tree

10 Basic Query Scenario a nk k d0Approximate Nesting Tree True XML Data Synopsis Key idea is to return fast, accurate feedback Key idea is to return fast, accurate feedback

11 Approximate Query Answers How to construct concise XML synopses, which capture the statistical traits of the true data How to construct concise XML synopses, which capture the statistical traits of the true data How to produce approximate query answers over the synopsis efficiently How to produce approximate query answers over the synopsis efficiently -- Two key problems

12 TreeSketch Construction Step 1: Step 1: Given an XML tree T, build a graph synopsis: each node represents a set of same tag elements, large tree Given an XML tree T, build a graph synopsis: each node represents a set of same tag elements, large tree Step2: Step2: Compress synopsis by merging nodes with similar sub- structures (i.e. clustering of the XML elements) Compress synopsis by merging nodes with similar sub- structures (i.e. clustering of the XML elements) Step 3 Step 3 Repeat Step 2 until the predefined space budget constraint is met Repeat Step 2 until the predefined space budget constraint is met Step 4 Step 4 Return the TreeSketch Synopsis Return the TreeSketch Synopsis … Perfect Space Budget --Construction Algorithm

13 More Discussions Graph synopsis construction Graph synopsis construction Use node to represent a set of same tag elements Use node to represent a set of same tag elements Query can be retrieved with zero-error Query can be retrieved with zero-error The size can become very large-it can easily be in the order of the original document size The size can become very large-it can easily be in the order of the original document size TreeSketch synopsis construction TreeSketch synopsis construction Compress the synopsis by merging nodes Compress the synopsis by merging nodes Bottom-up merging clustering algorithm Bottom-up merging clustering algorithm Key technique to compress  Clustering Key technique to compress  Clustering Based on structure Based on structure Model accuracy depends on quality of clustering Model accuracy depends on quality of clustering Tight clusters  Accurate synopsis, but large model Tight clusters  Accurate synopsis, but large model Loose clusters  Less accuracy, but small model Loose clusters  Less accuracy, but small model --of the construction procedure

14 Construction Example XML Document (Graph Synopsis) P(1) S(2) F(2) C(4) F(2) E(2) R(1) p1p1 s2s2 f5f5 c 11 s3s3 f6f6 c 12 f4f4 e8e8 c9c9 e 10 f7f7 c 13 r Synopsis node  Set of elements of the same tag Synopsis node  Set of elements of the same tag Synopsis edge  Document edge(s) Synopsis edge  Document edge(s) --Count same tag elements

15 Construction Example Calculate the number of children for each edge Calculate the number of children for each edge Count [r, p]: mean #children in p per element in r Count [r, p]: mean #children in p per element in r 1 2 = 2 / P(1) S(2) F(2) C(4) F(2) E(2) R(1) --Calculate number of children per element P(1) S(2) F(2) C(4) F(2) E(2) R(1)

16 Merging Nodes P(1) S(2) C(4) F(4) E(2) R(1) --Less space budget P(1) S(2) F(2) C(4) F(2) E(2) R(1) More Concise Synopsis TreeSkech synopsis

17 Compute Approximate Answers --more like the traditional way Travel down the tree Travel down the tree Match a pattern in the structure and return a sub-tree Match a pattern in the structure and return a sub-tree TreeSketch: Fast response TreeSketch: Fast response Concise synopsis Concise synopsis Keep statistical information Keep statistical information Node: number of same tag elements Node: number of same tag elements Edge: number of children per element Edge: number of children per element

18 Compute Approximate Answers TreeSketch q0q0 q1q1 q2q2 q3q3 //section.//equation.//caption Query Approximate Nesting Tree R E1x1=1 1x1+1x1=2C S 1x2 = P(1) S(2) F(2) C(4) F(2) E(2) R(1) Approximate results with structure 1) Take advantage of the concise structure 1) Take advantage of the concise structure 2) and the statistical data 2) and the statistical data --Example

19 Outline Motivation Motivation TreeSketch Approach TreeSketch Approach  Experimental Results Contributions and Limitations Contributions and Limitations

20 Experimental Setup Focus on Focus on the quality of the approximate answers generated the quality of the approximate answers generated the efficiency of the construction process the efficiency of the construction process Data Set Data Set Data Sets: XMark, DBLP, IMDB, SwissProt Data Sets: XMark, DBLP, IMDB, SwissProt Workload: 1000 random twig queries Workload: 1000 random twig queries

21 Evaluation Methods Error  Distance between R’ and R Error  Distance between R’ and R Popular metric: Tree-edit distance Popular metric: Tree-edit distance Min-cost sequence of operations that transform R’ to R Min-cost sequence of operations that transform R’ to R Argument: not capture the structure similarity Argument: not capture the structure similarity New Evaluation metrics : ESD (Element Simulation Distance) New Evaluation metrics : ESD (Element Simulation Distance) Calculate the number of children for each edge in the tree to capture the complete structure of the tree Calculate the number of children for each edge in the tree to capture the complete structure of the tree model how well the structure of two trees match from each other model how well the structure of two trees match from each other “degree” of simulation between two trees “degree” of simulation between two trees Average ESD for evaluation Average ESD for evaluation

22 Experimental Results --Approximate answers, compared with TwigXsketches

23 Experimental Results --Relative Errors < 5% i.e. 95% accuracy

24 Outline Motivation Motivation TreeSketch Approach TreeSketch Approach Experimental Results Experimental Results  Contributions and Limitations -Strengths and Weaknesses

25 TreeSketch Approach Propose an effective XML-summarization mechanism Propose an effective XML-summarization mechanism Captures the complete tree structure of large XML data Captures the complete tree structure of large XML data Experimental results: produce fast and accurate approximate query answers Experimental results: produce fast and accurate approximate query answers Author claim: The first work to address the timely problem of producing approximate tree-structured answers for complex XML queries Author claim: The first work to address the timely problem of producing approximate tree-structured answers for complex XML queries Comparison with the related work: 2 options Comparison with the related work: 2 options Either compute the exact answer to a path query: expensive Either compute the exact answer to a path query: expensive Or use an approach such as twig-XSketch, which does not capture the complete tree structure of the underlying XML database Or use an approach such as twig-XSketch, which does not capture the complete tree structure of the underlying XML database -In this paper

26 Limitations Difficult to optimize some pre-defined parameters, such as the space budget Difficult to optimize some pre-defined parameters, such as the space budget which directly related to the accuracy of the approximate query answers which directly related to the accuracy of the approximate query answers too large  affect the efficiency, too small  quality of the answers; depends on the query, data set, and the computing resources too large  affect the efficiency, too small  quality of the answers; depends on the query, data set, and the computing resources Expecting incremental model construction process Expecting incremental model construction process XML data always increase incrementally, we need to construct the synopsis model incrementally XML data always increase incrementally, we need to construct the synopsis model incrementally More experiments or some real applications are needed to justify the scalability of this technique More experiments or some real applications are needed to justify the scalability of this technique -Nice research, Next steps for further investigation

27 Thank You / Merci