INEX – a broadly accepted data set for XML database processing? Pavel Loupal, Michal Valenta.

Slides:



Advertisements
Similar presentations
UNIVERSITY OF JYVÄSKYLÄ Mobile Chedar – A Peer-to-Peer Middleware for Mobile Devices Presentation for International Workshop on Mobile Peer-to- Peer Computing.
Advertisements

INEX: Evaluating content-oriented XML retrieval Mounia Lalmas Queen Mary University of London
Evaluating content-oriented XML retrieval: The INEX initiative Mounia Lalmas Queen Mary University of London
Evaluating XML retrieval: The INEX initiative Mounia Lalmas Queen Mary University of London
XML Retrieval: from modelling to evaluation Mounia Lalmas Queen Mary University of London qmir.dcs.qmul.ac.uk.
Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald.
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
CG0119 Web Database Systems Parsing XML using SimpleXML.
XML: Extensible Markup Language
XML Ranking Querying, Dagstuhl, 9-13 Mar, An Adaptive XML Retrieval System Yosi Mass, Michal Shmueli-Scheuer IBM Haifa Research Lab.
XML Query Evaluation Using a –calculus Based Framework Pavel Loupal & Karel Richta, FEL ČVUT Praha
Mercator/Coronelli ArcGIS Server 9.3 Data Management GIS Web Services Mapping Application Developer Tools Spatial Analysis Publishing to Clients Image.
December 9, 2002 Cheshire II at INEX -- Ray R. Larson Cheshire II at INEX: Using A Hybrid Logistic Regression and Boolean Model for XML Retrieval Ray R.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Dynamic Element Retrieval in a Structured Environment Crouch, Carolyn J. University of Minnesota Duluth, MN October 1, 2006.
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Benchmarking XML storage systems Information Systems Lab HS 2007 Final Presentation © ETH Zürich | Benchmarking XML.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
1 Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCLA Computer Science Department {sliu, zou,
1998/5/21by Chang I-Ning1 ImageRover: A Content-Based Image Browser for the World Wide Web Introduction Approach Image Collection Subsystem Image Query.
1 COS 425: Database and Information Management Systems XML and information exchange.
Hybrid XML Retrieval Revisited Jovan Pehcevski PhD Candidate School of CS and IT, RMIT University
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
1 - Fuhr: Information Retrieval Methods for XML Documents XIRQL: Eine Anfragesprache für Information Retrieval in XML- Dokumenten Norbert Fuhr Universität.
Overview of Database Access in.Net Josh Bowen CIS 764-FS2008.
10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics.
INEX : Understanding XML Retrieval Evaluation Mounia Lalmas and Anastasios Tombros Queen Mary, University of London Norbert Fuhr University.
JSP Standard Tag Library
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
CPS120: Introduction to Computer Science The World Wide Web Nell Dale John Lewis.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
DATABASE and XML Moussa Mané. Learning Objectives ● Learn about Native XML Databases ● Learn about the conversion technology available ● Understand New.
Entity Recognition via Querying DBpedia ElShaimaa Ali.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Overview of Data Access MacDonald Ch. 15 MIS 324 Professor Sandvig.
Extracting tabular data from the Web. Limitations of the current BP screen scraper. Parsing is done line by line. Parsing is done line by line. Pattern.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Service Computation 2010November 21-26, Lisbon.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Query Processing In Multimedia Databases Dheeraj Kumar Mekala Devarasetty Bhanu Kiran.
TopX 2.0 at the INEX 2009 Ad-hoc and Efficiency tracks Martin Theobald Max Planck Institute Informatics Ralf Schenkel Saarland University Ablimit Aji Emory.
Institutional Web Management Workshop - Sept 1998 slide 1 Events on-line Stephen Emmott Web Editor King’s College London King’s College London is dedicated.
XML Registries Source: Java TM API for XML Registries Specification.
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
XML and Database COSC643 Sungchul Hong. Is XML a Database? Yes but only in the strictest sense of the term. It is a collection of data. (some sort) XML.
ISP 433/533 Week 11 XML Retrieval. Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
Book: Bayesian Networks : A practical guide to applications Paper-authors: Luis M. de Campos, Juan M. Fernandez-Luna, Juan F. Huete, Carlos Martine, Alfonso.
Declaratively Producing Data Mash-ups Sudarshan Murthy 1, David Maier 2 1 Applied Research, Wipro Technologies 2 Department of Computer Science, Portland.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Personal Project. Topic Modeling and Presenting Data from a Publication Objectives –Using XML related techniques to model and present data from a publication.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
JAVA BEANS JSP - Standard Tag Library (JSTL) JAVA Enterprise Edition.
Reviews Crawler (Detection, Extraction & Analysis) FOSS Practicum By: Syed Ahmed & Rakhi Gupta April 28, 2010.
XML Databases – do they really exist? Jan Erik Kofoed BIBSYS Library Automation ELAG 2005 at CERN, Geneva.
©2007 Really Strategies, Inc. CONFIDENTIAL 1 Native XML Content Management Philadelphia XML Users’ Group.
Utilizing the Benefits of Native XML Database Technologies Alan Cornish Systems Librarian Washington State University Libraries.
Developing GRID Applications GRACE Project
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
I Copyright © 2004, Oracle. All rights reserved. Introduction.
OrientX: an Integrated, Schema-Based Native XML Database System
“INEX 2005: Playground for XML-retrieval” Sergey Chernov
Toshiyuki Shimizu (Kyoto University)
2/18/2019.
Query Type Classification for Web Document Retrieval
Presentation transcript:

INEX – a broadly accepted data set for XML database processing? Pavel Loupal, Michal Valenta

Valenta, Loupal: INEX – a broadly accepted data set for XML database processing? 2 Presentation Content 1. INEX initiative 2. INEX data set 3. Utilization framework 4. Example – approximate XML tree embedding

Valenta, Loupal: INEX – a broadly accepted data set for XML database processing? 3 INEX Initiative 1/ – reference dataset for information retrieval 2001 – reference dataset for information retrieval Duisburg-Essen University – Norbert Fuhr, Saadia Malik Queen Mary University London – Maunia Lalmas 2003 – 69 participants (mainly universities) 2003 – 69 participants (mainly universities) 2 workshops (2002, 2003) 2 workshops (2002, 2003) open discussion about actual stage of the project open discussion about actual stage of the project

Valenta, Loupal: INEX – a broadly accepted data set for XML database processing? 4 INEX Initiative 2/3 1.stage – data collection (by IEEE) 2.stage – referential queries evaluation 30 Content Only (CO) 30 Content Only (CO) 36 Content and Structure (CAS) 36 Content and Structure (CAS) 3.stage – manual relevance assessment of query results continues…

Valenta, Loupal: INEX – a broadly accepted data set for XML database processing? 5 INEX Initiative 3/3 3.stage – our join-point to INEX: Assessment of queries 83,84 – 1000 docs each Assessment of queries 83,84 – 1000 docs each 2-dimensional scale (exhaustivity, specificity) 2-dimensional scale (exhaustivity, specificity) Relevance assessment on XML elements (parent-child dependencies) Relevance assessment on XML elements (parent-child dependencies) Finished in February 2004 Finished in February stage (actual) Study of researchers behaviour Study of researchers behaviour Heterogenous resources / distributed systems Heterogenous resources / distributed systems

Valenta, Loupal: INEX – a broadly accepted data set for XML database processing? 6 INEX Initiative - Assessment

Valenta, Loupal: INEX – a broadly accepted data set for XML database processing? 7 INEX Data Set Structure 1/3 Actual version 1.4 – 536 MB Actual version 1.4 – 536 MB 6 IEEE Transactions, 12 journals ( ) 6 IEEE Transactions, 12 journals ( ) articles – XML text only (without pictures) articles – XML text only (without pictures) Organized in file system matter Organized in file system matter In average each article has In average each article has 1532 nodes, 45 kB 1532 nodes, 45 kB average depth: 6.9 average depth: 6.9

Valenta, Loupal: INEX – a broadly accepted data set for XML database processing? 8 INEX Data Set Structure 2/3 /inex-1.4 /dtd /dtd xmlarticle.dtd xmlarticle.dtd /xml /xml /an /an /1995 / a1019.xml a1019.xml a1032.xml a1032.xml a1034.xml a1034.xml /... /... /2002 /2002 /... /... /ts /ts

Valenta, Loupal: INEX – a broadly accepted data set for XML database processing? 9 INEX Data Set Structure 3/3 <article> IEEE Transactions on... IEEE Transactions on... Construction of... Construction of... John John Smith Smith University of... University of Introduction Introduction <sec> </article>

Valenta, Loupal: INEX – a broadly accepted data set for XML database processing? 10 Data Set Utilization – Framework 1/2 Native XML storage (Apache Xindice) Native XML storage (Apache Xindice) Key features: Key features: Inner structure: Collections & documents Inner structure: Collections & documents Standard API (XML:DB or XML-RPC) Standard API (XML:DB or XML-RPC) XPath expressions over collections & docs XPath expressions over collections & docs Metadata Metadata

Valenta, Loupal: INEX – a broadly accepted data set for XML database processing? 11 Data Set Utilization – Framework 2/2 Web interface – Java Server Pages (JSPs) Web interface – Java Server Pages (JSPs) Usage of XML:DB Java API: Usage of XML:DB Java API: String url = “xmldb:xindice://localhost:8080/inex/mu/2001”; Collection col = DB.getCollection(url); doc = col.getResource(“a1019.xml”); System.out.println(doc.getContent());

Valenta, Loupal: INEX – a broadly accepted data set for XML database processing? 12 Approximate Tree Embedding 1/4 Aim: Approximately embed one XML tree (query) into another (data) Aim: Approximately embed one XML tree (query) into another (data) Algorithm history: Algorithm history: Kilpelainen – NP complete problem Kilpelainen – NP complete problem Schlieder – polynomial in practical examples Schlieder – polynomial in practical examples Vana – further improvements Vana – further improvements

Valenta, Loupal: INEX – a broadly accepted data set for XML database processing? 13 Approximate Tree Embedding 2/4

Valenta, Loupal: INEX – a broadly accepted data set for XML database processing? 14 Approximate Tree Embedding 3/4 Query:<article> Smith Smith </article> Data:<articles> … John Smith John Smith Mark Knopfler Mark Knopfler …</articles>

Valenta, Loupal: INEX – a broadly accepted data set for XML database processing? 15 Approximate Tree Embedding 4/4

Valenta, Loupal: INEX – a broadly accepted data set for XML database processing? 16 Conclusion INEX initiative overview INEX initiative overview INEX data set + our testing framework = INEX data set + our testing framework = suitable for testing algorithms & approaches Further discussion Further discussion