XIRQL: Eine Anfragesprache für Information Retrieval in XML-Dokumenten

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

Angstrom Care 培苗社 Quadratic Equation II
AP STUDY SESSION 2.
1
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
Copyright © 2003 Pearson Education, Inc. Slide 8-1 Created by Cheryl M. Hughes, Harvard University Extension School Cambridge, MA The Web Wizards Guide.
Copyright © 2003 Pearson Education, Inc. Slide 3-1 Created by Cheryl M. Hughes The Web Wizards Guide to XML by Cheryl M. Hughes.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Objectives: Generate and describe sequences. Vocabulary:
UNITED NATIONS Shipment Details Report – January 2006.
David Burdett May 11, 2004 Package Binding for WS CDL.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination. Introduction to the Business.
We need a common denominator to add these fractions.
AIFB Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 1 Mind the Web! Valentin Zacharias, Andreas Abecker, Imen.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Determine Eligibility Chapter 4. Determine Eligibility 4-2 Objectives Search for Customer on database Enter application signed date and eligibility determination.
Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
Dr. Alexandra I. Cristea CS 253: Topics in Database Systems: XPath, NameSpaces.
Chapter 7 Sampling and Sampling Distributions
Excel Functions. Part 1. Introduction 2 An Excel function is a formula or a procedure that is performed in the Visual Basic environment, outside the.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Bellwork Do the following problem on a ½ sheet of paper and turn in.
XML and Databases Exercise Session 3 (courtesy of Ghislain Fourny/ETH)
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
VOORBLAD.
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Chapter 5: Query Operations Hassan Bashiri April
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
Traditional IR models Jian-Yun Nie.
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
Universität Kaiserslautern Institut für Technologie und Arbeit / Institute of Technology and Work 1 Q16) Willingness to participate in a follow-up case.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Exponents and Radicals
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Chapter 13 Web Page Design Studio
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Web Data Management XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
South Dakota Library Network MetaLib User Interface South Dakota Library Network 1200 University, Unit 9672 Spearfish, SD © South Dakota.
1 - Fuhr: Information Retrieval Methods for XML Documents XIRQL: Eine Anfragesprache für Information Retrieval in XML- Dokumenten Norbert Fuhr Universität.
ISP 433/533 Week 11 XML Retrieval. Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity.
XML Information Retreival Hui Fang Department of Computer Science University of Illinois at Urbana-Champaign Some slides are borrowed from Nobert Fuhr’s.
Presentation transcript:

XIRQL: Eine Anfragesprache für Information Retrieval in XML-Dokumenten Norbert Fuhr Universität Duisburg

Outline of Talk XML retrieval XIRQL: XML IR Query Language XIRQL vs. XQuery User Interface INEX: Initiative for the Evaluation of XML Retrieval Summary

I. XML documents <book class="H.3.3"> Elements: <author>John Smith</author> <title>XML Retrieval</title> <chapter> <heading>Introduction</heading> This text explains all about XML and IR. </chapter> <chapter> <heading> XML Query Language XQL </heading> <section> <heading>Examples</heading> </section> <heading>Syntax</heading> Now we describe the XQL syntax. </book> Elements: start tag end tag content attribute

Tree view document class="H.3.3" chapter chapter author title heading section section John Smith heading This. . . XML Query heading heading We describe Language XQL XML Retrieval Introduction syntax of XQL Examples Syntax

XML query languages Data-centric view: XML as exchange format for structured data Document-centric view: XML as format for representing the logical structure of documents W3C WG proposal for XML query language: XQuery Focuses on data-centric view here: Information Retrieval for document-centric view Starting point: XPath (XQL)

XPath Path condition: parent/child node chapter/heading document class="H.3.3" chapter chapter author title heading section section John Smith heading This. . . XML Query heading heading We describe Language XQL XML Retrieval Introduction syntax of XQL Examples Syntax Path condition: parent/child node chapter/heading

XPath Path condition: ancestor-descendant chapter//heading document class="H.3.3" chapter chapter author title heading section section John Smith heading This. . . XML Query heading heading We describe Language XQL XML Retrieval Introduction syntax of XQL Examples Syntax Path condition: ancestor-descendant chapter//heading

XPath Filter wrt. structure: //chapter[heading] document class="H.3.3" author title heading section section John Smith heading This. . . XML Query heading heading We describe Language XQL XML Retrieval Introduction syntax of XQL Examples Syntax Filter wrt. structure: //chapter[heading]

XPath Filter wrt. content: document class="H.3.3" chapter chapter author title John Smith heading section section heading This. . . XML Query heading heading We describe Language XQL XML Retrieval Introduction syntax of XQL Examples Syntax Filter wrt. content: /document[@class="H.3.3"  author="John Smith"]

XPath properties Conditions wrt. logical structure Conditions wrt. content Results are arbitrary (complete) elements of the original documents Boolean Retrieval (poor retrieval quality) Relevance-oriented search (irrespective of structure) not supported Few data types only

II. XIRQL: XML IR Query Language Extend XPath by: Probabilistic retrieval with weighted document indexing Relevance-oriented search (irrespective of structure) (Extensible) data types with vague predicates Structural relativism

II.1 Probabilistic Retrieval with XIRQL Problem: weighting of different forms of occurrence of terms /document[.//heading  "XML"  .//section//*  "XML"] document chapter chapter heading section section heading This. . . XML Query heading heading We describe Language XQL syntax of XQL Introduction Examples Syntax

Weighting of term occurrences in documents a) Weighting wrt. single query conditions P(.//heading  "XML“,d) = 0.5 P(.//section//*  "XML“,d) = 0.7 Possible overlapping of query conditions Dependent probabilistic events Only probability intervals for answers No linear ranking of documents

Weighting of term occurrences in documents b) Weighting wrt. document parts Term weighting depends on context of term occurrence All occurrences within same context refer to same probabilistic event Only identical and independent events Point probabilities for answers Linear ranking of documents

Index nodes as units for term weighting 1 2 3 4 5 document class="H.3.3" author John Smith title XML Retrieval Introduction chapter heading This. . . Syntax Examples section XML Query Lang. XQL We describe syntax of XQL Application of known indexing functions (e.g. tf*idf)

Probabilistic events and event expressions Problem: combination of term weights consistent with probability theory Basic event: term occurrence in an index node Basic events are independent (different terms, same term in different index nodes) Event expressions describe combination of basic events in a document wrt. a query

Event expressions //section[.//*  "XQL"  .//*  "syntax"] document class="H.3.3" chapter chapter author title heading section section John Smith heading This. . . XML Query We describe heading heading Lang. XQL syntax of XQL XML Retrieval Introduction 1 2 3 Syntax Examples 4 5 //section[.//*  "XQL"  .//*  "syntax"] [5,XQL]  [5,syntax]

Event expressions /document/chapter [.//*  "XQL"  .//*  "syntax"] class="H.3.3" chapter chapter author title heading section section John Smith heading This. . . XML Query We describe heading heading Lang. XQL syntax of XQL XML Retrieval Introduction 1 2 3 Syntax Examples 4 5 /document/chapter [.//*  "XQL"  .//*  "syntax"] ([3,XQL]  [5,XQL]) [5,syntax]

Evaluation of event expressions (as in probabilistic Datalog) Transform event expression into disjunctive normal form e = C1  …  Cn Ci: Conjunction of event atoms Event atom: positive or negated basic event Application of inclusion/exclusion formula:

II.2 Relevance-oriented search (Queries irrespective of document structure) Restrict possible answers (not all elements suitable) Retrieval strategy: return most specific element satisfying the query but: combination with weighted indexing? Solution: Index nodes as roots of possible answers Augmentation as concept for computing tradeoff between indexing weights and specifity of answers

Index nodes for relevance-oriented search document class="H.3.3" chapter chapter author title heading section section John Smith heading This. . . XML Query heading heading We describe Lang. XQL syntax of XQL XML Retrieval Introduction 1 2 3 Examples Syntax 4 5

Augmentation …by disjunction Example query: syntax  example 0.7*0.5 chapter 0.7*0.5 0.86 0.3 XQL 0.5 example 0.7 syntax section1 section2 0.5 example 0.8 XQL 0.7 syntax Example query: syntax  example

Augmentation …by disjunction Example query: XQL 0.86 0.8 chapter 0.86 0.7 syntax section1 section2 0.5 example 0.8 XQL 0.8 0.7 syntax Example query: XQL

Augmentation …with augmentation weight Example query: XQL 0.64 0.8 chapter 0.64 0.64 0.3 XQL 0.30 example 0.42 syntax 0.6 section1 section2 0.5 example 0.8 XQL 0.8 0.7 syntax Example query: XQL

II.3 XIRQL: Data types with vague predicates XML markup allows for detailed markup of text elements Exploit markup for more precise searches Consider also vagueness and imprecision of IR Data types with vague queries ``Search for an artist named Ulbrich, living in the Rhine-Main area of Germany about 100 years ago” Ernst Olbrich, Darmstadt, 1899 (Extensible) data types for document-centric view (person names, dates, geographic locations, classifications/ images, audio,...)

Extensible type hierarchy Extensible type hierarchy with vague predicates for each data type text: substring-match Western language: single word search, truncation, word distance English text: stemming, noun phrases Data types of XML documents defined in extended DTD (XML schema)

II.4 Structural Relativism Drop distinction attribute/element: ~author searches for attribute or element Generalize to data types: #personname searches for attributes/elements of specific data type Exploit ontology over element names: region – country – continent Edit distance on paths: author=“Smith” vs. author/name vs. author/name/lastname

III. XIRQL vs. XQuery XQuery (proposed as standard XML query language by W3C WG): No IR support (weighting, vague predicates, relevance-oriented search, structural relativism) Aggregation operators (sum, count, min, max, avg) Restructuring of results

XIRQL as IR extension of XQuery subset XQuery structure: FOR PathExpression WHERE AdditionalSelectionCriteria RETURN ResultConstruction XIRQL subset: FOR $X IN PathExpression RETURN $X

IV. User Interface Query formulation Result visualization

Query Formulation: Layout-oriented

Query Formulation: Structure-oriented

Visualization of Results: Textbars

Visualization of Results : Treemaps

V. INEX: Initiative for the Evaluation of XML Retrieval Initially 50 groups from 20 countries (finally 27 active) Documents: 7 years of IEEE-CS journals (12107 articles, 494 MB) Queries: 30 content-only, 30 content+structural conditions Results due: September 15, 2002 Relevance judgements due: November 20 Final Workshop: December 9-11, 2002

Example query Title: Nonmonotonic Reasoning Description: Condition: Retrieve all articles from the years 1999-2000 that deal with nonmonotonic reasoning. Do not retrieve articles that are calendar/calls for papers. Condition: /article[./bdy/sec  “nonmonotonic reasoning”  ./hdr/yr[.= 2000  . = 1999]  .//.  “belief revision”   .//tig/atl  “calendar”]

VI. Summary Data-centric vs. document-centric view on XML (database vs. IR view) IR methods for XML must support uncertainty and vagueness…

XIRQL: XML query language implementing Combination of structural conditions with probabilistic weighting Relevance-oriented search by augmentation Extensible data types with vague predicates Structural relativism HyREX: Open source XML retrieval engine: http://ls6-www.cs.uni- dortmund.de/ir/projects/hyrex