DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 7: Scoring and results assembly.
Advertisements

Evaluating the Robustness of Learning from Implicit Feedback Filip Radlinski Thorsten Joachims Presentation by Dinesh Bhirud
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
XML Ranking Querying, Dagstuhl, 9-13 Mar, An Adaptive XML Retrieval System Yosi Mass, Michal Shmueli-Scheuer IBM Haifa Research Lab.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
XML R ETRIEVAL Tarık Teksen Tutal I NFORMATION R ETRIEVAL XML (Extensible Markup Language) XQuery Text Centric vs Data Centric.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Information Retrieval in Practice
Search Engines and Information Retrieval
LYU0101 Wireless Digital Information System Lam Yee Gordon Yeung Kam Wah Supervisor Prof. Michael Lyu Second semester FYP Presentation 2001~2002.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Dynamic Element Retrieval in a Structured Environment Crouch, Carolyn J. University of Minnesota Duluth, MN October 1, 2006.
1 Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCLA Computer Science Department {sliu, zou,
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
CS/Info 430: Information Retrieval
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
INEX 2003, Germany Searching in an XML Corpus Using Content and Structure INEX 2003, Germany Yiftah Ben-Aharon, Sara Cohen, Yael Grumbach, Yaron Kanza,
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Overview of Search Engines
Aparna Kulkarni Nachal Ramasamy Rashmi Havaldar N-grams to Process Hindi Queries.
Programming by Example using Least General Generalizations Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft Research.
Search Engines and Information Retrieval Chapter 1.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
1 CS 430 / INFO 430 Information Retrieval Lecture 2 Text Based Information Retrieval.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Querying Structured Text in an XML Database By Xuemei Luo.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
ISP 433/533 Week 11 XML Retrieval. Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
Chapter 6: Information Retrieval and Web Search
Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Chapter 23: Probabilistic Language Models April 13, 2004.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Chapter 5 Ranking with Indexes. Indexes and Ranking n Indexes are designed to support search  Faster response time, supports updates n Text search engines.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
Chapter 13. Structured Text Retrieval With Mounia Lalmas 무선 / 이동 시스템 연구실 김민혁.
Information Retrieval in Practice
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Text Based Information Retrieval
Efficient Ranking of Keyword Queries Using P-trees
Information Retrieval and Web Search
Toshiyuki Shimizu (Kyoto University)
CS 430: Information Discovery
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR

CONTENTS Introduction Retrieval Environment - The Vector Space Model - INEX Environment - Flexible Retrieval System Method Used for Retrieval - Document Tree – Construction - Ranking of Elements - Output Experiments Conclusions

INTRODUCTION Extensible Markup Language (XML) preferred for representing documents and due to increase of documents, issue of element retrieval arises Focus on retrieval of relevant elements rather than entire document INEX – INitiative for Evaluation of XML Retrieval Flexible Mechanisms Different Approaches Term Weighting

RETRIEVAL ENVIRONMENT 2 Factors – Issues when focus moves from documents to components and Salton’s Vector Space Model Vector Space Model – Weight number of times a term occurs in the document Fox’s Extended Vector Space Model – Incorporation of objective identifiers Document vector consists of subvectors Contain text independently indexed, weighted, searched and retrieved Term Weighting – weighting within subjective vectors Smart Experimental Retrieval System

INEX ENVIRONMENT Content Only (CO) –ignore document structure, like typical queries, specify only content of search Content and Structure (CAS) – explicitly refer to structure, exhaustive and specific CO query directly to user, CAS additional filtering and search of body portion CAS returns rank ordered list of elements INEX-EVAL – uses measures of recall and precision ( fig, exhaustivity, specificity mapped to a single relevance) results are ranked

FLEXIBLE RETRIEVAL SYSTEM Smart Format – documents and topics translated, indexed as extended vectors Subjective vectors – contain content bearing terms Objective vectors – serve as filters on result returned by CAS queries Extended vector – subjective vector, terms having a paragraph in body subvector Lnu-ltu weighting Dynamic flexible retrieval- tree representation, rank ordered list by lnu weights

METHOD FOR FLEXIBLE RETRIEVAL Input – Query Q given and paragraph, retrieve rank ordered list, terminal modes N top ranked paragraphs as input selected Set of paragraphs used to identify documents – elements generated and returned as output Document Tree – Needs information of structure Terminal nodes Pre-order traversal Terminal nodes found in paragraph index

SIMPLE XML DOCUMENT AND ITS SCHEMA

CONSTRUCTION OF DOCUMENT TREE For query Q, n top ranked paras used to build trees Leaf elements or terminal nodes - paragraph nodes Each leaf represented by term-freq weighted frequency vector 1 st – gather all leaf nodes, terminal nodes done 2 nd – merge children vectors for parents Document schema determine merging Parent – unique terms of children, term –freq weighted parent vector( has content of children) Process in recursive manner done

RANKING OF ELEMENTS Set of elements of document tree generated Problem- structured retrieval; rank ordered list of elements Method used – All-element index( separate representation for each element of each document and weighting information) Lnu weights - elements variable length, do not require global frequency Normalization and length – failing results in biased values Pivot – document length probability of relevance= probability of retrieval Slope- amount of tilting Pivoted Normalization – reduces difference Lnu term weights: ((1+log(term_freq))/ (1+log(avg_term_freq)))/((1- slope)+slope*((no_unique_terms)/pivot)

Ltu weighting – N collection size, nk no of elements ((1+log(term_freq))/log(N/nk))/ ((1-slope)+slope*(no_unique_terms)/pivot)) N,nk element dependent, should be known through indexing We move up; N – count elements of each type Nk – inverted file entry in paragraph index, mapping identifiers and xpaths (given)

OUTPUT OF FLEXIBLE RETRIEVAL Select another leaf node, gather siblings, construct document tree, calculate Lnu term weights, Ltu weighted query; produce another rank ordered list After n top ranked exhausted, last list produced, merge lists Single set of elements rank ordered – correlation Q Comparison – flexible retrieval & all-element index identical – set of n paragraphs i/p to flexible retrieval have all paragraphs same values used for Lnu-ltu

ALGORITHM

EXPERIMENTS Paragraph – result; set of extended vectors representing paragraph CO – subvector represents subjective portion, body subvector important (content of element and not type) contained in body Tree Representation

FACTORS OF INTEREST Slope, pivot for Lnu-ltu Effective structure retrieval Can be determined – empirically, applied from one collection to other; Generic N- no of paragraphs input, sets upper bound on number per query Actual trees depend on number of paragraphs having same group or same document

EXPERIMENTS DONE All-element and dynamic/flexible retrieval experiments and results - body-only retrieval Correlation between element and query vector produced – correlation of body elements only Table 1

RESULTS Tables

Result equivalent Flexible more efficient – file space Time required for indexing is half Dynamic- Per query basis cost more – n; total trees not exact required specified Another factor – value of nk

DISCUSSIONS AND CONCLUSIONS Flexible retrieval dynamically, rank ordered list of elements, single indexing at level - basic indexing node (paragraph) Basic functions- SMART; extended vector model Results – flexible capabilities Attempt to incorporate other subvectors, internal node, weight INEX – exhaustivity and specificity; results exhaustive; specificity research going on; results are reflection It is the better way of retrieval than all-indexing

THANK YOU!!!