Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni

Slides:



Advertisements
Similar presentations
A Vector Space Model for Automatic Indexing
Advertisements

Chapter 5: Introduction to Information Retrieval
Multimedia Database Systems
Modern Information Retrieval Chapter 1: Introduction
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
IR Models: Overview, Boolean, and Vector
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Modern Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Advance Information Retrieval Topics Hassan Bashiri.
Vector Space Model CS 652 Information Extraction and Integration.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
TextMOLE: Text Mining Operations Library and Environment Daniel B. Waegel and April Kontostathis, Ph.D. Ursinus College Collegeville PA.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Information retrieval: overview. Information Retrieval and Text Processing Huge literature dating back to the 1950’s! SIGIR/TREC - home for much of this.
Information Retrieval
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 5: Information Retrieval and Web Search
© Ramesh Jain Ramesh Jain CTO, PRAJA inc. and Professor Emeritus, UCSD Emergent Semantics and Experiential Computing.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Search Engines and Information Retrieval Chapter 1.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
Modern Information Retrieval Computer engineering department Fall 2005.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Chapter 6: Information Retrieval and Web Search
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Information Retrieval
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Modern Information Retrieval Lecture 2: Key concepts in IR.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Information Organization: Overview
Modern Information Retrieval
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
موضوع پروژه : بازیابی اطلاعات Information Retrieval
CSE 635 Multimedia Information Retrieval
Information Organization: Overview
Topic: Semantic Text Mining
Presentation transcript:

Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni

Overview Keyword-based IR and early conceptual approaches Keyword-based IR and early conceptual approaches Context and concepts in modern topical IR Context and concepts in modern topical IR Emerging IR tasks requiring knowledge structures Emerging IR tasks requiring knowledge structures Research at FUB Research at FUB Conclusions Conclusions

DocumentsQuery Vectors of weighted keywords Vector of weighted keywords Retrieved documents Matching Vector-based IR

Term weighting tf.idf and vector space model (Salton) very popular in70’s and 80’s BM25 (Robertson) has been the state of the art in the 90’s Several recent term-weighting functions based on statistical language modeling (Ponte, Lafferty) A new weighting framework based on deviation from randomness + information gain (FUB + UG)

Inherent limitations of keyword-based IR Vocabulary problem Vocabulary problem Relations are ignored Relations are ignored

Early approaches to conceptual IR n-grams n-grams (Salton 1975, Maarek 1989) parse tree parse tree (Dillon 1983, Metzler 1989) case relations case relations (Fillmore 1968, Somers 1987) conceptual graphs conceptual graphs (Dick 1991)

Why early conceptual IR not successful No best representation scheme No best representation scheme Manual coding too costly Manual coding too costly Automated coding too hard Automated coding too hard Training required both for the indexer and the user Training required both for the indexer and the user Effectiveness not clearly demonstrated Effectiveness not clearly demonstrated Retrieval task often not appropriate Retrieval task often not appropriate

Overview Vector-based IR and early conceptual approaches Vector-based IR and early conceptual approaches Context and concepts in modern topical IR Context and concepts in modern topical IR Emerging IR tasks requiring knowledge structures Emerging IR tasks requiring knowledge structures Research at FUB Research at FUB Conclusions Conclusions

Evolution of topical IR Very short queries Very short queries Heterogeneous collections Heterogeneous collections Unreliable sources Unreliable sources Interactive sessions Interactive sessions

Indexing DocsQueryContextVisualization Ranking Use Indexing Interaction Model of modern topical IR

Performance of retrieval feedback versus query difficulty

Ranking based on interdocument similarity Cluster hypothesis (van Rijsbergen 1978) Approaches - Matching the query against document clusters (Willet 1988) - Matching the query against transformed document representations (GVSM, Wong 1987, LSI, Deerwester 1990) representations (GVSM, Wong 1987, LSI, Deerwester 1990) - Computing the conceptual distance between query and documents (Order-theoretical ranking, Carpineto 2000) documents (Order-theoretical ranking, Carpineto 2000)

Order-theoretical ranking NNS 0 FINANCE (Query) 1 NNS FINANCE CREDIT KBS (D7) 4 KBS 1 NNS FINANCE BANK ACCOUNT (D1) 1 NNS 1 FINANCE 2 NNS BANK 2 NNS BANK ACCOUNT (D3) 2 FINANCE CREDIT KBS (D4) 3 CREDIT KBS (D5) 3 NNS BANK RIVER (D2) 3 BANK 4 KBS WATERS (D6)

Performance of order-theoretical ranking Better than hierarchic clustering and comparable to best matching on the whole collection Markedly better than both hierarchic clustering and best matching on non-matching relevant documents Order-theoretical ranking does not scale up well but it is synergistic with best matching document ranking

Overview Vector-based IR and early conceptual approaches Vector-based IR and early conceptual approaches Context and concepts in modern topical IR Context and concepts in modern topical IR Emerging IR tasks requiring knowledge structures Emerging IR tasks requiring knowledge structures Research at FUB Research at FUB Conclusions Conclusions

Question Answering Task: Closed-class questions in unrestricted domains with no guarantee of answer and result possibly scattered over multiple documents

Question Answering Approach: 1.Recognize type of queries 2.Retrieve relevant documents 3.Find sought entities near question words 4.Fall back to best-matching passage retrieval in case of failure

Web Information Retrieval

Current tasks: named-entity finding task topic distillation task Approach: 1.Use of multiple methods 2.Combination of results via interpolation and normalization schemes

XML document retrieval Goal: Use document structure to improve precision and recall of unstructured queries “concerts this weekend at Sofia under 20 euros” Approaches: Automatic inference of query structure Semi-automatic query annotation Hybrid query languages

Overview Vector-based IR and early conceptual approaches Vector-based IR and early conceptual approaches Context and concepts in modern topical IR Context and concepts in modern topical IR Emerging IR tasks requiring knowledge structures Emerging IR tasks requiring knowledge structures Research at FUB Research at FUB Conclusions Conclusions

Recommender systems “Related keyword” feature versus Context-dependent query reformulation

Combining text retrieval and text mining with concept lattices Integration of multiple search strategies (querying, browsing, thesaurus climbing, bounding) into a unique Web interface Goal

The use of conceptual structures surfaces in traditional topic relevance retrieval and it is at the heart of many non-topical retrieval tasks Towards conceptual search Conclusions Understand term meaning Adapt to the user Can translate between applications Explainable Capable of filtering and summarization