 Andisheh Keikha Ryerson University Ebrahim Bagheri Ryerson University May 7 th 2014 1.

Slides:



Advertisements
Similar presentations
Yansong Feng and Mirella Lapata
Advertisements

Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
A Domain Level Personalization Technique A. Campi, M. Mazuran, S. Ronchi.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Information Retrieval
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Automatically obtain a description for a larger cluster of relevant documents Identify terms related to query terms  Synonyms, stemming variations, terms.
Evaluating Retrieval Systems with Findability Measurement Shariq Bashir PhD-Student Technology University of Vienna.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli 1 A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Going Beyond Simple Question Answering Bahareh Sarrafzadeh CS 886 – Spring 2015.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Which of the two appears simple to you? 1 2.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Word Sense Disambiguation in Queries Shaung Liu, Clement Yu, Weiyi Meng.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
1 Computing Relevance, Similarity: The Vector Space Model.
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Company LOGO Digital Infrastructure of RPI Personal Library Qi Pan Digital Infrastructure of RPI Personal Library Qi Pan.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Progress Report (Concept Extraction) Presented by: Mohsen Kamyar.
Semi-Automatic Quality Assessment of Linked Data without Requiring Ontology Saemi Jang, Megawati, Jiyeon Choi, and Mun Yong Yi KIRD, KAIST NLP&DBPEDIA.
Digital libraries and web- based information systems Mohsen Kamyar.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Iana Atanassova Research: – Information retrieval in scientific publications exploiting semantic annotations and linguistic knowledge bases – Ranking algorithms.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Information Retrieval
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Knowledge based Question Answering System Anurag Gautam Harshit Maheshwari.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Extractive Summarisation via Sentence Removal: Condensing Relevant Sentences into a Short Summary Marco Bonzanini, Miguel Martinez-Alvarez, and Thomas.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Personalized Social Image Recommendation
Multimedia Information Retrieval
The Economy of Distributed Metadata Authoring
Murat Açar - Zeynep Çipiloğlu Yıldız
TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation.
CS246: Information Retrieval
Presentation transcript:

 Andisheh Keikha Ryerson University Ebrahim Bagheri Ryerson University May 7 th

2  Search Process  Query Processing  Document Ranking  Search Result Clustering and Diversification  What is the Goal  Contributions

3  Simple search  Query: keywords  Find documents which have those keywords  Rank them based on query  Result: ranked documents

4  Search Process  Query Processing  Document Ranking  Search Result Clustering and Diversification  What is the Goal  Contributions

5  Query length  Correlated with performance in the search task  Query is small collection of keywords  Hard to find relevant documents only based on 2,3 words  Solution  Query reformulation  Query expansion

6  Query Expansion  Selection of new terms Relevant documents WordNet (Synonym, hyponym, …) … Disambiguation

7  Query Expansion  Selection of new terms  Weighting those terms

8  Search Process  Query Processing  Document Ranking  Search Result Clustering and Diversification  What is the Goal  Contributions

9  Probabilistic Methods  What is the probability that this document is relevant to this query? The event that the document is judged as relevant to query The document description

10  Language Models  What is the probability of generating query Q, given document d, with language model M d. Maximum likelihood estimate of the probability Maximum likelihood estimate of the probability

11  Search Process  Query Processing  Document Ranking  Search Result Clustering and Diversification  What is the Goal  Contributions

12

13  Search Process  Query Processing  Document Ranking  Search Result Clustering and Diversification  What is the Goal  Contributions

14  Searching on google

15  Searching on google I want all of these searches show the same results, since they have same meaning, and it is the intent of the user to know all of them, when searching for one.

16  Search Process  Query Processing  Document Ranking  Search Result Clustering and Diversification  What is the Goal  Contributions  Query Expansion  Query Expansion(Tasks to Decide)  Document Ranking

17  How?  New Semantic Query Expansion Method  New Semantic Document Ranking Method

18  Search Process  Query Processing  Document Ranking  Search Result Clustering and Diversification  What is the Goal  Contributions  Query Expansion  Query Expansion(Tasks to Decide)  Document Ranking

19  Example: “Gain Weight”  Desirable keywords in expanded query: “Gain, weight, muscle, mass, fat” Gain weight Muscle Mass Fat What are these relations?

20  Digging in dbpedia and wikipedia

21  Search Process  Query Processing  Document Ranking  Search Result Clustering and Diversification  What is the Goal  Contributions  Query Expansion  Query Expansion(Tasks to Decide)  Document Ranking

22  How to map query phrases into Wikipedia components?  Which properties and their related entitles should be selected?  Can those properties be selected automatically for each phrase? Or should it be fixed for the whole algorithm?  If it’s automatic, what is the process?

23  Is dbpedia and Wikipedia enough to decide, or should we use other ontologies?  How should we weight the extracted entities (terms, senses) in order to select the expanded query among them.

24  Search Process  Query Processing  Document Ranking  Search Result Clustering and Diversification  What is the Goal  Contributions  Query Expansion  Query Expansion(Tasks to Decide)  Document Ranking

25  Are the documents annotated?  Yes Rank documents using the extracted entitles from the query expansion phase.  No Rank the documents based on the semantics of the expanded query other than the terms or phrases. Define probabilities over senses other than terms in the query and documents.

26  Are the documents annotated?  Yes Rank documents using the extracted entitles from the query expansion phase.  No Rank the documents based on the semantics of the expanded query other than the terms or phrases. Define probabilities over senses other than terms in the query and documents. Documents are not annotated, so how?

27  Semantic Similarity between two non-annotated documents ( the expanded query and the document)  There are papers on using WordNet ontology, with “topic specific PageRank algorithm”, for similarity of two sentences (phrase or word).  The application on information retrieval has not been seen yet.

28  Semantic Similarity between two non-annotated documents ( the expanded query and the document)  There are papers on using WordNet ontology, with “topic specific PageRank algorithm”, for similarity of two sentences (phrase or word).  The application on information retrieval has not been seen yet. Find the aspects of different algorithms which are more beneficial in the information retrieval domain (two large documents)

29  Semantic Similarity between two non-annotated documents ( the expanded query and the document)  There are papers on using WordNet ontology, with “topic specific PageRank algorithm”, for similarity of two sentences (phrase or word).  The application on information retrieval has not been seen yet. More reasonable is to apply the algorithm on dbpedia (instead of WordNet) in the entity domain (instead of sense domain)

30  Applying a search result clustering and diversification, based on the different semantics of the query.

31  1. B. Selvaretnam, M. B. (2011). Natural language technology and query expansion: issues, state-of-the-art and perspectives. Journal of Intelligent Information Systems, 38(3),  2. C. Carpineto, G. R. (2012). A Survey of Automatic Query Expansion in Information Retrieval. ACM Computing Surveys, 44(1),  3. Hiemstra, Djoerd. "A linguistically motivated probabilistic model of information retrieval." In Research and advanced technology for digital libraries, pp Springer Berlin Heidelberg,  4. S. W. S. R. K. Sparck Jones, "A probabilistic model of information retrieval : development and comparative experiments Part 1," Information Processing & Management, vol. 36, no. 6, pp ,  5. Sparck Jones, Karen, Steve Walker, and Stephen E. Robertson. "A probabilistic model of information retrieval: development and comparative experiments: Part 2." Information Processing & Management 36.6 (2000):  6. a. R. N. A. Di Marco, "Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction," Computational Linguistics, vol. 39, no. 3, pp ,  7. Di Marco, Antonio, and Roberto Navigli. "Clustering and diversifying web search results with graph-based word sense induction." Computational Linguistics 39, no. 3 (2013):  8. Pilehvar, Mohammad Taher, David Jurgens, and Roberto Navigli. "Align, disambiguate and walk: A unified approach for measuring semantic similarity." InProceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013)