Compare&Contrast: Using the Web to Discover Comparable Cases for News Stories Presenter: Aravind Krishna Kalavagattu.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Information Retrieval Review
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Using the Semantic Web for Web Searches Norman Piedade de Noronha, Mário J. Silva XLDB / LaSIGE, Faculdade de Ciências, Universidade de Lisboa.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
1 Discussion Class 3 Inverse Document Frequency. 2 Discussion Classes Format: Questions. Ask a member of the class to answer. Provide opportunity for.
Vector Space Model CS 652 Information Extraction and Integration.
Chapter 19: Information Retrieval
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Information retrieval: overview. Information Retrieval and Text Processing Huge literature dating back to the 1950’s! SIGIR/TREC - home for much of this.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
1 Announcements Research Paper due today Research Talks –Nov. 29 (Monday) Kayatana and Lance –Dec. 1 (Wednesday) Mark and Jeremy –Dec. 3 (Friday) Joe and.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
1 Chapter 19: Information Retrieval Chapter 19: Information Retrieval Relevance Ranking Using Terms Relevance Using Hyperlinks Synonyms., Homonyms,
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
5 June 2006Polettini Nicola1 Term Weighting in Information Retrieval Polettini Nicola Monday, June 5, 2006 Web Information Retrieval.
Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Chapter 6: Information Retrieval and Web Search
Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Vector Space Models.
Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Term Weighting approaches in automatic text retrieval. Presented by Ehsan.
1 CS 430: Information Discovery Lecture 5 Ranking.
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
3: Search & retrieval: Structures. The dog stopped attacking the cat, that lived in U.S.A. collection corpus database web d1…..d n docs processed term-doc.
IR 6 Scoring, term weighting and the vector space model.
Presentation by: ABHISHEK KAMAT ABHISHEK MADHUSUDHAN SUYAMEENDRA WADKI
Semantic Processing with Context Analysis
Information Retrieval
Personalized Social Image Recommendation
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Applying Key Phrase Extraction to aid Invalidity Search
Information Retrieval
Correlation of Term Count and Document Frequency for Google N-Grams
Correlation of Term Count and Document Frequency for Google N-Grams
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Chapter 31: Information Retrieval
Information Retrieval and Web Design
Chapter 19: Information Retrieval
Discussion Class 9 Google.
VECTOR SPACE MODEL Its Applications and implementations
Presentation transcript:

Compare&Contrast: Using the Web to Discover Comparable Cases for News Stories Presenter: Aravind Krishna Kalavagattu

What is the problem the paper is attempting? Why is it interesting? Discover comparable cases for news stories –documents about similar situations but involving distinct entities Useful for case-based reasoning Comparing similar situations can get hints for problem solving –Gain insights from familiar examples Interesting  similar situation, but different entities… and, on a web-scale!

Solution proposed News Story Modeler –Models the document as appropriate vectors –Separates named entities and non-named entities –TF based on frequency and position in the document Score(sentence i) = 1 – i/numsentences IDF from a standard archive –Main entity = named entity with highest TF Comparable Entity Discovery –pages are retrieved using good non- entity terms & phrases as the query Ex: “open source –IBM” –Word Context Vector in terms of co- occurrence with main entity in a sentence is defined, –comparable entities from the relevant pages (using similarity) Page Filtering to remove noise –Directory Pages Count capitalized words! –Irrelevant pages Similarity measure to Google summary!

Criticism of the solution Experimental section is very weak and preliminary ! –Over a mere 40 manually collected news articles  So, where is the web scale, which was they claim as their improvement/contribution over previous works? –User study on 5 users Assumption that the article is centered on a single main entity is not always valid And, removing main entity completely from the search query loses relevant results –Example: “Google acquires Youtube”, “Google acquires Blogger” Lacks in significant theoretical contribution –Weight assignments are not justified ! Learning weights?

Related concepts from the course TF-IDF with word position taken into account –Score(sentence i) = 1 – i/numsentences Information Extraction –Named entity discovery They use ClearForest Semantic Web Services (SWS) Co reference resolution by SWS Similarity of term vectors & entities - Like Jaccard ! Ranking the cases Word Context Vector –Like in SemTag (using 10 word boundaries), here co occurrence with in a sentence is used