Compare&Contrast: Using the Web to Discover Comparable Cases for News Stories Presenter: Aravind Krishna Kalavagattu
What is the problem the paper is attempting? Why is it interesting? Discover comparable cases for news stories –documents about similar situations but involving distinct entities Useful for case-based reasoning Comparing similar situations can get hints for problem solving –Gain insights from familiar examples Interesting similar situation, but different entities… and, on a web-scale!
Solution proposed News Story Modeler –Models the document as appropriate vectors –Separates named entities and non-named entities –TF based on frequency and position in the document Score(sentence i) = 1 – i/numsentences IDF from a standard archive –Main entity = named entity with highest TF Comparable Entity Discovery –pages are retrieved using good non- entity terms & phrases as the query Ex: “open source –IBM” –Word Context Vector in terms of co- occurrence with main entity in a sentence is defined, –comparable entities from the relevant pages (using similarity) Page Filtering to remove noise –Directory Pages Count capitalized words! –Irrelevant pages Similarity measure to Google summary!
Criticism of the solution Experimental section is very weak and preliminary ! –Over a mere 40 manually collected news articles So, where is the web scale, which was they claim as their improvement/contribution over previous works? –User study on 5 users Assumption that the article is centered on a single main entity is not always valid And, removing main entity completely from the search query loses relevant results –Example: “Google acquires Youtube”, “Google acquires Blogger” Lacks in significant theoretical contribution –Weight assignments are not justified ! Learning weights?
Related concepts from the course TF-IDF with word position taken into account –Score(sentence i) = 1 – i/numsentences Information Extraction –Named entity discovery They use ClearForest Semantic Web Services (SWS) Co reference resolution by SWS Similarity of term vectors & entities - Like Jaccard ! Ranking the cases Word Context Vector –Like in SemTag (using 10 word boundaries), here co occurrence with in a sentence is used