Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compare&Contrast: Using the Web to Discover Comparable Cases for News Stories Presenter: Aravind Krishna Kalavagattu.

Similar presentations


Presentation on theme: "Compare&Contrast: Using the Web to Discover Comparable Cases for News Stories Presenter: Aravind Krishna Kalavagattu."— Presentation transcript:

1 Compare&Contrast: Using the Web to Discover Comparable Cases for News Stories Presenter: Aravind Krishna Kalavagattu

2 What is the problem the paper is attempting? Why is it interesting? Discover comparable cases for news stories –documents about similar situations but involving distinct entities Useful for case-based reasoning Comparing similar situations can get hints for problem solving –Gain insights from familiar examples Interesting  similar situation, but different entities… and, on a web-scale!

3 Solution proposed News Story Modeler –Models the document as appropriate vectors –Separates named entities and non-named entities –TF based on frequency and position in the document Score(sentence i) = 1 – i/numsentences IDF from a standard archive –Main entity = named entity with highest TF Comparable Entity Discovery –pages are retrieved using good non- entity terms & phrases as the query Ex: “open source –IBM” –Word Context Vector in terms of co- occurrence with main entity in a sentence is defined, –comparable entities from the relevant pages (using similarity) Page Filtering to remove noise –Directory Pages Count capitalized words! –Irrelevant pages Similarity measure to Google summary!

4 Criticism of the solution Experimental section is very weak and preliminary ! –Over a mere 40 manually collected news articles  So, where is the web scale, which was they claim as their improvement/contribution over previous works? –User study on 5 users Assumption that the article is centered on a single main entity is not always valid And, removing main entity completely from the search query loses relevant results –Example: “Google acquires Youtube”, “Google acquires Blogger” Lacks in significant theoretical contribution –Weight assignments are not justified ! Learning weights?

5 Related concepts from the course TF-IDF with word position taken into account –Score(sentence i) = 1 – i/numsentences Information Extraction –Named entity discovery They use ClearForest Semantic Web Services (SWS) Co reference resolution by SWS Similarity of term vectors & entities - Like Jaccard ! Ranking the cases Word Context Vector –Like in SemTag (using 10 word boundaries), here co occurrence with in a sentence is used

6


Download ppt "Compare&Contrast: Using the Web to Discover Comparable Cases for News Stories Presenter: Aravind Krishna Kalavagattu."

Similar presentations


Ads by Google