Presentation is loading. Please wait.

Presentation is loading. Please wait.

UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.

Similar presentations


Presentation on theme: "UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago."— Presentation transcript:

1 UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago

2 Summary Overview of the opinion retrieval Relevant document retrieval Opinion relevant document retrieval Opinion system Subjective/objective training data Feature extraction Subjectivity classifier Opinion document ranking

3 Opinion Document Retrieval Query Opinion Documents Document Space Relevant Documents Opinion Relevant Documents

4 Opinion Document Retrieval Relevant documents – an IR approach Opinion relevant documents – a classification approach

5 Relevant Document Retrieval The UIC IR system in TREC 2005 Robust Track Without WSD and adding synonyms/hyponyms Phrase recognition –Proper name, dictionary phrase –Simple phrase, complex phrase Query expansion –pseudo relevant feedback, Wikipedia, Web Document-query similarity –Phrase similarity and term similarity

6 Opinion Relevant Document Retrieval Retrieved documents a document Opinion sentences For a documentary, it carried just about no information. … … another bad thing about march of the penguins - I totally agree.... " march of the penguins," which was excellent yet really pretty disturbing … opinion relevant document

7 The Opinions Opinions are query dependent – food automobile – Should be learned and tested depending on queries – Should be analyzed within the sentences

8 Opinion System Overview query Rateitall.com Subjective sentences Feature Extraction SVM classifier Retrieved Documents Opinion Relevant Documents Wikipedia.org Objective sentences Opinion Documents Opinion - query connection Re-rank Final answers

9 The Objective Sentences Wikipedia.org pages as primary source – every sentence is objective – multiple pages for multiple phrases –Web pages as secondary source – from web search engine – restriction: -comment -review, -”I think”

10 The Subjective Sentences Rateitall.com pages as primary source – every comment sentence is subjective Web pages as secondary source – from web search engine – restriction: +comment, +review, +”I think”.

11 The Featured Terms Use unigrams and bigrams Chi-square test –to test the hypothesis that a term t is distributed unevenly in the objective text set and the subjective text set

12 The Sentence Classifier Support Vector Machine sentence classifier Objective sentencesSubjective sentences Featured terms SVM Training Featured term vector representation SVM classifier

13 Find the Opinion Documents A retrieved document that contains at least one opinion sentence –Split document to sentences –Test each sentence by the classifier SVM classifier Document Sentence 1 … Label 1:objective … Sentence 2 Sentence n Label 2:subjective Label n:objective

14 Find the Opinion Relevant Documents A retrieved document that contains at least one opinion “relevant” sentence –query terms in or near a opinion sentence queryopinion sentence document text window

15 Rank the Opinion Relevant Documents Strategy 1 –Use the document retrieval ranking –Remove documents that does not have opinion relevant sentence Sim(D, Q): query-doc similarity I(D, Q) = 1 if D contains opinion relevant sentence = 0 otherwise

16 Rank the Opinion Relevant Documents Strategy 2 –Calculate a document opinion score OS(D): opinion sentence set of document D Score classification (s): score of the opinion sentence s from the SVM classifier Relevant(s, Q): 1 if s is a opinion relevant sentence, 0 otherwise

17 Blog Track Results Run IDStrategyMAPGMAPR-PrecP@10 UICSR1.1636.0921.2522.4380 UICST2.1885.1083.2771.5120

18 Thanks! and Questions?


Download ppt "UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago."

Similar presentations


Ads by Google