Presentation is loading. Please wait.

Presentation is loading. Please wait.

Identifying Comparative Sentences in Text Documents

Similar presentations


Presentation on theme: "Identifying Comparative Sentences in Text Documents"— Presentation transcript:

1 Identifying Comparative Sentences in Text Documents
Nitin Jindal and Bing Liu University of Illinois SIGIR 2006

2 Introduction Comparisons are one of the most convincing ways of evaluation. Much of such info is available on the Web (customer reviews), forum discussions, and blogs. Useful for product manufacturers and potential customers (to make purchasing decisions).

3 Comparisons vs. Opinions
Comparisons can be both objective or subjective. Comparative sentences have different language constructs from typical opinion sentences. Comparative sentences may contain some indicators. Car X is much better than Car Y Car X is two feet longer than Car Y

4 Related Work Linguistics: based on grammars (syntax and semantics) and logic (gradability), which is more for human consumption than for automatic identification. Opinion tasks: opinion extraction and classification problem, which is quite different from this comparison identification.

5 Comparatives (Linguistic)
Comparatives are used to express explicit orderings between objects with respect to the degree or amount to which they possess some gradable property. John is taller than he was => John is tall to degree d

6 Comparatives (Linguistic)
Two broad types: Metalinguistic Comparatives: compare properties of one entity. Ronaldo is angrier than upset. Propositional Comparatives: compare between two propositions. Three subcategories:

7 Comparatives (Propositional)
Nominal Comparatives: (two sets of entities) Paul ate more grapes than bananas. Adjectival Comparatives: (than, as good as) Ford is cheaper than Volvo. Adverbial Comparatives: (occur after a verb phrase) Tom ate more quickly than Jane.

8 Superlatives Adjectival Superlatives: John is the tallest person.
Adverbial Superlatives: Jill did her homework most frequently. Equality: conjunctions like and, or, … John and Sue, both like sushi.

9 POS involved NN: Noun NNP: Proper Noun
VBZ: Verb, present tense, 3rd person singular JJ: Adjective RB: Adverb JJR Adjective, comparatives JJS: Adjective, superlative RBR: Adverb, comparative RBS: Adverb, superlative

10 Limitations of linguistic classification.
Non-comparatives with comparative words: many non-comparatives contain comparative words. In the context of speed, faster means better. John has to try his best to win this game. Limited coverage: many comparatives contain no comparative words. In market capital, Intel is way ahead of Amd. Nokia Samsung, both cell phones perform badly on heat dissipation index. The M7500 earned a World bench score of 85, whereas Asus A3V posted a mark of 89.

11 Enhancements First limitation: machine learning methods to distinguish comparatives and non-comparatives. Second limitation: User preferences: I prefer Intel to Amd = Intel is better than Amd Implicit comparatives: Camera X has 2 MP, whereas camera Y has 5 MP.

12 Types of Comparatives Non-Equal Gradable: greater or less than type, including user preferences. Equative (Gradable): equal to type Superlative (Gradable): greater of less than all others type Non-Gradable: A is similar to B; A has feature F1 while B has F2; A has feature F but B doesn’t

13 Tasks Identifying comparative sentences from a given text data set.
Extracting comparative relations from sentences. (Mining comparative sentences and relations, AAAI 2006)

14 Class Sequential Rules with Multiple Minimum Supports
For sequential pattern mining, patterns to the left and class to the right. Select patterns: keywords – POS (JJR, RBR, JJS, RBS) + Words (favor, prefer, win beat, but…) + Phrases (number one, up against) The performance of only using keywords are P=32%, R=94%.

15 Support and Confidence
Using the minimum support of 20% and minimum confidence of 40%, one of the discovered CSRs is:

16 Building the Sequence DB
this/DT camera/NN has/VBZ significantly/RB more/JJR noise/NN at/IN iso/NN 100/CD than/IN the/DT nikon/NN 4500/CD {NN}{VBZ}{RB}{moreJJR}{NN}{IN}{NN} -> comparative Sequences which exceeds 60% confidence threshold become rules. Minimum support = 10%. 13 Manual rules with conjunctions as whereas/IN, but/CC, however/RB, while/IN, though/IN, although/IN, etc..

17 Classification Learning
Machine learning methods: Feature Set = {X | X is the sequential pattern in CSR X → y} ∪ {Z | Z is the pattern in a manual rule Z → y}

18 Data Preparation Consumer reviews on products such as digital cameras, DBD players, MP3 players and cellular phones. Forum discussions on topics such as Intel vs. AMD, Coke vs. Pepsi, and Microsoft vs. Google. News articles on topics such as automobiles, ipods, and soccer vs. football.

19 Number of Sentences in Data Sets

20 Experimental Results (1)

21 Experimental Results (2)
Review: R low P high -> short sentences, hard to find patterns Articles and Forums: R high P low -> long sentences and find patterns too easily or find too many patterns.

22 Conclusion and Future Work
Identifying comparative sentences. Analyzing different types of comparative sentences. Studying how to automatically classify subjective and objective comparisons.


Download ppt "Identifying Comparative Sentences in Text Documents"

Similar presentations


Ads by Google