Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus † Institute for Advanced Computer Studies and CLIP lab.

Similar presentations


Presentation on theme: "Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus † Institute for Advanced Computer Studies and CLIP lab."— Presentation transcript:

1 Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus † Institute for Advanced Computer Studies and CLIP lab ‡ Human-Computer Interaction Lab Department of Computer Science, University of Maryland. *Human Language Technology Center of Excellence. Saif Mohammad †, Cody Dunne ‡, and Bonnie Dorr † ∗

2 Evaluative sentences Sony’s new digital camera is fabulous. The characters in the movie are flawed. Creative solutions are valued. Singapore has an immaculate transportation system. Our waters have never been more contaminated. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.2

3 Evaluative sentences Sony’s new digital camera is fabulous. The characters in the movie are flawed. Creative solutions are valued. Singapore has an immaculate transportation system. Our waters have never been more contaminated. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.3

4 Semantic orientation Positive semantic orientation (SO) (or polarity) ◦ Term is often used to convey favorable sentiment or evaluation of the target. ◦ E.g.: excellent, happy, honest, … Negative semantic orientation ◦ Term is often used to convey unfavorable sentiment or evaluation of the target. ◦ E.g.: poor, sad, dishonest, … Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.4

5 Applications Automatic product recommendation systems (Tatemura, 2000; Terveen1 et al., 1997) Question answering (Somasundaran et al., 2007; Lita et al., 2005) Summarizing multiple view points and opinions (Seki et al., 2004; Mohammad et al., 2008a) Identifying flames (Spertus, 1997) Appropriate ad placement (Jin et al. 2007) Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.5

6 Manually created lexicons General Inquirer (GI) (Stone et al., 1966) ◦ http://www.wjh.harvard.edu/inquirer ◦ has labels for only about 3,600 entries Pittsburgh subjectivity lexicon (PSL) (Wilson et al., 2005) ◦ http://www.cs.pitt.edu/mpqa ◦ draws from the General Inquirer and other sources ◦ has labels for only for about 8,000 words. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.6

7 Automatically created lexicons Hatzivassiloglou and McKeown (1997) ◦ a supervised algorithm to determine the semantic orientation of adjectives. Turney and Littman lexicon (TLL) (2003) ◦ Exploit tendency to co-occur with a seed set ◦ Need very large corpora (100 billion words) Esuli and Sebastiani (2006) — SentiWordNet (SWN) ◦ Attach labels to WordNet synsets ◦ Use supervised classifiers ◦ Need significant manual annotation Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.7

8 Semantic oppositeness scale Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr. antonymousnot antonymous big–smallbig–large many antonym pairs have opposite semantic orientation (one positive, one negative) good–bad; beautiful–ugly; honest–dishonest 8

9 Detecting word-pair antonymy: Mohammad, Dorr, Hirst (2008) Use affix patterns to identify seed pairs of strong antonyms. Use a Roget-like thesaurus to identify near-synonyms of seed words. Mark pairs of words near-synonymous to seed pairs as contrasting. The degree of antonymy is proportional to their tendency to co-occur. Created a list of more than 3 million strongly antonymous word pairs. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.9

10 Our approach Identify a seed set of positive and negative words: ◦ From edicts of marking theory Identify their synonyms: ◦ Use a Roget-like thesaurus Mark as negative: ◦ words synonymous with a negative seed Mark as positive: ◦ words synonymous to a positive seed Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.10

11 Step 1: Identify seed words From marking theory: ◦ Overtly marked words tend to be negative.  E.g., undo, unhappy, dishonest, immobile ◦ Their unmarked counterparts tend to be positive.  E.g., do, happy, honest, mobile Exceptions exist: ◦ impartial—partial, unbiased—biased, unstuck—stuck Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.11

12 Affix patterns Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.12 word1word2# of word pairsexample pairs XdisX382honest–dishonest XimX196possible–impossible XinX691consistent–inconsistent XmalX28adroit–maladroit XmisX146fortune–misfortune XnonX73sense–nonsense XunX844happy–unhappy XXless208gut–gutless lXlXillX25legal–illegal rXrXirX48responsible–irresponsible XlessXful51harmless–harmful Total2692

13 Step 2: Identify synonyms of seed words Take synonyms from a Roget-like thesaurus ◦ We used the Macquarie Thesaurus ◦ Has 98,000 word-types Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.13

14 Thesaurus categories All words classified into ~1000 categories ability absence accept accompanied action affect affirm agree allow approach ask assemble attack attitude awareness ability absence accept accompanied action affect affirm agree allow approach ask assemble attack attitude awareness be beautiful beings belief better big blood body breath calm care for careful cause certain change be beautiful beings belief better big blood body breath calm care for careful cause certain change choice clean clear collect colors comfort concern conflict connect continue control convex correct count courtesy choice clean clear collect colors comfort concern conflict connect continue control convex correct count courtesy … Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.14

15 Example category entry 369 HONESTY adj. paragraph honest above board authentic bona fide legit … adj. paragraph honest above board authentic bona fide legit … Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr. noun paragraph bona fides reliability soundness trueness trustiness … noun paragraph bona fides reliability soundness trueness trustiness … adj. paragraph reliable sound steadfast trustworthy trusty … adj. paragraph reliable sound steadfast trustworthy trusty … 15 noun paragraph honesty incorruptness integrity probity sincerity … noun paragraph honesty incorruptness integrity probity sincerity …

16 369 HONESTY Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.16 Words in each paragraph are near-synonyms. Step 2: Identify synonyms of seed words adj. paragraph honest above board authentic bona fide legit … adj. paragraph honest above board authentic bona fide legit … noun paragraph bona fides reliability soundness trueness trustiness … noun paragraph bona fides reliability soundness trueness trustiness … adj. paragraph reliable sound steadfast trustworthy trusty … adj. paragraph reliable sound steadfast trustworthy trusty … noun paragraph honesty incorruptness integrity probity sincerity … noun paragraph honesty incorruptness integrity probity sincerity …

17 adj. paragraph reliable sound steadfast trustworthy trusty … adj. paragraph reliable sound steadfast trustworthy trusty … adj. paragraph honest above board authentic bona fide legit … adj. paragraph honest above board authentic bona fide legit … Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.17 Seed pair: honest — dishonest (positive) (negative) + + + + + Seed pair: reliable — unreliable (positive) (negative) + + + + + Step 3: Mark as positive synonyms of positive seeds 369 HONESTY noun paragraph bona fides reliability soundness trueness trustiness … noun paragraph bona fides reliability soundness trueness trustiness … noun paragraph honesty incorruptness integrity probity sincerity … noun paragraph honesty incorruptness integrity probity sincerity …

18 370 DISHONESTY noun paragraph crookedness dishonesty fraudulence improbity trickery … noun paragraph crookedness dishonesty fraudulence improbity trickery … adj. paragraph crooked dishonest knavish shady unjust … adj. paragraph crooked dishonest knavish shady unjust … Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr. … … … … 18 Seed pair: honest — dishonest (positive) (negative) - - - - - Step 4: Mark as negative synonyms of negative seeds

19 Majority voting All words in a paragraph assigned identical orientation. If multiple seeds in the same paragraph: ◦ simple voting determines orientation. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.19 369 HONESTY noun paragraph honesty incorruptness integrity probity sincerity … noun paragraph honesty incorruptness integrity probity sincerity … Seed pairs: honesty — dishonesty (positive) (negative) + - corruptness — incorruptness (positive) (negative) + probity … — improbity (positive) (negative) + sincerity.. — insincerity (positive) (negative)

20 369 HONESTY noun paragraph honesty incorruptness integrity probity sincerity … noun paragraph honesty incorruptness integrity probity sincerity … Majority voting All words in a paragraph have identical orientation. If multiple seeds in the same paragraph: ◦ simple voting determines orientation. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.20 + + + + + Positive orientation has majority, so all words in the paragraph are marked positive.

21 Sense and word lexicons Macquarie Semantic Orientation Lexicon (MSOL) ◦ Assigns orientation to word—category combinations ◦ Categories are coarse word senses Most natural language text is not sense disambiguated We create word lexicons from MSOL and SentiWordNet ◦ By choosing for each word the orientation most common amongst its senses Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.21

22 Size of lexicons SentiWordNet (SWN) ◦ 56,200 entries (8 5.1 ◦ sitive and 14.9% negative) Affix seeds lexicon (ASL) ◦ 5,031 entries (47.3% positive and 52.7% negative) MSOL(ASL) ◦ 51,157 entries (66.8% positive and 33.2% negative) ◦ 3,643 multi-word expressions MSOL(ASL and GI) ◦ Uses both affix pairs and GI entries as seeds ◦ 76,400 entries (39.9% positive and 60.1% negative) ◦ Available for download: http://www.umiacs.umd.edu/~saif/WebPages/ResearchInterests.html #SemanticOrientation Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.22

23 Intrinsic evaluation: The percentage of GI entries that match those of the automatically generated lexicons. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.23 F-score

24 Extrinsic evaluation Gold standard of phrases manually annotated with semantic orientation: ◦ MPQA corpus (version 1.1) ◦ positive phrases (1726) and negative phrases (4485) A simple algorithm to determine the polarity of a phrase: ◦ If target phrase has a negative word, then the phrase is marked negative. ◦ If target phrase has no negative word and has at least one positive word, then it is marked positive. ◦ Otherwise, the classifier refrains from assigning a tag. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.24 Even better accuracies: supervised classifiers and more sophisticated context features (Choi and Cardie, 2008).

25 Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.25 F-score Extrinsic evaluation: Performance of phrase polarity tagging. No semantic-orientation labeled data used.

26 Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.26 F-score Extrinsic evaluation: Performance of phrase polarity tagging. Using GI labels.

27 Orientation of thesaurus categories Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr. Red: negative;Blue: positive; Size of node: intensity; Edge: oppositeness 27

28 Polyanna Hypothesis People use positive expressions more frequently than negative expressions. (Boucher and Osgood, 1969; Kelly, 2000) Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.28 5031 entries Percentage of entries

29 Polyanna Hypothesis People use positive expressions more frequently than negative expressions. (Boucher and Osgood, 1969; Kelly, 2000) Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.29 5031 entries Percentage of entries 51157 entries

30 Summary Created a high-coverage semantic orientation lexicon: ◦ using only affix rules and a Roget-like thesaurus. ◦ no manually annotated semantic orientation labels required. The lexicon: ◦ has about twenty times the number of entries in GI. ◦ has entries for both single-words and common multi- word expressions. ◦ more useful in phrase-polarity annotation than SentiWordNet, GI, or the Turney lexicon. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.30

31 Future work Creating even better semantic orientation lexicons by combining: ◦ our approach (affix rules and thesaurus) ◦ with the Turney–Littman 2003 method (co-occurrence statistics). Create orientation lexicons for resource-poor languages. ◦ use a bilingual dictionary ◦ use English thesaurus ◦ use affix rules from both (multiple) languages. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.31

32 Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.32

33 Automatic approaches: sentiment analysis Those that rely on a lexical-semantic resource ◦ Most use WordNet ◦ Strapparava and Valitutti, 2004; Hu and Liu, 2004; Kamps et al., 2004; Takamura et al., 2005; Esuli and Sebastiani, 2006; Andreevskaia and Bergler, 2006; Kanayama and Nasukawa, 2006 Those that rely only on text corpora ◦ Hatzivassiloglou and McKeown, 1997; Turney and Littman, 2003; Yu and Hatzivassiloglou, 2003; Grefenstette et al., 2004 Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.33

34 Intrinsic evaluation: The percentage of GI entries that match those of the automatically generated lexicons. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.34 F-score

35 Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.35 F-score Extrinsic evaluation: Performance of phrase polarity tagging. Using GI labels.


Download ppt "Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus † Institute for Advanced Computer Studies and CLIP lab."

Similar presentations


Ads by Google