Presentation is loading. Please wait.

Presentation is loading. Please wait.

Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

Similar presentations


Presentation on theme: "Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen."— Presentation transcript:

1 Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen Information and Communications University South Korea 2007 Conference on Granular Computing 1

2 2 Introduction The web contains a wealth of opinions for various topics. “Opinion holder identification” helps to analyze how people think about social issues. Challenging on online news articles: Newspapers carry various opinions from very different holders. There may be many possible candidates between an anaphor and the proper antecedent. The authors proposed an anaphor resolution based opinion holder identification method exploiting lexical and syntactic information. Corpus: NTCIR-6, English 2 2

3 3 Related Work “Automatic extraction of opinion propositions and their holders”, S. Bethard et al. Semantic parser based system “Identifying Opinion Holders for Question Answering in Opinion Texts”, S. Kim and E. Hovy. Maximum Entropy ranking model using syntactic features. “Anaphora resolution by antecedent identification followed by anaphoricity determination”, R. Iida et al. Anaphor resolution technique; maximizes the use of lexical and structural information 3

4 Method An anaphor resolution based opinion holder identification method exploiting lexical and syntactic information. 1. System Architecture The system consists of the subjectivity classification and opinion holder identification. 2. Resolving Coreference Problem 3. Identifying Actual Opinion Holder 4. Selecting Training Features 4

5 System Architecture 1. Determine the subjectivity of a sentence Opinionated or Non-opinionated 2. Identifies opinion holders including anaphor resolution Anaphoric holder or non-anaphoric holder 3. Identifies Actual Opinion Holder Selecting the most probable holder among candidates Each sentence is associated with a triplet 5

6 Resolving Coreference Problem (1/2) Model: SVM light (MIT) Named Entity Recognizer: MALLET (UMASS) Noun Phrase: (t|T)he(ADJ ?)(Noun+) Several opinions are carried by “the author”; an implicit holder without explicit holder. Classify opinions into three classes: Anaphoric, Non-anaphoric, or The author 6

7 Resolving Coreference Problem (2/2) Anaphoric Non-anaphoric The author 7

8 Identifying Actual Opinion Holder Candidate lists: Anaphoric: Entities or opinion holders from the previous sentences Non-anaphoric: Entities from the current sentence Model: Decision rule based on a probabilistic model Candidate lists:{ h 1, h 2,..., h C }; context e f k is a feature function (0,1); λ k is the weight of each f k S all :all opinionated sentences S O :opinions contained opinion holders 8

9 Selecting Training Features Feature function in learning model 9

10 Experiments (1/3) Lenient standard is the case where one or more words are overlapped Strict standard is the case for the exact matching About 38% of whole holders is anaphoric in the gold-standard The system accomplishes the following tasks. 1. Anaphoricity classification (Anaphoric holder / Non- anaphoric holder / The author) 2. Non-anaphoric holder resolution: ranking the candidates based on the features (Table 2) 3. Anaphoric holder resolution: ranking the candidates based on the features (Table 3) 10

11 Experiments (2/3) Non-anaphoric: Lexical clues (N3, 4, 5, 6) are dominant, whereas syntactic clues (N1, 2) are less effective Anaphoric: The combination of all features leads dramatically increasing performance NER could not generate proper candidates such as “The ministry speaker, Ms. Barbara” but catch only “Ms. Barbara” 11

12 Experiments (3/3) Various features such as some particular phrases and the mixture of syntactic and lexical clues were utilized in anaphoricity classification Using syntactic features (A2, 5, 7, 8, 10, 11) alone is less effective than using all features The author is not clearly revealed by structural clues alone. 12

13 13 Discuss & Conclusion The system solved the task by the novel approach focusing on coreference resolution. Most errors are related to the named entities hypothesis. Can’t find general nouns such as “31 smokers” A more complicated problem arises when opinions are expressed by both anaphoric and non- anaphoric holders. Apposition is also a source of problems. “Jackson” for “The singer, Jackson” is not acceptable, since “Jackson” is too ambiguous 13

14 14 Thank you! 14


Download ppt "Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen."

Similar presentations


Ads by Google