Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Extraction Patterns for Subjective Expressions

Similar presentations


Presentation on theme: "Learning Extraction Patterns for Subjective Expressions"— Presentation transcript:

1 Learning Extraction Patterns for Subjective Expressions
Ellen Riloff Janyce Wiebe University of Utah University of Pittsburgh 7/2003 EMNLP03

2 Subjectivity Subjective language includes opinions, rants, allegations, accusations, suspicions, and speculation Distinguishing factual from subjective information could benefit many applications: information extraction question answering summarization Informally speaking, subjective language includes … Ideally, IE systems should be able to distinguish between factual information (which should be extracted) and non-factual information (which should be discarded or labeled as uncertain). Question answering systems should distinguish between factual and speculative answers. Multi-perspective QA aims to present multiple answers to the user reflecting differences of opinion. Good summaries of news event, for example, should summarize the various views about the event. 7/2003 EMNLP03

3 Goals Sentence-level subjectivity classification
(Wiebe et al. 2001) found that 44% of sentences in news articles are subjective Learning subjectivity clues from unannotated text corpora Learning linguistically rich patterns Sentence-level: Our goal is classify sentences as subjective or objective, not documents as a whole. Even in news articles, I.e., not reviews or editorials, there are many subjective sentences. Typically, news stories present views (e.g., critics say… supporters say). Learn subjectivity clues: We want to learn words and phrases associated with subjectivity. An application system such as a QA or IE system would benefit from knowing which specific words and phrases in a sentence are subjective. From unannotated text: there is great variety in the words and phrases that have subjective uses. Many subjective terms occur infrequently --- consider strongly negative words such as preposterous, and metaphorical or idiomatic clues such as swept off one’s feet. Also, the types of clues correlated with subjectivity may vary from corpus to corpus. We want to enrich what we know about subjectivity clues by processing large amounts of unannotated text. Linguistically rich patterns: a goal is to learn not just single words or fixed n-grams, but more linguistically rich patterns and phrases. As the title of our paper suggests, we seek extraction patterns for subjective expressions. 7/2003 EMNLP03

4 Previous Work: Subjectivity Analysis
Document-level subjectivity classification (e.g., Turney 2002; Pang et al 2002; Spertus 1997) and above (Tong 2001) Genre classification (e.g., Karlgren and Cutting 1994; Kessler et al. 1997; Wiebe et al. 2001) Supervised sentence-level classification (Wiebe et al 1999) Learning adjectives, adjectival phrases, verbs, nouns, and N-grams (e.g., Turney 2002; Hatzivassiloglou & McKeown 1997; Wiebe et al. 2001) This is at the time of writing this paper. Much previous work in NLP has focused on document-level classification (and a level above, I.e., Tong’s work tracking sentiments over time). Previous work on sentence-level classification was supervised. There has been work on automatically learning various types of subjective features, but not learning syntactic extraction patterns such as the ones in our work. (some previous work used fixed n-grams; some used manually developed patterns) 7/2003 EMNLP03

5 Recent Related Work Yu and Hatzivassiloglou (EMNLP03). Unsupervised sentence level classification. Complementary approach and features. Dave et al. (WWW03): reviews classified as positive or negative. Agrawal et al. (WWW03): newsgroup authors partitioned into camps based on quotation links Gordon et al. (ACL03): manually developed grammars for some types of subjective language Among recent work, Yu and Hatzivassiloglou’s work is most similar to ours. They also perform unsupervised sentence-level classification. Their approach and features are complementary to ours. The other papers listed here address different problems. 7/2003 EMNLP03

6 Extraction Patterns Extraction patterns are lexico-syntactic patterns to identify relevant information Typically they represent role relationships surrounding noun and verb phrases hijacking of <x>: hijacked vehicle <x> was hijacked: hijacked vehicle <x> hijacked: hijacker Information extraction systems typically use lexico syntactic patterns to identify relevant information. Typically patterns represent role relationships surrounding nouns and verb phrases. For example, an IE system designed to extract information about hijackings might use the patterns hijacking of x and x was hijacked to extract the hijacked vehicle, and the pattern x hijacked to extract the hijacker. 7/2003 EMNLP03

7 Our Method Subjective expressions represented as extraction patterns
get to know <dobj> <subj> appear to be <subj> was satisfied <subj> complained Supervised extraction pattern learning Training data generated automatically Entire process bootstrapped In our method, subjective expressions are represented as extraction patterns. Here are examples. Patterns often have higher precision than their individual constituents words have. These are syntactic patterns, so are more general than fixed n-grams. We also find that the extraction pattern representation can reveal slight variations of the same verb or noun phrase might remove or add subjective connotations. For example, “The comedian bombed last night” is subjective, but the same subjective meaning isn’t possible with the passive. We used a supervised extraction pattern learning algorithm. We use Riloff’s AutoSlog-TS algorithm. This is supervised in that it requires a et of relevant texts and a set of irrelevant texts as its input training data. However, the overall process is unsupervised, in that the training data is generated automatically by high-precision (but low recall) subjective and objective classifiers applied to large amounts of unannotated texts. In addition, the learned patterns can be used to recognize new subjective sentences, and the entire process is bootstrapped. 7/2003 EMNLP03

8 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier Here is a picture of the overall process subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

9 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier Start with a large collection of unannotated texts and a lexicon of subjective language (components in yellow). subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

10 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier The subjective classifier is a high-precision, low recall classifier that identifies sentences it can label as subjective with confidence, and leaves the rest unclassified The objective classifier is a high-precision, low recall classifier that identifies sentences it can label as objective with confidence, and leaves the rest unclassified subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

11 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier The extraction pattern learner takes subjective and objective sentences as input, and identifies extraction patterns for subjective expressions. subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

12 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier Here is one of two bootstrapping loops in the process. The subjective patterns can be used to identify additional subjective sentences, which can be added to the training data for the extraction pattern learner, which in turn can identify more subjective patterns, and so on. subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

13 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier In the other bootstrapping loop, the subjective patterns are fed back into the original high-precision subjective classifier, which identifies more subjective sentences, and so on. The results in the paper are for one complete cycle through this diagram, not repeated cycles. subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

14 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier Now we’ll look at various pieces, starting here. subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

15 Unannotated Text Collection
English language versions of FBIS news articles from a variety of countries. Size: 302,160 sentences 7/2003 EMNLP03

16 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier Now we’ll look at this part subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

17 (e.g, entries from Levin 1993) Automatically identified
From previous work Manually identified (e.g, entries from Levin 1993) Automatically identified (e.g., nouns from Riloff et al. 2003) Known subjective vocabulary Our subjective vocabulary consists of lexical items that have been shown in previous work (by us and others) to be good subjectivity clues. Most are single words, some are fixed n-grams, but none involve syntactic generalizations as in the extraction patterns Some were manually identified, and some were automatically identified

18 (e.g, entries from Levin 1993) Automatically identified
From previous work Manually identified (e.g, entries from Levin 1993) Automatically identified (e.g., nouns from Riloff et al. 2003) Known subjective vocabulary Strongly subjective: most instances subjective Weakly subjective: objective instances also common The clues are divided into those that are strongly subjective and those that are weakly subjective, based on a combination of manual review and and empirical results on a small tuning set of manually annotated data.

19 (e.g, entries from Levin 1993) Automatically identified
From previous work Manually identified (e.g, entries from Levin 1993) Automatically identified (e.g., nouns from Riloff et al. 2003) Known subjective vocabulary Any data used is separate from data in this paper Strongly subjective: most instances subjective Weakly subjective: objective instances also common Any data used to learn or classify the clues is separate from all the data used in this paper (training and testing)

20 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective sentences Subjective >1 strongly subjective Classifier clue Known subjective vocabulary unlabeled sentences 91.3% Precision 31.9% Recall Test set: 2197 sentences 59% subjective objective sentences Objective Classifier Turning to the HP subjective classifier: it is simple. It classifies a sentence as subjective if there are 2 or more of the strongly subjective clues in the sentence. Evaluating this on a manually annotated test set that is not part of the training set, the precision is 91.3 and the recall is 31.9 This test set has about 2200 sentences, 59% of which are subjective.

21 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective sentences Subjective >1 strongly subjective Classifier clue Known subjective vocabulary unlabeled sentences Objective 0 strongly subjective clue & Classifier 0 or 1 weakly subjective clue in previous, current, next sentence objective sentences The HP objective classifier takes a different approach. Rather than looking for the presence of lexical items, it looks for their absence. If classifies a sentence as objective if there are no strongly subjective clues and at most 1 weakly subjective clue in the combination of the previous, current, and next sentence. The performance is lower than that of the subjective classifier. 82.6% Precision 16.4% Recall

22 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier Here is the overall picture again subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

23 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier We now turn to the extraction pattern learner subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

24 17,000 17,000 Subjective Classifier subjective sentences
“relevant texts” 17,000 objective sentences Extraction Pattern AutoSlog-TS Learner Riloff 1996 Objective Classifier The extraction pattern learner, as I mentioned earlier is a very similar algorithm to Riloff’s AutoSlog-TS, and uses the same syntactic templates and parser “irrelevant texts” subjective patterns

25 Step 1: Apply Syntactic Templates
<subj>active-verb dobj <subj> dealt blow <subj> verb infinitive <subj> appear to be <subj> aux noun <subj> has position Active-verb <dobj> endorsed <dobj> Verb infinitive <dobj> get to know <dobj> Noun prep <np> opinion on <np> Infinitive prep <np> to resort to <np> A set of syntactic templates represents the space of possible extraction patterns. Syntactic templates are applied to the training corpus exhaustively, extraction patterns are generated for every possible instantiation of the templates that appear in the corpus. Here are examples of the syntactic templates; the others are in the paper. On the right are examples of patterns learned in our experiments that are instantiations of the templates on the left. 7/2003 EMNLP03

26 Step 1: Apply Syntactic Templates
<subj>active-verb dobj <subj> dealt blow <subj> verb infinitive <subj> appear to be <subj> aux noun <subj> has position Active-verb <dobj> endorsed <dobj> Verb infinitive <dobj> get to know <dobj> Noun prep <np> opinion on <np> Infinitive prep <np> to resort to <np> Now we will look at the first line in more detail 7/2003 EMNLP03

27 Step 1: Apply Syntactic Templates
<subj>active-verb dobj <subj> dealt blow Matches any sentence with verb phrase with head=dealt direct object with head=blow. “The experience certainly dealt a stiff blow to his pride.” The system looks for syntactic constructions produced by a shallow parser (sundance), not exact word sequences 7/2003 EMNLP03

28 Step 2: Select Patterns Apply all learned patterns to training data
Rank patterns: Prec(pattern) = p(subjective | pattern) = # in subjective sentences / total # Choose patterns with: Frequency > F Prec > P on the training data for some F and P All selections of patterns is performed considering the training data only. (There is no tuning on the test set.) 7/2003 EMNLP03

29 Examples from Training Data
%SUBJ <subj> was asked 100% <subj> asked 63% <subj> is talk talk of <np> 90% <subj> will talk 71% was expected from <np> <subj> was expected 42% <subj> is fact fact is <dobj> Here are more examples of patterns learned by our system. The rightmost column shows the percentage of instances that occur in subjective sentences in the training data. This is to show some interesting examples of behaviors that a corpus-based approach is good at finding which a human would probably not expect. The paper lists frequency information as well; I can summarize this to say higher precision things are less frequent. Our goal in this work is to apply this process to massive amounts of unannotated data to find an extremely large number of high-precision, low recall clues. MAYBE SAY THIS IN THE CONCLUSIONS? Looking at the first two, we were surprised to see that the passive form of asked is much more likely to be subjective than the active form of ask.

30 Examples from Training Data
%SUBJ <subj> was asked 100% <subj> asked 63% <subj> is talk talk of <np> 90% <subj> will talk 71% was expected from <np> <subj> was expected 42% <subj> is fact fact is <dobj> Talk as a noun (e.g., “fred is the talk of the town”) is highly correlated with subjective sentences, while talk as a verb are found in a mix of subjective and objective sentences.

31 Examples from Training Data
%SUBJ <subj> was asked 100% <subj> asked 63% <subj> is talk talk of <np> 90% <subj> will talk 71% was expected from <np> <subj> was expected 42% <subj> is fact fact is <dobj> Another thing we often observe, not surprisingly, is longer expressions being more clearly subjective than shorter subexpressions.

32 Examples from Training Data
%SUBJ <subj> was asked 100% <subj> asked 63% <subj> is talk talk of <np> 90% <subj> will talk 71% was expected from <np> <subj> was expected 42% <subj> is fact fact is <dobj> We really like these, which are expressions correlated with the noun fact. These are highly subjective!

33 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier Now we will evaluate the subjective patterns produced by the extraction pattern learner the first time through the loop. For evaluation, we use manually annotated test data subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

34 Test Data Manual annotation to support project investigating multiple perspective QA (ARDA AQUAINT NRRC) 0.77 ave pair-wise kappa 0.89 ave pair-wise kappa with borderline sentences removed (11% of the corpus) Wilson & Wiebe, SIGDIAL 2003, describes the annotation scheme and agreement study In 2002 a detailed annotation scheme was developed for a US government sponsored project, named the ARDA AQUAINT NRRC program. We use this data in this work only for evaluation. And, although the scheme is more fine-grained, we use a binary classification of sentences, defined in terms of the lower-level annotations, to support the current work. In an agreement study with 3 annotators (not nlp researchers) the agreement for the sentence-level classifications is very good. 77 average pairwise kappa and percentage agreement is 90%. When we removed borderline cases, i.e., sentences where at least one annotator classified the sentence as subjective but the maximum strength rating was low, then the kappa and percentage agreement are very high. The annotators agree strongly about which are the clear cases of subjective and objective sentences. The other cases are only 11% of the corpus. For details about the annotation scheme and agreement study, please our sigdial paper. 7/2003 EMNLP03

35 Example (writer,FM) (writer,FM,FM) (writer,FM) The Foreign Ministry said Thursday that it was “surprised, to put it mildly” (writer,FM,FM,SD) by the U.S. State Department’s criticism of Russia’s human rights There isn’t time to go through the annotation scheme – I just want to point out that individual expressions of subjectivity are identified, of different types, and nested sources are identified. These are the basic annotation objects; a number of attributes of these objects are also annotated. (writer,FM) (writer,FM) record and objected in particular to the “odious” section on Chechnya. 7/2003 EMNLP03

36 Our annotation tool is built in Gate; everything is downloadable from our website.
7/2003 EMNLP03

37 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier Now we return to evaluating the subjective patterns on a manually annotated test set. subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

38 Evaluation of Learned Patterns
Test data: 3947 sentences 54% subjective Train Test F > 9 P: 100% P: 85% Recall: 41% F > 1 P > 59% P: 71% Recall: 92% This test set has over 3900 sentences, 54% of which are subjective We evaluated a number of sets of patterns. The paper gives a graph evaluating these sets. I’m showing here two endpoints of the graph. Precision is the proportion of pattern instances that are in subjective sentences. Recall is the proportion of subjective sentences that contain at least one pattern instance. The first line evaluates all the patterns that have frequency at least 10 and 100% precision on the training set. These have precision of 85% and recall of 38% on the manually annotated test data. The second line evaluates all the patterns that have frequency at least 10 and at least 60% precision on the test set (so the second set includes the first). This set has 71% precision on the test set. And they have high recall – most of the sentences in the manually annotated test set contain an instance of the patterns. If I’m asked: the ones chosen using lower frequency threshold on the training set (the F > 1 versus F > 9) give rise to higher recall on the test set because although there are fewer instances of each pattern, there are many more patterns that get selected. 7/2003 EMNLP03

39 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier We now turn to this bootstrapping cycle. subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

40 Pattern-Based Subjective
17000 Subjective Classifier subjective sentences unlabeled sentences Extraction Pattern Learner 17000 Objective Classifier objective sentences subjective patterns new subjective sentences unlabeled sentences This is a sentence classifier that identifies new subjective sentences. The input are unlabeled sentences and the subjective patterns initially learned by the extraction pattern learner. Pattern-Based Subjective Classifier

41 Pattern-Based Subjective Classifier
17000 Subjective Classifier subjective sentences unlabeled sentences Extraction Pattern Learner 17000 Objective Classifier objective sentences subjective patterns 9500 new subjective sentences unlabeled sentences This is a simple classifier that identifies a sentence as subjective if there is at least one instance of a pattern that has frequency at least 5 and precision of 100 on the training data. This classifier found 9500 new subjective sentences. Pattern-Based Subjective Classifier > 0 instances of patterns with F >4 P = 1 on training data

42 Pattern-Based Subjective
17000 7500 Subjective Classifier subjective sentences unlabeled sentences Extraction Pattern Learner 9500 new subjective sentences 17000 Objective Classifier objective sentences unlabeled sentences The extraction pattern leaner was then evoked again. The input is the original objective sentences, The 9500 new subjective sentences, and 7500 of the original subjective sentences (so that the numbers of input objective and subjective sentences is the same). Pattern-Based Subjective Classifier

43 Pattern-Based Subjective
17000 7500 Subjective Classifier subjective sentences unlabeled sentences Extraction Pattern Learner 9500 new subjective sentences 17000 Objective Classifier objective sentences new subjective patterns unlabeled sentences That produced new subjective patterns. It found some significant new knowledge: patterns with precision at least 60 on the training data, and 308 patterns with precision of 100% Pattern-Based Subjective Classifier 4248 patterns P > .59 on training data 308 patterns P = 1.0 on training data

44 Pattern-Based Subjective
17000 7500 Subjective Classifier subjective sentences unlabeled sentences Extraction Pattern Learner 9500 new subjective sentences 17000 Objective Classifier objective sentences new subjective patterns unlabeled sentences As, well, the same type of evaluation was performed before for the original set of patterns augmented with the new patterns. The recall went up more than the precision went down in all cases. At the one extreme, the recall increased by 2 percentage points and the precision decreased by only .5; at the other extreme, the recall increased by 4% while the precision decreased by 2%. Pattern-Based Subjective Classifier Evaluate new + old patterns on test set: Recall +2–4% Prec -0.5–2%

45 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier Now we turn to the other bootstrapping cycle. subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

46 unlabeled sentences subjective patterns Subjective Classifier
subjective sentences Known subjective vocabulary Extraction Pattern Learner The subjective patterns learned by the extraction pattern learner are fed bad into the original subjective classifier to help it find additional subjective sentences.

47 subjective patterns F > 9, P = 1.0 on training data
unlabeled sentences subjective patterns F > 9, P = 1.0 on training data Subjective Classifier New subjective Sentences: 1 old clue + 1 new >1 new old + new subjective sentences Extraction Pattern Learner The subjective classifier was modified as follows. All of the sentences originally identified to be subjective still are. The classifier identifies new sentences: either the sentence has 1 old clue and 1 new clue, or more than one of the new clues, i.e, instances of the subjective patterns. Here we used those with frequency at least 10 so this is a tougher test. The patterns we added to the classifier are those that occur at least 10 times and have precision of 1 on the training data. Known subjective vocabulary

48 subjective patterns F > 9, P = 1.0 on training data
unlabeled sentences subjective patterns F > 9, P = 1.0 on training data Subjective Classifier New subjective Sentences: 1 old clue + 1 new >1 new old + new subjective sentences Extraction Pattern Learner Now we will evaluate the old and new subjective sentences when the classifier is applied to a manually annotated test set. Known subjective vocabulary

49 Evaluation on Test Data
Original subjective classifier 32.9% recall % precision Augmented subjective classifier 40.1% recall % precision The recall increased much more than the precision decreased. This is promising. And there are many variations to experiment with in the future…. 7/2003 EMNLP03

50 Future Work 7/2003 EMNLP03

51 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier Obviously, do more rounds of bootstrapping !!!! Another is to work on the objective classifier… subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

52 Pattern-Based Objective
Improve original high-precision classifier Identify new objective sentences during bootstrapping Known subjective vocabulary objective sentences Extraction Pattern Learner Objective Classifier objective sentences unlabeled sentences Pattern-Based Objective Classifier

53 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier For both the subjective and objective classifiers…. subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

54 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection Subjective Classifier Iteration 0 Iteration 1+ subjective sentences Known subjective vocabulary Use our current classifiers on the first iteration, and then a stronger supervised learning algorithm on subsequent iterations that could appropriate feature weighting and feature selection. The idea is to use largely corpus-independent clues in the original high-precision classifier to create a training set, and then a smarter supervised learner to tune to the particular corpus. (Why use 3 learners? AutoSlog-TS is finding linguistic patterns, not just classifying. We are interested in the linguistic expressions for themselves. But we plan to experiment with different configurations.) objective sentences Subjective Classifier Iteration 0 Iteration 1+

55 Build up subjective lexicon as the process is applied to new corpora.
Human review of high precision patterns Tough act to follow: linguistic subjectivity Rush Limbaugh: opinionated source police: “lightning rod topic” Known subjective vocabulary After the bootstrapping process terminates for a particular corpus, the patterns with high precision in the training set could be reviewed by a human, to determine which to add to the known subjective vocabulary. Thus we could build up the known subjectivity vocabulary as the process is applied to additional corpora. The learned patterns are ones that are highly correlated with subjective sentences in that corpus. Thus, they do not all represent linguistic subjectivity. It might be very useful for a human to classify them in some way. Identify expressions that Are linguistically subjective to distinguish them from other things highly correlated such as the sources who are particular opinionated in that corpus, or controversial or “lightning rod topics”. Those categories might be useful to know as well. A richer representation, including theta roles, and various characteristics, to use in a repository of manually and automatically developed knowledge. Richer Representation with deeper knowledge (theta roles, polarity, tone, ambiguity,…)

56 Conclusions High-precision subjectivity classification can be used to generate large amounts of labeled training data Extraction pattern learning techniques can learn linguistically rich subjective patterns Bootstrapping process results in higher recall with little loss in precision Q: why look for high precision but low recall features? Why not look for features with the best F-measure? Ans: the idea is to process lots of text and build up a knowledge store with a great number of high-precision low recall patterns. If you find enough of them, the set as a whole will not be low recall. We must not exclude infrequent clues: we know from previous work that many clues of subjectivity are low frequency. And, for applications such as QA, we don’t just want a binary classification of a sentence but also want to know which specific expressions are responsible. Why different thresholds in the different classifiers? For the original classifier, we were being tough – we were going for high precision. For the pattern-based classifier, we felt we wouldn’t get enough sentences. So, we did various tests. There are many configurations and thresholds to experiment with. We just did one configuration without experimenting with thresholds and such. This was a proof of concept – there are many areas in which it could be improved. 7/2003 EMNLP03

57 Annotation Scheme The annotation scheme was developed as part of a U.S. government-sponsored project (ARDA AQUAINT NRRC) to investigate multiple perspective question answering. Annotators labeled private state expressions. Each private state can have low, medium, or high strength. Our gold standard considers a sentence to be subjective if it contains at least one private state expression of medium or higher strength. 7/2003 EMNLP03

58 Two Ways of Expressing Private States
Explicit mentions of private states and speech events The United States fears a spill-over from the anti-terrorist campaign Expressive subjective elements The part of the US human rights report about China is full of absurdities and fabrications. 7/2003 EMNLP03

59 Nested Sources “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. (writer, Xirao-Nima, US) (writer, Xirao-Nima) (writer) “The report is full of absurdities,’’ he continued. (writer, Xirao-Nima) (writer) 7/2003 EMNLP03

60 OnlyFactive “The US fears a spill-over’’, said Xirao-Nima, a professor
(writer, Xirao-Nima, US) OnlyFactive=no OnlyFactive=yes (writer, Xirao-Nima) “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. (writer) OnlyFactive=yes 7/2003 EMNLP03

61 Example (writer,FM) (writer,FM,FM) (writer,FM) The Foreign Ministry said Thursday that it was “surprised, to put it mildly” (writer,FM,FM,SD) by the U.S. State Department’s criticism of Russia’s human rights (writer,FM) (writer,FM) record and objected in particular to the “odious” section on Chechnya. 7/2003 EMNLP03

62 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier Here is a picture of the overall process, which we’ll keep returning to. Start with a large collection of unannotated texts and a lexicon of subjective language (components in yellow). ARROWS? HEREHEREHERE The subjective classifier is a high-precision, low recall classifier that identifies sentences it can label as subjective with confidence (and leaves the rest unclassified) The objective classifier is a high-precision, low recall classifier that identifies sentences it can label as objective with confidence (and leaves the rest unclassified) The idea is to start with a known subjective vocabulary. Things that are not specific to a particular category. Then use an automatic method to then tune to that corpus. implementation issues but also this is a proof of concept; there are many potential improvments. this is the simplest most straightforward configuration. subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier

63 Unannotated Text Collection
unlabeled sentences Unannotated Text Collection subjective patterns Subjective Classifier subjective sentences Known subjective vocabulary unlabeled sentences objective sentences Extraction Pattern Learner Objective Classifier Here is a picture of the overall process, which we’ll keep returning to. Start with a large collection of unannotated texts and a lexicon of subjective language (components in yellow). ARROWS? HEREHEREHERE The subjective classifier is a high-precision, low recall classifier that identifies sentences it can label as subjective with confidence (and leaves the rest unclassified) The objective classifier is a high-precision, low recall classifier that identifies sentences it can label as objective with confidence (and leaves the rest unclassified) The idea is to start with a known subjective vocabulary. Things that are not specific to a particular category. Then use an automatic method to then tune to that corpus. implementation issues but also this is a proof of concept; there are many potential improvments. this is the simplest most straightforward configuration. subjective sentences subjective patterns unlabeled sentences Pattern-Based Subjective Classifier


Download ppt "Learning Extraction Patterns for Subjective Expressions"

Similar presentations


Ads by Google