Presentation is loading. Please wait.

Presentation is loading. Please wait.

Opinion Integration and Summarization Yue Lu University of Illinois at Urbana-Champaign.

Similar presentations


Presentation on theme: "Opinion Integration and Summarization Yue Lu University of Illinois at Urbana-Champaign."— Presentation transcript:

1 Opinion Integration and Summarization Yue Lu University of Illinois at Urbana-Champaign

2 Opinions needed in all kinds of decision processes 2http://sifaka.cs.uiuc.edu/yuelu2/ “What do people complain about iPhone?” “How do people like the new drug?” “How is the new policy received?” Business intelligence Health informatics Political science Yue Lu

3 Online opinions cover all kinds of topics 3 65M msgs/day Topics: People Events Products Services, … Sources: Blogs Microblogs Forums Reviews,… 53M blogs 1307M posts 115M users 10M groups 45M reviews … … Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/

4 4 How could I read them all? http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu After collecting opinions using Google

5 Online opinions are complicated http://sifaka.cs.uiuc.edu/yuelu2/5 High quality Low quality Aspect Sentiment Quality Yue Lu

6 Online Opinions Topic = t Integrated Summary AspectOpinion SentencesSentimentQuality Aspect 1 positive negative high medium Aspect 2 neutral positive low high ………… Sentence 512 Sentence 823 Sentence 21 Sentence 153 Opinion Integration Opinion Integration Sentiment Analysis Sentiment Analysis Quality Prediction Sentence1 Sentence 2 Sentence 100 Sentence 900 … … 6http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu Vision: Opinion Integration & Summarization Major Challenge: develop general techniques that work for arbitrary topics Major Challenge: develop general techniques that work for arbitrary topics …

7 Existing work cannot scale to different topics Review summarization – Unsupervised feature extraction + opinion polarity identification: [Hu&Liu 04], [Popescu&Etzioni 05], … – Supervised aspect extraction: [Zhuang et al] … Hidden aspect discovery: [Hofmann99] [[Chen&Dumais00] [Blei et al03] [Zhai et al04] [Li&McCallum06] [Titov&McDonald08]… Sentiment classification – Binary classification: [Pang&Lee02] [Kim&Hovy04] [Cui et al06] … – Rating classification: [Pang&Lee05] [Snyder&Barzilay07] … Opinion Quality Prediction: [Zhang&Varadarajan`06] [Kim et al. `06] [Liu et al. `08] [Ghose&Ipeirotis `10]… 7http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu Heavily rely on domain specific Hand-labeled training data Hand-written heuristics/rules Heavily rely on domain specific Hand-labeled training data Hand-written heuristics/rules How to? develop general techniques that work for arbitrary topics How to? develop general techniques that work for arbitrary topics …

8 Sentence1 Sentence 2 Sentence 100 Sentence 900 … … 8http://sifaka.cs.uiuc.edu/yuelu2/ New idea: exploit naturally available resources Structured Ontology Overall Sentiment Ratings Expert Articles Topic = t [COLING'10] [WWW‘09] [KDD’10] [WWW’11] [WWW'10] [WWW‘08] Social Networks Yue Lu

9 9 Intuition: scalable to different topics Yue Lu 3.5 M things 45M reviews 22 M topics 500 M users >3 M users >3 K products/y 3.5 M articles Opportunities? Provide domain-specific guidance Alleviate heavy dependence on human labors Opportunities? Provide domain-specific guidance Alleviate heavy dependence on human labors Challenges? Cannot directly apply supervised machine learning Need for new methods Challenges? Cannot directly apply supervised machine learning Need for new methods http://sifaka.cs.uiuc.edu/yuelu2/

10 Online Opinions Topic = t Integrated Summary AspectOpinion SentencesSentimentQuality Aspect 1 positive negative high medium Aspect 2 neutral positive low high ………… Sentence 512 Sentence 823 Sentence 21 Sentence 153 Opinion Integration Opinion Integration Sentiment Analysis Sentiment Analysis Quality Prediction Sentence1 Sentence 2 Sentence 100 Sentence 900 … … 10http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu My Work … [WWW’08] [COLING'10] [WWW’10] [WWW’09] [KDD’10] [WWW’11]

11 [WWW’11] “Automatic Construction of a Context- Aware Sentiment Lexicon: an Optimization Approach” AspectsOpinion SentencesSentimentQuality Aspect 1positive negative high medium Aspect 2neutral positive low high Sentence 512 Sentence 823 Sentence 21 Sentence 153 Opinion Integration Opinion Integration Sentiment Analysis Sentiment Analysis Quality Prediction 11http://sifaka.cs.uiuc.edu/yuelu2/ Roadmap [WWW’08] [COLING'10] [WWW’10] [WWW’09] [KDD’10] [WWW’11] Yue Lu Integrated Summary

12 “unpredictable” Domain = Movie Domain = Laptop 12 A well-known challenge: sentiments are domain dependent Existing Work Linguistic heuristics [Hatzivassiloglou&McKeown `97], [Kanayama&Nasukawa `06], … Morphology, synonymy [Neviarouskaya et al `09], [Mohammad et al `09], … Seed sentiment words [Turney&Littman `03], … Document-level sentiment rating [Choi and C. Cardie. `09], … Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/

13 “large” Aspect = Screen Aspect = Battery 13 Sentiments are also aspect dependent Domain = Laptop Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/

14 New problem: constructing aspect-dependent sentiment lexicon 14 SCREEN-large+1 SCREEN-great+1 BATTERY-large-1… SCREEN-large+1 SCREEN-great+1 BATTERY-large-1… Output: Input: “Aspect-Adj”: sentiment_score “Aspects” Laptop Collection + Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/ SCREEN: screen, LCD, display, … BATTERY: battery, power, charger, … PRICE: price, cost, money, … … A challenging problem: due to increased sparseness A challenging problem: due to increased sparseness

15 General Sentiment Lexicon excellent, awesome, … bad, terrible, … Dictionary large~ big, …large tiny, … Language Heuristics 1. “and” clue 2. “but” clue 3. “negation” clue Screen: text… Battery: text… Overall Sentiment Ratings … 1 1 4 4 3 3 2 2 Our idea: exploit multiple resources 15Yue Lu SynonymsAntonyms SCREEN-large SCREEN-great BATTERY-large SCREEN-large SCREEN-great BATTERY-large Challenges: 1.signals in different format 2.contradictory signals Challenges: 1.signals in different format 2.contradictory signals ?

16 A Novel Optimization Framework S = argmin subject to 16 + δ λprior + λsim + λoppo + λrating SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… Objective function designed to encode signals from multiple resources http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu S S: Aspect-Dependent Sentiment Lexicon Constraints

17 1. sentiment prior 17 G: General-purpose Sentiment Lexicon S = argmin + δ λprior + λsim + λoppo + λrating http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu S: Aspect-Dependent Sentiment Lexicon S SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… SCREEN-great1 SCREEN-bad-1 BATTERY-great1… SCREEN-great1 SCREEN-bad-1 BATTERY-great1…

18 2. overall sentiment rating 18 O: Review Overall Ratings R11 R21 R3-1 R40 ….. R11 R21 R3-1 R40 ….. X: Review Word Matrix * S = argmin λprior + λsim + λoppo + δ ~ + λrating S Predicted Ratings R10.8 R20.5 R3-0.7 R40.1 ….. R10.8 R20.5 R3-0.7 R40.1 ….. = SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… S: Aspect-Dependent Sentiment Lexicon R1SCREEN-bright0.2 R1BATTERY-large0.3 R1SCREEN-great0.5 R2SCREEN-awesome0.4 ….. R1SCREEN-bright0.2 R1BATTERY-large0.3 R1SCREEN-great0.5 R2SCREEN-awesome0.4 …..

19 3. similar sentiments 19 A: Similar-Sentiment Matrix (from synonyms and “and” clues) S = argmin + δ λprior + λsim + λoppo + λrating http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu S SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… S: Aspect-Dependent Sentiment Lexicon SCREEN-largeSCREEN-big1 SCREEN-badSCREEN-terrible1 BATTERY-smallBATTERY-tiny1… SCREEN-largeSCREEN-big1 SCREEN-badSCREEN-terrible1 BATTERY-smallBATTERY-tiny1…

20 4. opposite sentiment 20 subject to S = argmin + δ λprior + λsim + λoppo + λrating B: Opposite-Sentiment Matrix (from antonyms and “but” clues) Separate the representation of S j : - Sign: only one of S j +, S j - is active - Abs Value: value of the active one Separate the representation of S j : - Sign: only one of S j +, S j - is active - Abs Value: value of the active one http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu S SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… S: Aspect-Dependent Sentiment Lexicon SCREEN-largeSCREEN-small 1 SCREEN-excellentBATTERY-big 1 BATTERY-smallBATTERY-big 1… SCREEN-largeSCREEN-small 1 SCREEN-excellentBATTERY-big 1 BATTERY-smallBATTERY-big 1… Sign is different Abs Value is similar Sign is different Abs Value is similar

21 +δ+δ A Novel Optimization Framework S = argmin subject to 21 + δ λprior + λsim + λoppo + λrating Overall rating General sentiment lexicon Synonyms “and” clues Synonyms “and” clues 1 1 2 2 3 3 4 4 Antonyms “but” clues Antonyms “but” clues Weights set as the degree we trust each signal Weights set as the degree we trust each signal 3 3 4 4 S Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/ Transform to linear programming solved efficiently using GAMS/CPLEX Transform to linear programming solved efficiently using GAMS/CPLEX

22 Evaluation: Data Sets Hotel DataPrinter Data SourceTripAdvisorCustomer Survey # doc47923511 # aspects725 AVG length27024 # judged doc7503511 # judged lexicon entry705NA # judged doc-aspect pair21454634 Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/22 Evaluation (1): Lexicon Quality Evaluation (2): Doc-Aspect Sentiment, aggregate the sentiment of lexicon entries to doc level

23 Evaluation (1): Lexicon Quality OPT > Global > Dictionary 23 MethodPrecisionRecallF-Score Random0.49320.27840.3559 MPQA0.96310.37020.5348 INQ0.87570.43970.5855 Global0.70730.59290.6451 OPT0.81250.68230.7417 equal weights, i.e. (λprior:λrating:λsim:λoppo = 1:1:1:1) Guess 1,0,-1 uniformly General dictionary only Overall ratings only Our method with [Lu et. al. WWW09] 15% 27% 39% Interesting sample results using OPT: Hotel Data: ROOM-private, FOOD-excelent Printer Data: INK-fast, SUPPORT-fast Hotel Data Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/

24 Tuning weights further improves performance 24 λpriorλsimλoppoλratingF-Score 11110.7417 01110.6549 10110.7309 11010.7408 11100.6453 21120.7431 31130.7544 61160.7510 81180.7506 OPT default: equal weights Dropping one term More weights on important terms Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/

25 Evaluation (2): Doc-Aspect Sentiment: OPT > Global > Dictionary 25 MethodPrecisionRecallF-ScoreMSE Random0.48440.26290.34080.7142 MPQA0.75790.15970.26390.5740 INQ0.78790.35020.48490.5365 Global0.76450.54480.63620.5091 OPT0.82220.52760.64280.4680 Random0.43680.36890.39990.5670 MPQA0.81280.52890.64080.4700 INQ0.78000.62940.69660.4561 Global0.69750.77300.73330.4426 OPT0.72830.77560.75120.4160 Printer Data Hotel Data 2% 1% 6% 8% 17% 33% 144% 13% 18% 9% 11% Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/

26 AspectsOpinion SentencesSentimentQuality Aspect 1positive negative high medium Aspect 2neutral positive low high Sentence 512 Sentence 823 Sentence 21 Sentence 153 Opinion Integration Opinion Integration Sentiment Analysis Sentiment Analysis Quality Prediction 26http://sifaka.cs.uiuc.edu/yuelu2/ Roadmap [WWW’10]: Exploiting Social Context for Review Quality Prediction [WWW’08] [COLING'10] [WWW’10] [WWW’09] [KDD’10] [WWW’11] Yue Lu Integrated Summary

27 Existing Work of Quality Prediction 27 As a supervised learning problem √ × ? ? ? ? ? ? ? ? ? √ [Zhang&Varadarajan`06] [Kim et al. `06] [Liu et al. `08] [Ghose&Ipeirotis `10] Labeled Unlabeled Textual features Meta-data features http://sifaka.cs.uiuc.edu/yuelu2/ Very Helpful Not Helpful Yue Lu

28 Base model: Linear Regression 28 w = argmin = argmin{ } Quality( ) = Weights × FeatureVector( ) i i Closed-form: w= http://sifaka.cs.uiuc.edu/yuelu2/ Textual Features Yue Lu w w Labeled Labels are expensive to obtain!

29 We also observe… 29 Reviewer Identity Social Network Social Context + http://sifaka.cs.uiuc.edu/yuelu2/ Quality( ) is related to its Social Network Quality( ) Intuitions: is related to How to use them to help prediction? Yue Lu Our idea: social context can help!

30 30 { + β× Graph Regularizer } w = argmin Trade-off parameter Designed to “favor” our intuitions Baseline Loss function Advantages: Semi-supervised: make use of unlabeled data Applicable to reviews without social context Labeled Unlabeled How to design the regularizers? http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu Our approach: add social context as graph-based regularizersw

31 Hypothesis 1: Reviewer Consistency 31 Quality( ) Quality( ) ~ 1 2 3 4 1 4 Quality( ) 2 Quality( ) ~ 3 Reviewers are consistent! Reviewers are consistent! http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu

32 Regularizer for Reviewer Consistency 32 Reviewer Regularizer =∑ [ Quality( ) - Quality( ) ] 2 Quality( ) ] 2 1 2 Closed-form solution! 1 2 3 4 Same-Author Graph (A) [Zhou et al. 03] [Zhu et al. 03] [Belkin et al 06] w= Graph LaplacianReview-Feature Matrix http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu

33 Hypothesis 2: Trust Consistency 33 Quality( ) - Quality( ) ≤ 0 I trust people with quality at least as good as mine! http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu

34 Regularizer for Trust Consistency 34 Trust Regularizer =∑max[0, Quality( ) - Quality( )] 2 Quality( )] 2 No closed-form solution… Still convex  Gradient Descent Trust Graph http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu

35 Hypothesis 3 &4 35 Trust GraphCo-citation Graph http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu Link Graph Hypothesis 4: Link Consistency Hypothesis 3: Co-citation Consistency

36 Mathematical Formulations http://sifaka.cs.uiuc.edu/yuelu2/36 1. Reviewer Consistency: 2. Trust Consistency: 3. Co-citation Consistency: 4. Link Consistency: Yue Lu Closed form Gradient descent

37 Evaluation: Data Sets from Ciao UK StatisticsCellphoneBeautyDigital Camera # Reviews194348493697 Reviews/Reviewer ratio 2.212.841.06 Trust Graph Density0.00750.0140.0006 37 SummaryCellphoneBeautyDigital Camera Social Contextrich sparse Gold-std Quality Distribution balancedskewedbalanced http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu

38 Our methods are most effective with limited labeled data 38 % of MSE Difference Percentage of labeled Data 10%25%50%100% (Cellphone) Better Reg:LinkReg:ReviewerReg:Cocitation Reg:Trust http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu Baseline

39 39 % of MSE Difference CellphoneBeautyDigital Camera Better Reg:Link Reg:Reviewer Reg:Cocitation Reg:Trust http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu Our methods are most effective with rich social context Baseline Reviews/Reviewer ratio = 1.06 Reviews/Reviewer ratio = 1.06

40 AspectsOpinion SentencesSentimentQuality Aspect 1positive negative high medium Aspect 2neutral positive low high Sentence 512 Sentence 823 Sentence 21 Sentence 153 Opinion Integration Opinion Integration Sentiment Analysis Sentiment Analysis Quality Prediction 40http://sifaka.cs.uiuc.edu/yuelu2/ Summary of this talk Yue Lu Integrated Summary …

41 AspectsOpinion SentencesSentimentQuality Aspect 1positive negative high medium Aspect 2neutral positive low high Sentence 512 Sentence 823 Sentence 21 Sentence 153 Opinion Integration Opinion Integration Sentiment Analysis Sentiment Analysis Quality Prediction 41http://sifaka.cs.uiuc.edu/yuelu2/ Summary of this talk 1.Sentiment Analysis: construct aspect- dependent sentiment lexicon 2.Quality Prediction: exploit social context [WWW’08] [COLING'10] [WWW’10] [WWW’09] [KDD’10] [WWW’11] Yue Lu Integrated Summary

42 Future Directions http://sifaka.cs.uiuc.edu/yuelu2/42 65M msgs/day53M blogs 1307M posts 115M users 10M groups 45M reviews Yue Lu Integrative Analysis Integrative Analysis Efficient Algo for Real-time Interaction Efficient Algo for Real-time Interaction Task-support Applications Task-support Applications

43 Summary of my other work: Text Information Management Text Mining [IRJ 10] “Investigation of Topic Models” [COLING 10] [WWW 09] [WWW 08] [WWW 10] [WWW 11] Opinion Integration and Summarization Opinion Integration and Summarization [KDD 10] Bioinformatics Information Retrieval Information Retrieval [NAR 07] “An open system for microarray clustering” [NAR 10] “Bio literature mining” http://sifaka.cs.uiuc.edu/yuelu2/43Yue Lu [IRJ 09] “Bio literature IR” [TREC 07]

44 Thank you! & Questions?

45 Backup Slides

46 References [WWW'11] Yue Lu, Malu Castellanos, Umeshwar Dayal, ChengXiang Zhai. "Automatic Construction of a Context-Aware Sentiment Lexicon: An Optimization Approach", To Appear at WWW’11 [COLING'10] Yue Lu, Huizhong Duan, Hongning Wang and ChengXiang Zhai. "Exploiting Structured Ontology to Organize Scattered Online Opinions", In Proceedings of the 23rd International Conference on Computational Linguistics Pages: 734--742. [KDD’10] Hongning Wang, Yue Lu, and ChengXiang Zhai. "Latent Aspect Rating Analysis on Review Text Data: A Rating Regression Approach", In Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Pages: 783-792 [WWW'10] Yue Lu, Panayiotis Tsaparas, Alexandros Ntoulas, and Livia Polanyi. "Exploiting Social Context for Review Quality Prediction", In Proceedings of the 19th International World Wide Web Conference Pages: 691-700. [WWW'09] Yue Lu, ChengXiang Zhai and Neel Sundaresan. "Rated Aspect Summarization of Short Comments", In Proceedings of the 18th International World Wide Web Conference Pages: 131-140. [WWW'08] Yue Lu and ChengXiang Zhai. "Opinion Integration Through Semi-supervised Topic Modeling", In Proceedings of the 17th International World Wide Web Conference Pages: 121- 130. 46http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu

47 Other Publications [IRJ’10] Yue Lu, Qiaozhu Mei, ChengXiang Zhai. "Investigating Task Performance of Probabilistic Topic Models - An Empirical Study of PLSA and LDA", Information Retrieval. [NAR’10] X. He, Y. Li, R. Khetani, B. Sanders, Yue Lu, X. Ling, C.-X. Zhai, B. Schatz. “BSQA: Integrated Text Mining Using Entity Relation Semantics Extracted from Biological Literature of Insects", Nucleic Acids Research. [IRJ’09] Yue Lu, Hui Fang and ChengXiang Zhai. "An Empirical Study of Gene Synonym Query Expansion in Biomedical Information Retrieval", Information Retrieval Volume 12, Issue1 (2009), Pages: 51-68. [TREC'07] Yue Lu, Jing Jiang, Xu Ling, Xin He, ChengXiang Zhai. "Language Models for Genomics Information Retrieval: UIUC at TREC 2007 Genomics Track", In Proceedings of the 16th Text REtrieval Conference. [NAR’07] Yue Lu, Xin He and Sheng Zhong. “Cross-species microarray analysis with the OSCAR system suggests an INSR->Pax6->NQO1 neuro-protective pathway in ageing and Alzheimer's disease", Nucleic Acids Research 105-114 47 Bioinformatics Biomedical IR Topic models http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu

48 http://sifaka.cs.uiuc.edu/yuelu2/48 Generating Candidate Lexicon Entries The LCD is great but battery is so large. [The/DT LCD/NN is/VBZ great] but/CC [battery/NN is/VBZ so/RB large/JJ]./. SCREEN-great BATTERY-large [The/DT (LCD/NN):SCREEN is/VBZ great/JJ] but/CC [(battery/NN):BATTERY is/VBZ so/RB large/JJ]./. Candidates: Parsed: Input: Aspect Tagged: SCREEN-large SCREEN-great BATTERY-large … SCREEN-large SCREEN-great BATTERY-large … ?

49 Hypotheses Testing (1): Reviewer Consistency 49 Qg( ) - 1 Qg( ) 2 Qg( ) - 1 Qg( ) 3 Hypothesis 1: Reviewer Consistency is supported by data Difference in Review Quality Density From same reviewer From different reviewers (Cellphone) http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu

50 Hypotheses Testing (2-4): Social Network-based Consistencies 50 Qg( ) - Qg( ) B is not linked to A B trusts A B is co-cited with A B is linked to A BA Hypotheses 2-4: Social Network-based Consistencies supported by data Hypotheses 2-4: Social Network-based Consistencies supported by data Difference in Reviewer Quality Density (Cellphone) http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu


Download ppt "Opinion Integration and Summarization Yue Lu University of Illinois at Urbana-Champaign."

Similar presentations


Ads by Google