Opinion Integration and Summarization Yue Lu University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Introduction to ReviewMiner Hongning Wang Department of Computer Science University of Illinois at Urbana-Champaign
Advertisements

Product Review Summarization Ly Duy Khang. Outline 1.Motivation 2.Problem statement 3.Related works 4.Baseline 5.Discussion.
A Researcher’s Workbench in 2020: Intelligent Information Systems for Knowledge Synthesis and Discovery ChengXiang (“Cheng”) Zhai Department of Computer.
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
Exploiting Structured Ontology to Organize Scattered Online Opinions Yue Lu, Huizhong Duan, Hongning Wang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
Exploiting Social Context for Review Quality Prediction Yue Lu University of Illinois at Urbana-Champaign Panayiotis Tsaparas Microsoft Research Alexandros.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Introduction to IR Research ChengXiang Zhai Department of Computer.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
MICHAEL PAUL AND ROXANA GIRJU UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics.
Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.
Latent Aspect Rating Analysis without Aspect Keyword Supervision Hongning Wang, Yue Lu, ChengXiang Zhai Department of.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
In Situ Evaluation of Entity Ranking and Opinion Summarization using Kavita Ganesan & ChengXiang Zhai University of Urbana Champaign
Topic Modeling with Network Regularization Qiaozhu Mei, Deng Cai, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Basic Concepts in Big Data
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
More than words: Social networks’ text mining for consumer brand sentiments A Case on Text Mining Key words: Sentiment analysis, SNS Mining Opinion Mining,
Opinion mining in social networks Student: Aleksandar Ponjavić 3244/2014 Mentor: Profesor dr Veljko Milutinović.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Prepare Yourself for IR Research ChengXiang Zhai Department of Computer.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Automatically Generating Gene Summaries from Biomedical Literature (To appear in Proceedings of PSB 2006) X. LING, J. JIANG, X. He, Q.~Z. MEI, C.~X. ZHAI,
Data Mining and Machine Learning Lab Unsupervised Feature Selection for Linked Social Media Data Jiliang Tang and Huan Liu Computer Science and Engineering.
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
1 Rated Aspect Summarization of Short Comments Yue Lu, ChengXiang Zhai, and Neel Sundaresan.
1 Rated Aspect Summarization of Short Comments Yue Lu, ChengXiang Zhai, and Neel Sundaresan Presented by: Sapan Shah.
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Entity Set Expansion in Opinion Documents Lei Zhang Bing Liu University of Illinois at Chicago.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
MODEL ADAPTATION FOR PERSONALIZED OPINION ANALYSIS MOHAMMAD AL BONI KEIRA ZHOU.
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
Domain Adaptation for Biomedical Information Extraction Jing Jiang BeeSpace Seminar Oct 17, 2007.
Semi-automatic Product Attribute Extraction from Store Website
1 Generating Comparative Summaries of Contradictory Opinions in Text (CIKM09’)Hyun Duk Kim, ChengXiang Zhai 2010/05/24 Yu-wen,Hsu.
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Automatic Labeling of Multinomial Topic Models
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
A Study of Poisson Query Generation Model for Information Retrieval
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Event Detection and Opinion Mining
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
Sentiment analysis algorithms and applications: A survey
Statistical Learning Methods for Natural Language Processing on the Internet 徐丹云.
中国计算机学会学科前沿讲习班:信息检索 Course Overview
A Researcher’s Workbench in 2020: Intelligent Information Systems for Knowledge Synthesis and Discovery ChengXiang (“Cheng”) Zhai Department of Computer.
Aspect-based sentiment analysis
Learning to Rank Shubhra kanti karmaker (Santu)
Introduction to TIMAN: Text Information Managemetn & Analysis
John Lafferty, Chengxiang Zhai School of Computer Science
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Presentation transcript:

Opinion Integration and Summarization Yue Lu University of Illinois at Urbana-Champaign

Opinions needed in all kinds of decision processes 2http://sifaka.cs.uiuc.edu/yuelu2/ “What do people complain about iPhone?” “How do people like the new drug?” “How is the new policy received?” Business intelligence Health informatics Political science Yue Lu

Online opinions cover all kinds of topics 3 65M msgs/day Topics: People Events Products Services, … Sources: Blogs Microblogs Forums Reviews,… 53M blogs 1307M posts 115M users 10M groups 45M reviews … … Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/

4 How could I read them all? Lu After collecting opinions using Google

Online opinions are complicated High quality Low quality Aspect Sentiment Quality Yue Lu

Online Opinions Topic = t Integrated Summary AspectOpinion SentencesSentimentQuality Aspect 1 positive negative high medium Aspect 2 neutral positive low high ………… Sentence 512 Sentence 823 Sentence 21 Sentence 153 Opinion Integration Opinion Integration Sentiment Analysis Sentiment Analysis Quality Prediction Sentence1 Sentence 2 Sentence 100 Sentence 900 … … 6http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu Vision: Opinion Integration & Summarization Major Challenge: develop general techniques that work for arbitrary topics Major Challenge: develop general techniques that work for arbitrary topics …

Existing work cannot scale to different topics Review summarization – Unsupervised feature extraction + opinion polarity identification: [Hu&Liu 04], [Popescu&Etzioni 05], … – Supervised aspect extraction: [Zhuang et al] … Hidden aspect discovery: [Hofmann99] [[Chen&Dumais00] [Blei et al03] [Zhai et al04] [Li&McCallum06] [Titov&McDonald08]… Sentiment classification – Binary classification: [Pang&Lee02] [Kim&Hovy04] [Cui et al06] … – Rating classification: [Pang&Lee05] [Snyder&Barzilay07] … Opinion Quality Prediction: [Zhang&Varadarajan`06] [Kim et al. `06] [Liu et al. `08] [Ghose&Ipeirotis `10]… 7http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu Heavily rely on domain specific Hand-labeled training data Hand-written heuristics/rules Heavily rely on domain specific Hand-labeled training data Hand-written heuristics/rules How to? develop general techniques that work for arbitrary topics How to? develop general techniques that work for arbitrary topics …

Sentence1 Sentence 2 Sentence 100 Sentence 900 … … 8http://sifaka.cs.uiuc.edu/yuelu2/ New idea: exploit naturally available resources Structured Ontology Overall Sentiment Ratings Expert Articles Topic = t [COLING'10] [WWW‘09] [KDD’10] [WWW’11] [WWW'10] [WWW‘08] Social Networks Yue Lu

9 Intuition: scalable to different topics Yue Lu 3.5 M things 45M reviews 22 M topics 500 M users >3 M users >3 K products/y 3.5 M articles Opportunities? Provide domain-specific guidance Alleviate heavy dependence on human labors Opportunities? Provide domain-specific guidance Alleviate heavy dependence on human labors Challenges? Cannot directly apply supervised machine learning Need for new methods Challenges? Cannot directly apply supervised machine learning Need for new methods

Online Opinions Topic = t Integrated Summary AspectOpinion SentencesSentimentQuality Aspect 1 positive negative high medium Aspect 2 neutral positive low high ………… Sentence 512 Sentence 823 Sentence 21 Sentence 153 Opinion Integration Opinion Integration Sentiment Analysis Sentiment Analysis Quality Prediction Sentence1 Sentence 2 Sentence 100 Sentence 900 … … 10http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu My Work … [WWW’08] [COLING'10] [WWW’10] [WWW’09] [KDD’10] [WWW’11]

[WWW’11] “Automatic Construction of a Context- Aware Sentiment Lexicon: an Optimization Approach” AspectsOpinion SentencesSentimentQuality Aspect 1positive negative high medium Aspect 2neutral positive low high Sentence 512 Sentence 823 Sentence 21 Sentence 153 Opinion Integration Opinion Integration Sentiment Analysis Sentiment Analysis Quality Prediction 11http://sifaka.cs.uiuc.edu/yuelu2/ Roadmap [WWW’08] [COLING'10] [WWW’10] [WWW’09] [KDD’10] [WWW’11] Yue Lu Integrated Summary

“unpredictable” Domain = Movie Domain = Laptop 12 A well-known challenge: sentiments are domain dependent Existing Work Linguistic heuristics [Hatzivassiloglou&McKeown `97], [Kanayama&Nasukawa `06], … Morphology, synonymy [Neviarouskaya et al `09], [Mohammad et al `09], … Seed sentiment words [Turney&Littman `03], … Document-level sentiment rating [Choi and C. Cardie. `09], … Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/

“large” Aspect = Screen Aspect = Battery 13 Sentiments are also aspect dependent Domain = Laptop Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/

New problem: constructing aspect-dependent sentiment lexicon 14 SCREEN-large+1 SCREEN-great+1 BATTERY-large-1… SCREEN-large+1 SCREEN-great+1 BATTERY-large-1… Output: Input: “Aspect-Adj”: sentiment_score “Aspects” Laptop Collection + Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/ SCREEN: screen, LCD, display, … BATTERY: battery, power, charger, … PRICE: price, cost, money, … … A challenging problem: due to increased sparseness A challenging problem: due to increased sparseness

General Sentiment Lexicon excellent, awesome, … bad, terrible, … Dictionary large~ big, …large tiny, … Language Heuristics 1. “and” clue 2. “but” clue 3. “negation” clue Screen: text… Battery: text… Overall Sentiment Ratings … Our idea: exploit multiple resources 15Yue Lu SynonymsAntonyms SCREEN-large SCREEN-great BATTERY-large SCREEN-large SCREEN-great BATTERY-large Challenges: 1.signals in different format 2.contradictory signals Challenges: 1.signals in different format 2.contradictory signals ?

A Novel Optimization Framework S = argmin subject to 16 + δ λprior + λsim + λoppo + λrating SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… Objective function designed to encode signals from multiple resources Lu S S: Aspect-Dependent Sentiment Lexicon Constraints

1. sentiment prior 17 G: General-purpose Sentiment Lexicon S = argmin + δ λprior + λsim + λoppo + λrating Lu S: Aspect-Dependent Sentiment Lexicon S SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… SCREEN-great1 SCREEN-bad-1 BATTERY-great1… SCREEN-great1 SCREEN-bad-1 BATTERY-great1…

2. overall sentiment rating 18 O: Review Overall Ratings R11 R21 R3-1 R40 ….. R11 R21 R3-1 R40 ….. X: Review Word Matrix * S = argmin λprior + λsim + λoppo + δ ~ + λrating S Predicted Ratings R10.8 R20.5 R3-0.7 R40.1 ….. R10.8 R20.5 R3-0.7 R40.1 ….. = SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… S: Aspect-Dependent Sentiment Lexicon R1SCREEN-bright0.2 R1BATTERY-large0.3 R1SCREEN-great0.5 R2SCREEN-awesome0.4 ….. R1SCREEN-bright0.2 R1BATTERY-large0.3 R1SCREEN-great0.5 R2SCREEN-awesome0.4 …..

3. similar sentiments 19 A: Similar-Sentiment Matrix (from synonyms and “and” clues) S = argmin + δ λprior + λsim + λoppo + λrating Lu S SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… S: Aspect-Dependent Sentiment Lexicon SCREEN-largeSCREEN-big1 SCREEN-badSCREEN-terrible1 BATTERY-smallBATTERY-tiny1… SCREEN-largeSCREEN-big1 SCREEN-badSCREEN-terrible1 BATTERY-smallBATTERY-tiny1…

4. opposite sentiment 20 subject to S = argmin + δ λprior + λsim + λoppo + λrating B: Opposite-Sentiment Matrix (from antonyms and “but” clues) Separate the representation of S j : - Sign: only one of S j +, S j - is active - Abs Value: value of the active one Separate the representation of S j : - Sign: only one of S j +, S j - is active - Abs Value: value of the active one Lu S SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… SCREEN-largeS 1 SCREEN-greatS 2 BATTERY-largeS 3… S: Aspect-Dependent Sentiment Lexicon SCREEN-largeSCREEN-small 1 SCREEN-excellentBATTERY-big 1 BATTERY-smallBATTERY-big 1… SCREEN-largeSCREEN-small 1 SCREEN-excellentBATTERY-big 1 BATTERY-smallBATTERY-big 1… Sign is different Abs Value is similar Sign is different Abs Value is similar

+δ+δ A Novel Optimization Framework S = argmin subject to 21 + δ λprior + λsim + λoppo + λrating Overall rating General sentiment lexicon Synonyms “and” clues Synonyms “and” clues Antonyms “but” clues Antonyms “but” clues Weights set as the degree we trust each signal Weights set as the degree we trust each signal S Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/ Transform to linear programming solved efficiently using GAMS/CPLEX Transform to linear programming solved efficiently using GAMS/CPLEX

Evaluation: Data Sets Hotel DataPrinter Data SourceTripAdvisorCustomer Survey # doc # aspects725 AVG length27024 # judged doc # judged lexicon entry705NA # judged doc-aspect pair Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/22 Evaluation (1): Lexicon Quality Evaluation (2): Doc-Aspect Sentiment, aggregate the sentiment of lexicon entries to doc level

Evaluation (1): Lexicon Quality OPT > Global > Dictionary 23 MethodPrecisionRecallF-Score Random MPQA INQ Global OPT equal weights, i.e. (λprior:λrating:λsim:λoppo = 1:1:1:1) Guess 1,0,-1 uniformly General dictionary only Overall ratings only Our method with [Lu et. al. WWW09] 15% 27% 39% Interesting sample results using OPT: Hotel Data: ROOM-private, FOOD-excelent Printer Data: INK-fast, SUPPORT-fast Hotel Data Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/

Tuning weights further improves performance 24 λpriorλsimλoppoλratingF-Score OPT default: equal weights Dropping one term More weights on important terms Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/

Evaluation (2): Doc-Aspect Sentiment: OPT > Global > Dictionary 25 MethodPrecisionRecallF-ScoreMSE Random MPQA INQ Global OPT Random MPQA INQ Global OPT Printer Data Hotel Data 2% 1% 6% 8% 17% 33% 144% 13% 18% 9% 11% Yue Luhttp://sifaka.cs.uiuc.edu/yuelu2/

AspectsOpinion SentencesSentimentQuality Aspect 1positive negative high medium Aspect 2neutral positive low high Sentence 512 Sentence 823 Sentence 21 Sentence 153 Opinion Integration Opinion Integration Sentiment Analysis Sentiment Analysis Quality Prediction 26http://sifaka.cs.uiuc.edu/yuelu2/ Roadmap [WWW’10]: Exploiting Social Context for Review Quality Prediction [WWW’08] [COLING'10] [WWW’10] [WWW’09] [KDD’10] [WWW’11] Yue Lu Integrated Summary

Existing Work of Quality Prediction 27 As a supervised learning problem √ × ? ? ? ? ? ? ? ? ? √ [Zhang&Varadarajan`06] [Kim et al. `06] [Liu et al. `08] [Ghose&Ipeirotis `10] Labeled Unlabeled Textual features Meta-data features Very Helpful Not Helpful Yue Lu

Base model: Linear Regression 28 w = argmin = argmin{ } Quality( ) = Weights × FeatureVector( ) i i Closed-form: w= Textual Features Yue Lu w w Labeled Labels are expensive to obtain!

We also observe… 29 Reviewer Identity Social Network Social Context + Quality( ) is related to its Social Network Quality( ) Intuitions: is related to How to use them to help prediction? Yue Lu Our idea: social context can help!

30 { + β× Graph Regularizer } w = argmin Trade-off parameter Designed to “favor” our intuitions Baseline Loss function Advantages: Semi-supervised: make use of unlabeled data Applicable to reviews without social context Labeled Unlabeled How to design the regularizers? Lu Our approach: add social context as graph-based regularizersw

Hypothesis 1: Reviewer Consistency 31 Quality( ) Quality( ) ~ Quality( ) 2 Quality( ) ~ 3 Reviewers are consistent! Reviewers are consistent! Lu

Regularizer for Reviewer Consistency 32 Reviewer Regularizer =∑ [ Quality( ) - Quality( ) ] 2 Quality( ) ] Closed-form solution! Same-Author Graph (A) [Zhou et al. 03] [Zhu et al. 03] [Belkin et al 06] w= Graph LaplacianReview-Feature Matrix Lu

Hypothesis 2: Trust Consistency 33 Quality( ) - Quality( ) ≤ 0 I trust people with quality at least as good as mine! Lu

Regularizer for Trust Consistency 34 Trust Regularizer =∑max[0, Quality( ) - Quality( )] 2 Quality( )] 2 No closed-form solution… Still convex  Gradient Descent Trust Graph Lu

Hypothesis 3 &4 35 Trust GraphCo-citation Graph Lu Link Graph Hypothesis 4: Link Consistency Hypothesis 3: Co-citation Consistency

Mathematical Formulations 1. Reviewer Consistency: 2. Trust Consistency: 3. Co-citation Consistency: 4. Link Consistency: Yue Lu Closed form Gradient descent

Evaluation: Data Sets from Ciao UK StatisticsCellphoneBeautyDigital Camera # Reviews Reviews/Reviewer ratio Trust Graph Density SummaryCellphoneBeautyDigital Camera Social Contextrich sparse Gold-std Quality Distribution balancedskewedbalanced Lu

Our methods are most effective with limited labeled data 38 % of MSE Difference Percentage of labeled Data 10%25%50%100% (Cellphone) Better Reg:LinkReg:ReviewerReg:Cocitation Reg:Trust Lu Baseline

39 % of MSE Difference CellphoneBeautyDigital Camera Better Reg:Link Reg:Reviewer Reg:Cocitation Reg:Trust Lu Our methods are most effective with rich social context Baseline Reviews/Reviewer ratio = 1.06 Reviews/Reviewer ratio = 1.06

AspectsOpinion SentencesSentimentQuality Aspect 1positive negative high medium Aspect 2neutral positive low high Sentence 512 Sentence 823 Sentence 21 Sentence 153 Opinion Integration Opinion Integration Sentiment Analysis Sentiment Analysis Quality Prediction 40http://sifaka.cs.uiuc.edu/yuelu2/ Summary of this talk Yue Lu Integrated Summary …

AspectsOpinion SentencesSentimentQuality Aspect 1positive negative high medium Aspect 2neutral positive low high Sentence 512 Sentence 823 Sentence 21 Sentence 153 Opinion Integration Opinion Integration Sentiment Analysis Sentiment Analysis Quality Prediction 41http://sifaka.cs.uiuc.edu/yuelu2/ Summary of this talk 1.Sentiment Analysis: construct aspect- dependent sentiment lexicon 2.Quality Prediction: exploit social context [WWW’08] [COLING'10] [WWW’10] [WWW’09] [KDD’10] [WWW’11] Yue Lu Integrated Summary

Future Directions 65M msgs/day53M blogs 1307M posts 115M users 10M groups 45M reviews Yue Lu Integrative Analysis Integrative Analysis Efficient Algo for Real-time Interaction Efficient Algo for Real-time Interaction Task-support Applications Task-support Applications

Summary of my other work: Text Information Management Text Mining [IRJ 10] “Investigation of Topic Models” [COLING 10] [WWW 09] [WWW 08] [WWW 10] [WWW 11] Opinion Integration and Summarization Opinion Integration and Summarization [KDD 10] Bioinformatics Information Retrieval Information Retrieval [NAR 07] “An open system for microarray clustering” [NAR 10] “Bio literature mining” Lu [IRJ 09] “Bio literature IR” [TREC 07]

Thank you! & Questions?

Backup Slides

References [WWW'11] Yue Lu, Malu Castellanos, Umeshwar Dayal, ChengXiang Zhai. "Automatic Construction of a Context-Aware Sentiment Lexicon: An Optimization Approach", To Appear at WWW’11 [COLING'10] Yue Lu, Huizhong Duan, Hongning Wang and ChengXiang Zhai. "Exploiting Structured Ontology to Organize Scattered Online Opinions", In Proceedings of the 23rd International Conference on Computational Linguistics Pages: [KDD’10] Hongning Wang, Yue Lu, and ChengXiang Zhai. "Latent Aspect Rating Analysis on Review Text Data: A Rating Regression Approach", In Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Pages: [WWW'10] Yue Lu, Panayiotis Tsaparas, Alexandros Ntoulas, and Livia Polanyi. "Exploiting Social Context for Review Quality Prediction", In Proceedings of the 19th International World Wide Web Conference Pages: [WWW'09] Yue Lu, ChengXiang Zhai and Neel Sundaresan. "Rated Aspect Summarization of Short Comments", In Proceedings of the 18th International World Wide Web Conference Pages: [WWW'08] Yue Lu and ChengXiang Zhai. "Opinion Integration Through Semi-supervised Topic Modeling", In Proceedings of the 17th International World Wide Web Conference Pages: http://sifaka.cs.uiuc.edu/yuelu2/Yue Lu

Other Publications [IRJ’10] Yue Lu, Qiaozhu Mei, ChengXiang Zhai. "Investigating Task Performance of Probabilistic Topic Models - An Empirical Study of PLSA and LDA", Information Retrieval. [NAR’10] X. He, Y. Li, R. Khetani, B. Sanders, Yue Lu, X. Ling, C.-X. Zhai, B. Schatz. “BSQA: Integrated Text Mining Using Entity Relation Semantics Extracted from Biological Literature of Insects", Nucleic Acids Research. [IRJ’09] Yue Lu, Hui Fang and ChengXiang Zhai. "An Empirical Study of Gene Synonym Query Expansion in Biomedical Information Retrieval", Information Retrieval Volume 12, Issue1 (2009), Pages: [TREC'07] Yue Lu, Jing Jiang, Xu Ling, Xin He, ChengXiang Zhai. "Language Models for Genomics Information Retrieval: UIUC at TREC 2007 Genomics Track", In Proceedings of the 16th Text REtrieval Conference. [NAR’07] Yue Lu, Xin He and Sheng Zhong. “Cross-species microarray analysis with the OSCAR system suggests an INSR->Pax6->NQO1 neuro-protective pathway in ageing and Alzheimer's disease", Nucleic Acids Research Bioinformatics Biomedical IR Topic models Lu

Generating Candidate Lexicon Entries The LCD is great but battery is so large. [The/DT LCD/NN is/VBZ great] but/CC [battery/NN is/VBZ so/RB large/JJ]./. SCREEN-great BATTERY-large [The/DT (LCD/NN):SCREEN is/VBZ great/JJ] but/CC [(battery/NN):BATTERY is/VBZ so/RB large/JJ]./. Candidates: Parsed: Input: Aspect Tagged: SCREEN-large SCREEN-great BATTERY-large … SCREEN-large SCREEN-great BATTERY-large … ?

Hypotheses Testing (1): Reviewer Consistency 49 Qg( ) - 1 Qg( ) 2 Qg( ) - 1 Qg( ) 3 Hypothesis 1: Reviewer Consistency is supported by data Difference in Review Quality Density From same reviewer From different reviewers (Cellphone) Lu

Hypotheses Testing (2-4): Social Network-based Consistencies 50 Qg( ) - Qg( ) B is not linked to A B trusts A B is co-cited with A B is linked to A BA Hypotheses 2-4: Social Network-based Consistencies supported by data Hypotheses 2-4: Social Network-based Consistencies supported by data Difference in Reviewer Quality Density (Cellphone) Lu