Opinion Detection by Transfer Learning 11-742 Information Retrieval Lab Grace Hui Yang Advised by Prof. Yiming Yang.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Improved TF-IDF Ranker
Fast Algorithms For Hierarchical Range Histogram Constructions
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Ziming Zhang*, Ze-Nian Li, Mark Drew School of Computing Science Simon Fraser University Vancouver, Canada {zza27, li, AdaMKL: A Novel.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
Product Feature Discovery and Ranking for Sentiment Analysis from Online Reviews. __________________________________________________________________________________________________.
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)
Speeding up multi-task learning Phong T Pham. Multi-task learning  Combine data from various data sources  Potentially exploit the inter-relation between.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
Scalable Text Mining with Sparse Generative Models
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Mining and Summarizing Customer Reviews
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Learning with Positive and Unlabeled Examples using Weighted Logistic Regression Wee Sun Lee National University of Singapore Bing Liu University of Illinois,
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Carmen Banea, Rada Mihalcea University of North Texas A Bootstrapping Method for Building Subjectivity Lexicons for Languages.
Non Negative Matrix Factorization
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Opinion Mining of Customer Feedback Data on the Web Presented By Dongjoo Lee, Intelligent Databases Systems Lab. 1 Dongjoo Lee School of Computer Science.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
NTU & MSRA Ming-Feng Tsai
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Extracting Opinion Topics for Chinese Opinions using Dependence Grammar Guang Qiu, Kangmiao Liu, Jiajun Bu*, Chun Chen, Zhiming Kang Reporter: Chia-Ying.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Semi-Supervised Clustering
Sentiment analysis algorithms and applications: A survey
Aspect-based sentiment analysis
Learning with information of features
An Overview of Concepts and Selected Techniques
Generally Discriminant Analysis
Topic: Semantic Text Mining
Presentation transcript:

Opinion Detection by Transfer Learning Information Retrieval Lab Grace Hui Yang Advised by Prof. Yiming Yang

Outline Introduction The Problem Transfer Learning by Constructing Informative Prior Datasets Evaluation Method Experimental Results Conclusion

Introduction TREC 2006 Blog Track –Opinion Detection Task Number: 851 "March of the Penguins" Description: Provide opinion of the film documentary "March of the Penguins". Narrative: Relevant documents should include opinions concerning the film documentary "March of the Penguins". Articles or comments about penguins outside the context of this film documentary are not relevant.

Opinion Detection Literature Review Researchers in Natural Language Processing (NLP) community –Turney (2002) : groups online words whose point mutual information close to "excellent" and "poor" –Riloff & Wiebe (2003): use a high-precision classifier to get high quality opinions and non-opinions, and then extract syntactic patterns. Repeat this process to bootstrap –Pang et al. (2002): treat opinion and sentiment detection and as a text classification problem Naive Bayes, Maximum Entropy, SVM +unigram pres. (82.9%) –Pang & Lee (2005): use Minicuts to cluster sentences based on their subjectivity and sentiment orientation. Researchers from data mining community –Morinaga et al. (2002) : use word polarity, syntactic pattern matching rules to extract opinions, PCA to create correspondence between the product names and keywords

Existing System Query Expansion Document Retrieval Binary Text Classification by Bayesian Logistic Regression

No Available Training Data Transfer Learning –Transfer knowledge over similar tasks but different domain –Generalize knowledge from limited training data –Discover underlying general structures across domains

Transfer Learning Literature Review Baxter(1997) and Thrun(1996): both used hierarchical Bayesian learning Lawrence and Platt (2004), Yu et al. (2005): also use hierarchical Bayesian models to learn hyper- parameters of Gaussian process Ando and Zhang (2005): proposed a framework for Gaussian logistic regression for text classification. Raina et al. (2006): continued this approach and built informative priors for Gaussian logistic regression

Transfer Learning The Approach presented in this project is Inspired by the work done by Raina, Ng & Koller (2006) on text classification Transferring common knowledge (word dependence) in similar tasks by constructing a informative prior in a Bayesian Logistic Regression Framework

Logistic Regression Framework Logistic regression assumes sigmoid-like data distribution To avoid overfitting, multivariate Gaussian prior is added on θ Maximum a posteriori (MAP) Estimation

Non-diagonal Covariance Zero-mean, equal variance Prior –Cannot capture relationship among words Zero-mean, non-diagonal covariance Prior –Model word dependency in covariance matrix’s off-diagonal entries

Pair-wised Covariance Covariance Definition: Given zero mean,

Get Covariance by MCMC Markov Chain Monte Carlo (MCMC) Sample V (V=4) small vocabularies with size S (S=5) containing the two words w i and w j corresponding to θ i and θ j. From each vocabulary, sample T (T=4) training sets with size Z(Z=3) to train an ordinary Log. Reg. model on labeled datasets

Get Covariance by MCMC Subtract a bootstrap estimation of the covariance due to randomness of training set change

Learning a Covariance Matrix Learning a single covariance for pairs of regression coefficients is NOT all we need Two Challenges: (1) Valid Covariance Matrix –A valid covariance matrix needs to be positive semi-definite (PSD) –Hermitian matrix (square, self-adjoint) with nonnegative eigen values. –Project the matrix on to a PSD cone

Learning a Covariance Matrix (2) Pair-wise calculations increase the complexity quadratically with vocabulary size –represent the word dependence as linear combination of underlying features –Learn the coefficients by Least Squared Error

Learning a Covariance Matrix By Joint Minimization λ is the trade-off coefficient between the two objectives. –As λ-> 0, only care about PSD cone –As λ-> 1, only care about word pair relationship –Set to 0.6

Solve the Joint Minimization Convex problem, converge to global minimum Fix Σ, minimize over ψ –Use Quadratic Program (QP) Solver Fix ψ, minimize over Σ –A special semi-definite programming (SDP) –Eigen decomposition and keep the nonnegative values

Feature Design Model word dependency –Wordnet synset –and? People do not always use the same general syntactic patterns to express opinion –"blah blah is good", –"awesome blah blah!"

Target-Opinion Word Pair Different opinion targets relate to different customary expression –A person is knowledgeable –A computer processor is fast –A computer processor is knowledgeable (ill) –A person is fast (ill) –A computer processor is running like a horse (word polarity test fails)

Target-Opinion Word Pair From training corpus, extract from a positive example –subject and object (excludes pronouns) “Melvin, pig” –subject and BE-predicate “lens, clear”, “base, heavy” –modifier and subject “good, coffee”, “interesting, movie”

Word Synonym Bridge vocabulary gap from training to testing –“This movie is good" in training corpus –"The film is really good" in the testing corpus

Feature Vector Log-co- occurrence Target-Opinion Synonym

Datasets Training Corpus –Movie reviews [Pang & Lee from Cornell] 10,000 sentences (5,000 opinions, 5,000 non- opinions) –Product reviews [Hu & Liu from UIC] 4,000+ sentences (2,034 opinions, 2,173 non- opinions. Digital camera, cell phone, DVD player, Jukebox, …

Datasets Test Corpus – TREC 2006 Blog corpus –3,201,002 articles (TREC reports 3,215,171) –December 2005 to February 2006 –Technorati, Bloglines, Blogpulse … For each topic, 5,000 passages are retrieved –Using Lemur as search engine –132,399 passages in total –2,648 passages per topic –Each passage 1-10 sentences ( less than 100 words)

Evaluation Method Precision at 11-pt recall level Mean average precision (MAP) Answers are provided by TREC qrels, –Document ids of documents containing an opinion Note that our system is developed for opinion detection at sentence level –An averaged score of all the sentences in a retrieved passages –Extract Unique document ids to compare with TREC qrels

Experimental Results Effects of Using Non-diagonal Prior Covariance –Baseline: Using movie reviews to train the Gaussian log. Reg. model with Prior ~N(0,σ 2 ) –Feature Selection: Using common word features in movie reviews and product reviews to train the Gaussian log. Reg. model with Prior ~N(0,σ 2 ) –Informative Prior:Using movie reviews to calculate prior covariance, train the Gaussian log. Reg. model with the informative prior ~N(0,Σ)

32% improvement

Experimental Results Effects of Feature Design –Baseline: Using movie reviews to train the Gaussian log. Reg. model with Prior ~N(0,σ 2 ), bi-gram model –Transfer Learning Using Synonyms: Using informative prior ~N(0,Σ) –Transfer Learning Using Target-Opinion pairs: informative prior ~N(0,Σ) –Transfer Learning Using Both: informative prior ~N(0,Σ)

A good feature

Experimental Results Effects on External Dataset Selection Negative Effect of Transfer Learning

Why Negative Effect Occurs? Movie covers more general topics Product only share 23% topics

Conclusion Applying Transfer Learning in Opinion Detection Transfer Learning by Informative Prior improves brutal transfer learning by 32% Discovering a good feature for opinion detection –Target-Opinion pair Need to be careful when choosing external datasets to help

Thank You!