1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

Slides:



Advertisements
Similar presentations
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.
Sentiment Analysis on Twitter Data
Farag Saad i-KNOW 2014 Graz- Austria,
1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Subjectivity and Sentiment Analysis of Arabic Tweets with Limited Resources Supervisor Dr. Verena Rieser Presented By ESHRAG REFAEE OSACT 27 May 2014.
Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.
Sarcasm Detection on Twitter A Behavioral Modeling Approach
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Joint Sentiment/Topic Model for Sentiment Analysis Chenghua Lin & Yulan He CIKM09.
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, Bing Qin
Text Classification With Labeled and Unlabeled Data Presenter: Aleksandar Milisic Supervisor: Dr. David Albrecht.
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.
Carmen Banea, Rada Mihalcea University of North Texas A Bootstrapping Method for Building Subjectivity Lexicons for Languages.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.
Named Entity Recognition based on Bilingual Co-training Li Yegang School of Computer, BIT.
Employing Active Learning to Cross-Lingual Sentiment Classification with Data Quality Controlling Shoushan Li †‡ Rong Wang † Huanhuan Liu † Chu-Ren Huang.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
A MIXED MODEL FOR CROSS LINGUAL OPINION ANALYSIS Lin Gui, Ruifeng Xu, Jun Xu, Li Yuan, Yuanlin Yao, Jiyun Zhou, Shuwei Wang, Qiaoyun Qiu, Ricky Chenug.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
Learning Multilingual Subjective Language via Cross-Lingual Projections Mihalcea, Banea, and Wiebe ACL 2007 NLG Lab Seminar 4/11/2008.
Word Translation Disambiguation Using Bilingial Bootsrapping Paper written by Hang Li and Cong Li, Microsoft Research Asia Presented by Sarah Hunter.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification John Blitzer, Mark Dredze and Fernando Pereira University.
FEISGILTT Dublin 2014 Yves Savourel ENLASO Corporation QuEst Integration in Okapi This presentation was made possible by This project is sponsored by the.
Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales Bo Pang and Lillian Lee Cornell University Carnegie.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Text Annotation By: Harika kode Bala S Divakaruni.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Classification using Co-Training
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Extracting Opinion Topics for Chinese Opinions using Dependence Grammar Guang Qiu, Kangmiao Liu, Jiajun Bu*, Chun Chen, Zhiming Kang Reporter: Chia-Ying.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Bridging Domains Using World Wide Knowledge for Transfer Learning
Sentiment analysis algorithms and applications: A survey
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Statistical NLP: Lecture 9
iSRD Spam Review Detection with Imbalanced Data Distributions
Statistical Machine Translation Papers from COLING 2004
Presentation transcript:

1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009

2 Research Gap Opinion mining has drawn much attention recently –Sentiment classification (POS, NEG, NEU) –Subjectivity analysis (subjective, objective) Annotated corpora are most important for training However, most of them are English data Corpora for other languages, including Chinese, are rare

3 Related Work Pilot studies on cross-lingual subjectivity classification Mihalcea et al. ACL 2007 –Bilingual lexicon and manually translated parallel corpus Banea et al. EMNLP 2008 –English annotation tool + MT –Build Romanian annotation tool –Not much loss compared to human translation –Suggesting MT is a viable way

4 Problem Definition Perform cross-lingual sentiment classification –Either positive or negative Source: English Target: Chinese Leverage –8000 Labeled English product reviews –1000 Unlabeled Chinese product reviews –Machine translation (MT) Derive –Sentiment classification tools for Chinese product reviews

5 Framework Training Phase Classification Phase

6 Training Phase (1) Machine Translation

7 Two Views Chinese ViewEnglish View

8 Training Phase (2) The Co-Training Approach English View

9 Label the unlabeled data (English) English Classifier with SVM Label E en Top p positive Top n negative most confident review

10 Label the unlabeled data (Chinese) Chinese Classifier with SVM E cn Top p positive Top n negative most confident review Label

11 Remove from Unlabeled Data Finish one Iteration E en Top p positive Top n negative most confident review E cn Top p positive Top n negative most confident review ∪ Train again

12 Setting #Iteration = 40 p = n = 5

13 Classification Phase Chinese Classifier English Classifier average [-1, 1]

14 Experiment Setting (Training) 8000 Amazon product reviews positive 4000 negative Books, DVDs, electronics 1000 product reviews from mp3 player, mobile phones, DC

15 Experiment Setting (Testing) 886 Chinese product reviews from –451 positive, 435 negative –Different from unlabeled training data (outside testing)

16 Baseline SVM –Use only labeled data TSVM (Transductive SVM) –Joachims, 1999 –Use both labeled and unlabeled

17 SVM Baselines SVM(EN) SVM(CN)

18 SVM Baselines SVM(ENCN1)

19 SVM Baselines SVM(ENCN2) average

20 TSVM Baselines TSVM(EN) TSVM(CN)

21 TSVM Baselines TSVM(ENCN1)

22 TSVM Baselines TSVM(ENCN2) average

23 Result: Method Comparison (1)

24 Result: Method Comparison (2) Performance on Each Side SVM(EN) TSVM(EN) CoTrain(EN)

25 Result: Method Comparison (3) Accuracy SVM(EN)0.738 TSVM(EN)0.769 CoTrain(EN)0.790 Accuracy SVM(CN)0.771 TSVM(CN)0.767 CoTrain(CN)0.775 CoTrain make better use of unlabeled Chinese reviews than TSVM

26 Result: Iteration Number Outperform TSVM(ENCN2) after 20 iterations

27 Result: Balance of (p,n) Unbalanced examples hurt the performance badly

28 Conclusion & Comment Co-Training approach for cross-lingual sentiment classification Future Work –Translated and natural text have different feature distribution –Domain adaptation algorithm (ex. structural correspondence learning) for linking them

29 Comment Leverage word (phrase) alignment in translated text