A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh 10511 Paper.

Slides:

Advertisements

Similar presentations

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.

Machine learning continued Image source:

© author(s) of these slides including research results from the KOM research network and TU Darmstadt; otherwise it is specified at the respective slide.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Text Classification With Support Vector Machines

Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

SVM Active Learning with Application to Image Retrieval

Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?

CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.

Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.

Text Classification With Labeled and Unlabeled Data Presenter: Aleksandar Milisic Supervisor: Dr. David Albrecht.

Introduction to Machine Learning Approach Lecture 5.

Conditional Random Fields   A form of discriminative modelling   Has been used successfully in various domains such as part of speech tagging and other.

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

Smart RSS Aggregator A text classification problem Alban Scholer & Markus Kirsten 2005.

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu.

A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.

Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

1 CMSC 671 Fall 2010 Class #24 – Wednesday, November 24.

Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

Conditional Random Fields for ASR Jeremy Morris July 25, 2006.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Blog Summarization We have built a blog summarization system to assist people in getting opinions from the blogs. After identifying topic-relevant sentences,

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.

Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

NTU & MSRA Ming-Feng Tsai

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

UROP Research Update Citation Function Classification Eric Yulianto A B 22 February 2013.

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.

Data Mining and Text Mining. The Standard Data Mining process.

Support Vector Machine Slides from Andrew Moore and Mingyue Tan.

Language Identification and Part-of-Speech Tagging

Support Vector Machines

Semi-Supervised Clustering

Deep Learning Amin Sobhani.

Conditional Random Fields for ASR

Machine Learning Week 2.

Class #212 – Thursday, November 12

Support Vector Machines

Stance Classification of Ideological Debates

Presentation transcript:

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper Presentation: CS671

Overview The paper talks about annotation of documents according to certain broad classes such as objective, methodology, results obtained, etc. The paper basically investigates the performance of weakly-supervised learning for Argumentative Zoning(AZ) of scientific abstracts. Uses weakly-supervised learning which is less expensive as compared to fully supervised approach. 2

What is Argumentative Zoning?? AZ is an approach to information structure which provides an analysis of the rhetorical progression of the scientific argument in a document (Teufel and Moens, 2002). Because of the utility of this method in several domains (such as summarization, computational linguistics), weakly-supervised approach is much more practical. 3

Architecture Data Guo et al. (2010) provide a corpus of 1000 biomedical abstracts (consisting of 7985 sentences and words) annotated according to three schemes of information structure(section names, AZ and Core Scientific Concepts) The paper uses Argumentative Zoning (Mizuta et al., 2006) According to Cohens kappa (Cohen, 1960) the inter-annotator agreement is relatively high: kappa(k) =

Architecture Methods Comparison between supervised classifier (SVM and CRF) with four weakly supervised classifiers: Two based on semi-supervised learning ( transductive SVM and semi- supervised CRF) and Two on active learning (Active SVM alone and in combination with self- training). 5

Zones in AZ 6

An example of an annotated abstract 7

Methodology Feature Extraction Documents were first identified by their features which included Zones (e.g. abstracts were divided into ten parts where zones typically occur) Words, Bi-grams, Verbs, Verb Class, POS, Grammatical Relation, Subject & Object, Voice of Verbs. These were extracted using tools such as tokenizer(detects boundaries in a sentence), C&C Tool(POS tagging, lemmatization and parsing), unsupervised spectral clustering method(to acquire verb classes). The lemma output was used to create Word, Bi-gram and Verb features. 8

SVM(Support Vector Machine) Supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output, making it a non- probabilistic binary linear classifier. 9

CRF(Conditional Random Field) A form of discriminative modelling Has been used successfully in various domains such as part of speech tagging and other Natural Language Processing tasks Processes evidence bottom-up Combines multiple features of the data Builds the probability P( sequence | data) 10

Results Only one method(ASVM) identifies six out of the seven possible categories. Other methods identify five categories. RESULTS has the highest amount of data and OBJECTIVE has the minimum. The LOCATION feature is found to be the most important feature for ASSVM. Voice, Verb class and GR contribute to general performance. Least helpful features are Word, Bi-gram and Verb because they suffer from sparse data problems. 11

Results 12

References Yufan Guo, Anna Korhonen, and Thierry Poibeau. A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 273–283, Simone Teufel and Marc Moens. Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status. Comput. Linguist., 28(4):409–445, December

Thank You!!! 14

Future Work The approach to active learning could be improved in various ways by experimenting with more complex query strategies such as margin sampling algorithm by (Scheffer et al., 2001) and query-by-committee algorithm by (Seung et al., 1992). Also, looking for other optimal features could improve the result a lot. It could also be very interesting to evaluate the usefulness of weakly-supervised identification of information structure for NLP tasks such as summarization and information extraction and for practical tasks such as manual review of scientific papers for research purposes. 15

Copyright © 2001, 2003, Andrew W. Moore Maximum Margin f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against Linear SVM

CRF(Conditional Random Field) Each attribute of the data we are trying to model fits into a feature function that associates the attribute and a possible label A positive value if the attribute appears in the data A zero value if the attribute is not in the data Each feature function carries a weight that gives the strength of that feature function for the proposed label High positive weights indicate a good association between the feature and the proposed label High negative weights indicate a negative association between the feature and the proposed label Weights close to zero indicate the feature has little or no impact on the identity of the label 17

Discussion Almost all the methods performed as they have been performing in other domains and on work done by other authors. But TSVM did not perform better than SVM with the same amount of labelled data. This could be due to higher dimensional data in this work as compared to other works. SSCRF did not perform as expected on the data may be due to less number of labelled and unlabelled instances. 18

Methodology Machine Learning Methods SVM and CRF were used as Supervised methods on the data obtained after feature extraction. Active SVM: Starts with a small amount of labelled data, and iteratively chooses a proportion of unlabelled data for which SVM has less confidence to be labelled and used in the next round of learning, Active SVM with self-training Transductive SVM: Takes advantage of both labelled and unlabelled data Semi-supervised CRF were used as Weakly-Supervised Methods. 19

Results With 10% training data, ASSVM performs best with 81% accuracy and macro F-score of.76. ASVM performs with accuracy of 80% and F-score of.75. Both of them outperform supervised SVM. TSVM is the worst performing SVM-based method with an accuracy of 76% and F-score of.73 which is less than supervised SVM. But it outperforms both CRF-based methods. 20