A CORPUS-BASED STUDY OF REFERENTIAL CHOICE: Multiplicity of factors and machine learning techniques Andrej A. Kibrik, Grigorij B. Dobrov, Mariya V. Khudyakova,

Slides:

Advertisements

Similar presentations

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Advertisements

REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik Night Whites.

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.

DECISION TREES. Decision trees  One possible representation for hypotheses.

Random Forest Predrag Radenković 3237/10

Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,

My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.

Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.

Measuring Referring Expressions in a Story Context Phyllis Schneider, Speech Pathology & Audiology, University of Alberta Denyse Hayward, University of.

Generation of Referring Expressions: Modeling Partner Effects Surabhi Gupta Advisor: Amanda Stent Department of Computer Science.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.

Corpus 06 Discourse Characteristics. Reasons why discourse studies are not corpus-based: 1. Many discourse features cannot be identified automatically.

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

Ensemble Learning: An Introduction

Scientific method - 1 Scientific method is a body of techniques for investigating phenomena and acquiring new knowledge, as well as for correcting and.

Chapter 5 Data mining : A Closer Look.

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.

Chapter 1: Introduction to Statistics

Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.

Short Introduction to Machine Learning Instructor: Rada Mihalcea.

Chapter Eight The Concept of Measurement and Attitude Scales

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

A Comparison of Features for Automatic Readability Assessment Lijun Feng 1 Matt Huenerfauth 1 Martin Jansche 2 No´emie Elhadad 3 1 City University of New.

Theory Revision Chris Murphy. The Problem Sometimes we: – Have theories for existing data that do not match new data – Do not want to repeat learning.

Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.

Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A data mining approach to the prediction of corporate failure.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Capturing patterns of linguistic interaction in a parsed corpus A methodological case study Sean Wallis Survey of English Usage University College London.

REFERENTIAL CHOICE AS A PROBABILISTIC MULTI-FACTORIAL PROCESS Andrej A. Kibrik, Grigorij B. Dobrov, Natalia V. Loukachevitch, Dmitrij A. Zalmanov

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.

Classification Ensemble Methods 1

Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

#1 Make sense of problems and persevere in solving them How would you describe the problem in your own words? How would you describe what you are trying.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Linear Models & Clustering Presented by Kwak, Nam-ju 1.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

Methods of multivariate analysis Ing. Jozef Palkovič, PhD.

Data Science Credibility: Evaluating What’s Been Learned

Ensemble Classifiers.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning: Ensemble Methods

Neural Machine Translation

Erasmus University Rotterdam

Boosting and Additive Trees

Introduction to Data Mining, 2nd Edition by

Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning.

Somi Jacob and Christian Bach

Chapter 7: Transformations

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

A CORPUS-BASED STUDY OF REFERENTIAL CHOICE: Multiplicity of factors and machine learning techniques Andrej A. Kibrik, Grigorij B. Dobrov, Mariya V. Khudyakova, Natalia V. Loukachevitch, and Aleksandr Pechenyj

2 22 Referential choice in discourse  When a speaker needs to mention (or refer to) a specific, definite referent, s/he chooses between several options, including:  Full noun phrase (NP) Proper name (e.g. Pushkin) Common noun (with or without modifiers) = definite description (e.g. the poet)  Reduced NP, particularly a third person pronoun (e.g. he)

3 Example  Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done  How is this choice made?  Why does speaker/writer use a certain referential option in the given context? Full NP Pronoun antecedent anaphors

4 Why is this important?  Reference is among the most basic cognitive operations performed by language users  Reference constitutes a lion’s share of all information in natural communication  Consider text manipulation according to the method of Biber et al. 1999:

5 Referential expressions marked in green  Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done

6 Referential expressions removed  Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done

7 Referential expressions kept  Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done

8

9 99 Plan of talk  I. Referential choice as a multi-factorial process  II. The RefRhet corpus  III. Machine learning-based approach  IV. The probabilistic character of referential choice

10 I. MULTI-FACTORIAL CHARACTER OF REFERENTIAL CHOICE  Multiple factors of referential choice  Distance to antecedent  Along the linear discourse structure (Givón)  Along the hierarchical discourse structure (Fox)  Antecedent role (Centering theory)  Referent animacy (Dahl)  Protagonisthood (Grimes) Properties of the discourse context Properties of the referent

11 What shall we do with that?  Many authors have tried to emphasize one of these factors in particular studies  But none of those factors can explain everything: sometimes factor A is more relevant, sometimes factor B, etc.  One must recognize the inherently multi-factorial character of referential choice  Factors must be somehow integrated  Previous attempts of such integration  The calculative model (Kibrik 1996, 1999)  The neural networks study (Gruening and Kibrik 2005)

12 The calculative approach  Each value of each factor is attributed a numerical value  For each referent occurrence, all factor values are easily identifiable, and therefore all the corresponding numerical values are readily available  At every pointŒ in discourse all factors’ contributions are summed and give rise to an integral characterization – the referent’s activation score  Activation score can be understood  In a more cognitive way, that is as the referent’s status with respect to the speaker’s working memory  In a more superficial way, as a conventional integral characterization of the referent vis-à-vis referential choice  Activation score predetermines referential choice  Low  full NP  Medium  full or reduced NP  High  reduced NP

13 Multi-factorial model of referential choice (Kibrik 1999) Various properties of the referent or discourse context Referent’s activation score Referential choice Relevant factors

14 The neural networks approach  Neural networks  Machine learning algorithm Automatic selection of factors’ weights Automatic reduction of the number of factors («pruning»)  However: Small data set Single method of machine learning Interaction between factors remains covert  Hence a new study  Large corpus  Implementation of several machine learning methods  Statistical model of referential choice

15 II. THE RefRhet CORPUS  English  Business prose  Initial material – the RST Discourse Treebank  Annotated for hierarchical discourse structure  385 articles from Wall Street Journal  The added component – referential annotation  The RefRhet corpus  About referential expressions  157 texts are annotated twice  193 texts are annotated once  Why this particular corpus?

16 Example of a hierarchical graph, with rhetorical distances RhD = 1 LinD = 4 LinD = RhD = 2

17 Scheme of referential annotation  The ММАХ2 program  Krasavina and Chiarcos 2007  All markables are annotated, including:  Referential expressions  Their antecedents  Coreference relations are annotated  Features of referents and context are annotated that can potentially be factors of referential choice

18

19 Work on referential annotation  O. Krasavina  A. Antonova  D. Zalmanov  A. Linnik  M. Khudyakova  Students of the Department of Theoretical and Applied Linguistics, MSU

20 Current state of the RefRhet referential annotation  2/3 completed  Further results are based on the following data:  247 texts  110 thousand words  markables  4291 reliable pairs «anaphor – antecedent» Proper names — 43% Definite descriptions — 26% Pronouns — 31% 69%

21 Factors of referential choice (2010)  Properties of the referent:  Animacy  Protagonisthood  Properties of the antecedent:  Type of syntactic phrase (phrase_type)  Grammatical role (gramm_role)  Form of referential expression (np_form, def_np_form)  Whether it belongs to direct speech or not (dir_speech)

22 Factors of referential choice (2010)  Properties of the anaphor:  First vs. nonfirst mention in discourse (referentiality)  Type of syntactic phrase (phrase_type)  Grammatical role (gramm_role)  Whether it belongs to direct speech or not (dir_speech)  Distance between the anaphor and the antecedent:  Distance in words  Distance in markables  Linear distance in clauses  Hierarchical distance in elementary discourse units

23 Factors 2011  Gender and number (agreement): masculine, feminine, neuter, plural  Antecedent length, in words  Number of markables from the anaphor back to the nearest full NP antecedent  Number of referent mention in the referential chain  Distance in sentences  Distance in paragraphs

24 III. MACHINE LEARNING: TECHNIQUES AND RESULTS  Independent variables:  All potential activation factors implemented in corpus annotation  Dependent variable:  Form of referential expression (np_form)  Binary prediction:  Full NP vs. pronouns  Three-way prediction:  Definite description vs. proper name vs. pronoun  Accuracy maximization:  Ratio of correct predictions to the overall number of instances

25 Machine learning methods (Weka, a data mining system)  Easily interpretable methods:  Logical algorithms Decision trees (C4.5) Decision rules (JRip)  Logistic regression  Quality control – the cross-validation method

26 Examples of decision rules generated by the JRip algorithm  (Antecedent’s grammatical role = subject) & (Hierarchical distance ≤ 1.5) & (Distance in words ≤ 7) => pronoun  (Animate) & (Distance in markables ≥ 2) & (Distance in words ≤ 11) => pronoun 26

results with single machine-learning algorithms  Accuracy  Binary prediction:  logistic regression – 85.6%  logical algorithms – up to 84.5%  Three-way prediction:  logistic regression – 76%  logical algorithms – up to 74.3% 27

28 Composition of classifiers: boosting  Base algorithm (C4.5 Decision trees)  Iterative process  Each additional classifier applies to the objects that were not properly classified by the already constructed composition  At each iteration the weights of each wrongly classified object increase, so that the new classifier focuses on such objects

29 Composition of classifiers: bagging  Base algorithm (C4.5 Decision trees)  Bagging randomly selects a subset of the training samples to train the base algorithm  Set of algorithms built on different, potentially intersecting, training subsamples  A decision on classification is done through a voting procedure in which all the constructed classifiers take part

30 Binary prediction: full noun phrase vs. pronoun AlgorithmAccuracy 2010 Accuracy 2011 Logistic regression Decision tree algorithm Deciding rules algorithm Boosting Bagging 85.6% 84.3% 84.5% 87.0% 86.3% 86.2% 89.9% 87.6%

31 Three-way prediction: description vs. proper name vs. pronoun AlgorithmAccuracy 2010 Accuracy 2011 Logistic regression Decision tree algorithm Deciding rules algorithm Boosting Bagging 76.0% 74.3% 72.5% 77.4% 76.7% 75.4% 80.7% 79.5%

32 Comparison of single- and multi-factor accuracy FeatureBinary predictionThree-way prediction The largest class69%43% Distance in words76%55% Hierarchical distance74.8%53.5% Anaphor’s grammatical role 70%45.2% Anaphor in direct speech 70%43.8% Animate71.5%47.3% Combination of factors 89.9%80.7%

33 Significance of factors in the three- way prediction Factors Accuracy All factors, including the newly added ones without the anaphor’s grammatical role without the antecedent’s grammatical role without grammatical role without the antecedent’s referential form without protagonism without animacy 80.7% 79.3% 80.2% 79.2% 77.0% 80.0% 80.68%

34 Significance of factors in the three-way task of referential choice (continued) Factors – distances (6) Accuracy All factors, including the newly added ones without all distances - except for rhetorical distance only - except for the distance in words only - except for the distances in words and paragraphs - except for the distances in words and sentences - except for rhetorical distance and the distances in words and sentences - except for the distances in words, markables, and paragraphs 80.7% 73.5% 74.9% 79.0% 79.5% 79.7% 80.47%

35 IV. REFERENTIAL CHOICE IS A PROBABILISTIC PROCESS  According to Kibrik 1999 Potential referential expressions Actual referential expressions Full NP only (19%) Full NP (49%) Full NP, ?pronoun (21 %) Pronoun or full NP (28%) Pronoun (51%) Pronoun, ?full NP (23%) Pronoun only (9%)

36 Probabilistic character of referential choice in the RefRhet study  Prediction of referential choice cannot be fully deterministic  There is a class of instances in which referential choice is random  It is important to tune up the model so that it could process such instances in a special manner  We are beginning to explore this problem  Logistic regression generates estimates of probability for each referential option  This estimate of probability can be interpreted as the activation score from the earlier model

37 Probabilistic multi-factorial model of referential choice Activation score = probability of using a certain referential expression Referential choice Relevant factors Various properties of the referent or discourse context

38 Conclusions about the RefRhet study  Quantity: Large corpus of referential expressions  Quality: A high level of accurate prediction is already attained  And we keep working!  Theoretical significance: the following fundamental properties of referential choice are addressed:  Multi-factorial character of referential choice  Contribution of individual factors, assessed automatically statistically in a variety of ways  Probabilistic character of referential choice  This approach can be applied to a wide range of linguistic and other behavioral choices

39 Thank you in the CML languages  cпасибо  благодаря  хвала  mulţumesc  ευχαριστώ

40 5 th International conference on cognitive science See Abstract submission: between October 1 and November 15