© 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

Slides:

Advertisements

Similar presentations

Statistical modelling of MT output corpora for Information Extraction.

Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.

Copyright ©2011 Commonwealth of Pennsylvania 3.

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

A Metric for Software Readability by Raymond P.L. Buse and Westley R. Weimer Presenters: John and Suman.

MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.

NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Department of Computer.

Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.

Keyword extraction for metadata annotation of Learning Objects Lothar Lemnitzer, Paola Monachesi RANLP, Borovets 2007.

Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.

1 Language Model (LM) LING 570 Fei Xia Week 4: 10/21/2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA A A.

1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.

An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,

Scalable Text Mining with Sparse Generative Models

CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.

CORRELATIO NAL RESEARCH METHOD. The researcher wanted to determine if there is a significant relationship between the nursing personnel characteristics.

Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.

Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.

Inferential statistics Hypothesis testing. Questions statistics can help us answer Is the mean score (or variance) for a given population different from.

What is Readability?  A characteristic of text documents..  “the sum total of all those elements within a given piece of printed material that affect.

Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate 1 Xiaoqiang Luo 2 Siddharth Patwardhan 2 Martin Franz 2 Radu Florian 2.

Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

A Comparison of Features for Automatic Readability Assessment Lijun Feng 1 Matt Huenerfauth 1 Martin Jansche 2 No´emie Elhadad 3 1 City University of New.

David L. Chen Fast Online Lexicon Learning for Grounded Language Acquisition The 50th Annual Meeting of the Association for Computational Linguistics (ACL)

2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

1 Linmei HU 1, Juanzi LI 1, Zhihui LI 2, Chao SHAO 1, and Zhixing LI 1 1 Knowledge Engineering Group, Dept. of Computer Science and Technology, Tsinghua.

Overview of Text Complexity Text complexity is defined by: Qualitative 2.Qualitative measures – levels of meaning, structure, language conventionality.

Experimental Evaluation of Learning Algorithms Part 1.

Literacy in the Content Areas - Outcomes Reflect on Call for Change follow up tasks. Identify text features. Identify the readability statistics for a.

Date: 2013/8/27 Author: Shinya Tanaka, Adam Jatowt, Makoto P. Kato, Katsumi Tanaka Source: WSDM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Estimating.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,

1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.

Individual Differences in Human-Computer Interaction HMI Yun Hwan Kang.

Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.

1 Collaborative Filtering & Content-Based Recommending CS 290N. T. Yang Slides based on R. Mooney at UT Austin.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.

Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.

Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?

2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.

Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.

Statistical Machine Translation Part II: Word Alignments and EM

Sentiment analysis algorithms and applications: A survey

Automated Essay Scoring The IntelliMetric® Way

Authorship Attribution Using Probabilistic Context-Free Grammars

Erasmus University Rotterdam

Aspect-based sentiment analysis

Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov

Learning to Sportscast: A Test of Grounded Language Acquisition

Measuring Complexity of Web Pages Using Gate

Presentation transcript:

© 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz, Radu Florian, Raymond J. Mooney, Salim Roukos, Chris Welty Presented by: Young-Suk Lee The University of Texas at Austin IBM T. J. Watson Research Center

© 2010 IBM Corporation 2 Outline  Problem definition and motivations  Data  System and Features  Experimental Results

© 2010 IBM Corporation 3 Readability  DARPA machine reading program (MRP)  “Readability is defined as a subjective judgment of how easily a reader can extract the information the writer or the speaker intended to convey.”  Task: given a general document, assign a readability score (1 to 5)

© 2010 IBM Corporation 4 Sample Passage: High Readability  Industrial agriculture has grown increasingly paradoxical, replacing natural processes with synthetic practices and treating farms as factories. Consequently, food has become a marketing entity rather than a necessity to sustain life. …

© 2010 IBM Corporation 5 Sample Passage: Low Readability  The word of the prince of believers may Allah God him Talk of gold this at present Reflections on the word of the prince of believers may Allah pleased with him, Prince of Believers May Allah be pleased with him: …

© 2010 IBM Corporation 6  Remove less readable documents from web-search  Filter out less readable documents before extracting knowledge  Select reading materials Readability: Motivations

© 2010 IBM Corporation 7  Predicting readability: conveying message –vs. reading difficulty (grade 1 to 12)  Document sources: multiple genres –vs. single domain, genre or reader group Contrast With Other Work

© 2010 IBM Corporation 8 Outline  Problem definition and motivations  Data  System and Features  Experimental Results

© 2010 IBM Corporation 9 Data  390 training documents  Each document: –8 expert ratings: [1,..,5] –6-10 “novice” ratings: [1,…,5]  Ratings differ by genre –Nwire and wiki documents: high –MT documents: low Genre #Docs Expert Rating Novice Rating nwire wiki weblog q-trans news-grp ccap mt

© 2010 IBM Corporation 10 Data MT docs ng: newsgroup Speech: closed caption

© 2010 IBM Corporation 11 Outline  Problem definition and motivations  Data  System and Features  Experimental Results

© 2010 IBM Corporation 12 System Overview Training Docs Preprocessing LM score Parser score … Regression (WEKA) Test Doc Sys. Rating

© 2010 IBM Corporation 13 Syntactical Features  Using Sundance [Riloff &Phillips 04] and English Slot Grammer parsers – Ratio of sentences without verbs – Avg. # clauses/per sentence – Avg. #NPs, #VPs, #PPs, #Phrases/sent, – Failure rate of ESG parser –..

© 2010 IBM Corporation 14 Language Model (LM) Features  Normalized document probability: – by a 5-gram generic LM  Genre-specific LMs – Data readily available for those genres – Certain genre is a strong predictor of readability

© 2010 IBM Corporation 15 Genre-based Language Model Features  Perplexity of genre-specific LM (M j ):  Genre posterior perplexity (relative probability compared to all G genres): Document History words Word

© 2010 IBM Corporation 16 Lexical Features  Fraction of known words using dictionary and gazetteer of names  Out-of-vocabulary (OOV) rates using genre-based corpora  Ratio of function words (“the”, “of” etc.)  Ratio of pronouns

© 2010 IBM Corporation 17 Experiments: Evaluation Metric  Pearson correlation coefficient –Mean expert judge rating as the gold-standard  To compare with novice judges: –A sampling distribution representing performance of novice judges was generated –Distribution mean and upper critical value were computed  Correlation between system and mean expert ratings –If above the upper critical value: system significantly (statistically) better than novice judges

© 2010 IBM Corporation 18 Outline  Problem definition and motivations  Data  System and Features  Experimental Results

© 2010 IBM Corporation 19 Experiments: Methodology  Compared regression algorithms  Feature ablation experiments  Results: 13-fold cross-validation –Balanced genre representation

© 2010 IBM Corporation 20 Results: Regression Algorithms Choice of regression algorithm is not critical. Correlation Distribution Mean Upper Critical Value

© 2010 IBM Corporation 21 Results: Feature Sets Correlation Distribution Mean Upper Critical Value Each feature set contributes, LM-based feature set: most useful.

© 2010 IBM Corporation 22 Results: Genre-based Feature Sets Correlation Distribution Mean Upper Critical Value Genre-independent features: better than novice mean; Genre-specific features: significantly improve performance.

© 2010 IBM Corporation 23 Results: Individual Feature Sets Correlation Distribution Mean Upper Critical Value Posterior perplexities: best feature set, but no single feature set is indispensable. System using all features

© 2010 IBM Corporation 24 Official Evaluation  Conducted by SAIC on behalf of DARPA  Three teams participated  Evaluation task: Predict readability of 150 test documents using the 390 documents for training

© 2010 IBM Corporation 25 Official Evaluation Results Our system performed favorably and scored better than the upper critical value. Upper Critical Value Correlation Novice mean Sig. better than human at p<0.0001

© 2010 IBM Corporation 26  Readability system –Regression over syntactical, lexical and language model features  All features contribute, but LM features are most useful  System is significantly (statistically) better than novice human judges Conclusions

© 2010 IBM Corporation 27 Questions?? Thank You!