Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

Similar presentations


Presentation on theme: "© 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,"— Presentation transcript:

1 © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz, Radu Florian, Raymond J. Mooney, Salim Roukos, Chris Welty Presented by: Young-Suk Lee The University of Texas at Austin IBM T. J. Watson Research Center

2 © 2010 IBM Corporation 2 Outline  Problem definition and motivations  Data  System and Features  Experimental Results

3 © 2010 IBM Corporation 3 Readability  DARPA machine reading program (MRP)  “Readability is defined as a subjective judgment of how easily a reader can extract the information the writer or the speaker intended to convey.”  Task: given a general document, assign a readability score (1 to 5)

4 © 2010 IBM Corporation 4 Sample Passage: High Readability  Industrial agriculture has grown increasingly paradoxical, replacing natural processes with synthetic practices and treating farms as factories. Consequently, food has become a marketing entity rather than a necessity to sustain life. …

5 © 2010 IBM Corporation 5 Sample Passage: Low Readability  The word of the prince of believers may Allah God him Talk of gold this at present Reflections on the word of the prince of believers may Allah pleased with him, Prince of Believers May Allah be pleased with him: …

6 © 2010 IBM Corporation 6  Remove less readable documents from web-search  Filter out less readable documents before extracting knowledge  Select reading materials Readability: Motivations

7 © 2010 IBM Corporation 7  Predicting readability: conveying message –vs. reading difficulty (grade 1 to 12)  Document sources: multiple genres –vs. single domain, genre or reader group Contrast With Other Work

8 © 2010 IBM Corporation 8 Outline  Problem definition and motivations  Data  System and Features  Experimental Results

9 © 2010 IBM Corporation 9 Data  390 training documents  Each document: –8 expert ratings: [1,..,5] –6-10 “novice” ratings: [1,…,5]  Ratings differ by genre –Nwire and wiki documents: high –MT documents: low Genre #Docs Expert Rating Novice Rating nwire 56 4.934.23 wiki 56 4.834.13 weblog 55 4.463.75 q-trans 56 4.473.83 news-grp 55 4.263.34 ccap 56 4.133.53 mt 56 2.381.92

10 © 2010 IBM Corporation 10 Data MT docs ng: newsgroup Speech: closed caption

11 © 2010 IBM Corporation 11 Outline  Problem definition and motivations  Data  System and Features  Experimental Results

12 © 2010 IBM Corporation 12 System Overview Training Docs Preprocessing LM score Parser score … Regression (WEKA) Test Doc Sys. Rating

13 © 2010 IBM Corporation 13 Syntactical Features  Using Sundance [Riloff &Phillips 04] and English Slot Grammer parsers – Ratio of sentences without verbs – Avg. # clauses/per sentence – Avg. #NPs, #VPs, #PPs, #Phrases/sent, – Failure rate of ESG parser –..

14 © 2010 IBM Corporation 14 Language Model (LM) Features  Normalized document probability: – by a 5-gram generic LM  Genre-specific LMs – Data readily available for those genres – Certain genre is a strong predictor of readability

15 © 2010 IBM Corporation 15 Genre-based Language Model Features  Perplexity of genre-specific LM (M j ):  Genre posterior perplexity (relative probability compared to all G genres): Document History words Word

16 © 2010 IBM Corporation 16 Lexical Features  Fraction of known words using dictionary and gazetteer of names  Out-of-vocabulary (OOV) rates using genre-based corpora  Ratio of function words (“the”, “of” etc.)  Ratio of pronouns

17 © 2010 IBM Corporation 17 Experiments: Evaluation Metric  Pearson correlation coefficient –Mean expert judge rating as the gold-standard  To compare with novice judges: –A sampling distribution representing performance of novice judges was generated –Distribution mean and upper critical value were computed  Correlation between system and mean expert ratings –If above the upper critical value: system significantly (statistically) better than novice judges

18 © 2010 IBM Corporation 18 Outline  Problem definition and motivations  Data  System and Features  Experimental Results

19 © 2010 IBM Corporation 19 Experiments: Methodology  Compared regression algorithms  Feature ablation experiments  Results: 13-fold cross-validation –Balanced genre representation

20 © 2010 IBM Corporation 20 Results: Regression Algorithms Choice of regression algorithm is not critical. Correlation Distribution Mean Upper Critical Value

21 © 2010 IBM Corporation 21 Results: Feature Sets Correlation Distribution Mean Upper Critical Value Each feature set contributes, LM-based feature set: most useful.

22 © 2010 IBM Corporation 22 Results: Genre-based Feature Sets Correlation Distribution Mean Upper Critical Value Genre-independent features: better than novice mean; Genre-specific features: significantly improve performance.

23 © 2010 IBM Corporation 23 Results: Individual Feature Sets Correlation Distribution Mean Upper Critical Value Posterior perplexities: best feature set, but no single feature set is indispensable. System using all features

24 © 2010 IBM Corporation 24 Official Evaluation  Conducted by SAIC on behalf of DARPA  Three teams participated  Evaluation task: Predict readability of 150 test documents using the 390 documents for training

25 © 2010 IBM Corporation 25 Official Evaluation Results Our system performed favorably and scored better than the upper critical value. Upper Critical Value Correlation Novice mean Sig. better than human at p<0.0001

26 © 2010 IBM Corporation 26  Readability system –Regression over syntactical, lexical and language model features  All features contribute, but LM features are most useful  System is significantly (statistically) better than novice human judges Conclusions

27 © 2010 IBM Corporation 27 Questions?? Thank You!


Download ppt "© 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,"

Similar presentations


Ads by Google