Presentation is loading. Please wait.

Presentation is loading. Please wait.

Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011.

Similar presentations


Presentation on theme: "Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011."— Presentation transcript:

1 Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011

2 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 2WING, NUS Domain-Specific Resources

3 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Domain-Specific Resources 3WING, NUS Modular arithmetic page from Wikipedia Modular arithmetic page from Interactivate.com Domain-specific resources targets at varying audiences.

4 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Challenge for a Domain-Specific Search Engine 4WING, NUS How to measure readability for domain- specific resources?

5 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Literature Review Heuristic-based Readability Measures – Weighted sum of text feature values – Examples:  Flesch Kincaid Reading Ease (FKRE): [Flesch48]  Dale-Chall readability formula: [Dale&Chall48] 5WING, NUS Quick and indicative but often oversimplify

6 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Literature Review Natural Language Processing and Machine Learning Approaches – Extract deep text features and use supervised learning methods to generate models for readability measurement – Text Features  Unigram [Collins-Thompson04], Parse tree height [Schwarm05], Discourse relations [Pitler08] – Supervised learning techniques  Support Vector Machine (SVM) [Schwarm05], k-Nearest Neighbor (KNN) [Heilman07] 6WING, NUS More accurate but annotated corpus required and ignorant of the domain-specific concepts

7 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Literature Review Domain-Specific Readability Measures – Derive information of domain-specific concepts from expert knowledge sources – Examples:  Open Access and Collaborative Consumer Health Vocabulary [Kim07]  Medical Subject Headings ontology [Yan06] – Handles domain-specific concepts but expert knowledge sources are still expensive and not always available 7WING, NUS Key qualities of a good readability measure: effective, portable and domain-aware.

8 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Intuitions Use an iterative computation algorithm to estimate these two scores from each other Example: – Pythagorean theorem vs. ring theory 8WING, NUS A domain-specific resource is less readable if it contains more difficult concepts A domain-specific concept is more difficult if it appears in less readable resources

9 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Iterative Computation (IC) Algorithm Graph Construction – Construct a graph representing resources, concepts and occurrence information Score Computation – Initialize and iteratively compute the readability score of domain- specific resources and the difficulty score of domain-specific concepts – Two versions: heuristic and probabilistic Required Input – A collection of domain-specific resources – A list of domain-specific concepts 9WING, NUS

10 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Graph Construction 10WING, NUS …Pythagorean theorem can be written as a 2 + b 2 = c 2, where c represents the length of the hypotenuse… …The sine function (sin) can be defined as the ratio of the side opposite the angle to the hypotenuse… … right triangle Pythagorean theorem hypotenuse sine function cosine function … Resource 1 Resource 2 Concept List Pythagorean Theorem hypotenuse sine function Resource 1 Resource 2 right triangle cosine function

11 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Score Computation (Heuristic) 11WING, NUS wxyz abc Resource Nodes Concept Nodes Initialization – Resource Node (FKRE) – Concept Node (Average score of its adjacent nodes) 1.00 3.00 2.004.00 2.002.503.00 wxyz abc Resource Nodes Concept Nodes 3.00 5.254.757.00 4.00 5.00 6.00 Iterative Computation – Each node (Original score + average of the original scores of its adjacent nodes) Initialization Iteration 1

12 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Score Computation (Heuristic) 12WING, NUS wxyz abc Resource Nodes Concept Nodes 7.00 9.7510.2513.00 8.1310.0011.88 wxyz abc Resource Nodes Concept Nodes 15.13 18.8221.1924.88 16.51 20.00 23.51 Termination Condition – The rank order of the resource nodes stabilizes Iteration 2 Iteration 3

13 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Score Computation (Heuristic) Single-valued score for each node – Unable to handle concepts of varying difficulties Simple averaging in score computation – Difficult to incorporate sophisticated computational mechanisms 13WING, NUS

14 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Score Computation (Probabilistic) 14 wxyz abc Resource Nodes Concept Nodes Initialization – Resource Node (Sentence Sampling) – Concept Node (Resource Sampling) Initialization

15 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Score Computation (Probabilistic) 15 Iterative Computation – Modified Naïve Bayes Classification Original: Modified: Direct Adaptation: Resource Nodes Concept Nodes

16 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Evaluation Key qualities of a good readability measure – Effectiveness – Portability – Domain-awareness 16WING, NUS

17 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Effectiveness Corpus of Math Webpages Metrics: – Pairwise accuracy – Spearman’s rho Baseline: – Heuristic  FKRE – Supervised learning  NB, SVM, MaxEsnt  Binary concept features only 17WING, NUS PairwiseSpearmanIterations FKRE.72.48- NB.72.52- SVM.80.70- Maxent.82.67- HIC.87.7518 PIC.85.737

18 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Portability Different selection strategies – Resource selection at random – Concept selection at random – Resource selection by quality – Concept selection by TF.IDF Performance measurement at 5 levels – 20%, 40%, 60%, 80% and 100% of the original resource collection / concept list 18WING, NUS

19 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Portability 19WING, NUS Resource Selection Strategies Concept Selection Strategies

20 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Portability 20WING, NUS PairwiseSpearman FKRE.63.28 NB.73.53 SVM.82.70 Maxent.76.60 HIC.74.49 PIC.75.55

21 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Domain-awareness Handling of domain-specific concepts – Simple yet effective – Concepts of multiple difficulty levels?  Converge to single value even in PIC  Splitting? (K-Means, GMM, etc.)  Other computational mechanisms? 21WING, NUS

22 Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Conclusion Iterative Computation – Estimate the readability of domain-specific resources and difficulty of domain-specific concepts in a iterative manner – Effective, Portable and Domain-aware Future Work – Handling of concepts of multiple difficulty levels 22WING, NUS


Download ppt "Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011."

Similar presentations


Ads by Google