Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

The Tangent Ratio CHAPTER 7 RIGHT TRIANGLE TRIGONOMETRY.
Sine, Cosine, Tangent, The Height Problem. In Trigonometry, we have some basic trigonometric functions that we will use throughout the course and explore.
3 May 2011 no clickers Algebra 2. Pythagorean Thm & Basic Trig 5/3 Pythagorean Theorem Pythagorean Theorem: a 2 + b 2 = c 2 *only true for right triangles*
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
Presented by Zeehasham Rasheed
Scalable Text Mining with Sparse Generative Models
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Solving Right Triangles Given certain measures in a right triangle, we often want to find the other angle and side measures. This is called solving the.
Lesson 1: Primary Trigonometric Ratios
Advanced Multimedia Text Classification Tamara Berg.
Keyphrase Extraction in Scientific Documents Thuy Dung Nguyen and Min-Yen Kan School of Computing National University of Singapore Slides available at.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Pythagoras Theorem a2 + b2 = c2
What is Readability?  A characteristic of text documents..  “the sum total of all those elements within a given piece of printed material that affect.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Unit 1 – Physics Math Algebra, Geometry and Trig..
Iterative Readability Computation for Domain-Specific Resources By Jin Zhao and Min-Yen Kan 11/06/2010.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Which of the two appears simple to you? 1 2.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
THE PYTHAGOREAN THEOROM Pythagorean Theorem  What is it and how does it work?  a 2 + b 2 = c 2  What is it and how does it work?  a 2 + b 2 = c 2.
Structural Analysis I Truss Bridge Definitions Static Determinancy and Stability Structural Analysis Trigonometry Concepts Vectors Equilibrium Reactions.
Shoaib Jameel, Wai Lam and Xiaojun Qian The Chinese University of Hong Kong Ranking Text Documents Based on Conceptual Difficulty Using Term Embedding.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Right Triangles and Trigonometry Chapter Geometric Mean  Geometric mean: Ex: Find the geometric mean between 5 and 45 Ex: Find the geometric mean.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
TRIGONOMETRY Lesson 1: Primary Trigonometric Ratios.
Medical Information Retrieval: eEvidence System By Zhao Jin Mar
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
A Practical Web-based Approach to Generating Topic Hierarchy for Text Segments CIKM2004 Speaker : Yao-Min Huang Date : 2005/03/10.
The Right Triangle Right Triangle Pythagorean Theorem
Right Triangle Geometry “for physics students”. Right Triangles Right triangles are triangles in which one of the interior angles is 90 otrianglesangles.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
You will use the sine and cosine ratio to find the sides and angles of a right triangles Pardekooper.
Trigonometry Chapters Theorem.
9.5: Trigonometric Ratios. Vocabulary Trigonometric Ratio: the ratio of the lengths of two sides of a right triangle Angle of elevation: the angle that.
List all properties you remember about triangles, especially the trig ratios.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Lesson 9.9 Introduction To Trigonometry Objective: After studying this section, you will be able to understand three basic trigonometric relationships.
CHAPTER 5 LESSON 4 The Law of Sines VOCABULARY  None.
Data Mining and Text Mining. The Standard Data Mining process.
Trigonometry Review.
Chapter 2 Trigonometry.
hypotenuse opposite adjacent Remember
7-6 Sine and Cosine of Trigonometry
Source: Procedia Computer Science(2015)70:
Lesson 9.9 Introduction To Trigonometry
Math 3-4: The Pythagorean Theorem
5.7: THE PYTHAGOREAN THEOREM (REVIEW) AND DISTANCE FORMULA
Aim: How do we review concepts of trigonometry?
Trigonometry Ratios in Right Triangles
Right Triangles Unit 4 Vocabulary.
Measuring Complexity of Web Pages Using Gate
Right Triangle Trigonometry
Introduction to Trigonometric Functions
Presentation transcript:

Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 2WING, NUS Domain-Specific Resources

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Domain-Specific Resources 3WING, NUS Modular arithmetic page from Wikipedia Modular arithmetic page from Interactivate.com Domain-specific resources targets at varying audiences.

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Challenge for a Domain-Specific Search Engine 4WING, NUS How to measure readability for domain- specific resources?

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Literature Review Heuristic-based Readability Measures – Weighted sum of text feature values – Examples:  Flesch Kincaid Reading Ease (FKRE): [Flesch48]  Dale-Chall readability formula: [Dale&Chall48] 5WING, NUS Quick and indicative but often oversimplify

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Literature Review Natural Language Processing and Machine Learning Approaches – Extract deep text features and use supervised learning methods to generate models for readability measurement – Text Features  Unigram [Collins-Thompson04], Parse tree height [Schwarm05], Discourse relations [Pitler08] – Supervised learning techniques  Support Vector Machine (SVM) [Schwarm05], k-Nearest Neighbor (KNN) [Heilman07] 6WING, NUS More accurate but annotated corpus required and ignorant of the domain-specific concepts

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Literature Review Domain-Specific Readability Measures – Derive information of domain-specific concepts from expert knowledge sources – Examples:  Open Access and Collaborative Consumer Health Vocabulary [Kim07]  Medical Subject Headings ontology [Yan06] – Handles domain-specific concepts but expert knowledge sources are still expensive and not always available 7WING, NUS Key qualities of a good readability measure: effective, portable and domain-aware.

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Intuitions Use an iterative computation algorithm to estimate these two scores from each other Example: – Pythagorean theorem vs. ring theory 8WING, NUS A domain-specific resource is less readable if it contains more difficult concepts A domain-specific concept is more difficult if it appears in less readable resources

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Iterative Computation (IC) Algorithm Graph Construction – Construct a graph representing resources, concepts and occurrence information Score Computation – Initialize and iteratively compute the readability score of domain- specific resources and the difficulty score of domain-specific concepts – Two versions: heuristic and probabilistic Required Input – A collection of domain-specific resources – A list of domain-specific concepts 9WING, NUS

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Graph Construction 10WING, NUS …Pythagorean theorem can be written as a 2 + b 2 = c 2, where c represents the length of the hypotenuse… …The sine function (sin) can be defined as the ratio of the side opposite the angle to the hypotenuse… … right triangle Pythagorean theorem hypotenuse sine function cosine function … Resource 1 Resource 2 Concept List Pythagorean Theorem hypotenuse sine function Resource 1 Resource 2 right triangle cosine function

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Score Computation (Heuristic) 11WING, NUS wxyz abc Resource Nodes Concept Nodes Initialization – Resource Node (FKRE) – Concept Node (Average score of its adjacent nodes) wxyz abc Resource Nodes Concept Nodes Iterative Computation – Each node (Original score + average of the original scores of its adjacent nodes) Initialization Iteration 1

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Score Computation (Heuristic) 12WING, NUS wxyz abc Resource Nodes Concept Nodes wxyz abc Resource Nodes Concept Nodes Termination Condition – The rank order of the resource nodes stabilizes Iteration 2 Iteration 3

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Score Computation (Heuristic) Single-valued score for each node – Unable to handle concepts of varying difficulties Simple averaging in score computation – Difficult to incorporate sophisticated computational mechanisms 13WING, NUS

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Score Computation (Probabilistic) 14 wxyz abc Resource Nodes Concept Nodes Initialization – Resource Node (Sentence Sampling) – Concept Node (Resource Sampling) Initialization

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Score Computation (Probabilistic) 15 Iterative Computation – Modified Naïve Bayes Classification Original: Modified: Direct Adaptation: Resource Nodes Concept Nodes

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Evaluation Key qualities of a good readability measure – Effectiveness – Portability – Domain-awareness 16WING, NUS

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Effectiveness Corpus of Math Webpages Metrics: – Pairwise accuracy – Spearman’s rho Baseline: – Heuristic  FKRE – Supervised learning  NB, SVM, MaxEsnt  Binary concept features only 17WING, NUS PairwiseSpearmanIterations FKRE NB SVM Maxent HIC PIC

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Portability Different selection strategies – Resource selection at random – Concept selection at random – Resource selection by quality – Concept selection by TF.IDF Performance measurement at 5 levels – 20%, 40%, 60%, 80% and 100% of the original resource collection / concept list 18WING, NUS

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Portability 19WING, NUS Resource Selection Strategies Concept Selection Strategies

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Portability 20WING, NUS PairwiseSpearman FKRE NB SVM Maxent HIC PIC.75.55

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Domain-awareness Handling of domain-specific concepts – Simple yet effective – Concepts of multiple difficulty levels?  Converge to single value even in PIC  Splitting? (K-Means, GMM, etc.)  Other computational mechanisms? 21WING, NUS

Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Conclusion Iterative Computation – Estimate the readability of domain-specific resources and difficulty of domain-specific concepts in a iterative manner – Effective, Portable and Domain-aware Future Work – Handling of concepts of multiple difficulty levels 22WING, NUS