Presentation on theme: "Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser."— Presentation transcript:
Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser
Coh-Metrix Investigators Co-PIs and Senior Researchers: Max Louwerse, Art Graesser, Zhiqiang Cai, Randy Floyd, Xiangen Hu, Vasili Rus Postdocs & Staff: Rachel Best, David Dufty, Christian Hempelman, Tenaha O’Reilly, Yasuhiro Ozuru Many students
Coh-Metrix Coh-Metrix v1.2 Analyzes texts on many different dimensions of cohesion and language –Input text on a web site –Outputs 12 primary measures and over 200 additional measures Graesser, McNamara, Louwerse, & Cai, 2004
Prior Research Increasing text cohesion improves memory for text content. –Increasing argument overlap between sentences. Most plastics are good insulators. So are clothes you wear, like sweaters and coats. Most plastics are good insulators. Other good insulators are the clothes you wear, like sweaters and coats. –Adding connectives For example, most plastics are good insulators. because, consequently, so that, in addition, however –Adding headers and topic sentences
Prior Research Increasing text cohesion improves memory for text content. Text cohesion is particularly crucial for low-knowledge readers. Decreasing text cohesion helps high- knowledge readers process the text more actively and understand it at a deeper level. –McNamara, Kintsch, Songer, & Kintsch (1996, C&I) –McNamara & Kintsch (1996, DP) –McNamara (2001, CJEP)
Cohesion and Coherence Research points to the need to consider text difficulty in terms of text cohesion and coherence. –Cohesion is a property of the text. –Coherence is a property of the reader’s mental representation. We need automated measures of cohesion and coherence.
Current Method: Readability Measures E.g., Flesch-Kincaid Grade Level Based on the work of Rudolph Flesch in the 1940’s Scores range from 0-12 to predict grade appropriateness Measure based on surface characteristics –sentence length –word length
Goals of Coh-Metrix Tool Analyze texts on many different dimensions of cohesion and language –Input text on a web site –Outputs over 200 measures Focus primarily on deeper levels of meaning and cohesion, unlike standard readability formulas Tailor texts to students (K12, college) with different world knowledge and abilities
Argument overlap F-K easy hard Any disorder that stops the heart from supplying blood to the body is a threat to life. Heart disease is such a disorder. Any disorder that stops the blood supply is a threat to life. Heart disease is very common
Argument overlap F-K easy hard Cohesion and Readability Scores for 19 pairs of passages examined in 12 published studies
Beck et al. (1984) Beck et al. (1991) Britton and Gulgoz (1989) Cataldo & Oakhill (2000) Kintsch (1990) Lehman & Schraw (2002) Linderholm et al. (2000) Loxterman et al. (1994) McNamara (2001) McNamara et al. (1996) Vidal-Abarca et al. (2000) Voss & Silfies (1996) List of Cohesion Publications
Linderholm et al Mademoiselle Germaine (Easy Text) McNamara et al Mammal Text, Exp. 1 Lehman & Schraw 2002 The Quest for the Northwest Passage No differences causal, particle to verb ratio causal connectives LSA Sentence to Sentence noun overlap clarification connectives causal, particle to verb ratio causal connectives pronoun incidence What variables showed a greater than 50% difference in favor of the cohesive text?
Linderholm et al. 2000, Mademoiselle Germaine (Easy Text) McNamara et al Mammal Text, Exp. 1 Lehman & Schraw 2002 The Quest for the Northwest Passage No differences causal, particle to verb ratio,.2 vs.5 causal connectives, 4.7 vs 10.6 LSA,.17 vs.35 noun overlap,.11 vs.22 clarification connectives, 0 vs 1.22 causal, particle to verb ratio,.11 vs.43 causal connectives, 3.4 vs 11.0 pronoun incidence, 8.5 vs 25.7 What variables showed a greater than 50% difference in favor of the cohesive text?
Overall Results The 20 variables showing the largest differences were co-reference measures. Argument overlap measures showed the largest differences in comparison to noun and stem overlap measures –Argument overlap includes pronouns They skied all day. They were tired. –Regardless of whether overlap was counted at distances of 1, 2, or 3 sentences –Adjacent overlap showed the largest difference
Other Significant Variables Type-Token Ratio for Nouns (L>H) Higher level constituents per sentence (H>L) Ratio of causal particles and causal verbs (p L) Causal connectives (p L) Celex, log Freq, min in sentence (p H) Average Words per Sentence (p L) LSA, sentence to sentence (p L)
Indicates that the high-cohesion texts did not add new information
Number of Words Descriptive Statistics NMinimumMaximumMeanStd. Deviation
ANNOUNCING THE RELEASE OF Coh-Metrix 1.1
Current Goals Examine cohesion measures by grade level for TASA and complete textbooks. Conducting empirical studies to further examine the effects of text cohesion for adults Conducting experiments to establish the effects of cohesion for young children. –e.g., currently conducting comprehension and eye-tracking studies with 3 rd -5 th grade children.
What will Coh-Metrix achieve? Enhance education by giving educators better tools for choosing textbooks Help publishers more appropriately tailor books to target age groups Help writers improve the cohesion of their writing Help researchers better understand the hidden properties of text