Perspectives on Word Difficulty Using Item Response Theory Rick Chan Frey, PhD University of California, Berkeley

The Amazonification of Knowledge  How does Amazon know what books you'll enjoy reading?  Amazon has no theoretical framework to distinguish between hundreds of potential recommendations based on genre or style.  Analyzing the data from millions of sales with similar reading patterns

What percentage of the area of the square is covered by the circle?

How to Know Something Without Knowing Anything About It  Remember back to your 5th grade math, calculating the area of a square, circle and calculating a ratio.  If you know the right formula, it's easy and a mathematician could even explain why it works.

How to Know Something Without Knowing Anything About It  But pretend for a moment you didn't know the formulas, didn't know how to calculate ratios.  Is there another way to solve the problem?

Simulate the Answer with Scratch  Draw a square 4" x 4"  Draw a circle 2" diameter w/different color  Have the computer draw random sprites  Count what colors they land on

What is IRT?  First and foremost, IRT (as with classical test theory) is about assessment, not instruction.  Adaptive testing - some questions are rated easier or harder than others, so the test has a better ability to differentiate student ability than a generic test.  Most developed statistical model for interactively measuring item difficulty and student ability.

Item - Response - Theory  Item – questions on a survey, math problems, multiple choice questions on SAT  Response – easily quantifiable, usually right/wrong, can have partial credit.  Theory – hypothesizing a construct (e.g. intelligence, love of art, etc.) that can be measured.

Understand by example - Are you a tech junkie?  Do you own a computer?  Do you own a smart phone?  Do you own two or more tablets?  Have you ever tried Google glasses?

Participants Items Have tried Google glasses Bill Gates grandma Average teenager Own a computer Own a smart phone Own 2+ tablets Luddite Tech junkie

What could IRT offer reading researchers?  Quantitative measure of word difficulty  Improved accuracy in determining student reading ability from assessments  Reliability measures (item and test level)

How do researchers estimate word difficulty?  Ehri (2005) Earliest and simplest  Beck, McKeown & Kucan (2002) Word Tiers  Hiebert, Stewart & Uzicanin (2010) Word Features and Word Recognition  Fountas & Pinnell – Guided Reading  MetaMetrics – Lexile Scoring System

What are the basics we know?  Shorter words easier than longer words  Fewer syllables are easier  Higher frequency words are easier  Highly imageable words are easier  A quantitative measure of word difficulty could be highly useful

Testing the Idea  One school, two 1 st grade classes, (n=75)  Data from two years of DIBELS assessments Oral Reading Fluency (ORF) assessment  First 40 words of four different ORF assessments  Simple IRT analysis marking incorrectly read words as incorrect responses (no partial credit)

Findings  Quantitative measures of word difficulty and student reading ability  Strong correlations (r between.54 and.72) for basics of word difficulty  Potential measure for text reliability  Identifies words that defy expectations  Provides potential model for analyzing the impact of context on word difficulty

Findings – Word Difficulty TextWordLtrsDifficulty ORF 1B-36 waiting 7 3.66 ORF 1A-6 outside 7 3.41 ORF 1B-3 mind 4 2.93 ORF 2A-21 spot 4 1.81 ORF 2A-29 got 3 0.28 ORF 1B-32 fish 4 -1.65 ORF 1B-1 I 1 -3.12

Findings – Strange Cases  "First we picked a spot far from the big waves" -- guess the word with the highest item difficulty score?  3 letters, fairly easy phonetically, but easily confused with for  ¼ of the hardest 20 words had 3-4 letters, one syllable and were phonetically regular  Yellow, castle, anymore, all easier than traits would indicate

Findings – Context Effect  A six letter, two syllable word with an unusual spelling pattern is missed by only 2 students, difficulty score -1.01  "We built a giant sand castle at the beach."  Built scored 2.85 difficult and giant was one of the top 5 hardest words at 3.34  Fish tank and rocky road, no such luck

Findings – Context Possibilities  Compare instances of reading same words in different contexts (starting a sentence, in subordinate clause, etc.)  Compare instances of words with suffixes (lick, licks, licked, licking)  Compare the effects of background knowledge on word difficulty scores

Findings – Text Reliability  Use of Chronbach's alpha to measure reliability of a given text for assessing student reading ability  Each of the four passages scored.97 or higher  Few words across the four passages had poor fit

Conclusions  IRT provides researchers with a quantitative method for assessing word difficulty that can be used in a wide variety of research designs.  IRT offers useful information for text designers attempting to design and redesign increasingly complicated texts that comply with common core standards.  IRT offers a window into the brave new world of big data, suggesting new ideas about literacy development we don't necessarily understand but would be wise to consider.

For a copy of the presentation or questions contact: Rick Chan Frey, PhD rick@mustardseedbooks.org

