 What is the reliability?  How to identify the reliability of a test

stable over time consistent in terms of the content sampling free from bias

A test-taker when re-examined with the same test on different occasions, or with different sets of equivalent items, or under variable examining conditions will have the same score

good validity poor reliability

poor validity good reliability

poor validity poor reliability

good validity good reliability

 a type of validity evidence  a posteriori validity evidence  “scoring validity”  a measure of the stability of test scores  a prerequisite for measurement validity

Test-retest reliability Internal consistency Marker reliability Parallel forms reliability Types of scoring validity (methods of estimating reliability)

uusing one test twice for the same test-takers tthe period between the two tests is long enough for test-takers to forget the test but not too long bbetween the two tests, no lesson is given Test-retest reliability

uuse two different but equivalent forms of the test to the same test-takers tthe two tests can be applied in close succession Parallel forms reliability

AA variation of parallel forms reliability; UUsing parallel statistic on one test and dividing the test into two halves for statistics or estimating the correlation of each items in the test with another; FFocuses on the consistency with each other of a test’s internal elements.

SSplit half reliability AAverage inter-item correlation AAverage item-total correlation

SSplit half reliability measure Item 01 Item 02 Item 03 Item 04 Item 05 Item 06 Item 01 Item 03 Item 04 Item 02 Item 05 Item 06.87 Item 05 Item 02 Item 04

AAverage inter-total correlation measure Item 01 Item 02 Item 03 Item 04 Item 05 Item 06 i1i1 i2i2 i3i3 i4i4 i5i5 i6i6 i2i2 i3i3 i4i4 i5i5 i6i6 i1i1 1.00.89.91.88.84.88 1.00.92.93.86.91 1.00.95.92.95 1.00.85.87 1.00.851.00.90

AAverage item-total correlation measure Item 01 Item 02 Item 03 Item 04 Item 05 Item 06 i1i1 i2i2 i3i3 i4i4 i5i5 i6i6 i2i2 i3i3 i4i4 i5i5 i6i6 i1i1 1.00.89.91.88.84.88 Total.84 1.00.92.93.86.91.88 1.00.95.92.95.86 1.00.85.87.87 1.00.85.83 1.00.82 1.00.85

EExcel correlation KKuder-Richarson 20 or 21 CCronbach’s alpha

AAdvantages : SSaving time and expenses; HHigher value compared with the test-retest and parallel forms DDisadvantages : LLack of temporal stability of the scores as they result from a single administration of the test; NNot easy to determine the level of difficulty of the items; TThe items in one half may not be equivalent to the items in the other half.

eenvironmental factors; cconstruct, content, theory-based validity; ddefine the level of difficulty/ease of the items ddefine the level of difficulty in reading texts and their questions

rrelate chiefly to tests in which samples of writing or speaking are produced tthe consistency of the marker(s)

 intra-rater reliability : each marker needs to be consistent within himself/herself  inter-rater reliability : markers need to be consistent with each other

 Have explicit agreed criteria for carrying the marking task  Analytic scales  Holistic scales  Standardization  Moderation of scores (Multi-faceted Rasch - MFR)

Thank you for listening

