RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Slides:



Advertisements
Similar presentations
Agenda Levels of measurement Measurement reliability Measurement validity Some examples Need for Cognition Horn-honking.
Advertisements

Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Consistency in testing
Topics: Quality of Measurements
Reliability.
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
The Department of Psychology
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Chapter 4 – Reliability Observed Scores and True Scores Error
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
Lesson Six Reliability.
Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.
Reliability & Validity.  Limits all inferences that can be drawn from later tests  If reliable and valid scale, can have confidence in findings  If.
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
Reliability and Validity of Research Instruments
Reliability n Consistent n Dependable n Replicable n Stable.
Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.
Reliability and Validity
A quick introduction to the analysis of questionnaire data John Richardson.
Measurement: Reliability and Validity For a measure to be useful, it must be both reliable and valid Reliable = consistent in producing the same results.
Lesson Seven Reliability. Contents  Definition of reliability Definition of reliability  Indication of reliability: Reliability coefficient Reliability.
1 BASIC CONSIDERATIONS in Test Design 2 Pertemuan 16 Matakuliah: >/ > Tahun: >
Session 3 Normal Distribution Scores Reliability.
Research Methods in MIS
Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.
Classical Test Theory By ____________________. What is CCT?
Classroom Assessment Reliability. Classroom Assessment Reliability Reliability = Assessment Consistency. –Consistency within teachers across students.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
Validity and Reliability
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
Reliability Lesson Six
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Reliability Chapter 3. Classical Test Theory Every observed score is a combination of true score plus error. Obs. = T + E.
Reliability Chapter 3.  Every observed score is a combination of true score and error Obs. = T + E  Reliability = Classical Test Theory.
Reliability & Validity
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Chapter 2: Behavioral Variability and Research Variability and Research 1. Behavioral science involves the study of variability in behavior how and why.
Designs and Reliability Assessing Student Learning Section 4.2.
Psychometrics. Goals of statistics Describe what is happening now –DESCRIPTIVE STATISTICS Determine what is probably happening or what might happen in.
Measurement MANA 4328 Dr. Jeanne Michalski
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.
Reliability n Consistent n Dependable n Replicable n Stable.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Technical Adequacy of Tests Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment.
Chapter 6 - Standardized Measurement and Assessment
Reliability EDUC 307. Reliability  How consistent is our measurement?  the reliability of assessments tells the consistency of observations.  Two or.
WHS AP Psychology Unit 7: Intelligence (Cognition) Essential Task 7-3:Explain how psychologists design tests, including standardization strategies and.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Lab 4: Alpha and Standard Error of Measurement. Reliability Reliability refers to consistency Types of reliability estimates – Test-retest reliability.
Reliability. Basics of test score theory Each person has a true score that would be obtained if there were no errors in measurement. However, measuring.
Chapter 2 Norms and Reliability. The essential objective of test standardization is to determine the distribution of raw scores in the norm group so that.
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
Lecture 5 Validity and Reliability
Classical Test Theory Margaret Wu.
PSY 614 Instructor: Emily Bullock, Ph.D.
Evaluation of measuring tools: reliability
By ____________________
The first test of validity
Psy 425 Tests & Measurements
Chapter 8 VALIDITY AND RELIABILITY
Reliability.
Presentation transcript:

RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Outline 1.Defining reliability 2.How to measure reliability 3.Reliability coefficient 4.Observed score and true score 5.SEM 6.Item analyses

Tests as measuring tools ‘A test is something (as a series of questions or exercises) for measuring the skill, knowledge, intelligence, capacities, or aptitudes of an individual or group’ (Merriam Webster Dictionary Online, 2013)

Tests as measuring tools ‘…a language test is a procedure for gathering evidence of general or specific language abilities from performance on tasks designed to provide a basis for predictions about an individual’s use of those abilities in real world contexts.’ (McNamara, 2000:11)

A reliable test A perfectly reliable test is ‘one which would give precisely the same results for a particular set of candidates regardless of when it happened to be administered.’ (Hughes, 1989:31)

An unreliable test A completely unreliable test is one ‘which would give sets of results unconnected with each other.’ (Hughes, 1989: 32)

Strategies to estimate reliability We can use statistics to estimate how reliable a test is: test-retest reliability; equivalent (parallel) forms reliability; internal consistency reliability.

Test-retest reliability ‘calculating a reliability estimate by administering a test on two occasions and calculating the correlation between the two sets of scores’ (Brown, 2002)

Equivalent (parallel/alternative) forms reliability ‘calculating a reliability estimate by administering two forms of a test and calculating the correlation between the two sets of scores’ (Brown, 2002)

Internal consistency reliability ‘calculating a reliability estimate based on a single form of a test administered on a single occasion using internal consistency equations’ (Brown, 2002)

Internal consistency reliability: calculating reliability from single administration of test; some commonly reported figures (reliability coefficients) are; - split-half; - Cronbach’s alpha. calculated automatically by many statistical software packages.

Split-half reliability: the test is split in half (e.g. odd / even) creating “equivalent forms”; the two “forms” are correlated with each other; the correlation coefficient is adjusted to reflect the entire test length.

Reliability coefficient: range: -1.0 (inverse relationship) to 0.0 (totally unreliable test) to 1.0 (perfectly reliable test); reliability coefficients are estimates of the systematic variance in the test scores; lower reliability coefficient = greater measurement error in the test score.

How high should reliability be? (Pope n.d.)

Standard error of measurement (SEM): This allows us to use the score that the test taker got for the test (observed score) and estimate what their true level of ability might be. Of course, we do not know, so the ‘true score’ that we estimate must be a range of numbers. Observed score. True score.

Maria’s scores: True score = observed score +/- error Standard error of measurement (SEM):

We would expect the student to score near the centre of the distribution most of the time. Standard error of measurement (SEM):

The standard error of measurement (SEM) is the standard deviation of all those scores averaged across persons and test administrations. (Brown, 2002) Standard error of measurement (SEM):

Sx √(1-rxx’) Sx – standard deviation of raw scores rxx’ – reliability coefficient Standard error of measurement (SEM):

1 SEM = 68% confidence 2 SEM = 95% confidence 3 SEM = 99.7% confidence Standard error of measurement (SEM):

Observed score = 50 SEM = 3 68%: from 47 to 53 95%: from 44 to 56 Standard error of measurement (SEM):