Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Slides:



Advertisements
Similar presentations
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
Advertisements

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
VALIDITY AND RELIABILITY
Measuring Research Variables
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.
Reliability and Validity of Research Instruments
Chapter 4 Validity.
REVIEW I Reliability Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure Degree.
Test Validity: What it is, and why we care.
Lecture 7 Psyc 300A. Measurement Operational definitions should accurately reflect underlying variables and constructs When scores are influenced by other.
Characteristics of Sound Tests
PSYCHOMETRICS RELIABILITY VALIDITY. RELIABILITY X obtained = X true – X error IDEAL DOES NOT EXIST USEFUL CONCEPTION.
Research Methods in MIS
Validity and Reliability EAF 410 July 9, Validity b Degree to which evidence supports inferences made b Appropriate b Meaningful b Useful.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Classroom Assessment A Practical Guide for Educators by Craig A
Norms & Norming Raw score: straightforward, unmodified accounting of performance Norms: test performance data of a particular group of test takers that.
Relationships Among Variables
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Measurement and Data Quality
Foundations of Educational Measurement
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Research Project Statistical Analysis. What type of statistical analysis will I use to analyze my data? SEM (does not tell you level of significance)
Validity. Face Validity  The extent to which items on a test appear to be meaningful and relevant to the construct being measured.
Reliability & Validity
Tests and Measurements Intersession 2006.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
Correlation & Prediction REVIEW Correlation BivariateDirect/IndirectCause/Effect Strength of relationships (is + stronger than negative?) Coefficient of.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
1 Virtual COMSATS Inferential Statistics Lecture-25 Ossam Chohan Assistant Professor CIIT Abbottabad.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Chapter 10 Copyright © Allyn & Bacon 2008 This multimedia product and its contents are protected under copyright law. The following are prohibited by law:
Measurement MANA 4328 Dr. Jeanne Michalski
Developing a Hiring System Measuring Applicant Qualifications or Statistics Can Be Your Friend!
REVIEW I Reliability scraps Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure.
Chapter 6 - Standardized Measurement and Assessment
Reliability and Validity in Testing. What is Reliability? Consistency Accuracy There is a value related to reliability that ranges from -1 to 1.
Validity & Reliability. OBJECTIVES Define validity and reliability Understand the purpose for needing valid and reliable measures Know the most utilized.
Sample Size Mahmoud Alhussami, DSc., PhD. Sample Size Determination Is the act of choosing the number of observations or replicates to include in a statistical.
Chapter 13 Understanding research results: statistical inference.
Chapter 6 Norm-Referenced Reliability and Validity.
LESSON 5 - STATISTICS & RESEARCH STATISTICS – USE OF MATH TO ORGANIZE, SUMMARIZE, AND INTERPRET DATA.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Chapter 6 Norm-Referenced Measurement. Topics for Discussion Reliability Consistency Repeatability Validity Truthfulness Objectivity Inter-rater reliability.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.
Quantitative Methods in the Behavioral Sciences PSY 302
Concept of Test Validity
Reliability & Validity
پرسشنامه کارگاه.
Evaluation of measuring tools: reliability
Understanding Statistical Inferences
REVIEW I Reliability scraps Index of Reliability
Presentation transcript:

Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous) ANOVA (1 nominal variable for 3 + groups, 1 continuous)

Variance Standard deviation Correlation Are two variables related? What happens to Y when X changes? Linear relationship between two variables Quantifies the RELIABILITY & VALIDITY of a test or measurement

Reliability (0-1;.80 + goal) All scores: observed = true + error r xx =S 2 t /S 2 o proportion of observed score variance that is true score variance Interclass reliability coefficients (correlates 2 trials) Test/retest time, fatigue, practice effect Equivalent reduces test length by 50% Split-halves Index of Reliability Tells you what? Related to C of D how?

Standard Error of Measurement RELIABILITY MEASURE Reflects the degree to which a person's observed score fluctuates as a result of measurement errors S=standard deviation of the test r xx’ =reliability of the test

EXAMPLE: Test standard deviation=100r=.84 SEM = =100( .16) =100(.4) =40

SEM is the standard deviation of the measurement errors around an observed score EXAMPLE: Test score=500SEM=40 68% of all scores should fall between (500+40) 95% of all scores range between: ?

Factors Affecting Test Reliability 1)Fatigue ↓ 2)Practice ↑ 3)Subject variability homogeneous ↓, heterogeneous ↑ 4)Time between testing more time= ↓ 5)Circumstances surrounding the testing periods change= ↓ 6)Test difficulty too hard/easy= ↓ 7) Precision of measurement precise= ↑ 8)Environmental conditions change= ↓ SO WHAT? A test must first be reliable to be valid

Validity Types THIS SLIDE IS HUGE!!!! Content-Related Validity (a.k.a., face validity) Should represent knowledge to be learned Criterion for content validity rests w/ interpreter Use “experts” to establish Criterion-Related Validity Test has a statistical relationship w/ trait measured Alternative measures validated w/ criterion measure Concurrent: criterion/alternate measured same time Predictive: criterion measured in future Construct-Related Validity Validates theoretical measures that are unobservable

Standard Error of Estimate (reflects accuracy of estimating a score on the criterion measure) VALIDITY MEASURE Standard Error Standard Error of Prediction

Standard Errors SE of Measurement SE of Estimate

Methods of Obtaining a Criterion Measure Actual participation Play the game over multiple trials Perform the criterion known valid criterion (e.g., treadmill performance) Expert judges Tournament participation Round robin (to identify best player/team) Known valid test (may be too long/time consuming)

Interpreting the “r” you obtain THIS IS HUGE!!!!

Table 6-8 Correlation Matrix for Development of a Golf Skill Test (From Green et al., 1987) Playing golf Long puttChip shotPitch shotMiddle distance shot Drive Playing golf 1.00 Long putt Chip shot Pitch shot Middle distance shot Drive What are these? Concurrent Validity coefficients

Interpret these correlations Actual golf score Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer 1 Observer 2 Actual golf score 1.00 Putting T Putting T Driving T Driving T Observer Observer What are these? Concurrent Validity coefficients Criterion

Interpret these correlations Actual golf score Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer 1 Observer 2 Actual golf score 1.00 Putting T Putting T Driving T Driving T Observer Observer What are these? Reliability coefficients

Interpret these correlations Actual golf score Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer 1 Observer 2 Actual golf score 1.00 Putting T Putting T Driving T Driving T Observer Observer What is this? Objectivity coefficient

Concurrent Validity This square represents variance in performance in a skill (e.g., golf)

Concurrent Validity The different colors and patterns represent different parts of a skills test battery to measure the criterion (e.g., golf)

Concurrent Validity The orange color represents ERROR or unexplained variance in the criterion (e.g., golf) Error

Concurrent Validity ACDB Consider the Concurrent validity of the above 4 possible skills test batteries

Concurrent Validity ACDB Which test battery would you be LEAST likely to use? Why? D – it has the MOST error and requires 4 tests to be administered

Concurrent Validity ACDB Which test battery would you be MOST likely to use? Why? C – it has the LEAST error but it requires 3 tests to be administered

Concurrent Validity ACDB Which test battery would you use if you are limited in time? A or B – requires 1 or 2 tests to be administered but you lose some validity