Technical Adequacy of Tests Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment.

Slides:



Advertisements
Similar presentations
Chapter 8 Flashcards.
Advertisements

Richard M. Jacobs, OSA, Ph.D.
Topics: Quality of Measurements
The Research Consumer Evaluates Measurement Reliability and Validity
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
VALIDITY AND RELIABILITY
Chapter 5 Measurement, Reliability and Validity.
Part II Sigma Freud & Descriptive Statistics
Part II Sigma Freud & Descriptive Statistics
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Copyright © Allyn & Bacon (2007) Data and the Nature of Measurement Graziano and Raulin Research Methods: Chapter 4 This multimedia product and its contents.
LECTURE 9.
Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.
Reliability and Validity
Copyright 2001 by Allyn and Bacon Standardized Testing Chapter 14.
FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,
Standardized Test Scores Common Representations for Parents and Students.
Classroom Assessment A Practical Guide for Educators by Craig A
Evaluating a Norm-Referenced Test Dr. Julie Esparza Brown SPED 510: Assessment Portland State University.
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Measurement and Data Quality
But What Does It All Mean? Key Concepts for Getting the Most Out of Your Assessments Emily Moiduddin.
Data Collection & Processing Hand Grip Strength P textbook.
Instrumentation.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Technical Adequacy Session One Part Three.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Estimation of Statistical Parameters
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Chapter 1: Research Methods
Psychology’s Statistics Statistical Methods. Statistics  The overall purpose of statistics is to make to organize and make data more meaningful.  Ex.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Assessment of Students with Special Education Needs SPE 3374 Margaret Gessler Werts 124D Edwin Duncan Hall www1.appstate.edu/~wertsmg.
Review of Basic Tests & Measurement Concepts Kelly A. Powell-Smith, Ph.D.
Reliability & Validity
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Sampling Error.  When we take a sample, our results will not exactly equal the correct results for the whole population. That is, our results will be.
Selecting a Sample. Sampling Select participants for study Select participants for study Must represent a larger group Must represent a larger group Picked.
CHAPTER OVERVIEW The Measurement Process Levels of Measurement Reliability and Validity: Why They Are Very, Very Important A Conceptual Definition of Reliability.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Psychometrics. Goals of statistics Describe what is happening now –DESCRIPTIVE STATISTICS Determine what is probably happening or what might happen in.
Measurement MANA 4328 Dr. Jeanne Michalski
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.
Standardized Testing. Basic Terminology Evaluation: a judgment Measurement: a number Assessment: procedure to gather information.
Chapter 6 - Standardized Measurement and Assessment
Reliability and Validity in Testing. What is Reliability? Consistency Accuracy There is a value related to reliability that ranges from -1 to 1.
Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
Testing Intelligence. ARE YOU OVER-TESTED?  Your age group is the most tested group in the history of the United States.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
© 2009 Pearson Prentice Hall, Salkind. Chapter 5 Measurement, Reliability and Validity.
ESTIMATION.
Concept of Test Validity
Evaluation of measuring tools: validity
Reliability & Validity
Human Resource Management By Dr. Debashish Sengupta
پرسشنامه کارگاه.
Scoring: Measures of Central Tendency
Unit IX: Validity and Reliability in nursing research
Sampling Distributions
Presentation transcript:

Technical Adequacy of Tests Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment

Two Essential Concepts Reliability: Test consistency Validity: Test measures what it says it does

Psychometric Properties To create an instrument with sound psychometric properties means that it is reliable.

Reliability Reliability is whether a test or measurement tool measures something consistently.

1.Reliable - Hits same part of the target each time, consistent but not valid because goal is center. 2.Valid – evenly distributed around goal, but not reliable because shots are off the mark and inconsistent. 3.Not reliable – shots not tightly clustered, not consistent and not valid because pattern is not around true center. 4.Reliable – darts are close together and valid because darts clustered around where they were aimed.

Reliability Type of Reliabilit y When You Use ItHow You Do ItEx. Of What You Can Say When You’re Done Test-retest reliability When you want to know whether a test is reliable over time. Correlate the scores from a test given in Time 1 with the same test given in Time 2 The Bonzo test of identity formation for adolescents Is reliable over time Parallel form of reliability When you want to know if several different forms of a test are reliable or equivalent Correlate the scores from one form of the test with the scores from a second form of the same test of the same content (but not the exact same test)

Reliability Type of Reliabilit y When You Use ItHow You Do ItEx. Of What You Can Say When You’re Done Internal consistenc y reliability When you want to know if the items on a test assess one, and only one, dimension. Correlate each individual item score with the total score. All of the items on the SMART Test of Creativity assess the same construct. Interrater reliability When you want to know whether there is consistency in the rating of some outcome. Examine the percent of agreement between raters. The interrater reliability for the best-dressed football player judging was.91, indicating a high degree of agreement between judges.

Interpreting Reliability Coefficients We want two things: o For reliability coefficients to be positive (or direct) and not to be negative (or indirect) o Reliability coefficients that are as large as possible (between.00 and +1.00). o Reliability is a function of how much error contributes to the observed score. The lower the error, the higher the reliability.

To Increase Test Reliability… Ensure instructions are standardized across all settings when the test is administered. Increase the number of items because the larger the sample the more likely it is representative and reliable; this is especially true for achievement tests. Delete unclear items. Minimize the effects of external events.

Validity The property of an assessment tool that indicates the tool does what it says it does. A valid test measure what it is supposed to.

Validity Type of ValidityWhen You Use ItHow You Do ItEx. Of What You Can Say When You’re Done Content validityWhen you want to know whether a sample of items truly reflects an entire universe of items in a certain topic. Ask Mr. or Ms. Expert to make a judgment that the test items reflect the universe of items in the topic being measured. My weekly quiz in my class fairly assesses the chapter’s content. Criterion validity (concur-rent or predictive) When you want to know if test scores are systematically related to other criteria that indicate the test taker is competent in a certain area Correlate the scores from the test with some other measure that is already valid and assesses the same set of abilities The EATS tests (of culinary skills) has been shown to be correlated with being a fine chef 2 years after culinary school (ex. Of predictive validity)

Validity Type of Validity When You Use It How You Do ItEx. Of What You Can Say When You’re Done Construct validity When you want to know if a test measures some underlying psychological construct Correlate the set of test scores with some theorized outcome that reflects the construct for which the test is being designed. It’s true – men who participate in body contact and physically dangerous sports score higher on the TEST(osterone) test of aggression.

Quantifying Validity The maximum level of validity is equal to the square root of the reliability coefficient. Example, if the reliability coefficient of a test is.87, the validity coefficient can be no larger than.93 (the square root of.87).

Reliability and Validity You can have a test that is reliable but not valid. But, you can’t have a valid test without it first being reliable. If a test does what it is supposed to, then it has to do it consistently to work!

Test Scores A raw score or obtained score on a test is the number of points obtained by an aminee. A true score is that part of an examinee’s observed score uninfluenced by random events. The error of measurement or error score is the difference between an obtained score and its theoretical true score counterpart.

Error Score The error score is that part of the obtained score which is unsystematic, random, and due to chance.

Standard Error of Measurement The standard deviation of errors of measurement that is associated with the test scores for a specified group of test takers.” It is a measure of the variability of the errors of measurement. It is used to help us predict true scores based upon knowledge of obtained score. SEM is the standard deviation of errors of measurement associated with test scores for a specific group.

Score Bands Score bands are sometimes called confidence intervals or confidence bands because they allow us to make probabilistic statements of confidence about an unknown value. Score bands have lower and upper limits on the score scale and provide an estimate that is a range or band of possible test scores.

Score Bands An example of a score band or confidence interval is “I am 95 percent confident that the examinee’s obtained score will be between 46 and 54 (given a true score of 50 and an SEM of two). 68 percent confidence intervals are the most commonly used.

Confidence Intervals In a normal distribution: o The area between one SDs below and one SDs above the mean is 68 % of the total area under the curve o The area between two (actually 1.96) SDs below and two SDs above the mean is 95% o The area between 2.58 SDs below and 2.58 SDs above the mean is 99%

Confidence Intervals If we add and subtract one SEM from a person’s test score, we will have an estimate of the true score.

Standard Scores Test developers calculate the statistical average based on the performance of students tested in the norming process of test development. That score is assigned a value. Different performance levels are calculated based on the differences among student scores from the statistical average and are expressed as standard deviations.

Standard Scores These standard deviations are used to determine at what scores fall within the above average, average, and below average ranges. Standard scores and standard deviations are different for different tests. Many of the commonly used tests, such as the Wechsler Intelligence Scales, have an average score of 100 and a standard deviation of 15.

Standard Scores Standardized test scores enable us to compare a student's performance on different types of tests. Although all test scores should be considered estimates, some are more precise than others. Standard scores and percentiles, for example, define a student's performance with more precision than do t-scores, z-scores, or stanines.

Standard Deviation Standard deviation measures how widely spread data points are. If data values are all equal to one another, then the standard deviation is zero. Under a normal distribution, ± one standard deviation encompasses 68% of the measurements and ± two standard deviations encompasses 96% of the measurements.

Standard Deviation If a high proportion of data points lie near the mean (average) value, then the standard deviation is small. o An experiment that yields data with a low standard deviation is said have high precision. If a high proportion of data points lie far from the mean value, then the standard deviation is large. o An experiment that yields data with a high standard deviation is said to have low precision.

Test Scores Observed score – what someone actually gets on a test. True score – the true, 100% accurate reflection of what the someone actually knows. An observed score is usually close to the true score but they are rarely the same. The difference between the two is in the amount of error that is introduced.

Observed Score If someone has 89 on a test, but their true score is 80, that means the 9 points in the difference (the error score) are due to error or the reason why individual test scores vary from being 100% true. What may be the source of the error: o Room is too warm o Person didn’t have time to study o Person has a fever o ????

Observed Score We need to reduce the errors as much as possible. The less error, the more reliable the score.

Reporting Test Scores Score or confidence bands are the best way to report test scores. The score band provides reasonable limits for estimating true score; it is an adequate approximation when the test reliability is reasonably high and the obtained score is not an extreme deviate from the mean of the reference group. You can say, “It is fairly likely your daughter’s true ability lies between 110 and 120.”

Test Selection In selecting a published test: o Read test manual to determine if it has reported the reliability, SEM and norms (including confidence bands) o The above information is reported for a reference group similar to your examinee o Be sure manual explains clearly how the information was gathered and how the confidence bands in the manual were calculated

Reliable and Value If the tools you use to collect data are not reliable nor valid, then the results will be inconclusive.