MANA 5341 Dr. George Benson benson@uta.edu Measurement MANA 5341 Dr. George Benson benson@uta.edu 1.

Slides:

Advertisements

Similar presentations

Reliability IOP 301-T Mr. Rajesh Gunesh Reliability  Reliability means repeatability or consistency  A measure is considered reliable if it would give.

Advertisements

Topics: Quality of Measurements

1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

Chapter 4 – Reliability Observed Scores and True Scores Error

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

Part 4 Staffing Activities: Selection

VALIDITY AND RELIABILITY

 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.

Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.

What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.

Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.

Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.

Research Methods in MIS

Chapter 7 Correlational Research Gay, Mills, and Airasian

Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.

Measurement and Data Quality

Reliability and Validity what is measured and how well.

Instrumentation.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.

Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.

McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.

Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.

LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.

Technical Adequacy Session One Part Three.

Foundations of Recruitment and Selection I: Reliability and Validity

Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.

Chapter 4: Test administration. z scores Standard score expressed in terms of standard deviation units which indicates distance raw score is from mean.

Reliability & Validity

1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.

Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.

Tests and Measurements Intersession 2006.

MANA 4328 Dr. Jeanne Michalski

Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.

RELIABILITY AND VALIDITY OF ASSESSMENT

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Class Final Exam & Project Review.

Measurement MANA 4328 Dr. Jeanne Michalski

©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.

Technical Adequacy of Tests Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment.

Chapter 6 - Standardized Measurement and Assessment

Dr. Jeffrey Oescher 27 January 2014 Technical Issues  Two technical issues  Validity  Reliability.

Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.

5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)

ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.

Mgmt Staffing Prof. Howard Miller

Unit 2: Test Worthiness and Making Meaning out of Raw Scores

Dr. Jeanne Michalski Selection MANA 3320 Dr. Jeanne Michalski

Product Reliability Measuring

Assessment Theory and Models Part II

RELIABILITY OF QUANTITATIVE & QUALITATIVE RESEARCH TOOLS

Validity and Reliability

CHAPTER 5 MEASUREMENT CONCEPTS © 2007 The McGraw-Hill Companies, Inc.

Classical Test Theory Margaret Wu.

Reliability & Validity

Outline the steps خطوات in the selection اختيار process عملية

Human Resource Management By Dr. Debashish Sengupta

MANA 4328 Dennis C. Veit Measurement MANA 4328 Dennis C. Veit 1.

5. Reliability and Validity

Inverse Transformation Scale Experimental Power Graphing

PSY 614 Instructor: Emily Bullock, Ph.D.

Evaluation of measuring tools: reliability

RESEARCH METHODS Lecture 18

MANA 4328 Dennis C. Veit Measurement MANA 4328 Dennis C. Veit 1.

Using statistics to evaluate your test Gerard Seinhorst

The first test of validity

How can one measure intelligence?

Descriptive Statistics

Chapter 8 VALIDITY AND RELIABILITY

Presentation transcript:

MANA 5341 Dr. George Benson benson@uta.edu Measurement MANA 5341 Dr. George Benson benson@uta.edu 1

Testing: Basic Concepts The Normal Curve Many people taking a test One person taking the test many times Central tendency (Mean) Variability (Standard Deviation)

The Normal Curve Rounded Percentiles Note: Not to Scale Rounded Percentiles Standard Deviations or Z Scores .1% 2.3% 15.9% 50% 84.1% 97.7% 99.9% -3 -2 -1 0 +1 +2 +3

Variability How did an individual score compared to others? How to compare scores across different tests? Test 1 Test 2 Bob Jim Sue Linda Raw Score 49 47

Variability How did an individual score compared to others? How to compare scores across different tests? Test 1 Test 2 Bob Jim Sue Linda Raw Score 49 47 Mean 48 46

Variability How did an individual score compared to others? How to compare scores across different tests? Test 1 Test 2 Bob Jim Sue Linda Raw Score 49 47 Mean 48 46 Std. Dev 2.5 .80

Z Score or “Standard” Score Score – Mean Z Score = Std. Dev Test 1 Test 2 Bob Jim Sue Linda Raw Score 49 47 Mean 48 46 Std. Dev 2.5 .80 Z score .4 -.4 3.75 1.25

The Normal Curve Note: Not to Scale Jim Bob Linda Sue

Converting Z scores to Percentiles Look up z scores on a “standard normal table” Corresponds to proportion of area under normal curve Move decimal 2 places and you have a percentage Linda has z score of 1.25 Standard normal table = .9265 Percentile score of 92.65% Linda scored better than 92.65% of test takers Z score Percentile 3.0 99.9% 2.0 97.7% 1.0 84.1% 0.0 50.0% -1.0 15.9% -2.0 2.3% -3.0 .1%

Proportion Under the Normal Curve Note: Not to Scale Jim Bob Linda Sue

Correlation How strongly are two variables related? Correlation coefficient (r) Ranges from -1.00 to 1.00 Shared variation = r2 If two variables are correlated at r =.6 then they share .62 or 36% of the total variance. Illustrated using scatter plots Used to test consistency and accuracy of measure

Correlation Scatterplots

Reliability: Basic Concepts Observed score = true score + error Error is anything that impacts test scores that is not the characteristic being measured Reliability measures error Lower the error the better the measure Things that can be observed are easier to measure than things that are inferred

Reliability Consistency of the measure Potential contaminations If the same person takes the test again will he/she earn the same score? Potential contaminations Test takers physical or mental state Environmental factors Test forms Multiple raters

Reliability of Measures Visual acuity High Hearing Dexterity Mathematical ability Verbal ability Intelligence Clerical skills Mechanical aptitudes Sociability Cooperativeness Tolerance Emotional stability Low

Reliability Test Methods Test – retest Alternate or parallel form Internal consistency Inter-rater Methods of calculating correlations between test items, administrations, or scoring.

Summary of Types of Reliability Compare scores within T1 Compare Scores across T1 and T2 Objective Measures (Test items) Internal Consistency or Alternate Form Test-retest Subjective Ratings Interrater – Compare different Raters Intrarater – Compare same Rater different times

Reliability Test Methods Test – retest Pearson product moment correlation (the “Correlation” coefficient) Memory and learning as potential contaminations Good for single item measures Alternate or parallel form Correlation coefficient becomes the “Coefficient of Equivalence” Testing Form “equivalency”

Reliability Test Methods Internal consistency Extent to which measures are similar Single measure use “Split-half” reliability Spearman Brown Correction since you use only half Multiple measures use “Chronbach’s Alpha” Average correlation of measures with one another Inter-rater Used to test reliability of subjective measures Consistency across two raters level of agreement 80% is the target Cohen’s Kappa or Kendal’s Tau

How high should a reliability coefficient be? Test-Retest (immediate) r = .90 Split-half reliability r = .85 Test-Retest (long-term) r = .80 Chronbach’s Alpha r = .75

Standard Error of Measure (SEM) Estimate of the error for an individual test score Uses variability AND reliability to establish a confidence interval around a score 95% Confidence Interval (CI) means if one person took the test 100 times, 95 of the scores will fall within the upper and lower bounds. SEM = SD * √ (1- reliability) There is a 5% chance that scores observed outside the CI are due to chance, therefore the differences are “significant”.

The Normal Curve Rounded Percentiles Note: Not to Scale Rounded Percentiles Standard Deviations or Z Scores .1% 2.3% 15.9% 50% 84.1% 97.7% 99.9% -3 -2 -1 0 +1 +2 +3

Standard Error of Measure (SEM) Assume a mathematical ability test has a reliability of .9 and a standard deviation of 10. SEM = 10 * √ (1- .9) = 3.16 If an applicant scores a 50, the SEM is the degree to which the score would vary if she were retested on another day. Plus or minus 2 SEM gives you a ~95% confidence interval. 50 + 2(3.16) = 56.32 50 – 2(3.16) = 43.68

Standard Error of Measure The difference between two scores should not be considered significant unless the difference is twice the standard error. If an applicant scores 2 points above a passing score and the SEM is 3.16 – then there is a good chance of making a bad selection choice. If two applicants score within 2 points of one another and the SEM is 3.16 then it is possible that the difference is due to chance.

Standard Error of Measure The higher the reliability, the lower the SEM Std. Dev. r SEM 10 .96 2 .84 4 .75 5 .51 7

Confidence Intervals Jim -- 40 Mary -- 50 Jen -- 60 SEM -2 SEM +2 SEM 36 44 46 54 56 64 4 32 48 42 58 52 68 Do the applicants differ when SEM = 2? Do the applicants differ when SEM = 4?

Validation: Basic Concepts Validity is the degree to which research supports inferences made from selection test scores Accuracy of the measure Are you measuring what you intend to measure? OR Does the test measure a characteristic related to job performance?

Methods to Test Validity Criterion – test predicts job performance Criterion (criteria) vs. predictors Content – test representative of the job Construct – test of abstract trait is accurate and predictive of job performance

Types of Validity Criterion-Related Content-Related Job Performance Job Duties Selection Tests KSA’s

Conducting a Validation Study Conduct a job analysis Identify critical KSA’s Choose (or develop) predictors of the KSA’s Choose job performance criteria Administer to incumbents or applicants Correlate two administrations

Criterion-Related Validity Deficiency Validity Contamination Job Performance Test Performance

Tests of Criterion-Related Validity Predictive validity “Future Employee or Follow-up Method” Test Applicants Performance of Hires Time 1 6-12 mos. Time 2 Concurrent validity “Present Employee Method” Test Existing Employee AND Measure Performance Time 1

Assumptions Time between measures is appropriate. Job (and measures of performance) is stable over time. Sample of applicants or incumbents is representative. Large enough samples (hundreds of observations).

Content-Related Validity Deficiency Validity Contamination Job Content Test Content

Reliability vs. Validity

Validation in Practice Small business applications Validation studies require large samples and expertise Use off-the-shelf tests for common jobs Increase number of observations by broadening categories of jobs or look at KSA’s across jobs Utility Analysis Estimates the economic value of selection test decisions Need to put a dollar value on productivity Seldom used in practice

Reliability vs. Validity Validity Coefficients Reject below .11 Very useful above .21 Rarely exceed .40 Reliability Coefficients Reject below .70 Very useful above .90 Rarely approaches 1.00 Why the difference?

Principles of Assessment Uniform Guidelines For Employee Selection Procedures http://www.dol.gov/dol/allcfr/ESA/Title_41/Part_60-3/toc.htm Don’t rely on a single method. Use only fair and unbiased instruments. Use only reliable and valid instruments. Use only tools designed for a specific group. Use instruments with understandable instructions. Ensure test administration staff are properly trained. Ensure test conditions are suitable for all test takers. Provide reasonable accommodation. Maintain confidentiality of results. Ensure proper interpretation of results.

Selection Tests and Litigation Unstructured interviews Cognitive ability tests Physical ability tests Structured interviews Work sample tests Assessment Centers More likely to be challenged in court

Where to get validated measures.... Buros’ Mental Measurements Yearbook Pro-Ed Consumer’s Guide to Tests in Print http://www.shrm.org/assessment/index.asp