Chapter 6: Selecting Measurement Instruments

Slides:



Advertisements
Similar presentations
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Advertisements

Topics: Quality of Measurements
The Research Consumer Evaluates Measurement Reliability and Validity
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
VALIDITY AND RELIABILITY
Chapter 5 Measurement, Reliability and Validity.
Part II Sigma Freud & Descriptive Statistics
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Concept of Measurement
FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,
Educational Research: Instruments (“caveat emptor”)
Classroom Assessment A Practical Guide for Educators by Craig A
Chapter 5 Selecting Measuring Instruments Gay, Mills, and Airasian
Technical Issues Two concerns Validity Reliability
Measurement and Data Quality
Instrumentation.
Foundations of Educational Measurement
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
SELECTION OF MEASUREMENT INSTRUMENTS Ê Administer a standardized instrument Ë Administer a self developed instrument Ì Record naturally available data.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Selecting a Sample. Sampling Select participants for study Select participants for study Must represent a larger group Must represent a larger group Picked.
Measurement and Questionnaire Design. Operationalizing From concepts to constructs to variables to measurable variables A measurable variable has been.
CHAPTER OVERVIEW The Measurement Process Levels of Measurement Reliability and Validity: Why They Are Very, Very Important A Conceptual Definition of Reliability.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
Psychometrics. Goals of statistics Describe what is happening now –DESCRIPTIVE STATISTICS Determine what is probably happening or what might happen in.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
SECOND EDITION Chapter 5 Standardized Measurement and Assessment
Chapter 6 - Standardized Measurement and Assessment
Educational Research Chapter 8. Tools of Research Scales and instruments – measure complex characteristics such as intelligence and achievement Scales.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Educational Research Chapter 5 Selecting Measuring Instruments Gay and Airasian.
© 2009 Pearson Prentice Hall, Salkind. Chapter 5 Measurement, Reliability and Validity.
Measurement and Scaling Concepts
ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.
Survey Methodology Reliability and Validity
Chapter 2 Theoretical statement:
Data and the Nature of Measurement
Ch. 5 Measurement Concepts.
Assessing Personality
Product Reliability Measuring
MEASUREMENT: RELIABILITY AND VALIDITY
Reliability and Validity in Research
Concept of Test Validity
Assessment Theory and Models Part II
Measurement: Part 1.
Associated with quantitative studies
Test Validity.
RELIABILITY OF QUANTITATIVE & QUALITATIVE RESEARCH TOOLS
Validity and Reliability
CHAPTER 5 MEASUREMENT CONCEPTS © 2007 The McGraw-Hill Companies, Inc.
Reliability & Validity
Research Methods: Concepts and Connections First Edition
Writing Survey Questions
Measuring Social Life: Qualitative & Quantitative Measurement
5. Reliability and Validity
Chapter Eight: Quantitative Methods
PSY 614 Instructor: Emily Bullock, Ph.D.
Measurement: Part 1.
RESEARCH METHODS Lecture 18
Assessing Personality
Understanding and Using Standardized Tests
Measurement Concepts and scale evaluation
Measurement: Part 1.
TESTING AND EVALUATION IN EDUCATION GA 3113 lecture 1
Ch 5: Measurement Concepts
Chapter 8 VALIDITY AND RELIABILITY
Chapter 3: How Standardized Test….
Presentation transcript:

Chapter 6: Selecting Measurement Instruments Objectives State the relation between a variable and a construct, and distinguish among categories of variables (e.g., categorical and quantitative; dependent and independent) and the scales to measure them (e.g., nominal, ordinal, interval, and ratio). Define measurement, and describe ways to interpret measurement data.

Selecting Measurement Instruments Objectives Describe the types of measuring instruments used to collect data in qualitative and quantitative studies (e.g., cognitive, affective, and projective tests). Define validity, and differentiate among content, criterion-related, construct, and consequential validity.

Selecting Measurement Instruments Objectives Explain how to measure reliability, and differentiate among stability, equivalence, equivalence and stability, internal consistency, and scorer/rater reliability. Identify useful sources of information about specific tests, and provide strategies for test selection. Provide guidelines for test construction and test administration.

Data & Constructs Data are the pieces of information you collect and use to examine your topic. You must determine what type of data to collect. A construct is an abstraction that cannot be observed directly but is invented to explain behavior. e.g., intelligence, motivation, ability

Constructs & Variables Constructs must be operationally- defined to be observable and measurable. Variables are operationally-defined constructs. Variables are placeholders that can assume any one of a range of values. Variables may be measured by instruments.

Measurement Scales The measurement scale is a system for organizing data. Knowing your measurement scale is necessary to determine the type of analysis you will conduct.

Measurement Scales Nominal variables describe categorical data. e.g., gender, political party affiliation, school attended, marital status Nominal variables are qualitative. Quantitative variables range on a continuum with ordinal, interval, and ratio variables.

Measurement Scales Ordinal variables describe rank order with unequal units. e.g., order of finish, ranking of schools or groups as levels Interval variables describe equal intervals between values. e.g., achievement, attitude, test scores

Measurement Scales Ratio variables describe all of the characteristics of the other levels but also include a true zero point. e.g., total number of correct items on a test, time, distance, weight

Independent & Dependent Variables Dependent variables are those believed to depend on or to be caused by another variable. Dependent variables are also called criterion variables. Independent variables are the hypothesized cause of the dependent variable. There must be at least two levels of an independent variable. Independent variables are also called an experimental variables, manipulated variables, or treatment variables.

Characteristics of Instruments There are three major ways for researchers to collect data. A researcher can administer a standardized test. e.g., an achievement test A researcher can administer a self-developed instrument. e.g., a survey you might develop A researcher can record naturally-occurring events or use already available data. e.g., recording off-task behavior of a student in a classroom

Instruments Using standardized instruments takes less time than developing an instrument. With standardized instruments, results from different studies that use the same instrument can be compared. At times researchers may need to develop their own instruments. To effectively design an instrument one needs expertise and time.

Instruments Tests are a formal systematic procedure for gathering information about people. Cognitive characteristics (e.g., thinking, ability) Affective characteristics (e.g., feelings, attitude)

Instruments A standardized test is administered, scored, and interpreted the same way across administrations. e.g., ACT or SAT or Stanford Achievement test

Instruments Assessment refers to the process of collecting, synthesizing, and interpreting information, including data from tests as well as from observations. Formal or informal Numerical or textual Measurement is the process of quantifying or scoring assessment information. Occurs after data collection

Instruments Qualitative researchers often use interviews and observations. Quantitative researchers often use paper and pencil (or electronic) methods. Selection methods: The respondent selects from possible answers (e.g., multiple choice test). Supply methods: The respondent has to provide an answer (e.g., essay items).

Instruments Performance assessments emphasize student process and require creation of a product (e.g., completing a project).

Interpreting Instrument Data Raw Score Number or point value of items correct (e.g., 18/20 items correct). Norm-referenced scoring Student’s performance is compared with performance of others (e.g., grading on a curve).

Interpreting Instrument Data Criterion-referenced scoring Student’s performance is compared to preset standard (e.g., class tests). Self-referenced scoring How individual student’s scores change over time is measured (e.g., speeded math facts tests).

Types of Instruments Cognitive tests measure intellectual processes (e.g., thinking, memorizing, calculating, analyzing). Standardized tests measure individual’s current proficiency in given areas of knowledge or skill. Standardized tests are often given as a test battery (e.g., Iowa test of basic skills, CTBS).

Types of Instruments Diagnostic tests provide scores to facilitate identification of strengths and weaknesses (e.g., tests given for diagnosing reading disabilities). Aptitude tests measure prediction or potential versus what has been learned (e.g., Wechsler Scales).

Affective Instruments Affective tests measure affective characteristics (e.g., attitude, emotion, interest, personality). Attitude scales measure what a person believes or feels. Likert scales measure agreement on a scale. Strongly agree, Agree, Undecided, Disagree, Strongly disagree

Affective Instruments Semantic differential scales require the individual to indicate attitude by position on a scale. Fair Unfair 3 2 1 0 -1 -2 -3 Rating scales may require a participant to check the most appropriate description. 5=always; 4=almost always, 3=sometimes… The Thurstone Scale & Guttman Scales are also used to measure attitudes.

Additional Inventories Interest inventories assess personal likes and dislikes (e.g., occupational interest inventories). Values tests assess the relative strength of a person’s values (e.g., Study of Values instrument).

Additional Inventories Personality inventories provide participants with statements that describe behaviors characteristic of given personality traits and the participant answers each statement (e.g., MMPI). Projective tests were developed to eliminate some of the concerns with self-report measures. These tests are ambiguous so that presumably the respondent will project true feelings (e.g., Rorschach).

Criteria for Good Instruments Validity refers to the degree that the test measures what it is supposed to measure. Validity is the most important test characteristic.

Criteria for Good Instruments There are numerous established validity standards. Content validity Criterion-related validity Concurrent validity Predictive validity Construct validity Consequential validity

Content Validity Content validity addresses whether the test measures the intended content area. Content validity is an initial screening type of validity. Content validity is sometimes referred to as Face Validity. Content validity is measured by expert judgment (content validation).

Content Validity Content validity is concerned with both: Item validity: Are the test items measuring the intended content? Sampling validity: Do the items measure the content area being tested? One example of a lack of content validity is a math test with heavy reading requirements. It may not only measure math but also reading ability and is therefore not a valid math test.

Criterion-Related Validity Criterion-related validity is determined by relating performance on a test to performance on an alternative test or other measure. Correlation coefficients are used to determine relative validity.

Criterion-Related Validity Two types of criterion-related validity include: Concurrent: The scores on a test are correlated to scores on an alternative test given at the same time (e.g., two measures of reading achievement). Predictive: The degree to which a test can predict how well a person will do in a future situation, e.g., GRE, (with predictor represented by GRE score and criterion represented as success in graduate school).

Construct Validity Construct validity is the most important form of validity. Construct validity assesses what the test is actually measuring. It is very challenging to establish construct validity.

Construct Validity Construct validity requires confirmatory and disconfirmatory evidence. Scores on tests should relate to scores on similar tests and NOT relate to scores on other tests. For example, scores on a math test should be more highly correlated with scores on another math test than they are to scores from a reading test.

Consequential Validity Consequential validity refers to the extent to which an instrument creates harmful effects for the user. Some tests may harm the test taker. For example, a measure of anxiety may make a person more anxious.

Validity Some factors that threaten validity include: Unclear directions Confusing or unclear items Vocabulary or required reading ability too difficult for test takers Subjective scoring Cheating Errors in administration

Self-Report Instruments There are some concerns with data derived from self-report instruments. One concern is response set, or the tendency for a participant to respond in a certain way (e.g., social desirability). Bias may also play a role in self-report instruments (e.g., cultural norms).

Reliability Reliability refers to the consistency of an instrument to measure a construct. Reliability is expressed as a reliability coefficient based upon a correlation. Reliability coefficients should be reported for all measures. Reliability affects validity. There are several forms of reliability.

Reliability Test-Retest (Stability) reliability measures the stability of scores over time. To assess test-retest reliability, a test is given to the same group twice and a correlation is taken between the two scores. The correlation is referred to Coefficient of Stability.

Reliability Alternate forms (Equivalence) reliability measures the relationship between two versions of a test that are intended to be equivalent. To assess alternate forms reliability, both tests are given to the same group and the scores on each test are correlated. The correlation is referred to as the Coefficient of Equivalence.

Reliability Equivalence and stability reliability is represented by the relationship between equivalent versions of a test given at two different times. To assess equivalence and stability reliability, first one test is given, after a time a similar test is given, and the scores are correlated. The correlation is referred to as the Coefficient of Stability and Equivalence.

Reliability Internal Consistency reliability represents the extent to which items in a test are similar to one another. Split-half: The test is divided into halves and a correlation is taken between the scores on each half. Coefficient alpha and Kuder-Richardson measure the relationship between and among all items and total scale of a test.

Reliability Scorer and rater reliabilities reflect the extent to which independent scorers or a single scorer over time agree on a score. Interjudge (inter-rater) reliability: Consistency of two or more independent scorers. Intrajudge (intra-rater) reliability: Consistency of one person over time.

Reliability Standard Error of Measurement is an estimate of how often one can expect errors of a given size in an individual’s test score. SEm=SD * SQT 1-r Sem=Standard error of measurement SD=Standard deviation of the test scores r=the reliability coefficient

Selecting a Test Once your have defined the purpose for your study: Determine the type of test that you need. Identify and locate appropriate tests. Determine which test to use after a comparative analysis.

Selecting a Test There are several locations where one can obtain information and reviews about available tests. These are a good place to start when selecting a test. MMY: The Mental Measurements Yearbook is the most comprehensive source of test information Pro-Ed Publications ETS Test Collection Database Professional Journals Test publishers and distributors

Selecting a Test When comparing tests you have located and you are deciding which to use attend to each of the following: First, examine validity. Next, consider reliability. Consider ease of test use. Assure participants have not been previously exposed to the test. Assure sensitive information is not unnecessarily included.