Introduction to Measurement Goals of Workshop Reviewing assessment concepts Reviewing instruments used in norming process Getting an overview of the.

Slides:



Advertisements
Similar presentations
Questionnaire Development
Advertisements

Standardized Scales.
Chapter 8 Flashcards.
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Standardized Tests: What Are They? Why Use Them?
Inferential Statistics
CHAPTER TWELVE ANALYSING DATA I: QUANTITATIVE DATA ANALYSIS.
Conceptualization and Measurement
The Research Consumer Evaluates Measurement Reliability and Validity
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Part II Sigma Freud & Descriptive Statistics
Table of Contents Exit Appendix Behavioral Statistics.
Reliability and Validity of Research Instruments
Chapter 4 Validity.
Beginning the Research Design
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Classroom Assessment A Practical Guide for Educators by Craig A
Standardized Test Scores Common Representations for Parents and Students.
Evaluating a Norm-Referenced Test Dr. Julie Esparza Brown SPED 510: Assessment Portland State University.
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Chapter 14 Understanding and Using Standardized Tests Viewing recommendations for Windows: Use the Arial TrueType font and set your screen area to at least.
Reading Assessments for Elementary Schools Tracey E. Hall Center for Applied Special Technology Marley W. Watkins Pennsylvania State University Frank.
The Learning Behaviors Scale
Measurement in Exercise and Sport Psychology Research EPHE 348.
Instrumentation.
Foundations of Educational Measurement
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Technical Adequacy Session One Part Three.
Chapter 3 Understanding Test Scores Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition.
Chapter 1: Research Methods
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Thinking About Psychology: The Science of Mind and Behavior 2e Charles T. Blair-Broeker Randal M. Ernst.
Chapter 2 AP Psychology Outline
Chapter 4: Test administration. z scores Standard score expressed in terms of standard deviation units which indicates distance raw score is from mean.
Reliability & Validity
Research Methodology Lecture No :24. Recap Lecture In the last lecture we discussed about: Frequencies Bar charts and pie charts Histogram Stem and leaf.
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Psychology’s Statistics Module 03. Module Overview Frequency Distributions Measures of Central Tendency Measures of Variation Normal Distribution Comparative.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Selecting a Sample. Sampling Select participants for study Select participants for study Must represent a larger group Must represent a larger group Picked.
Thinking About Psychology The Science of Mind and Behavior 3e Charles T. Blair-Broeker & Randal M. Ernst PowerPoint Presentation Slides by Kent Korek Germantown.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
STATISTICS STATISTICS Numerical data. How Do We Make Sense of the Data? descriptively Researchers use statistics for two major purposes: (1) descriptively.
Chapter 6 - Standardized Measurement and Assessment
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Psychology’s Statistics Appendix. Statistics Are a means to make data more meaningful Provide a method of organizing information so that it can be understood.
Definition Slides Unit 2: Scientific Research Methods.
Definition Slides Unit 1.2 Research Methods Terms.
Measurement and Scaling Concepts
ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.
Reliability and Validity
Concept of Test Validity
Assessment Theory and Models Part II
Understanding Results
Journalism 614: Reliability and Validity
Reliability & Validity
CHAPTER 2: PSYCHOLOGICAL RESEARCH METHODS AND STATISTICS
Human Resource Management By Dr. Debashish Sengupta
Analyzing Reliability and Validity in Outcomes Assessment Part 1
RESEARCH METHODS Lecture 18
Thinking About Psychology The Science of Mind and Behavior 3e
Week Three Review.
Understanding and Using Standardized Tests
Analyzing Reliability and Validity in Outcomes Assessment
Chapter 8 VALIDITY AND RELIABILITY
Presentation transcript:

Introduction to Measurement

Goals of Workshop Reviewing assessment concepts Reviewing instruments used in norming process Getting an overview of the secondary and elementary normative samples Learning how to use the manuals in interpreting students scores.

ASSESSMENT The process of collecting data for the purpose of making decisions about students Its a process and typically involves multiple sources and methods. Assessment is in service of a goal or purpose. The data we collect will be used to support some type of decision (e.g., monitoring, intervention, placement)

Major Types of Assessment in Schools More frequently used: –Achievement: how well is child doing in curriculum? –Aptitude: what is this childs intellectual and other capabilities? –Behavior: Is the childs behavior affecting learning? Less frequently used: –Teacher competence: Is teacher actually imparting knowledge? –Classroom environment: Are classroom conditions conducive to learning? –Other concerns: home, community,...

Types of Tests Norm-referenced –Comparison of performance to a specified population/set of individuals Individually-referenced –Comparisons to self Criterion-referenced –Comparison of performance to mastery of a content area; what does the student know? The data in the manual will allow you to do look at norms and at individual growth.

MAJOR CONCEPTS Nomothetic and Idiographic Samples Norms Standardized Administration Reliability Validity

Nomothethic Relating to the abstract, the universal, the general. Nomothetic assessment focuses on the group as a unit. Refers to finding principles that are applicable on a broad level. For example, boys report higher math self- concepts than girls; girls report more depressive symptoms than boys..

Idiographic Relating to the concrete, the individual, the unique Idiographic assessment focuses on the individual student What type of phonemic awareness skills does Joe possess?

Populations and Samples I A population consists of all the representatives of a particular domain that you are interested in The domain could be people, behavior, curriculum (e.g. reading, math, spelling,...

Populations and Samples II A sample is a subgroup that you actually draw from the population of interest Ideally, you want your sample to represent your population –people polled or examined, test content, manifestations of behavior

Random Samples A sample in which each member of the population had an equal and independent chance of being selected. Random samples are important because the idea is to have a sample that represents the population fairly; an unbiased sample. A sample can be used to represent the population.

Probability Samples I Sampling in which elements are drawn according to some known probability structure. Random samples are subcases of probability samples. Probability samples are typically used in conjunction with subgroups (e.g., ethnicity, socioeconomic status, gender).

Probability Samples II Probability samples using subgroups are also referred to as stratified samples. Standardization samples are typically probability or stratified samples. Standardization samples need to represent population because the samples results will be used to create norms against which all members of population will be compared.

Norms I Norms are examples of how the average individual performs. Many of the tests and rating scales that are used to compare children in the US are norm-referenced. –An individual childs performance is compared to the norms established using a representative sample.

Norms II For the score on a normed instrument to be valid, the person being assessed must belong to the population for which the test was normed If we wish to apply the test to another group of people, we need to establish norms for the new group

Norms III To create new norms, we need to do a number of things: –Get a representative sample of new population –Administer the instrument to the sample in a standardized fashion. –Examine the reliability and validity of the instrument with that new sample –Determine how we are going to report on scores and create the appropriate tables

Standardized Administration All measurement has error. Standardized administration is one way to reduce error due to examiner/clinician effects. For example, consider these questions with different facial expressions and tone: Please define a noun for me :-) DEFINE a noun if you can ? :- (

Distributions Any group of scores can arranged in a distribution from highest to lowest 10, 3, 31, 100, 17, 4 3, 4, 10, 17, 31, 100

Normal Curve Many distributions of human traits form a normal curve Most cases cluster near middle, with fewer individuals at extremes; symmetrical We know how the population is distributed based on the normal curve

Ways of reporting scores Mean, standard deviation Distribution of scores –68.26% ± 1; ± 2; ±3 Stanines (1, 2, 3, 4, 5, 6, 7, 8, 9) Standard scores - linear transformations of scores, but easier to interpret Percentile ranks* Box and Whisker Plots*

Percentiles A way of reporting where a person falls on a distribution. The percentile rank of a score tells you how many people obtained a score equal to or lower than that score. So if we have a score at the 23rd %tile and another at the 69th %tile, which score is higher?

Percentiles 2 Is a high percentile always better than a low percentile? It depends on what you are measuring. For example…. Box and whisker plots are visual displays r graphic representation of the shape of a distribution using percentiles.

Correlation We need to understand the correlation coefficient to understand the manual The correlation coefficient, r, quantifies the relationship between two sets of scores. A correlation coefficient can have a range from -1 to + 1. –Zero means the two sets of scores are not related. –One means the two sets of scores are identical (a perfect correlation)

Correlation 2 Correlations can be positive or negative. A + correlation tells us that as one set of scores increases, the second set of scores also increases. they can be negative. Examples? A negative correlation tells us that as one set of scores increases, the other set decreases. Think of some examples of variables with negative rs. The absolute value of a correlation indicates the strength of the relationship. Thus.55 is equal in strength to -.55.

How would you describe the correlations shown by these charts?

Correlation 4.25,.70, -.40,.55, -.87,.58,.05 Order these from strongest to weakest -.87,.70,.58,.57, -.40,.25,.05 We will meet 3 different types of correlation coefficients today: Reliability coefficients - Definitions? Validity coefficients Pattern coefficients

Reliability Reliability addresses the stability, consistency, or reproducibility of scores. –Internal consistency –Split half, Cronbachs alpha –Test-retest –Parallel forms –Inter-rater

Reliability 2 Internal Consistency –How do the items on a scale relate to one another? Are respondents relating to them in the same way? Test-retest –How do respondents scores at Time 1 relate to their scores at Time 2?

Reliability 3 Parallel forms –Begin by creating at least two versions of the exam. How do respondents performance on one version compare to their performance on another version Inter-rater –Connected to ratings of behavior. How does one raters scores compare to anothers?

Validity Validity addresses the accuracy or truthfulness of scores. Are they measuring what we want them to? –Content –Criterion - Concurrent –Criterion - Predictive –Construct –Face

Content Validity Is the assessment tool representative of the domain (behavior, curriculum) being measured? An assessment tool is scrutinized for its (a) completeness or representativeness, (b) appropriateness, (c) format, and (d) bias –E.g., MSPAS

Criterion-related Validity What is the correlation between our instrument, scale, or test and another variable that measures the same thing, or measures something that is very close to ours? In concurrent validity, we compare scores on the instrument we are validating to scores on another variable that are obtained at the same time. In predictive validity, we compare scores on the instrument we are validating to scores on another variable that are obtained at some future time.

Structural Validity Used when an instrument has multiple scales. Asks the question, Which items go together best? For example, how would you group these items from the Self-Description Questionnaire? 3. I am hopeless in English classes. 5. Overall, I am no good. 7. I look forward to mathematics class. 15. I feel that my life is not very useful. 24. I get good marks in English. 28. I hate mathematics.

Structural Validity 2 We expect the English items (3, 24), Math items (7, 28) and global items (5, 15) to group together. The items that group together make up a new composite variable we call a factor. We want each item to correlate highly with the factor it clusters on, and less well with other factors. Typically, we accept item-factor coefficients from about.30 and higher.

What can we say about the structural validity of the SDQ given these scores? Item #VerbalMathGlobal

Construct Validity Overarching construct: Is the instrument measuring what it is supposed to? –Dependent on reliability, content and criterion-related validity. We also look at some other types of validity evidence some times –Convergent validity: r with similar construct –Discriminant validity: r with unrelated construct –Structural validity: What is the structure of the scores on this instrument?

Statistical Significance When we examine group differences in science, we want to make objective rather than subjective decisions. We use statistics to let us know if the difference we are observing occurs by chance. In psychology, we typically set our alpha or error rate at 5% (i.e.,.05), and we conclude that if a difference was likely less than 5% of the time, that difference is statistically significant.

Statistical Significance 2 When our statistical test tells us that our difference is statistically significant (i.e., <.05). Statistical significance is affected by a number of variables, including sample size. The larger the sample, the easier it is to achieve statistical significance. We also look at the magnitude of the difference (or effect size). A difference may be statistically significant, but have a small effect size..10 to. 30 = small effect;.40 to.60 = medium effect; >.60 = large effect.