Psy 427 Cal State Northridge Andrew Ainsworth PhD Cal State Northridge - Psy 4271.

Slides:

Advertisements

Similar presentations

Assessing Student Performance

Advertisements

Test Development.

Taking Stock Of Measurement. Basics Of Measurement Measurement: Assignment of number to objects or events according to specific rules. Conceptual variables:

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

Part II Sigma Freud & Descriptive Statistics

Item Response Theory in Health Measurement

Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.

AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova

CH. 9 MEASUREMENT: SCALING, RELIABILITY, VALIDITY

Item Analysis What makes a question good??? Answer options?

LECTURE 4 ATTITUDE SCALING.

Multiple Choice Test Item Analysis Facilitator: Sophia Scott.

Quantitative Research

PSY 307 – Statistics for the Behavioral Sciences

Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.

Multivariate Methods EPSY 5245 Michael C. Rodriguez.

Measures of Central Tendency

Measurement and Data Quality

Reliability, Validity, & Scaling

Understanding Research Results

Assessment Report Department of Psychology School of Science & Mathematics D. Abwender, Chair J. Witnauer, Assessment Coordinator Spring, 2013.

1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.

MEASUREMENT OF VARIABLES: OPERATIONAL DEFINITION AND SCALES

Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.

Technical Adequacy Session One Part Three.

Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.

User Study Evaluation Human-Computer Interaction.

Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.

Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)

Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.

Lab 5: Item Analyses. Quick Notes Load the files for Lab 5 from course website –

6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)

Learning Objective Chapter 9 The Concept of Measurement and Attitude Scales Copyright © 2000 South-Western College Publishing Co. CHAPTER nine The Concept.

MEASUREMENT: SCALE DEVELOPMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 13 Data Collection in Quantitative Research.

Writing Multiple Choice Questions. Types Norm-referenced –Students are ranked according to the ability being measured by the test with the average passing.

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Class Final Exam & Project Review.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.

Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”

Appraisal and Its Application to Counseling COUN 550 Saint Joseph College Ability, Intelligence, Aptitude and Achievement Testing For Class #12 Copyright.

Chapter 10 Copyright © Allyn & Bacon 2008 This multimedia product and its contents are protected under copyright law. The following are prohibited by law:

©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.

Tests and Measurements

Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.

Chapter 6 - Standardized Measurement and Assessment

Response Processes Psych DeShon. Response Elicitation Completion Completion Requires production Requires production Allows for creative responses.

Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.

Dan Thompson Oklahoma State University Center for Health Science Evaluating Assessments: Utilizing ExamSoft’s item-analysis to better understand student.

Assessment Assessment is the collection, recording and analysis of data about students as they work over a period of time. This should include, teacher,

Educational Research Chapter 8. Tools of Research Scales and instruments – measure complex characteristics such as intelligence and achievement Scales.

Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.

5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)

 Good for:  Knowledge level content  Evaluating student understanding of popular misconceptions  Concepts with two logical responses.

Statistical Significance

Ch. 5 Measurement Concepts.

ARDHIAN SUSENO CHOIRUL RISA PRADANA P.

Assessment Theory and Models Part II

Classroom Analytics.

Test Development Test conceptualization Test construction Test tryout

Using statistics to evaluate your test Gerard Seinhorst

Correlation and the Pearson r

ECC511 Research Methods in Education Dr. Jill Taggart, 2014

Tests are given for 4 primary reasons.

Presentation transcript:

Psy 427 Cal State Northridge Andrew Ainsworth PhD Cal State Northridge - Psy 4271

 Dichotomous Format  Two alternatives  True/False  MMPI/2; MMPI/A  Polytomous or Polychotomous Format  More than two alternatives  Multiple choice  Psy427 Midterm, SAT, GRE, Cal State Northridge - Psy 4272

 Distractors  Item Formats ▪ Incorrect choices on a polychotomous test ▪ Best to have three or four  BUT - ▪ one study (Sidick, Barret, & Doverspike, 1994) found equivalent validity and reliability for a test with two distractors (three items) as one with four distractors (five items).  SO, best might be to have two to four (further study is needed) Cal State Northridge - Psy 4273

 Depends… Correction for guessing:  R is the number correct  W is the number incorrect  n is the number of polytomous choices  If no correction for guessing, guess away.  If there is a correction for guessing, better to leave some blank (unless you can beat the odds) Cal State Northridge - Psy 4274

 Likert scales  On a rating scale of 1-5, or 1-6, 1-7, etc. where  1 = strongly disagree  2 = moderately disagree  3 = mildly disagree  4 = mildly agree  5 = moderately agree  6 = strongly agree  rate the following statements…. Cal State Northridge - Psy 4275

 Likert scales  Even vs. odd number of choices ▪ Even numbers prevents “fence-sitting” ▪ Odd numbers allows people to be neutral  Likert items are VERY popular measurement items in psychology.  Technically ordinal but are often assumed continuous if 5 or more choices  With that assumption we can calculate means, factor analyze, etc. Cal State Northridge - Psy 4276

 Category format  Like Likert, but with MANY more categories ▪ e.g., 10-point scale  Best if used with anchors  Research supports use of 7-point scales to 21- point scales Cal State Northridge - Psy 4277

 Visual Analogue Scale No Headache Worst Headache  Also used in research  dials, knobs  time sampling Cal State Northridge - Psy 4278

 Both used in qualitative research as well as quantitative research  Checklists  Present list of words (adjectives)  Have person choose to endorse each item  Can determine perceptions of concepts using checklists. Cal State Northridge - Psy 4279

 Adjective Checklists (from )  In psychometrics, any list of adjectives that can be marked as applicable or not applicable ▪ to oneself ▪ to one's ideal self ▪ to another person, OR ▪ to some other entity or concept. Cal State Northridge - Psy 42710

 Checklists  When written with initial uppercase letters (ACL), the term denotes more specifically a measure consisting of a list of 300 adjectives, from absent-minded to zany  Selected by the US psychologist Harrison G. Gough (born 1921) and introduced as a commercial test in  The test yields 24 scores, including measures of personal adjustment, self-confidence, self-control, lability, counselling readiness, some response styles, and 15 personality needs, such as achievement, dominance, and endurance. Cal State Northridge - Psy 42711

 Q-Sorts  Introduced by William Stephenson in 1935 ▪ PhD in physics 1926; PhD in psychology in 1929 ▪ Student of Charles Spearman  Goal: to get a quantitative description of a person’s perceptions of a concept  Process: give subject a pile of numbered “cards” & have them sort them into piles  Piles represent graded degrees of description (most descriptive to least descriptive). Cal State Northridge - Psy 42712

 Q-Sorts  Means of self-evaluation of client’s current status  The Q-Sort consists of a number of cards, often as many as 40 or 50, even 100 items each consisting of a single trait, belief, or behavior.  The goal is to sort these cards into one of five columns ranging from statements such as, ‘very much like me’ to ‘not at all like me.’  There are typically a specific number of cards allowed for each column, forcing the client to balance the cards evenly.  Example:  California Q-sort, Attachment Q-sort Cal State Northridge - Psy 42713

Cal State Northridge - Psy 42714

Cal State Northridge - Psy 42715

Cal State Northridge - Psy Attachment Q-sort Distribution (number of items per pile designated)

 Methods used to evaluate test items.  What are good items?  Techniques  Item Difficulty (or easiness)  Discriminability ▪ Extreme Group ▪ Item/Total Correlation  Item Characteristic Curves  Item Response Theory  Criterion-Referenced Testing Cal State Northridge - Psy 42717

 The proportion of people who get a particular item correct or that endorse an item (if there is no “correct” response, e.g. MMPI)  Often thought of as the item’s easiness because it is based on the number correct/endorsed Cal State Northridge - Psy 42718

 The difficulty can be given in proportion for or it can be standardized in to a Z-value Cal State Northridge - Psy 42719

 For example a test with the difficulty of.84 Cal State Northridge - Psy 42720

If you are taking a criterion referenced test in a social psychology course and you need to score a 92 in order to get an A, the criterion is a) Social Psychology * b) Scoring a 92 c) Getting an A d) Not enough info. Cal State Northridge - Psy 42721

Cal State Northridge - Psy 42722

The correlation between X and is.54. X has a SD of 1.2 and Y has a SD of 5.4. What is the regression coefficient (b) when Y is predicted by X? a).12 b) 2.43* c).375 d).45 Cal State Northridge - Psy 42723

Cal State Northridge - Psy 42724

 For the following set of data [ ], the mean is a) 4 b) 5* c) 4.5 d) 6 Cal State Northridge - Psy 42725

Cal State Northridge - Psy 42726

 Mathematically: half-way between chance and 100%.  Steps (assuming a 5-choice test) 1. Find half-way between 100% and chance ▪ =.8,.8/2 =.4 2. Add this value to chance alone ▪ =.6  Alternately: Chance / 2 = optimum difficulty  A good test will have difficulty values between.30 and.70 Cal State Northridge - Psy 42727

 Can be defined in 2 ways: 1. How well does each item distinguish (discriminate) between individuals who are scoring high and low on the test as a whole (e.g. the trait of interest). 2. Or simply how well is each item related to the trait (e.g. loadings in factor analysis)  1 and 2 are really the same the more an item is related to the trait the better it can distinguish high and low scoring individuals Cal State Northridge - Psy 42728

 Extreme Group Method  First ▪ Identify two “extreme” groups ▪ Top third vs. bottom third  Second ▪ Compute “Difficulty” for the top group ▪ Compute “Difficulty” for the bottom group ▪ Compute the difference between Top Difficulty and Bottom Difficulty ▪ Result = Discriminability Index Cal State Northridge - Psy 42729

Cal State Northridge - Psy 42730

 Item/Total Correlation  Let the total test score “stand in” for the trait of interest; a roughly estimated “factor” of sorts  Correlate each item with the total test score; items with higher item/total correlations are more discriminating  These correlations are like rough factor loadings Cal State Northridge - Psy 42731

 Point Biserial Method  If you have dichotomous scored items (e.g. MMPI) or items with a correct answer  Correlate the proportion of people getting each item correct with total test score.  One dichotomous variable (correct/incorrect) correlated with one continuous variable (total score) is a Point-Biserial correlation  Measures discriminability Cal State Northridge - Psy 42732

 Point Biserial Method Cal State Northridge - Psy 42733

 The discimination can be standardized in to a Z-value as well Cal State Northridge - Psy 42734

 The discimination can be standardized in to a Z-value as well Cal State Northridge - Psy 42735

Cal State Northridge - Psy 42736

 Using Difficulty and Discrimination together Cal State Northridge - Psy 42737

 A graph of the proportion of people getting each item correct, compared to total scores on the test.  Ideally, lower test scores should go along with lower proportions of people getting a particular item correct.  Ideally, higher test scores should go along with higher proportions of people getting a particular item correct. Cal State Northridge - Psy 42738

Cal State Northridge - Psy 42739

Cal State Northridge - Psy 42740

Cal State Northridge - Psy 42741

Cal State Northridge - Psy 42742

Cal State Northridge - Psy 42743

Cal State Northridge - Psy 42744

Cal State Northridge - Psy 42745

Cal State Northridge - Psy 42746

Cal State Northridge - Psy 42747

Cal State Northridge - Psy 42748

Cal State Northridge - Psy 42749

Cal State Northridge - Psy 42750

Cal State Northridge - Psy 42751

Cal State Northridge - Psy 42752

Cal State Northridge - Psy 42753

Cal State Northridge - Psy 42754

Cal State Northridge - Psy 42755

Cal State Northridge - Psy 42756

Cal State Northridge - Psy 42757

Cal State Northridge - Psy 42758

Cal State Northridge - Psy 42759

Cal State Northridge - Psy 42760

Cal State Northridge - Psy 42761

Cal State Northridge - Psy 42762

Cal State Northridge - Psy 42763

Cal State Northridge - Psy 42764

Cal State Northridge - Psy 42765

 Item Response Theory  viewing item response curves at different levels of difficulty  Looks at standard error at different ranges of the trait you are trying to measure  More on this in the next topic Cal State Northridge - Psy 42766

 Criterion-Referenced Tests  Instead of comparing a score on a test or scale to other respondents’ scores we can compare each individual to what they “should have scored”.  Requires that there is a set objective in order to assess whether the objective has been met  E.g. In intro stats students should learn how to run an independent samples t-test a criterion referenced test could be used to test this. This needs to be demonstrated before moving on to another objective. Cal State Northridge - Psy 42767

 Criterion-Referenced Tests  To evaluate CRT items ▪ Give the test to 2 groups one exposed to the material and one that has not seen the material ▪ Distribute the scores for the test in a frequency polygon ▪ The antimode (leasts frequent value) represents the cut score between those who were exposed to the material and those who weren’t ▪ Scores above the cut score are assumed to have mastered the material, and vice versa Cal State Northridge - Psy 42768

Cal State Northridge - Psy 42769

 Criterion-Referenced Tests  Often used with Mastery style learning ▪ Once a student indicates they’ve “mastered” the material he/she moves on to the next “module” of material ▪ If they do not pass the cut score for mastery they receive more instruction until they can master the material Cal State Northridge - Psy 42770