Evaluation Metrics II February 12, 2010. Today’s Class Evaluation Metrics I Last Week’s Probing Question Evaluation Metrics II Next Friday’s Probing Question.

Slides:



Advertisements
Similar presentations
Wednesday night seminar. the three week plan Tonight - Discuss first 3 chapters Parent your teen as if he or she is a child Treat your teen as if he or.
Advertisements

COGNITION. Cognition Questions Do you have difficulty remembering or concentrating? Split Interviews: How often do you have difficulty remembering important.
Meta-Cognition, Motivation, and Affect PSY504 Spring term, 2011 April 20, 2011.
Standardized Scales.
Presentation Format Remember to include:
Psychometric Aspects of Linking Tests to the CEF Norman Verhelst National Institute for Educational Measurement (Cito) Arnhem – The Netherlands.
Telling lies * Things to think about * What are lies?
Measurement Reliability and Validity
Surveys and Questionnaires. How Many People Should I Ask? Ask a lot of people many short questions: Yes/No Likert Scale Ask a smaller number.
Web E’s goal is for you to understand how to create an initial interaction design and how to evaluate that design by studying a sample. Web F’s goal is.
Primary and Secondary Data
What children think about having a thyroid disorder: a small scale study By Shannon Davidson Age 10.
Qualitative Methods Part One January 20, Today’s Class Probing Question for today Qualitative Methods Probing Question for next class.
Evaluation Metrics February 12, A break in the usual order of things… Today’s Probing Question will be discussed later in the class rather than.
Questions:  Are behavioral measures less valid and less reliable due to the amount of error that can occur during the tests compared to the other measures?
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
Non-Experimental designs: Developmental designs & Small-N designs
Research Methods for the Learning Sciences Ken Koedinger Phil Pavlik TA: Ben Shih Lecture 3 Experimental Design.
By Arnold Goldstein and Ellen McGinnis
thinking hats Six of Prepared by Eman A. Al Abdullah ©
© Curriculum Foundation1 Section 2 The nature of the assessment task Section 2 The nature of the assessment task There are three key questions: What are.
Socratic Seminar “The unexamined life is not worth living.”
Mixed-level English classrooms What my paper is about: Basically my paper is about confirming with my research that the use of technology in the classroom.
Thinking Actively in a Social Context T A S C.
SKILLS AND TECHNIQUES HOMEWORK DUE IN TODAY Higher/Intermediate 2 Physical Education.
Meta-Cognition, Motivation, and Affect PSY504 Spring term, 2011 January 13, 2010.
Evidence Based Medicine
Chapter 1: Research Methods
Monsoureh-Rezasoltani Ismail Said Methods for Evaluating Responses of Children with Outdoor Environments November st National Conference on.
Lack of Learning or Lack of Studying? An Inquiry into Low Exam Scores Katherine M. Sauer Metropolitan State College of Denver February.
What to Expect During Your Support Needs Interview Orientation Session for Individuals with Disabilities and Families.
Jessica Babb. Professional Learning and Ethical Practice The Teacher engages in on going professional learning and uses evidence to continually evaluate.
Lesson  Rebecca and Tova have math class right after lunch.  Rebecca always eats a hot lunch on days when she has an exam, because she has a theory.
Session 4: PREPARE FOR TESTS Year 7 Life Skills Student Wall Planner and Study Guide.
Business English Upper Intermediate U1S09 John Silberstein
Non-Experimental designs: Surveys Psych 231: Research Methods in Psychology.
Experiment Basics: Variables Psych 231: Research Methods in Psychology.
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Scientific Method & Descriptive Research Methods Module 5.
Psychological Research Methods Psychology: Chapter 2, Section 2.
Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.
Socratic Seminar “The unexamined life is not worth living.”
McGraw-Hill/Irwin Copyright © 2008 by The McGraw-Hill Companies, Inc. All rights reserved. CHAPTER 2 Tools of Positive Analysis.
Finishing up: Statistics & Developmental designs Psych 231: Research Methods in Psychology.
Bell Work (woohoo!) Pick up a SOAPPSTone and complete it. Be ready to discuss in 5 minutes. Remember what we learned about POV this week. Be as thorough.
Day 10 Analysing usability test results. Objectives  To learn more about how to understand and report quantitative test results  To learn about some.
PSY 219 – Academic Writing in Psychology Fall Çağ University Faculty of Arts and Sciences Department of Psychology Inst. Nilay Avcı Week 9.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
RESPONDING TO RULES HOW TO: MAKE COMPLAINTS TAKE “NO” FOR AN ANSWER DISAGREE APPROPRIATELY CHANGE RULES.
Sample Size Mahmoud Alhussami, DSc., PhD. Sample Size Determination Is the act of choosing the number of observations or replicates to include in a statistical.
Idiom of the Day IN THE LOOP To keep someone informed and up-to-date about what’s happening – usually in the workplace.
© 2015 albert-learning.com How to talk to your boss How to talk to your boss!!
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
What is development? Domains of development Questions about Development: Normative Development and Individual Differences Goals of developmental psychology.
Questioning as Formative Assessment: GRECC Math Alliance February 4 th - 7 th, 2008.
n Taking Notes and Keeping a Journal n Listening Skills n Working Together n Managing Your Time.
CAS Managebac update CAS opportunity for someone with a scanner. Cambodia?
Educational Research Chapter 8. Tools of Research Scales and instruments – measure complex characteristics such as intelligence and achievement Scales.
ON LINE TOPIC Assessment.  Educational assessment is the process of documenting, usually in measurable terms, knowledge, skills, attitudes and beliefs.
IF GIRLS AREN’T INTERESTED IN COMPUTING CAN WE CHANGE THEIR MINDS? Julie Fisher Monash University, Melbourne, Australia,
The Fine Art of Knowing How Wrong You Might Be. To Err Is Human Humans make an infinitude of mistakes. I figure safety in numbers makes it a little more.
ASSESSMENT OF STUDENT LEARNING
Classroom Assessment Validity And Bias in Assessment.
About Nursing…. Hello. My name is ____________ and I am a nurse. (briefly describe your current nursing position and previous positions you have had)
About Nursing…. Hello. My name is ____________ and I am a nurse. (briefly describe your current nursing position and previous positions you have had)
Big Data, Education, and Society
Reasoning in Psychology Using Statistics
THE RELATIONSHIP BETWEEN PRE-SERVICE TEACHERS’ PERCEPTIONS TOWARD ACTIVE LEARNING IN STATISTIC 2 COURSE AND THEIR ACADEMIC ACHIEVEMENT Vanny Septia Efendi.
Reasoning in Psychology Using Statistics
Presentation transcript:

Evaluation Metrics II February 12, 2010

Today’s Class Evaluation Metrics I Last Week’s Probing Question Evaluation Metrics II Next Friday’s Probing Question Assignments

Preparation for Future Learning Can a student learn a new skill or concept better, based on their previous experience?

Preparation for Future Learning What might be some ways to measure that the learning on the new task is “better”?

Preparation for Future Learning What might be some ways to measure that the learning on the new task is “better”? – Better performance on new task after learning – Faster learning on new task (“Accelerated future learning”)

Advantages/Disadvantages of PFL

Gets at not just skill, but sophisticated conceptual understanding that can be utilized in new contexts High vulnerability to second learning task – If the task is too easy or too hard, you won’t learn anything – Requires really understanding your domain Most people aren’t good at learning fast – Requires running longer, more complex study OR – Picking relatively easy second learning tasks

Comments? Questions?

Last Week’s Probing Question Should state/national/international assessments of learning (like the MCAS) have Preparation for Future Learning items? Why or why not? First, who is in favor? Who is against?

Last Week’s Probing Question Should state/national/international assessments of learning (like the MCAS) have Preparation for Future Learning items? Why or why not? Reasons in favor? Reasons against?

“Robust Learning” The “Robust Learning” movement argues that we should test “robust learning”, which is learning that – is retained – can transfer – prepares students for future learning (VanLehn, 2005; Corbett et al, in preparation)

“Robust Learning” Other researchers believe that these are distinct ways that learning can be “robust”, and that there is no single “robust learning” construct – E.g. you can remember something forever but be unable to transfer it – E.g. you can understand something flexibly and be prepared for future learning, but only for a couple of weeks before you forget it What do you think?

An empirical question… Ongoing research into this

Today’s Class Evaluation Metrics I Last Week’s Probing Question Evaluation Metrics II Next Friday’s Probing Question Assignments

More Evaluation Metrics Motivation Attitudes Affect Behavior

Motivation & Attitudes What kind of constructs might you want to measure? And what could you make conclusions about, by measuring them?

Motivation & Attitudes Grit Self-Handicapping Self-Efficacy Goal Orientation Intrinsic Motivation Extrinsic Motivation Disliking Domain Disliking Computers Disliking Your Software Theory of Intelligence Perception of Usefulness Self-Concept Cognitive Interest Situational Interest Vocational Interest

Currently Fashionable Grit Self-Handicapping Self-Efficacy Goal Orientation Intrinsic Motivation Extrinsic Motivation Disliking Domain Disliking Computers Disliking Your Software Theory of Intelligence Perception of Usefulness Self-Concept Cognitive Interest Situational Interest Vocational Interest

Fashionable in 1980s-1990s Grit Self-Handicapping Self-Efficacy Goal Orientation Intrinsic Motivation Extrinsic Motivation Disliking Domain Disliking Computers Disliking Your Software Theory of Intelligence Perception of Usefulness Self-Concept Cognitive Interest Situational Interest Vocational Interest

Never Fashionable Grit Self-Handicapping Self-Efficacy Goal Orientation Intrinsic Motivation Extrinsic Motivation Disliking Domain Disliking Computers Disliking Your Software Theory of Intelligence Perception of Usefulness Self-Concept Cognitive Interest Situational Interest Vocational Interest

Usually measured using questionnaires

Using questionnaires Making your own items Using someone else’s items

Making your own items Definitely not trivial Really easy to design items that are biased, or uninterpretable for students The chapters you read have some suggestions about how to do this right

Mind you, nobody does this anymore

What’s wrong with the following items?

What’s wrong with these items? (real item!) “Do you think women and children should be given the first available flu shots?”

What’s wrong with these items? “Do you prefer the Democratic health plan, or do you prefer for children to die of easily treatable diseases?”

What’s wrong with these items? “Do you prefer the Democratic health plan, or do you prefer lower health care costs?”

What’s wrong with these items? (real item!) “When you think of Kai Tak airport what are the 3 big mistakes you think of?”

What’s wrong with these items? (real item!) “Do you think that the software agent is genuinely concerned about your well-being?”

What’s wrong with these items? (real item!) “Have you ever cheated on a test?”

What’s wrong with these items? “Do Science ASSISTments improve your meta- cognitive understanding of control of variables strategy?”

What’s wrong with these items? “How much do you like DrScheme?” 12345

Ways to mess up items What are some other ways that you could mess up your items?

Comments? Questions?

The One-Coin-Toss Sampling Technique Let’s say that you want to ask a question with two answers, where one of the answers is socially stigmatized Example: “Have you ever cheated on a test?”

The One-Coin-Toss Sampling Technique Let’s say that you want to ask a question with two answers, where one of the answers is socially stigmatized Example: “Have you ever cheated on a test?” and others that are much more amusing, but which discussing in class might get me fired at my first-year review…

The One-Coin-Toss Sampling Technique Let’s say that you want to ask a question with two answers, where one of the answers is socially stigmatized Example: “Have you ever cheated on a test?”

The One-Coin-Toss Sampling Technique You ask the participant to flip a coin where you can’t see it If it is heads, they give the stigmatized answer, no matter what the truth is If it is tails, they answer honestly

The One-Coin-Toss Sampling Technique I know that no one carries change anymore, so I’ve brought some, courtesy of my mom Take a coin

The One-Coin-Toss Sampling Technique Flip your coin where no one can see, and remember the result

The One-Coin-Toss Sampling Technique Flip your coin where no one can see, and remember the result If it’s heads, say “YES” If it’s tails, tell me, have you ever cheated on a test?

Math Reported rate (R) of cheating on a test: Actual rate of cheating: R – (N/2) (N/2)

Statistical tests… There is added noise, so you need about double the sample to get significance

Comments? Questions?

“Lie Scale” Items Items which no one answering carefully and honestly would give a certain answer Used to test whether subject is answering carefully and honestly

“Lie Scale” Items “I never worry what other people think of me” TRUE/FALSE “I have never told a lie in my life” TRUE/FALSE

“Lie Scale” Items These items have been very successful on tests with adults, particularly personality exams My experience administering them with middle school students is that I get significantly over 50% lying – May be due to adminsitration out of context, an issue we’ll talk about later

Comments? Questions?

If you make your own items… Step 1: pre-test them with members of the target population for understandability

If you make your own items… Step 1: pre-test them with members of the target population for understandability By having them explain to you what the item means

One volunteer please

Please explain the meaning of Overall, how would you rate the quality of your loved one’s dying? (Circle one number) TerribleAlmost Perfect (yes, this is from a real questionnaire)

Please explain the meaning of Overall, how would you rate the quality of your loved one’s dying? (Circle one number) TerribleAlmost Perfect (yes, this is from a real questionnaire – Quality of Death and Dying Questionnaire for Family Members, University of Washington Medical School)

If you make your own items… Step 2: if you really want to know that the items are testing what you think they are testing It is recommended to create several items, administer them together (with other items) And see if they correlate, using Cronbach’s  A lot of work!

Using someone else’s items Advantages? Disadvantages?

Advantage Someone else has done the hard work of pre- testing the items and finding out what they correlate to

Disadvantage Many times, the items do not match exactly to what you need “I think that the tutor software is fun” (But you’re not studying tutor software!)

It has been argued… That it is usually safe to change the subject of a question, or to change grammatical tense “I think that Mily’s World is fun” But it is usually not safe to make further changes, without re-testing

Disadvantage Many times, items come in huge inventories that are too time-consuming to administer as a whole – The MMPI-2 clinical psychology exam has 567 questions Taking the items out of context may change how they are read and responded to – Particularly for lie scale items Often validation focuses on validity of entire scale, not of individual items

Solutions Use items designed to be given singly – For instance, individually-assigned items tested for correlation to scales – Not common, but not unheard of either Use entire sub-scale of questionnaire Find item(s) reported to be particularly central to the scale of interest in validation paper Use single item and hope for the best – Particularly when you can’t give large numbers of items

Comments? Questions?

If you are paying attention Raise your hand in the next 5 seconds!

Behavior & Affect As discussed on Jan. 20…

Behavior & Affect Measured in learning sciences with – observational methods (Jan. 20) – text replays (Jan. 20) – EDM models (Mar. 3) – Experience sampling method aka popup questions

Experience sampling method (Csikszentmihalyi & Larson, 1987) A participant does their normal task At regular (or semi-random) intervals the individual is interrupted – Classically with a beep, although these days with computerized administration pop-up questions are just as common And asked one or more questions

Experience sampling method Are you currently zoned-out? (Schooler et al, 2004) What are you doing right now? – Socializing, Seatwork, Listening to Teacher, … (Csikszentmihalyi & Larson, 1984) Are you bored? (Larson & Richards, 1991)

Advantages/Disadvantages? Field observations versus experience sampling method

Comments? Questions?

Today’s Class Evaluation Metrics I Last Week’s Probing Question Evaluation Metrics II Next Friday’s Probing Question Assignments

Probing Question Let’s say you wanted to do a large-scale research study on boredom Under what conditions would it be preferable to use – Questionnaire items – Experience sampling method – Quantitative field observations

Today’s Class Evaluation Metrics I Last Week’s Probing Question Evaluation Metrics II Next Friday’s Probing Question Assignments

Assignment #4 Any questions?