2 Outcomes Assessmenta.k.a How Do I Know If I’m Doing What I Think I’m Doing?1st: Identify what you are trying to do. This may include general outcomes and specific outcomes. For example:Increase the number of women entering the fields of math and engineering (general)Improve high school girls attitudes about math and engineering (specific)2nd: Identify ways to accurately assess whether these outcomes are occurring.3rd: Establish a procedure for program evaluation
3 Identify What You Are Trying To Do Some examples:Change attitudes about math and engineeringIncrease girls’ sense of self-efficacy in math and engineeringImprove motivation to engage in math and engineeringIncrease skills in math and engineeringIncrease the number of girls who go on to major in math and engineering from your high schoolIncrease the number of women who graduate from college with math and engineering majorsSome of these are assessments of attitude, some are assessments of skills and some are assessments of behavior.Because long-term outcome assessment is often difficult, we’d like to be able to assess attitudes that should theoretically predict those long term changes of behavior.It is especially good if we have some empirical knowledge about such a relationship: for example, we know that a sense of self-efficacy in reading is related to the development of future reading skills. Don’t know how much empirical evidence we have for math/engineering, so long-term follow-up would still be really useful.For now, I’m going to talk in more detail about how to accurately assess attitudes and motivation, which is typically (and most easily) done with questionnaires
4 Critical Issues for Assessment Tools ReliabilityConsistency of test scoresThe extent to which performance is not affected by measurement errorValidityThe extent to which a test actually measures what it is supposed to measureReliability - use scale exampleSometimes these are scales that are already shown to be reliable and valid - great to use these when you canSometimes you must make up your own scale, and then it will be important to evaluate whether it is reliable and valid.Also, a scale that is reliable and valid for one purpose may not be for another purpose,So good to always check in your own data
5 Types of Reliability Test-Retest Correlation of two tests taken on separate occasions by the same individualLimits: Practice effects, recall of former responsesAlternate FormCorrelation of scores obtained on two parallel formsLimits: May have practice effects, alternate forms often not availableIst two probably won’t use, but should know about
6 Types of Reliability Split-half Correlation between two halves of a testLimits: Shortens test, which affects reliability, difficult with tests that measure different things in the same test (heterogeneous tests)Kuder-Richardson and Coefficient AlphaInter-item consistency: Average correlation of each item with every other itemLimits: Not useful for hetergeneous testsThese are better, because they require only one administration. If you plan to publish the results of your program evaulation, you should be sure to check your measure using one of these techniquesK-R for yes-no responsesAlpha for continuous scale responses
7 Types of Validity Content Validity Checking to make sure that you’ve picked questions that cover the areas you want to cover, thoroughly and well.Difficulties: “Adequate sampling of the item universe.” Important to ensure that all major aspects are covered by the test items and in the correct proportionsSpecific Procedures: Content validity is built into the test from the onset through the choice of appropriate items.
8 Types of Validity Concurrent and Predictive Validity Definition: The relationship between a test and some criteria. The practical validity of a test for a specific purpose. Examples:Do high school girls who score high on this test go on to succeed in college as engineering majors? (P)Do successful women engineering majors score high on this test? (C)Difficulties: Criterion contamination; trainers must not know examinees’ test scoresSpecific Procedures: infinite, based on purpose of the test
9 Types of Validity Construct Validity Definition: the extent to which the test may be said to measure a theoretical construct or traitAny data throwing light on the nature of the trait and the conditions affecting its development and manifestations represent appropriate evidence for this validationExample: I have designed a program to lower girls’ math phobia. The girls who complete my program should have lower scores on the Math Phobia Measure compared to their scores before the program and compared to the scores of girls who have not completed the program
10 Optimizing Reliability & Validity Here are some tips for making sure your test will be reliable and valid for your purpose (circumstances that affect reliability and validity):The more questions the better (the number of test items)Ask questions several times in slightly different ways (homogeneity)Get as many people as you can in your program (N)Get different kinds of people in your program (sample heterogeneity)(Linear relationship between the test and the criterion)
11 Selecting and Creating Measures 1. Define the construct(s) that you want to measure clearly2. Identify existing measures, particularly those with established reliability and validity3. Determine whether those measures will work for your purpose and identify any areas where you may need to create a new measure or add new questions4. Create additional questions/measures5. Identify criteria that your measure should correlate with or predict, and develop procedures for assessing those criteria
12 Measuring Outcomes Pre and post tests Involves giving measure before intervention/training and then following the intervention in order to measure change as a result of the interventionImportant to identify what you are trying to change with your intervention (the constructs) in order to use measures that will pick up that changeBe sure to avoid criterion contaminationLimitations: If your group is preselected for the program, the variability will be restricted
13 Measuring Outcome Follow-up Procedures These may involve re-administering your pre/post measure again after some interval following the end of the program or any other criterion that should theoretically be predicted by your intervention, such as:choosing to take math/engineering courseschoosing to major in math/engineeringchoosing a career in math/engineering
14 Measuring Outcome Control Groups One critical problem faced by anyone who conducts an intervention is whether any observed change are related to the intervention or to some other factor (e.g., time, preselection, etc).The only way to be sure that your intervention is causing the desired changes is to use a control group. The control group must be the same as the treatment group in every way (usually by random assignment to groups), except the control group does not receive the intervention. Any differences between these groups can then be attributed to the intervention.How do you know whether the girls who chose to attend your program would not have gone on to major in math/engineering anyway?
15 Measuring Outcome Alternatives to randomly assigned control groups: Matched controlsComparison groupsComparison across programsRemember, you’ll need to use the same assessment and follow-up procedures for both groups
16 Comparing Across Programs In order to compare successfully across programs, you will also need to assess:Program characteristicsParticipant characteristicsSo you will need to also ask yourselves:What are the important aspects of the programs that I should know about?What are the important characteristics of the girls that I should know about?Probably the most likely procedure most of you will use is comparison across programs. And this is a big part of why we all came together for this workship. So I want to talk a bit about how to do this.
17 An Ideal Outcome Assessment TreatmentgroupAll participants Participants receives All participants All participantsfill out initial randomly intervention fill out post- are followedquestionnaires assigned to questionnaires through collegeconditions Control group and to first jobreceives nointervention
18 A More Realistic Outcome Assessment? Girls involved Each program Programsin each program Girls Girls fill reports data & conductingfill out pre-tests participate out post- program charac- follow-upsand client in programs questionnaires teristics report follow-characteristics up data