2 Reliability We all have friends, some are reliable and some are not With your partner, discuss what a reliable friend is,List three qualities you would use?
3 ReliabilityIn Laymen’s term, reliability is being able to depend that the results are accurate for that test.If you did it again, would you get the same score?There are many factors that affect reliability
4 Error in measurement Two types of error in measurement Systematic-Bias Random
5 BiasGenerally bias refers to raising a persons score because they were advantaged in some wayHowever, the groups that was not advantaged, was affected negatively by the biasBoys score better on multiple choice questions than girls, the boys were advantaged, the girls were disadvantaged
6 Random error Random error – is very different It is hard to predict who it is affecting,hard to predict by how much,Hard to predict by what magnitudeReliable test try and eliminate most types of error
7 Reliability Coefficient We can measure how reliable tests are by the reliability coefficientA test free from error- has perfect 1.0A test filled with error –has a 0Since every test has error then a reliability around .85 or above
8 Types of reliability Item reliability Stability Inter-rater reliability or interobserver agreement
9 Item reliabilityItem reliability affects the prediction of understanding of the knowledge in several waysImagine a study trying to predict how the population of a country or state will vote in the next electionThe prediction is only as good as the sample it selects, if it select from one area, it will not be representative of the populationThis same concepts applies to developing a testTest developers cannot possible select all the items they need to test, the more accurate the representative is of the total knowledge, the more reliable the test
10 Item reliabilityYour goal is for the student performance on the sample items would be the same as if he/she took all of the items ( if that were a possibility)The goal of the test is to be able to generalize the students ability to what they know of the entire realm of knowledge in that areaWhen we over estimate their ability, our test is unreliable
11 Item reliabilityThere are two main approaches to determining item reliabilityAlternate form reliability-Internal consistency
12 Item reliability Alternate form reliability- two forms of a test are developed, each from the same knowledge base but each with different questionsYou then test a large sample with the testHalf take one form , half the otherThey should have similar scoresScores from the test are correlated and form the correlation coefficient
13 Item reliability Internal consistency There are many ways to test internal consistencyOn popular way is to develop a test that can be split with a similar level of difficultyAdminister the test and see how the students didSay the test was split by first half an second half, grade half of the class on the first half and the other half on the second half and compare scores.Can also do if for specific items
14 StabilityIn many cases, we expect out tests to produce information that when tested later, will yield the same resultsA child tested for colorblindness- should reveal being colorblind later in life since the problem is not curable, if not the test was unreliable because it is unstable
15 Stability A test should produce similar results I you give a set of students a test and then wait a while, then readminister the test, it should produce similar resultsThe more similar the results, the more stable and the more reliable
16 Stability Stability is not affected by, interventions. If you test a child and it shows he is weak in a certain area, then you provide and intervention and the child does better on the next test, that is not considered a weakness in stability
17 Inter-rater reliability Inter observer/inter-rater reliabilityThe concept is simple and easy to understand-It is analogous to a piece of music, a book or a movie,Two people see, read or watch the same thing and have a different opinionWatch the next clip, what do you think?
21 Inter-rater reliability Inter-rater reliability needs to be developed in several places and can be measured in several waysDifferent raters/observers need to be trained on what to watch, need to have a clear criteria for what is a positive incident of what you are observerIf you are looking for out of seat behavior, is it standing, squirming, leaning over, or being two feet from the desk
22 Inter-rater reliability Inter-rater reliability can be measured in several ways, by comparing two people scores from the sameOr by doing an item by item analysis and comparing the difference observation
25 Standard Error of Measurement Imagine you gave a test to a kindergarten student on his letter sound recognitionYou developed 100 test of ten itemsAfter giving the child about ten of these test, the scores would be about the same.Some of the test he would know the sounds, some he would not, but the average would be accurateSEM tries to predict what that error between the test would be if you only gave him one test, remember it could be a test he scored well on, or it could be a test he scored poorly onIt is a similar concept to Standard Deviation, but related specifically to error
26 Estimate of True Scores This is more of a conceptual concept, that a statistical unitImagine you take a fifty question test and you do not know ten answers questionsYou guess on them and being a very lucky person, you get 8 right- These eight answers are really not your true scoreIf you are unlucky, you get a lower score
27 Confidence IntervalsGiven the fact that true scores are difficult to obtain, the concept of confidence intervals was created.When it is combined with SEM it relays very accurate scoresThe level of confidence tells us how certain the score is within the range
28 Confidence IntervalsIf a child has a score of 90 ± 5 ( SEM) the we are saying the child score is somewhere between 85 and 95.If we say that a child has a score of 90 ± 5 ( SEM) with a 95% confidence level, we are saying that there is only a 5% chance that the child score is somewhere above or below 85 and 95.The lower the confidence, the smaller the range the child score is somewhere between 88 and 92. at a 80% confidence level
29 ValidityThis refers to the degree to which the evidence and theory support the interpretation of the test scores by the proposed uses of testsOften test are interpreted for uses they were not designed.Therefore, Validity is a fundamental consideration
30 ValidityThe fundamental question that you need to ask, is, Does the testing process lead to the correct inferences about a specific person.
31 ValidityFirst assume you give an IQ test in English to a non English speaking personYou give a test that measures cultural items a that a person was not exposed toYou use a test designed for national standards that does not align to a local standards ( social studies)
32 ValidityContent validity- Is the content of the measure representative of the domain of content it is suppose to assess? Experts look at the content and compare it to what they feel it should contain.
33 Validity Appropriateness of included items- Should the questions be hereDo they represent what it is trying to measure ( different than content validity) are the questions from a too high of a grade level, like middle school stuff on an elementary testIs the presentation of the items appropriate, are the questions worded properly?
34 ValidityContent not included- is there important content missing that should be there?How are the items measuredAre the multiple choice,Open ended where you must show work
35 ValidityCriterion Reference Validity- references a tests ability to describe a test takers ability in two waysPresent- Concurrent Criterion Referenced ValidityFuture- Predictive Criterion Referenced Validity
36 Validity Concurrent Criterion Referenced Validity- Is the test/assessment a good predictor of what the students currently know based on the criterion of the knowledge base?If a child takes an achievement test. Is it a valid measure of how well he did in fourth grade?
37 Validity Predictive Criterion Referenced Validity Does the test have the ability to predict what it say it will predictA reading readiness test- if a students scores high, does he learn to read easily?If a child scores poorly, does he struggle to learn to read?
38 ValidityConstruct Validity refers to the extent to which a procedure or test measures a theoretical trait or characteristicconstruct validity refers to whether a scale measures or correlates with the theorized psychological construct ( such as intelligence) that it purports to measure.