1 General look at testing Taking a step backwards.

1 General look at testing Taking a step backwards

2 Issues of importance within testing Categories of Tests - different classification methods –Content vs non-content Uses and Users of Tests Assumptions and Questions behind the use of tests Creating a test from scratch

3 Categories of Tests / content Mental Abilities Memory, spatial, creativity Wechsler Adult Intelligence Scale - WAIS Intelligence Scholastic Assessment Test - SAT Achievement Tests Ind administered Gp administered Batteries Single Subject Certification Government Individual Series of tests e.g., reading & mathematics VocationalStandardisationDiagnostic

4 Categories of Tests / content Personality Interests & Attitudes Neuropsychological Minnesota Multiphasic Personality Test - MMPI Rorschach Inkblot Test Luria-Nebraska Neuropsychological battery - LNNB Objective Projective Vocational Attitude Values

5 Paper and pencil vs. performance –Respondent selects between predefined answers –Examinee performs some action and is judged on it Speed vs. power –Former purely interested in speed –Latter tests limits of knowledge or ability - no time limit imposed –Usually both are tested at the same time Categories of Tests / non-content

6 Individual vs. group testing Maximum vs. typical performance –Ability tests usually want to know about best performance –On personality test - how typically extroverted are you Norm-referenced vs. criterion referenced performance –Only relative performance considered –How well did you do relative to predefined criteria Categories of Tests / non-content

7 Users of tests Professional psychologists - time spent in assessment –Psychologists working in a mental health setting spend - 15-18% (Corrigan et al ‘98) –Over 80% of neuropsychologists - 5 or more hours /wk (Camara et al, ‘00) –Educational psychologists - 1/2 of working wk (Hutton et al, ‘00) –2/3 of counseling psychologists use objective measures regularly (Watkins et al, ‘98)

8 Other uses of tests Within education –Measure performance or predict future success Personnel –Select appropriate person or to select the task to which the person is most suited Research –Test often serves as the operational definition of the DV

9 Basic assumptions Humans must have recognized traits which we consider to be important –Inds must potentially differ on these traits These traits must be quantifiable Traits must be stable across time Traits must have relationship with actual behaviour

10 Issues to be concerned about How the test was developed Reliability Validity

11 Constructing a reliable test Is a much more extensive a process than average user realises Most personality constructs have been established – tests readily available to measure them – proliferation of tests would therefore seem pointless from a theoretical point of view

12 Writing test items Covered question format before – in addition –Need to ensure that all aspects of the construct should be dealt with –anxiety- all the different aspects of construct should be considered –Need to be long enough to be reliable - start with around 30 and reduce to 20 –Should only assess one trait –Should be culturally neutral –Should not be the same item rephrased (mentioned during FA)

13 Establishing item suitability Should not be too many items which are either very easy or very hard –>10% of items with scores.8 is questionable Items should have an acceptable standard deviation. If it is too low then it is not tapping into individual differences If there are different constructs then it is important that an equal number of items refers to each construct.

14 Establishing item suitability Criterion keying – choosing items based on their ability to differentiate groups –Atheoretical –Groups must be well defined –Interpret liberally since there will be overlap in response distributions By FA – items that have a low loading (<.3) would be removed

15 Classical item analysis –Correlation of item score with score on the whole test (excluding that item) calculated –Removing item with low such correlation improves reliability But since reliability is also a product of number of items there is a balance –Point comes where removing a poor item decreases reliability since it depends on the average correlation and the number of items in the test Each time an item is removed the correlation of each item to the main score must be recalculated since this will change as items are removed Establishing item suitability

16 Revisiting reliability and validity Each scale should assess one psychological construct Measurement error means that for any one item the psychological construct only accounts for low % of the respondent’s variation –Other factors cause most of variation - age, religious beliefs, sociability, peer-group pressure Use several items and this random variation should cancel each other out such that measured variance is due to underlying construct

17 Reliability of test Does not mean temporal stability (test-retest reliability measured through parallel forms) Is a measure of the extent to which a scale measures one construct only –Split-half reliability –Cronbach’s Alpha Influenced by the average correlation between the items and the number of items in the test Boosted by asking the ‘same’ question twice Test should not be used if alpha is below.7

18 Test Validity Face Validity Content Validity Construct Validity Predictive Validity

1 General look at testing Taking a step backwards.

Similar presentations

Presentation on theme: "1 General look at testing Taking a step backwards."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 General look at testing Taking a step backwards.

Similar presentations

Presentation on theme: "1 General look at testing Taking a step backwards."— Presentation transcript:

Similar presentations

About project

Feedback