Presentation is loading. Please wait.

Presentation is loading. Please wait.

Standardized Tests. Problems with Individually Administered Tests Time required to administer test Time required to administer test Expense Expense Need.

Similar presentations


Presentation on theme: "Standardized Tests. Problems with Individually Administered Tests Time required to administer test Time required to administer test Expense Expense Need."— Presentation transcript:

1 Standardized Tests

2 Problems with Individually Administered Tests Time required to administer test Time required to administer test Expense Expense Need for trained examiners Need for trained examiners Unsuited for administration to large numbers of people Unsuited for administration to large numbers of people

3 Group Intelligence Tests Robert M. Yerkes Robert M. Yerkes Army Alpha & Army Beta tests for WWI recruits (1917) Army Alpha & Army Beta tests for WWI recruits (1917) These tests initiated mass testing These tests initiated mass testing Within a few years of the war’s end, mass testing moved to the schools Within a few years of the war’s end, mass testing moved to the schools 8,000 students took the SAT when it was first administered in 1926 8,000 students took the SAT when it was first administered in 1926 Nearly 3 million take it annually now Nearly 3 million take it annually now

4 Items from Army Beta Test

5 Group Tests of Intelligence: The Cognitive Abilities Test (COGAT) Latest revision is form 6 (2001) Latest revision is form 6 (2001) Includes a kindergarten level, 2 levels for grades 1 & 2, and 8 levels (A to H) for grades 3 to 12 Includes a kindergarten level, 2 levels for grades 1 & 2, and 8 levels (A to H) for grades 3 to 12 Each level is printed in a separate booklet Each level is printed in a separate booklet

6 Levels A to H Contain the same nine subtests, grouped into three batteries: Contain the same nine subtests, grouped into three batteries: Verbal Verbal Quantitative Quantitative Nonverbal Nonverbal Each subtest preceded by practice exercises with detailed explanations Each subtest preceded by practice exercises with detailed explanations Provides three separate scores: a verbal, quantitative & nonverbal score Provides three separate scores: a verbal, quantitative & nonverbal score Scores have mean of 100, standard deviation of 16 Scores have mean of 100, standard deviation of 16

7

8 Reliability & Validity Reliabilities in the.90’s for each of the scores Reliabilities in the.90’s for each of the scores Good validity: correlates with other tests & school grades Good validity: correlates with other tests & school grades Correlates with scores in social studies, math, first grade reading, musical ability, even social status Correlates with scores in social studies, math, first grade reading, musical ability, even social status

9 Nonverbal Group Tests: Raven’s Standard Progressive Matrices Developed in UK by J.C. Raven (1938) Developed in UK by J.C. Raven (1938) Can be administered to individuals or groups aged 5 to elderly adult Can be administered to individuals or groups aged 5 to elderly adult Consists of 60 matrices, each containing a logical pattern or design with a missing part, of increasing difficulty Consists of 60 matrices, each containing a logical pattern or design with a missing part, of increasing difficulty

10

11 Reliability & Validity Internal consistency studies using either the split-half method corrected for length or KR20 estimates result in values ranging from.60 to.98, with a median of.90 Internal consistency studies using either the split-half method corrected for length or KR20 estimates result in values ranging from.60 to.98, with a median of.90 Test-retest correlations range from a low of.46 for an eleven- year interval to a high of.97 for a two-day interval. The median test-retest value is approximately.82. Test-retest correlations range from a low of.46 for an eleven- year interval to a high of.97 for a two-day interval. The median test-retest value is approximately.82. test-retest coefficients for several age groups:.88 (13 yrs. plus),.93 (under 30 yrs.),.88 (30-39 yrs.),.87 (40-49 yrs.),.83 (50 yrs. and over). test-retest coefficients for several age groups:.88 (13 yrs. plus),.93 (under 30 yrs.),.88 (30-39 yrs.),.87 (40-49 yrs.),.83 (50 yrs. and over). Concurrent validity coefficients between the SPM and the Stanford-Binet and Weschler scales range between.54 and.88, with the majority in the.70s and.80s. Concurrent validity coefficients between the SPM and the Stanford-Binet and Weschler scales range between.54 and.88, with the majority in the.70s and.80s.

12 Benefits of Using SPM Can be used without any verbal instructions with young children, culturally deprived, language- handicapped, brain-injured individuals Can be used without any verbal instructions with young children, culturally deprived, language- handicapped, brain-injured individuals Minimizes the effects of language & culture Minimizes the effects of language & culture Differences between African Americans & Caucasians are less (7 or 8 points) with RPM than with SB or Wechsler scales Differences between African Americans & Caucasians are less (7 or 8 points) with RPM than with SB or Wechsler scales

13 Goodenough-Harris Drawing Test Individual instructed to draw a picture of a whole man & do the best job possible Individual instructed to draw a picture of a whole man & do the best job possible Respondents given credit for each item included in drawings Respondents given credit for each item included in drawings Each detail given 1 point (to a total of 70) Each detail given 1 point (to a total of 70) Raw scores converted to standard scores with a mean of 100, s.d. of 15, using age norms Raw scores converted to standard scores with a mean of 100, s.d. of 15, using age norms

14

15 Reliability & Validity Reliabilities (split-half, test-retest, inter-scorer) range from high.60’s to low.90’s Reliabilities (split-half, test-retest, inter-scorer) range from high.60’s to low.90’s Scores level off at ages 14 or 15, so can only be used with younger children Scores level off at ages 14 or 15, so can only be used with younger children Reasonable validity; correlation with standard IQ tests in one study was.81 Reasonable validity; correlation with standard IQ tests in one study was.81

16 Tests of Aptitude & Achievement Used in making decisions about admission to universities at the undergraduate level, graduate level, and to business & professional schools Used in making decisions about admission to universities at the undergraduate level, graduate level, and to business & professional schools Referred to as “high stakes” tests because of the impact they have on people’s lives Referred to as “high stakes” tests because of the impact they have on people’s lives

17 The Scholastic Assessment Test Until 1995, known as the Scholastic Aptitude Test Until 1995, known as the Scholastic Aptitude Test Has been in use since 1926 Has been in use since 1926 Most widely used of university entrance tests Most widely used of university entrance tests Given to nearly 3 million students each year Given to nearly 3 million students each year Newest form was introduced in March 2005, for entry into university in fall of 2006 Newest form was introduced in March 2005, for entry into university in fall of 2006 There is a Reasoning Tests (general aptitude test) and Subject Tests in various subjects There is a Reasoning Tests (general aptitude test) and Subject Tests in various subjects

18

19 Reasoning Test (formerly SAT-I) “The SAT Reasoning Test is a measure of the critical thinking skills you'll need for academic success in college. The SAT assesses how well you analyze and solve problems—skills you learned in school that you'll need in college.” “The SAT Reasoning Test is a measure of the critical thinking skills you'll need for academic success in college. The SAT assesses how well you analyze and solve problems—skills you learned in school that you'll need in college.” Three sections: Three sections: Critical reading Critical reading Mathematics Mathematics Writing Writing

20 Each section of the SAT is scored on a scale of 200-800, and the writing section generates two subscores. Each section of the SAT is scored on a scale of 200-800, and the writing section generates two subscores. administered seven times a year in the U.S., Puerto Rico, and U.S. Territories, and six times a year in other countries. administered seven times a year in the U.S., Puerto Rico, and U.S. Territories, and six times a year in other countries.

21 Critical Reading Section Reading comprehension, sentence completions, and paragraph-length critical reading Reading comprehension, sentence completions, and paragraph-length critical reading Hoping to _______ the dispute, negotiators proposed a compromise that they felt would be _______ to both labor and management. (A) enforce.. useful (B) end.. divisive (C) overcome.. unattractive (D) extend.. satisfactory (E) resolve.. acceptable

22 Mathematics Section Content: Number and operations; algebra and functions; geometry; statistics, probability, and data analysis Content: Number and operations; algebra and functions; geometry; statistics, probability, and data analysis Item-types: Five-choice multiple-choice questions and student- produced responses Item-types: Five-choice multiple-choice questions and student- produced responses

23 Writing Section Multiple choice questions (35 min.) and student- written essay (25 min.) Multiple choice questions (35 min.) and student- written essay (25 min.) E.g., The following sentences test your ability to recognize grammar and usage errors. Each sentence contains either a single error or no error at all. No sentence contains more than one error. The error, if there is one, is underlined and lettered. If the sentence contains an error, select the one underlined part that must be changed to make the sentence correct. If the sentence is correct, select choice E. In choosing answers, follow the requirements of standard written English. Example: The other delegates (A) and him (B) immediately (C) accepted the resolution drafted (D) by the neutral states. No error (E)

24 Subject Tests (formerly SAT-II) Subject Tests are designed to measure students' knowledge and skills in particular subject areas, as well as their ability to apply that knowledge. Subject Tests are designed to measure students' knowledge and skills in particular subject areas, as well as their ability to apply that knowledge. Students take the Subject Tests to demonstrate to universities their mastery of specific subjects like English, history, mathematics, science, and language. Students take the Subject Tests to demonstrate to universities their mastery of specific subjects like English, history, mathematics, science, and language.

25 Reliability & Validity Studies of old SAT show high internal consistency (>.90), test-retest reliability (>.85 over 10 months) Studies of old SAT show high internal consistency (>.90), test-retest reliability (>.85 over 10 months) Predictive validity of test, using university grades as the criterion, is quite high Predictive validity of test, using university grades as the criterion, is quite high

26 May 4, 2005 ON EDUCATION SAT Essay Test Rewards Length and Ignores Errors By MICHAEL WINERIP By MICHAEL WINERIP http://www.nytimes.com/2005/05/04/education/04e ducation.html?ei=5090&en=94808505ef7bed5a&ex=1 272859200&partner=rssuserland&emc=rss&pagewante d=print&position= http://www.nytimes.com/2005/05/04/education/04e ducation.html?ei=5090&en=94808505ef7bed5a&ex=1 272859200&partner=rssuserland&emc=rss&pagewante d=print&position= http://www.nytimes.com/2005/05/04/education/04e ducation.html?ei=5090&en=94808505ef7bed5a&ex=1 272859200&partner=rssuserland&emc=rss&pagewante d=print&position http://www.nytimes.com/2005/05/04/education/04e ducation.html?ei=5090&en=94808505ef7bed5a&ex=1 272859200&partner=rssuserland&emc=rss&pagewante d=print&position

27 Graduate Record Exam (GRE) One of the most commonly used tests for graduate-school entrance One of the most commonly used tests for graduate-school entrance Used in combination with undergraduate grades, letters of recommendation in selecting students for graduate school Used in combination with undergraduate grades, letters of recommendation in selecting students for graduate school General Test produces three scores: General Test produces three scores: Verbal (GRE-V) Verbal (GRE-V) Quantitative (GRE-Q) Quantitative (GRE-Q) Analytic (GRE-A) Analytic (GRE-A) Subject Tests in biology, chemistry, literature, psychology, etc. Subject Tests in biology, chemistry, literature, psychology, etc. All scores have a mean of 500, standard deviation of 100 All scores have a mean of 500, standard deviation of 100

28 GRE Structure GRE (General) GRE-V Antonyms Analogies Sentence Completions Reading Comprehension GRE-Q Arithmetic Algebra Geometry Data analysis GRE-A Present your perspective Analyze an argument

29 Sample Questions See http://www.gre.org/ See http://www.gre.org/http://www.gre.org/

30 Reliability & Validity Stability (test-retest) & split-half reliability is good Stability (test-retest) & split-half reliability is good Predictive validity “far from convincing” (Kaplan & Saccuzzo, 2005, p. 330) Predictive validity “far from convincing” (Kaplan & Saccuzzo, 2005, p. 330) Correlations between GRE and grade point average are low (.22 to.33 in one study, accounting for 5 to 10% of variance) Correlations between GRE and grade point average are low (.22 to.33 in one study, accounting for 5 to 10% of variance) High false negative rates High false negative rates When combined with undergraduate grades, correlated.63 with graduate grade point average When combined with undergraduate grades, correlated.63 with graduate grade point average See http://www.fairtest.org/facts/gre.htm See http://www.fairtest.org/facts/gre.htmhttp://www.fairtest.org/facts/gre.htm

31 High Stakes Tests in the Schools Several states in the US, Great Britain, New Zealand have implemented national testing programs Several states in the US, Great Britain, New Zealand have implemented national testing programs Bill Clinton’s proposal in 1997 to implement nation- wide testing aroused considerable debate Bill Clinton’s proposal in 1997 to implement nation- wide testing aroused considerable debate In 1999 National Academy of Sciences published report entitled “High Stakes: Testing for Tracking, Promotion & Graduation” In 1999 National Academy of Sciences published report entitled “High Stakes: Testing for Tracking, Promotion & Graduation” Generally supported testing, but expressed concern that test results are commonly misinterpreted & misunderstanding of test results can damage individuals Generally supported testing, but expressed concern that test results are commonly misinterpreted & misunderstanding of test results can damage individuals

32 Testing in Canda A number of provinces, including Alberta & Ontario, administer standardized ability tests to all students in their jurisdictions A number of provinces, including Alberta & Ontario, administer standardized ability tests to all students in their jurisdictions In Ontario, these tests are coordinated by the Education Quality & Accountability Office (EQAO) In Ontario, these tests are coordinated by the Education Quality & Accountability Office (EQAO) Budget for EQAO: approximately $50 million annually Budget for EQAO: approximately $50 million annually

33 The Ontario Secondary School Literacy Test (OSSLT) given every fall to assess the reading and writing abilities of Grade 10 students given every fall to assess the reading and writing abilities of Grade 10 students Students must pass the OSSLT in order to obtain an Ontario Secondary School diploma Students must pass the OSSLT in order to obtain an Ontario Secondary School diploma Students who don’t pass can retake the test an unlimited number of times Students who don’t pass can retake the test an unlimited number of times Their school transcript will only list whether or not they passed the OSSLT, not how many times they attempted the test. Their school transcript will only list whether or not they passed the OSSLT, not how many times they attempted the test.

34 OSSLT (continued) Reading: Students are given examples of different types of reading selections. They are then tested on their comprehension of what they have read. Reading: Students are given examples of different types of reading selections. They are then tested on their comprehension of what they have read. Writing: Students are required to write four different types of work Writing: Students are required to write four different types of work A summary A summary An opinion piece An opinion piece An information paragraph An information paragraph A news report A news report

35 EQAO changes to standardized testing make them less disruptive but do not address the fundamental validity of the tests September 23, 2004 (Toronto) - “The changes to standardized testing in Ontario’s schools announced by the Education Quality and Accountability Office (EQAO) today do not address the fundamental question posed by educators and parents as to whether the testing is in fact valid,” said Rhonda Kimberley-Young, president of the Ontario Secondary School Teachers’ Federation.” “These changes will mean that these intrusive tests will not disrupt the learning of students to the same degree as they have until now, but simply making the tests shorter and changing how the results are reported does not mean that the testing is any way a valid measure of student achievement. “Teachers and educational workers believe the Ontario government should now take the next logical step and immediately conduct a validity study of the standardized testing taking place in Ontario schools. “The EQAO and the testing it is conducting is a multi million dollar expense. At a time when financial resources for schools and students are stretched, OSSTF believes these education dollars would be far better spent on meeting the educational needs of students,” concluded Kimberley-Young.

36 OSSTF Position on Grade 10 The EQAO Grade 10 literacy test is not a fair measure. The EQAO Grade 10 literacy test is not a fair measure. The test is not administered consistently across the province. It is impossible to standardize preparation and administration conditions in a standardized test. The test is not administered consistently across the province. It is impossible to standardize preparation and administration conditions in a standardized test. According to Alfie Kohn, who crusades against standardized tests in the United States, socioeconomic status accounts for "an overwhelming proportion of the variance in test scores". According to Alfie Kohn, who crusades against standardized tests in the United States, socioeconomic status accounts for "an overwhelming proportion of the variance in test scores". Time is taken away from the regular curriculum in preparing for the test. Student anxiety affects learning in other areas. Time is taken away from the regular curriculum in preparing for the test. Student anxiety affects learning in other areas.

37 OSSTF Criticism (cont’d) The EQAO Grade 10 literacy test is not a valid measure of student reading and writing. The EQAO Grade 10 literacy test is not a valid measure of student reading and writing. The test is very heavily weighted to writing. The test is very heavily weighted to writing. Students need over 60% in BOTH reading and writing to pass. Students need over 60% in BOTH reading and writing to pass. No marked tests will be returned. Students who fail receive limited, vague feedback. No marked tests will be returned. Students who fail receive limited, vague feedback. There are very few funds or opportunities to provide help to students who perform poorly or fail. There are very few funds or opportunities to provide help to students who perform poorly or fail. Instructions for questions are unclear. On a question which asked for one paragraph, students who wrote more than one paragraph failed the question because they did not follow the instructions exactly. Instructions for questions are unclear. On a question which asked for one paragraph, students who wrote more than one paragraph failed the question because they did not follow the instructions exactly. EQAO is secretive and will reveal neither the marking criteria nor what constitutes a pass. EQAO is secretive and will reveal neither the marking criteria nor what constitutes a pass.

38 OSSTF Criticisms (cont’d) Cost of administering the tests Cost of administering the tests The cost of last year’s literacy test was $15 million at the same time as there were textbook shortages, and cuts to library, music, guidance, educational assistants and support staff. The cost of last year’s literacy test was $15 million at the same time as there were textbook shortages, and cuts to library, music, guidance, educational assistants and support staff.

39

40 Canadian Teachers Federation High stakes testing High stakes testing Encourages “teaching to the test” Encourages “teaching to the test” Creates a situation in which students struggling with the material or who have special needs are seen as a liability because their low score influences averages Creates a situation in which students struggling with the material or who have special needs are seen as a liability because their low score influences averages Squeezes “non-tested” subjects out of the curriculum Squeezes “non-tested” subjects out of the curriculum Are frequently biased against certain groups of students Are frequently biased against certain groups of students Perpetuates the idea that a good education equals high test scores Perpetuates the idea that a good education equals high test scores Transfers control over curriculum to the body that controls the exam Transfers control over curriculum to the body that controls the exam

41 Not long ago, a widely respected middle-school teacher in Wisconsin, famous for helping students design their own innovative learning projects, stood up at a community meeting and announced that he "used to be" a good teacher. The auditorium fell silent at his use of the past tense. These days, he explained, he just handed out textbooks and quizzed his students on what they had memorized. The reason was very simple. He and his colleagues were increasingly being held accountable for raising test scores. The kind of wide-ranging and enthusiastic exploration of ideas that once characterized his classroom could no longer survive when the emphasis was on preparing students to take a standardized examination. Not long ago, a widely respected middle-school teacher in Wisconsin, famous for helping students design their own innovative learning projects, stood up at a community meeting and announced that he "used to be" a good teacher. The auditorium fell silent at his use of the past tense. These days, he explained, he just handed out textbooks and quizzed his students on what they had memorized. The reason was very simple. He and his colleagues were increasingly being held accountable for raising test scores. The kind of wide-ranging and enthusiastic exploration of ideas that once characterized his classroom could no longer survive when the emphasis was on preparing students to take a standardized examination.

42 Benefits of Standardized Tests Allow for identification of children with problems, so that remediation can take place Allow for identification of children with problems, so that remediation can take place Allow for identification of schools that may need extra resources Allow for identification of schools that may need extra resources Increases accountability of school to parents, Boards of Education, government Increases accountability of school to parents, Boards of Education, government

43 What do you think about standardized tests?


Download ppt "Standardized Tests. Problems with Individually Administered Tests Time required to administer test Time required to administer test Expense Expense Need."

Similar presentations


Ads by Google