# Accurate Assessment in the Common Core Era

## Presentation on theme: "Accurate Assessment in the Common Core Era"— Presentation transcript:

Accurate Assessment in the Common Core Era
Eric Bright 8th Grade Math Charleston Middle School Accurate Assessment in the Common Core Era Assessing student content mastery

Have you ever gotten a grade you didn’t deserve? Why? What final grade should the student with the following grades get? F, D, C, F, F, B, D, F, C, F, F, B, F, A, F, A Open discussion of what the purpose actually is.

What about now? Homework: F Homework: D Quiz: C Test: B Project: A Final Exam: A Open discussion of what the purpose actually is. What does a grade represent?

Purpose of Assessment Bottom line: Does the student get it?
Purpose: To determine to what degree a student has mastered content standards with a high degree of validity and reliability. Validity – the assessment measures what it is supposed to measure Reliability – the assessment produces consistent results across evaluators Open discussion of what the purpose actually is.

Formative vs. Summative
Formative – Assessment FOR learning; assessment that informs teaching and learning strategies for the teacher and/or student May be formal or informal (observations, effort, participation, exit slips, etc.) May be more qualitative Includes meaningful feedback to students Purpose: To improve student learning. Formative grades should occur all the time, but we cannot include it as part of a student’s grade because it doesn’t reflect mastery.

Formative vs. Summative
Formative Checklist – The assessment should… Tie directly to standards Focus on student learning needs Identify students’ current learning progress Give results that you can act on Be a regular part of instruction Quick and easy to give and grade If you don’t use the data, stop gathering it!

Formative vs. Summative
Summative – An assessment that summarizes the student’s mastery of a standard Usually formal (test, quiz, multiple choice, short response, word problems, projects, performance, portfolios, etc.) May be more quantitative Purpose: To give a picture of how well a student has mastered a standard at a specific time. Summative grades should not occur until we can reasonably expect a student to master content.

Formative vs. Summative
Summative Checklist – The assessment should… Tie directly to standards Include multiple levels of learning (Bloom’s: remember, understand, apply, analyze, evaluate, create) Summarize students’ overall learning progress Give results that all stakeholders can understand Make sure you have offered students examples of what meeting the standards looks like prior to a summative assessment.

SPECIFIC ASSESSMENT TYPES
Extra Credit Class Participation Homework Completion Homework Accuracy Problem Solving Pop Quiz Quiz Pre-Test Post-Test Performance Assessment with Rubric

Extra Credit Does the activity address a standard?
Yes – Should be either formative or summative. This is regular credit. No – Should not be summative since it is not the grade level standard. This is formative at best. Does the activity go deeper into or below the grade level standard? Deeper – Should not be summative since it is not the grade level standard. It is enrichment and formative. Below – Should not be summative since it is not the grade level standard. It is remediation and formative. Extra credit is grade inflation, often from teachers who don’t want to deal with the consequences of having many F’s. That is an issue for another talk, but the key question there is why do so many students have F’s?

Class Participation Definition: A grade based solely on the frequency of participation in class. What does a participation grade measure? Content mastery? No, those with high content mastery may not participate and those without content mastery may participate frequently. This is not summative. Willingness to participate? Yes, which is critical to success in the real world, but does not reflect content mastery. Therefore, participation grades should be formative. Innovate: Keep track of participation with tally marks on a seating chart. Log it weekly in the grade book, but count it as worth 0%.

Homework Completion Definition: A grade based solely on the amount of work completed and not the accuracy of that work. What does a completion grade measure? Content mastery? No, due to the lack of accuracy assessment. Therefore it is not summative. Effort? Yes, which is critical to a student’s academic success, but is still not summative. This is formative. Innovate: Keep completion grades in the grade book, but count them as worth 0%.

The “No Homework” Policy
Examine the research on homework. HW has no little to no effect on elementary students and begins having positive effects at the middle school level Positive correlation between HW frequency and student achievement Positive correlation between HW completion and student achievement Positive correlation between HW that promotes self-regulation and student achievement Negative correlation between the relative amount of time spent on math HW versus other subjects Negative correlation between drill/practice HW and student achievement Conclusion: HW is important, but we need to rethink how we use it.

Homework Accuracy What is the purpose of homework?
To practice skills? Then it is formative, not summative. Consequence of not doing: Nothing. Natural consequences show up on summative assessment. Innovate: Write the homework completion rate on the top of each major summative assessment so students see the relationship between their practice and achievement. Innovate: Give a homework quiz at the end of every class period (or every other day) with two to four questions from the homework. Use this quiz as a formative grade.

Homework Accuracy What is the purpose of homework?
To get teacher feedback? Then it is formative, not summative. Consequence of not doing: Redo the homework so feedback can be given. Innovate: Don’t write a grade on this or else the students just toss it. Give feedback qualitatively instead of quantitatively.

Homework Accuracy What is the purpose of homework?
To learn content through discovery? Then it is formative, not summative. Consequence of not doing: Redo the homework. Innovate: Give less homework problems but have what is assigned take more thought with higher levels of Bloom’s taxonomy. Be less helpful which forces the students to think for themselves.

Homework Accuracy What is the purpose of homework?
To learn time management and organization? Then it is formative, not summative. Consequence of not doing: Create a homework completion plan with parents to train students in self-regulation skills. Innovate: Focus more on the time spent on task rather than the amount of homework completed. Ask parents to track or report that data to show growth.

Homework Accuracy Is it possible for homework to be summative?
Yes, but for it to be summative, students must have had the chance to master the skills. They need time to correct/revise homework before it is graded. Unfortunately, the homework loses its value as a formative assessment with this method. Show homework check paper.

Problem Solving Activity
Definition: An extended response situation requiring multiple steps to solve, use of multiple skills, and justification of reasoning and process. What is the purpose of the problem solving activity? To be exposed to new applications of content? This is formative. To demonstrate mastery of content through application? This is summative.

Pop Quiz Summative assessment with no advanced notice.
Are there “pop” football games? We’re not out to trap the students. Give students fair notice so that they can not only learn the material, but also develop and apply good study skills. (This assumes we teach self-regulation skills.) Innovate: Instead of pop quizzes, recent research is showing the benefits of practice quizzes. These formative assessments function like a pop quiz, but are not for a grade. They merely provide feedback on the learning process prior to the summative assessment. Give a quick four question quiz at the beginning of class as a warm-up activity. “Grade” it and go over it together instead of going over homework.

Quiz Definition: A shorter assessment designed to assess mastery of a small set of skills. What is the purpose of the quiz? Establish reliability of mastery through multiple data entries? This is summative. Provide feedback to students about particular deficit skills before a culminating summative assessment? This is formative. Could it be a mixture of both?

Pre-Tests Definition: A test given before a unit of study to ascertain content already mastered. Why give a pre-test? All reasons are formative. Differentiation – Students who have already mastered content can move deeper into that content. Identify student leaders – Students with content mastery can be used to promote mathematical discourse. Show growth – Establishes a base line of where students are to compare with the post-test at the end of the unit.

Post-Test Definition: A test designed to show mastery over a whole unit of study. How does each type of test show mastery? Multiple choice Short response (written or symbolic) Extended response PBA or Project

Performance Assessment with Rubric
Projects or Extended Response Items The rubric must address mastery of standards to be summative. Sample bad rubric:  3D Shape Children’s Story Book 5 pts 4 pts 3 pts 2 pts PRESENTATION  1 X Posture, eye contact, grammar, pacing, clearness of speech Excellent Good Fair Poor REQUIREMENTS  2 X Has all shapes with theme and story that flows. NEATNESS AND CONSTRUCTION  3 X Book well built and illustrations neat and colored. CREATIVITY  3 X Well thought out and original theme PROJECT  1 X Overall looks great with well thought out theme. Rubric does not match the standards and in fact has very little to do with math at all. Also note that the minimum grade a student could get is 20/50.

Performance Assessment with Rubric
A better rubric part 1:  3D Shape Children’s Story Book 2 pts 1 pts 0 pts DEFINITION OF CYLINDER  Student accurately defines in his own words Mastery level understanding Good understanding but copied some of the definition Does not show understanding DEFINITION OF CONE Student accurately defines in her own words DEFINITION OF SPHERE Student accurately defines in his own words VOLUME OF CYLINDER Student accurately gives the formula Yes No VOLUME OF CONE Student accurately gives the formula VOLUME OF SPHERE Student accurately gives the formula

Performance Assessment with Rubric
A better rubric part 2:  3D Shape Children’s Story Book 2 pts 1 pts 0 pts FINDING VOLUME OF CYLINDER Student accurately finds volume and justifies solution with work X 2 Mastery level understanding Good understanding but computation errors Does not show understanding Finding the volume makes sense in the context of story/problem Yes Partially No FINDING VOLUME OF CONE Student accurately finds volume and justifies solution with work X 2 FINDING VOLUME OF SPHERE Student accurately finds volume and justifies solution with work X 2 MATHEMATICAL PRECISION Student maintains precision by using π≈ 3.14 and rounding final solutions to two decimal place x 2 6 or 7 problems solved with precision 4 or 5 problems solved with precision < 4 problems solved with precision

Performance Assessment with Rubric
A better rubric part 3:  3D Shape Children’s Story Book 2 pts 1 pts 0 pts FINDING RADIUS OF CYLINDER OR CONE Student accurately finds radius and justifies solution with work Mastery level understanding Good understanding but computation errors Does not show understanding Finding the radius makes sense in the context of the story Yes Partially No FINDING HEIGHT OF CYLINDER OR CONE Student accurately finds height and justifies solution with work Finding the height makes sense in the context of the story FINDING RADIUS OF SPHERE  Student accurately finds radius and justifies solution with work FINDING VOLUME OF COMBINATION  Student accurately finds volume and justifies solution with work Finding the volume makes sense in the context of the story FINAL GRADE /50 pts X 2 = __________%

ASSESSMENT POLICIES Late Work Policy Cheating Giving Zeroes
Group Grades Partial Credit Test Retakes Common Assessments

Late Work Innovate: Avoid late work in the first place:
Give students advanced notice of any out-of-class summative assessments. Have benchmark due dates for those assessments. Reduce the amount of out-of-class summative assessments. If an assessment is turned in late: Be flexible depending on the circumstances, but have a written policy in place such as: Minus 10% to grade, but only accepted up to a week late. Note late work as a formative assessment and track it with individual students and parents. Grade assessment normally, but only accept work up to a week late.

Cheating What constitutes cheating?
Formative: Explaining what to do or just giving an answer without explaining why we do it. Summative: Explaining what to do or just giving an answer. If students cheat, what should be the consequences? Formative: Perhaps nothing except notifying parents. Natural consequences will occur on summative assessments. Summative: Take a different version of the assessment.

The “No Zero” Policy Zero is an outlier and therefore we should only give 50% instead. False. If the student really knows 0% of the content, the zero is the best reflection of their content mastery. It is an outlier because the outcomes (grades) are not equally likely. 60% passing is a low standard. Giving a lower bound of 50% skews grades much higher. Think of a student with scores of 60, 10, 80, 10.

The “No Zero” Policy Any zero should be redone until the student passes. False and impossible. There is a time component to mastery. We expect mastery by a certain time. If students had the whole year to master content, they could save all assessments for the last day. Giving one chance for a retake assessment is reasonable since students learn at different rates, but beyond that either means bad teaching in the first place or that a student truly has not mastered the content.

The “No Zero” Policy The “No Zero” policy came about because teachers gave zeros for formative assessments and then counted it toward a student’s grade (summative). This is not an inappropriate use of the zero. It is an inappropriate use of formative assessment.

Zeros Innovate: Use the zero, but use it correctly!
Why do we give a zero for summative grades? Incomplete? Then we don’t know how well a student has mastered that standard, so force the student to complete the assignment. If they refuse, our best guess is that they do not understand the topic, and the zero stands. Total lack of mastery? Then the zero is the most accurate representation of student mastery.

Zeros Innovate: Use the zero, but use it correctly!
Why do we give a zero for summative grades? Cheating? This does not accurately assess what a student has learned. A better consequence is to retake a similar assessment. Late? (When does late become incomplete? One day?) This does not accurately assess what a student has learned. Why is it late? If cheating, see that consequence. If effort, note that as a formative assessment.

Group Grades Definition: Giving the same grade (or slightly modified grades) to each student in a group. What does a group grade measure? Content mastery? It can, but one student may have achieved mastery while getting a poor grade due to someone else’s lack of mastery. This is not summative for each student. Innovate: Have students discuss ideas in a group, but… Don’t let them write anything down until they are on their own. Have them throw away their group work before filling out the summative assessment. Use group work only as a formative assessment or discovery task.

Partial Credit Consider the following work on an algebra assessment:
Was the mistake an algebraic mistake? This was a computation error, not an algebra error. Innovate: If we are assessing the algebra standard, the student appears to understand inverse operations. Perhaps 3/5 points.

Test Retakes Should students be able to retake tests? What is the purpose of the retake? Purpose: Assess mastery. Yes, a retake might show new mastery of content. If a student retakes a test, should the new grade be averaged with the previous grade or should the new grade replace the previous one? Averaging acknowledges the struggle, but does not necessarily show the student’s current level of mastery. How many times can a student retake a test? It is impractical to allow multiple retakes. Since a summative assessment is tied to a time frame, giving one chance to retake an assessment reinforces that timeliness and also student responsibility.

Common Assessments Definition: Identical assessments that are given by different teachers who teach the same course. Purpose of Common Assessments: Establish inter-grader reliability for assessments. Count as a Type II assessment for teacher evaluations. Type I – MAP, PARCC, Universal Screener Type II – District, grade level, or course-wide assessment adopted and approved by the school district Type III – Teacher created Give a springboard for discussing student mastery for the purposes of lesson revision. Possible common assessments include: weekly or mid-chapter quizzes, unit or chapter tests, quarter or semester exams We want to make sure that an A in one class is the same as an A in another class. This also assumes a common grading scale. CHS allows teachers to choose a grading scale. They have a common multiple choice final exam in Algebra of 50 questions but one teacher has a grading scale of is an A while another has is an A. This doesn’t work.

Total Points vs. Weighted Categories “Standards-Based” Grading

Grading on the Curve Changing student grades methodically for a better grade distribution. This does not accurately assess student mastery of content if we have a clear picture of what mastery is. If we don’t know what mastery looks like, that must be satisfied before we can assess. Rather than making your grade distribution match the normal curve, ask yourself why the grades are distributed they way they are. This is a formative exercise.

Was it too soon to expect mastery? Eliminate the summative grade and give the assessment later. Use this as a formative assessment. Was the material poorly taught? Eliminate the summative grade and re-teach. Was the assessment too difficult? Eliminate the summative grade and give a better written assessment. Was the assessment accurate? Give students options for remediation, but move on in the curriculum. Are there too many high grades? Was it the assessment too easy? Eliminate the summative grade and give a better written assessment. Was the assessment accurate? Celebrate your students!

Points vs. Percents When talking about how to record grades of the same weight, this is irrelevant except for rounding differences. For example, these are the same grade: 60%, 80%, 100%, 80% yields average of 80% 3/5, 4/5, 5/5, 4/5 yields 16/20 = 80% These have a slight rounding error due to “not nice” denominators: 65%, 71%, 59%, 71% yields 67% 11/17, 12/17, 10/17, 12/17 yields 66% Moral: Choose the denominator wisely.

Points vs. Percents When in reference to weighted categories versus straight points, the differences are aesthetic because every grading system is weighted. Consider a typical “points” system: Homework worth 5 points each Quizzes worth 50 points each Tests worth 100 points each Is a test worth 20 times as much as homework?

Points vs. Percents If there are 2 tests and 2 quizzes per quarter, but homework every day that gives us: 225 points of homework (43%) 100 points from quizzes (19%) 200 points from tests (38%)

Points vs. Percents Think long-term. During the whole quarter say you typically have: 8 Homework summative assessments (This is purely for demonstration purposes! HW should be a formative assessment!) 4 Problem solving summative assessments 4 Quizzes 2 Unit tests 1 Project 1 Quarter Exam

Points vs. Percents Now consider this weighted system:
Homework worth 10% (This is purely for demonstration purposes! HW should be worth 0%!) Problem Solving worth 10% Quizzes worth 20% Unit Tests worth 30% Project worth 10% Quarter Exam worth 20%

Points vs. Percents It is the same as this points system:
Homework worth 100 points each Problem Solving worth 200 points each Quizzes worth 400 points each Unit Tests worth 1200 points each Project worth 800 points Quarter Exam worth 1600 points Hint: Making everything out of 100 makes it easier for the students. So instead of saying a quiz is out of 400 points, tell students their grade counts four times.

Points vs. Percents The difference is how you get to the end grade:
In this case, both grades end with a final grade of 83% because we made sure the weight of the points matched the weight of the categories. HW1 PS1 HW2 PS2 QZ1 HW3 QZ2 HW4 TST1 HW5 PS3 HW6 PS4 QZ3 HW7 QZ4 HW8 TST2 Proj Q Ex Points for assignment 100 200 400 1200 800 1600 Joe Bob (%) 80% 75% 70% 85% 60% 100% 90% 93% 50% 65% 88% 96% Joe Bob (pts) 80 150 70 170 240 340 90 1116 60 140 50 260 1056 680 1536 Weighted Grade 78% 69% 71% 77% 84% 83% 82% 81% Points Grade 74% 79%

Points vs. Percents The difference is how you get to the end grade:

Standards-based grading usually doesn’t actually mean standards-based grading. It usually means grading with a rubric or something similar with: 4 – Exceeds standard 3 – Meets standard 2 – Meets standard with assistance 1 – Does not meet standard Example Report Card: 8th Grade Pre-Algebra: 3 Number System: 4 Expression and Equations: 2 Functions: 3 Geometry: 2 Statistics and Probability: 4 Note: You can have this same break down of grades with a regular percent grading system by simply making your category of grades follow the Domain name and weighting summative assessments appropriately via points.

Fact: are not equally likely. 4 may represent 90% – 100% accuracy 3 may represent 80% – 90% accuracy 2 may represent 60% – 80% accuracy 1 may represent 0% – 60% accuracy Even using objective benchmarks, they are still not equally likely. Problem: You can’t average the scores, but we need to. Geometry scores of 0%, 90%, 90%, 90% average to 67.5% SB scores of 1, 4, 4, 4 average to 3.25 (meets) Which one signals to the parent there is a problem? To get an overall Geometry score we need to average to account for subcategories within Geometry (area, volume, Pythagorean Theorem, etc.) 4 should not mean “above grade level” because that is formative, not summative for the grade level standards.

Potential Solution: Power Law Average The Power Law is basically a predictor of how the student would score on the next assessment based on previous performance. So scores of 1,2,3,4 might yield a 4 while scores of 4,3,2,1 might yield a 1. Problem: Power Law only works for individual skills Most assessments cover a multitude of skills. Getting a Geometry score of 4 on the first assessment does not mean anything about the Geometry score on the next assessment if the first assessment covered 2D geometry while the second assessment covered 3D.

Problem: How do you deal with assessments that incorporate multiple standards or skills? You would need multiple grades for the same assessment(s). Problem: There is more information reported (usually) with SB grading, but it is still not useful. Does a 2 in Geometry mean I need help with transformations, volume, or the Pythagorean theorem?

Bottom Line: is just as flawed as a traditional grading system. Solution: Use good assessment practices in whatever grading system you use and many of the problems that the SB Grading movement is trying to tackle will be resolved.

My Grading System Weighted Categories that are assessed based on standards mastery Homework Completion (0%) – Daily Homework Accuracy (0%) – Each specific skill or set of skills Mastery Task (0%) – Each specific skill or set of skills Problem Solving (10%) – 4 to 8 per quarter Weekly Quiz (30%) – 4 to 6 per quarter Unit Pre-Test (0%) – 3 per quarter Unit Post-Test (40%) – 3 per quarter Quarter Project (10%) – 1 per quarter Quarter Exam (10%) – 1 per quarter Enrichment (0%) – As needed based on Pre-Tests Remediation (0%) – As needed for progress monitoring But this is not perfect! We’re working to change it! Explain each one, summative vs. formative, valid since everything ties directly to standards, reliable since even partial credit is defined on common assessments.