Automatically Grading Programming Assignments with Web-CAT Stephen H. Edwards Virginia Tech Dept. of Computer Science

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Questioning Techniques
Verification and Validation
Just what you need to know and do NOW!
Intro to CIT 594
A Brief Introduction to Test- Driven Development Shawn M. Jones.
Evaluating Requirements. Outline Brief Review Stakeholder Review Requirements Analysis Summary Activity 1.
Alternate Software Development Methodologies
Chapter 10 Schedule Your Schedule. Copyright 2004 by Pearson Education, Inc. Identifying And Scheduling Tasks The schedule from the Software Development.
EVALUATING WRITING What, Why, and How? Workshopping explanation and guidelines Rubrics: for students and instructors Students Responding to Instructor.
API Design CPSC 315 – Programming Studio Fall 2008 Follows Kernighan and Pike, The Practice of Programming and Joshua Bloch’s Library-Centric Software.
Computer Science 162 Section 1 CS162 Teaching Staff.
Software Testing. “Software and Cathedrals are much the same: First we build them, then we pray!!!” -Sam Redwine, Jr.
High Level: Generic Test Process (from chapter 6 of your text and earlier lesson) Test Planning & Preparation Test Execution Goals met? Analysis & Follow-up.
Introduction to Software Testing
Effective Questioning in the classroom
OCTOBER ED DIRECTOR PROFESSIONAL DEVELOPMENT 10/1/14 POWERFUL & PURPOSEFUL FEEDBACK.
©Ian Sommerville 2000Software Engineering, 6th edition. Chapter 19Slide 1 Verification and Validation l Assuring that a software system meets a user's.
INFO 637Lecture #31 Software Engineering Process II Launching & Strategy INFO 637 Glenn Booker.
Thinking Actively in a Social Context T A S C.
Managing Software Quality
Is PeerMark a useful tool for formative assessment of literature review? A trial in the School of Veterinary Science Duret, D & Durrani,
Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.
Introduction to RUP Spring Sharif Univ. of Tech.2 Outlines What is RUP? RUP Phases –Inception –Elaboration –Construction –Transition.
1 Shawlands Academy Higher Computing Software Development Unit.
University of Palestine software engineering department Testing of Software Systems Fundamentals of testing instructor: Tasneem Darwish.
TESTING.
IT Systems Analysis & Design
Classroom Assessments Checklists, Rating Scales, and Rubrics
Software Engineering Management Lecture 1 The Software Process.
1 The Software Development Process  Systems analysis  Systems design  Implementation  Testing  Documentation  Evaluation  Maintenance.
Software Development Software Testing. Testing Definitions There are many tests going under various names. The following is a general list to get a feel.
Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.
Dr. Tom WayCSC Testing and Test-Driven Development CSC 4700 Software Engineering Based on Sommerville slides.
Introduction to Software Testing. Types of Software Testing Unit Testing Strategies – Equivalence Class Testing – Boundary Value Testing – Output Testing.
From Quality Control to Quality Assurance…and Beyond Alan Page Microsoft.
OCTOBER ED DIRECTOR PROFESSIONAL DEVELOPMENT 10/1/14 POWERFUL & PURPOSEFUL FEEDBACK.
Peer Assessment Slides Use the following slides to provide a platform for ‘assessment for learning’ in your classroom. This PowerPoint has was produced.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
The Software Development Process
University of Limerick1 Computer Applications CS 4815 Robocode.
Software Engineering 2004 Jyrki Nummenmaa 1 BACKGROUND There is no way to generally test programs exhaustively (that is, going through all execution.
Chapter 1: Fundamental of Testing Systems Testing & Evaluation (MNN1063)
CSCE 1030 Computer Science 1 First Day. Course Dr. Ryan Garlick Office: Research Park F201 B –Inside the Computer Science department.
Topic 4 - Database Design Unit 1 – Database Analysis and Design Advanced Higher Information Systems St Kentigern’s Academy.
Smart Home Technologies
Grades And what they mean. What Grades are NOT: Punishment or Reward Compliments or Insults Kindness or Meanness Indication of how well you are liked.
(1) Test Driven Development Philip Johnson Collaborative Software Development Laboratory Information and Computer Sciences University of Hawaii Honolulu.
Company LOGO Revised and Presented by Rob Coffman, CGMP and Patty Barron, CGMP Welcome To the 2015 Chapter Presidents’ Training Minneapolis – April 28,
Evaluating Requirements
Educational Methods The bag of tricks Direct Instruction/Lecture ä Advantages ä Teacher controlled ä Many objectives can be mastered in a short amount.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
Test Driven Development Introduction Issued date: 8/29/2007 Author: Nguyen Phuc Hai.
Marking and Feedback CPD Student approach to marking.
CS223: Software Engineering Lecture 18: The XP. Recap Introduction to Agile Methodology Customer centric approach Issues of Agile methodology Where to.
Web-CAT: Automatic grading using student-written tests Web-CAT: Grade it your way Decide when and how students can submit, including.
Lecture 15 Chapter 8 Managing IT Project Delivery.
A2 Agreement Trial ICT November Agenda  GCE ICT – Entries and Transitional Arrangements  Outcomes and Issues  10.45Coffee  11.15Principal.
Investigate Plan Design Create Evaluate (Test it to objective evaluation at each stage of the design cycle) state – describe - explain the problem some.
Software Testing and Quality Assurance Practical Considerations (1) 1.
Software Development. The Software Life Cycle Encompasses all activities from initial analysis until obsolescence Analysis of problem or request Analysis.
Chapter 25 – Configuration Management 1Chapter 25 Configuration management.
Laurea Triennale in Informatica – Corso di Ingegneria del Software I – A.A. 2006/2007 Andrea Polini XVII. Verification and Validation.
1 © Agitar Software, 2007 Automated Unit Testing with AgitarOne Presented by Eamon McCormick Senior Solutions Consultant, Agitar Software Inc. Presented.
Software Development.
Accounting: Graded Unit 2 F8KF 35
Week 1 Gates Introduction to Information Technology cosc 010 Week 1 Gates
Testing and Test-Driven Development CSC 4700 Software Engineering
Music Technology What’s in the course?
Presentation transcript:

Automatically Grading Programming Assignments with Web-CAT Stephen H. Edwards Virginia Tech Dept. of Computer Science

Stephen EdwardsVirginia Tech My goals today are to … Explain how requiring students to formulate and test hypotheses about their own code can improve their understanding and performance Describe our experiences with an alternate grading approach supported by a new tool: Web-CAT Describe some of the flexibility in Web-CAT for supporting other approaches Convince you software testing can be an important—and practical—addition to classroom practices Automatically Grading Programming Assignments with Web-CAT

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Students hold onto ineffective techniques Too often, intro students believe that if their code: compiles, the errors are mostly gone runs correctly when I try it once, it is correct runs on the instructor-provided sample input, it is correct has a problem, it can be fixed by trial and error

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT What is reflection-in-action? For an expert, when the current technique is failing … Step back and reflect: “I must be missing something” Re-examine the situation, your solution, and your implicit assumptions about the problem Leads to guesses (hypotheses) about why the solution isn’t working or why something else will be better “[Carry] out an experiment which serves to generate both a new understanding of the phenomenon and a change in the situation”

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Practicing software testing will help students frame and carry out experiments The problem: too much focus on synthesis and analysis too early in teaching CS Need to be able to read and comprehend source code Envision how a change in the code will result in a change in the behavior Need explicit, continually reinforced practice in hypothesizing about program behavior and then experimentally verifying their hypotheses

Stephen EdwardsVirginia Tech Student comments suggest their current testing practices are often weak I run them through some simple tests to ensure that it is operating as expected. But for the most part I have always relied on supplied test data I don’t think about test cases until I am confident my program is 100% working. Of course, it almost never is … I usually write the whole thing up and then start doing rapid-fire tests of everything I can think of. Automatically Grading Programming Assignments with Web-CAT

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT A comprehensive strategy is necessary for a culture shift in what students do Students cannot test their own code Want a culture shift in student behavior A single upper-division course would have little impact on practices in other classes So: Systematically incorporate testing practices across many courses CS1 CS2 OO Design OO Design Data Struct Data Struct Testing Practices Testing Practices

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Expect students to apply their testing skills all the time in programming assignments Expect students to test their own work Empower students by engaging them in the process of assessing their own programs Require students to demonstrate the correctness of their own work through testing Do this consistently across many courses

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT What tools and techniques should I teach? We want to start with skills that are directly applicable to authentic student-oriented tasks Don’t want to add bureaucratic busywork to assignments Without tool support, this is a lost cause! It is imperative to give students skills they value … But most textbooks only give a “conceptual” intro to idealized industrial practices, not techniques students can use in their own assignments

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Test-driven development is very accessible for students Also called “test-first coding” Focuses on thorough unit testing at the level of individual methods/functions “Write a little test, write a little code” Tests come first, and describe what is expected, then followed by code, which must be revised until all tests pass Encourages lots of small (even tiny) iterations See for on-line references

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Students can apply TDD in assignments and get immediate, useful benefits Conceptually, easy for students to understand and relate to Increases confidence in code Increases understanding of requirements Preempts “big bang” integration

Stephen EdwardsVirginia Tech The problem is devising an effective assessment strategy Need to assess student performance at testing Need to give productive feedback Need to provide rapid turnaround Cannot afford huge increase in resources required Automatically Grading Programming Assignments with Web-CAT

Stephen EdwardsVirginia Tech Conventional automated assessment does not encourage good testing habits Student uploads program Program is compiled Executed against test data Scored based on output Automatically Grading Programming Assignments with Web-CAT

Stephen EdwardsVirginia Tech The conventional approach provides useful benefits that do lead to a cultural change Fast, precise feedback to students Chance(s) to improve based on feedback Good assessment of behavior Systematic use resulted in culture change Automatically Grading Programming Assignments with Web-CAT

Stephen EdwardsVirginia Tech But the conventional approach may discourage desired behavior and skills Focus is on output correctness, first and foremost “Get it working first, work on commenting, structure, etc. later” Students not encouraged or rewarded for testing on their own Students often do less testing Automatically Grading Programming Assignments with Web-CAT

Stephen EdwardsVirginia Tech Proper grading and feedback can provide positive incentive for desirable behavior Decide what behavior to foster Choose a corresponding scoring/reward system Design feedback approach Use students’ adaptive nature to drive cultural change Automatically Grading Programming Assignments with Web-CAT

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Proper grading and feedback is critical to reinforcing desired behavior Assess test validity: correctness of student’s tests Assess test completeness: the “thoroughness” of student’s tests Assess program correctness: behavior of student’s solution Multiply scores as percentages

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Students improve their code quality when using Web-CAT Newly written “untested” code Commerical-quality code

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Students start earlier and finish earlier when they use Web-CAT

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT An evaluation of submitted code indicates students program more effectively Bold  p =.05 significance WithoutWith TDD Recorded grades90.2%96.1% TA assessment98.1%98.2% Automated grader assessment76.8%94.0% Faults on master test suite36.7%24.9% Projected Defects/KSLOC 7038 (45% less!) How early was first submission?2.2 days4.2 days

Stephen EdwardsVirginia Tech After using TDD and Web-CAT, students clearly perceive practical benefits AgreeDisagree More helpful at detecting errors than Curator4.3 Provides excellent support for TDD4.1 Increases my confidence in correctness3.9 Increases my confidence when making changes3.8 Makes me test my solution more thoroughly3.8 Makes me more systematic in devising tests3.8 Would like to use, even if not required3.8 Automatically Grading Programming Assignments with Web-CAT

Stephen EdwardsVirginia Tech Student reactions are very positive toward TDD I am very excited about using TDD. I agree that TDD can be beneficial and I’m glad we are being required to experiment with it in this course. If it increases the effectiveness of my programming and decreases the time I spend debugging, then I am all for it. [Previously,] I had to quit my detailed testing and stick to making the program appear to work with the sample data given every time a deadline drew near. With [TDD], the tests are such an integral part of the project that no time- conserving measure will save me. Automatically Grading Programming Assignments with Web-CAT

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT We use Web-CAT to automatically process student submissions and check their work Web application written in 100% pure Java Deployed as a servlet Built on Apple’s WebObjects Uses a large-grained plug-in architecture internally, providing for easily extensible data model, UI, and processing features

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Web-CAT’s strengths are targeted at broader use Security: mini-plug-ins for different authentication schemes, global user permissions, and per-course role- based permissions Portability: 100% pure Java servlet for Web-CAT engine Extensibility: Completely language-neutral, process- agnostic approach to grading, via site-wide or instructor- specific grading plug-ins Manual grading: HTML “web printouts” of student submissions can be directly marked up by course staff to provide feedback

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Grading plug-ins are the key to process flexibility and extensibility in Web-CAT Processing for an assignment consists of a “tool chain” or pipeline of one or more grading plug-ins The instructor has complete control over which plug-ins appear in the pipeline, in what order, and with what parameters A simple and flexible, yet powerful way for plug-ins to communicate with Web-CAT, with each other We have a number of existing plug-ins for Java, C++, Scheme, Prolog, Pascal, Standard ML, … Instructors can write and upload their own plug-ins Plug-ins can be written in any language executable on the server (we usually use Perl)

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT The most well-known plug-in is for grading Java assignments that include student tests ANT-based build of arbitrary Java projects PMD and Checkstyle static analysis ANT-based execution of student-written JUnit tests Carefully designed Java security policy Clover test coverage instrumentation ANT-based execution of optional instructor reference tests Unified HTML web printout Highly configurable (PMD rules, Checkstyle rules, supplemental jar files, supplemental data files, java security policy, point deductions, and lots more)

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Web-CAT supports a variety of languages, and its Java plug-in is aimed at software testing ANT-based build of arbitrary Java projects PMD and Checkstyle static analysis ANT-based execution of student-written JUnit tests Carefully designed Java security policy Clover test coverage instrumentation ANT-based execution of optional instructor reference tests Unified HTML web printout Highly configurable (PMD rules, Checkstyle rules, supplemental jar files, supplemental data files, java security policy, point deductions, and lots more)

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Web-CAT provides timely, constructive feedback on how to improve performance Indicates where code can be improved Indicates which parts were not tested well enough Provides as many “revise/ resubmit” cycles as possible

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT The most important step in writing testable assignments is … Learning to write tests yourself Writing an instructor’s solution with tests that thoroughly cover all the expected behavior Practice what you are teaching/preaching

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Students get frustrated without feedback, so reference tests must provide some If students only get a score, but no other feedback for how to improve, they get easily frustrated We augment our reference tests to provide “hints” for failed tests, cross-referenced to the program assignment Requirements in assignment spec mul : this command takes two arguments from the evaluation stack and multiplies them 11. Feedback to student on failed test Your testing does not fully cover (11) More detailed alternate feedback (11) mul command failed, expected 4 but received 8

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Students will try to get Web-CAT to do their work for them Students appreciate the feedback, but will avoid thinking at (nearly) all costs Too much feedback encourages students to use Web- CAT for testing instead of writing their own tests— they use it as a development tool instead of simply to check their work This limits the learning benefits, which come in large part from students writing their own tests Lesson: balance providing suggestive feedback without “giving away” the answers: lead the student to think about the problem

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT We have also tried to influence student work habits to improve their success Encourage early submission by providing extra incentives or using late penalties Score bonuses and/or penalties are easy Another useful approach: Generous limit on the total number of submissions (60) Hints disappear one day before the due date Project closes for one day to encourage students to step away and reflect on “the last bug” Project opens again for one day with hints re- enabled, but with a cap on how much the score can improve

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Lessons for writing program assignments intended for automatic grading Requires greater clarity and specificity Requires you to explicitly decide what you wish to test, and what you wish to leave open to student interpretation Requires you to unambiguously specify the behaviors you intend to test Requires preparing a reference solution before the project is due, more upfront work for professors or TAs Grading is much easier as many things are taken care by Web-CAT; course staff can focus on assessing design

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Areas to look out for in writing “testable” assignments How do you write tests for the following: Main programs Code that reads/write to/from stdin/stdout or files Code with graphical output Code with a graphical user interface

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Testing main programs The key: think in object-oriented terms There should be a principal class that does all the work, and a really short main program The problem is then simply how to test the principal class (i.e., test all of its methods) Make sure you specify your assignments so that such principal classes provide enough accessors to inspect or extract what you need to test

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Testing input and output behavior The key: specify assignments so that input and output use streams given as parameters, and are not hard-coded to specific sources destinations Then use string-based streams to write test cases; show students how In Java, we use BufferedReaders and PrintWriters for all I/O In C++, we use istreams and ostreams for all I/O

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Testing programs with graphical output The key: if graphics are only for output, you can ignore them in testing Ensure there are enough methods to extract the key data in test cases We use this approach for testing Karel the Robot programs, which use graphic animation so students can observe behavior

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Testing programs with graphical UIs This is a harder problem—maybe too distracting for many students, depending on their level The key question: what is the goal in writing the tests? Is it the GUI you want to test, some internal behavior, or both? Three basic approaches: Specify a well-defined boundary between the GUI and the core, and only test the core code Switch in an alternative implementation of the UI classes during testing Test by simulating GUI events

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Conclusion: including software testing helps promote learning and performance If you require students to write their own tests … Our experience indicates students are more likely to complete assignments on time, produce one third less bugs, and achieve higher grades on assignments It is definitely more work for the instructor But it definitely improves the quality of programming assignment writeups and student submissions

Stephen EdwardsVirginia Tech Automatically Grading Programming Assignments with Web-CAT Visit our SourceForge project! Info about using our automated grader, getting trial accounts, etc. Movies of making submissions, setting up assignments, and more Custom Eclipse plug-ins for C++- style TDD Links to our own Eclipse feature site