Download presentation

Presentation is loading. Please wait.

Published bySamuel Salsbury Modified over 3 years ago

1
Portugal Improving the Automatic Evaluation of Problem Solutions in Programming Contests Pedro Ribeiro and Pedro Guerreiro

2
2 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Presentation Overview Automatic Evaluation: Past and Present –The case of IOI A possible path for improving evaluation –Developing only a function (not a complete program) –Abstract Input/Output –Repeat the same function call (+ clock precision) –No hints on expected complexity –Examine runtime behaviour as tests increase in size Some preliminary results Conclusions

3
3 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests All programming contests need an efficient and fair way of distinguishing submitted solutions (Automatic) Evaluation What do we evaluate? –Correction: does the program produce correct answers for all instances of the problem? –Efficiency: does it do it fast enough? Does it have the necessary time and memory complexity? Programming Contests

4
4 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Classic way of evaluating –Set of pre-defined tests (inputs) –Run program with tests and check output IOI has been doing this almost the same way since the beginning with two major advances: –Manual evaluation > Automatic evaluation –Individual Tests -> Grouped tests Although IOI has 3 different types of tasks, the main core of the event are still batch tasks Programming Contests

5
5 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests IOI Types of Tasks IOI YearBatch TasksReactive TasksOutput Only 2009 710 2008 600 2007 510 2006 402 2005 510 2004 501

6
6 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Programming Contests Correction: almost black art –Program testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence (Dijkstra) Efficiency: –Typically judges create set of model solutions of different complexities –Tests designed in that model solutions achieve planned number of points –Considerable amount of tuning (environment) –Considerable amount of man power needed –More difficult to introduce new languages

7
7 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Solve the problem by writing a specific function (as opposed to a complete program) Motivation: –Concentrate on the core algorithm (less distractors) –Can be used on earlier stages of learning –Opportunities for new ways of testing (more control on submitted code) It is already done on other types of contests: –TopCoder –Teaching Environments (Ribeiro and Guerreiro, 2008) Ideas: Single function

8
8 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Ideas: I/O Abstraction The Input and Output should be abstract and not specific to a language How to do it: –Input already in memory, passed as function arguments (simple form, no complex data structure) –Output as the function return value(s) Motivation: –Less information processing details –Less complicated problem statements –We can measure time spent in solution (not in I/O) –More balanced performance between languages

9
9 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Idea: Repeat function calls In the past we used smaller input sizes increased speed of computers Currently we use huge input sizes –Clock resolution is poor: small instances > instant –Need to distinguish small asymptotic complexities –Historic fact: Smaller time limit used on IOI: IOI 2007, problem training: 0.3 seconds Future? –Always more speed > bigger input size

10
10 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Idea: Repeat function calls Problems completely detached from reality: –Ex: IOI 2007 Sails, ship with 100,000 masts

11
11 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Idea: Repeat function calls Problems completely detached from reality: –Ex: IOI 2007 Sails, ship with 100,000 masts

12
12 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Idea: Repeat function calls Real world: How can we measure the thickness of a sheet of paper if we have a standard ruler without enough accuracy? stack of 100 sheets measures 1cm, then each sheet is ~0.1mm We can use the same idea on functions! –Run once with small instances may be instantaneous But –Running multiple times takes more than 0.00s!

13
13 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Idea: Repeat function calls Run the same functions several times and compute average time Pros –Input size can be smaller and related to problem –We can concentrate on quality of test cases and rely less on randomization to produce big test cases that are impossible to verify manually Cons –We must be careful with memory persistence between successive function calls

14
14 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Idea: No hints on complexity When we give limits for the input: –we simplify implementation details and avoid the need for dynamic memory allocation. but –We disclose the complexity required for the problem Trained students can identify precisely the complexity needed This has great impact on problem solving aspect: –Different mindset: I know which complexity Im looking for and I settle for a solution that does that vs –Scientific approach with real world open problem –Ex: is there a polynomial solution for a problem?

15
15 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Idea: No hints on complexity Give limits for implementation purposes, but make it clear that those are not related to sought efficiency More scientific and open ended approach Need to think how to really solve the problem (and not how to produce a program that passes the test cases) Not overemphasize runtime of particular language –(let me make a test with maximum limits and see if it runs in X seconds on this machine with this language)

16
16 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Idea: Runtime behaviour as tests increase Typically we measure efficiency by creating set of tests such that different model solutions achieve different number of points But not passing does not imply that the required complexity was not achieved (other factors) –Just means that the test case is solved within the constraints A lot of man power needed for model solutions and fine tuning (compiler version, computer speed, language used, etc)

17
17 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Idea: Runtime behaviour as tests increase How can we improve on that? Pen and Paper not an option for large scale evaluation –Need for automatic processes We have different tests, we have different time measures, why dont we use all this information? Plot the runtime as data increases and do some curve fitting –Impossible to determine complexity for all programs, but even a trivial (imperfect) curve can show more information than just knowing which test cases are passed

18
18 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Some Preliminary Results As a proof of concept a simple problem: –Input: Sequence of integers –Output: Subsequence of consecutive integers with maximum sum Only ask for function with I/O already given Small input limit (only 100) Measure time by running multiple times (until aggregated time reached 1s) Use random data for 1,4,8,12,…64 1-2310-43-641

19
19 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Some Preliminary Results Implemented 3 model solutions: –O(N^3) – Iterate all possible intervals in O(N^2) plus iterate trough each interval to discover sum in O(N) –O(N^2) – Iterate all possible intervals in O(N^2) plus O(1) checking of each sum with accumulated sums –O(N) – Iterate trough sequence and keep partial sum, whenever the partial sum is negative, it cannot contribute to best and therefore reset to zero and continue A B C

20
20 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Some Preliminary Results Plot Time(N) / Time(1) Simple correlation measure with another function Solutionlog NN N log N N^2N^3N^42^NN! A 0.68480.92640.95150.99120.99930.98690.60330.5722 B 0.75240.96660.98350.99980.98480.95640.54690.5183 A 0.86240.99520.99270.95860.89850.84170.41360.3906

21
21 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Some Preliminary Results Out of scope to give more detailed mathematical analysis –We could use other statistical measures We know that it is impossible to automatically compute and prove complexities but This simple approach gives meaningful results –runtime is somehow consistent and correlated with a certain function and therefore appears to grow following a pattern that we were able to identify Ex: Linear > appears to take twice the time when data doubles

22
22 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Some Preliminary Results What could this do? –More information from the same test cases –Possibility of giving students automatic feedback on runtime behavior –Possibility of identifying runtime behaviors for which no model solutions were created (less man power!) –Independent of language specific details Ex: Archery Problem, IOI 2009, Day 1 There were solutions with O(N^2R), O(N^3), O(N^2 log N), O(N^2), O(N log N), … No need to code them all in all languages and then tune!

23
23 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests Conclusion 20 Years of IOI: computers are much faster, style of evaluation is still the same Setting up test cases is time consuming and requires man power Need to think of ways to improve evaluation Our proposal, geared to more informal contests or teaching environments, can offer: –No distraction with I/O –No large data sets –More natural problem statements –No hint on complexity (open ended approach) –No need for implementing many model solutions –New languages can be added without changing tests Still more work to obtain robust system but we feel this ideas (or some of them) can be used in practice Future: can evaluation be improved in other ways?

24
24 Plovdiv – IOI2009P. Ribeiro, P. Guerreiro Improving the Automatic Evaluation of Problem Solutions in Programming Contests The End And thats all!:-) Questions? Pedro Ribeiro (pribeiro@dcc.fc.up.pt)pribeiro@dcc.fc.up.pt Pedro Guerreiro (pjguerreiro@ualg.pt)pjguerreiro@ualg.pt

Similar presentations

OK

11-1 FRAMING The data link layer needs to pack bits into frames, so that each frame is distinguishable from another. Our postal system practices a type.

11-1 FRAMING The data link layer needs to pack bits into frames, so that each frame is distinguishable from another. Our postal system practices a type.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on clinical trials phases Ppt on arc welding process Greenhouse effect for kids ppt on batteries Ppt on kpo and bpo Ppt on synthesis and degradation of purines and pyrimidines Ppt on acid-base indicators colors Ppt on time division switching Ppt on traction rolling stock trains Ppt on religion and science Download ppt on basic concepts of chemistry