Datastructures Koen Lindström Claessen (guest lecture by Björn Bringert)

Slides:



Advertisements
Similar presentations
Why not just use Arrays? Java ArrayLists.
Advertisements

AITI Lecture 19 Linked List Adapted from MIT Course 1.00 Spring 2003 Lecture 26 and Tutorial Note 9 (Teachers: Please do not erase the above note)
CS252: Systems Programming Ninghui Li Program Interview Questions.
Introduction to Computation and Problem Solving Class 28: Binary Search Trees Prof. Steven R. Lerman and Dr. V. Judson Harward.
Recursive Data Types Koen Lindström Claessen. Modelling Arithmetic Expressions Imagine a program to help school-children learn arithmetic, which presents.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
HST 952 Computing for Biomedical Scientists Lecture 9.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Sorting Algorithms. Motivation Example: Phone Book Searching Example: Phone Book Searching If the phone book was in random order, we would probably never.
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
Searching. 2 Searching an array of integers If an array is not sorted, there is no better algorithm than linear search for finding an element in it static.
Test Data Generators Lecture 5. Why Distinguish Instructions? Functions always give the same result for the same arguments Instructions can behave differently.
Modelling & Datatypes Koen Lindström Claessen. Software Software = Programs + Data.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 17: Linked Lists.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
CS 106 Introduction to Computer Science I 02 / 28 / 2007 Instructor: Michael Eckmann.
Chair of Software Engineering Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Exercise Session 10.
Searching. 2 Searching an array of integers If an array is not sorted, there is no better algorithm than linear search for finding an element in it static.
Advanced Programming Andrew Black and Tim Sheard Lecture 4 Intro to Haskell.
CS 106 Introduction to Computer Science I 10 / 15 / 2007 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 10 / 16 / 2006 Instructor: Michael Eckmann.
CSE 143 Lecture 7 Sets and Maps reading: ; 13.2 slides created by Marty Stepp
Data Structures Using C++ 2E
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Erlang/QuickCheck Thomas Arts, IT University John Hughes, Chalmers University Gothenburg.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
ML Datatypes.1 Standard ML Data types. ML Datatypes.2 Concrete Datatypes  The datatype declaration creates new types  These are concrete data types,
(c) University of Washington14-1 CSC 143 Java Collections.
COP3530 Data Structures600 Stack Stack is one the most useful ADTs. Like list, it is a collection of data items. Supports “LIFO” (Last In First Out) discipline.
LECTURE 37: ORDERED DICTIONARY CSC 212 – Data Structures.
Multi-way Trees. M-way trees So far we have discussed binary trees only. In this lecture, we go over another type of tree called m- way trees or trees.
Chapter 13 Recursion. Learning Objectives Recursive void Functions – Tracing recursive calls – Infinite recursion, overflows Recursive Functions that.
Priority Queues and Binary Heaps Chapter Trees Some animals are more equal than others A queue is a FIFO data structure the first element.
CS 61B Data Structures and Programming Methodology July 28, 2008 David Sun.
1 Joe Meehean.  Problem arrange comparable items in list into sorted order  Most sorting algorithms involve comparing item values  We assume items.
Chapter 21 Priority Queue: Binary Heap Saurav Karmakar.
1 Heaps and Priority Queues Starring: Min Heap Co-Starring: Max Heap.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
Built-in Data Structures in Python An Introduction.
Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Testing and Debugging (Depuração em Haskell) Complementa as seções anteriores Original autorizado por: John Hughes Adaptado.
CS261 Data Structures Ordered Bag Dynamic Array Implementation.
1 Heaps and Priority Queues v2 Starring: Min Heap Co-Starring: Max Heap.
Data Structures & Algorithms
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
Function Definition by Cases and Recursion Lecture 2, Programmeringsteknik del A.
Recursion on Lists Lecture 5, Programmeringsteknik del A.
Chapter 5 Linked List by Before you learn Linked List 3 rd level of Data Structures Intermediate Level of Understanding for C++ Please.
HEAPS. Review: what are the requirements of the abstract data type: priority queue? Quick removal of item with highest priority (highest or lowest key.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
Searching CSE 103 Lecture 20 Wednesday, October 16, 2002 prepared by Doug Hogan.
Chapter 17: Linked Lists. Objectives In this chapter, you will: – Learn about linked lists – Learn the basic properties of linked lists – Explore insertion.
Haskell Chapter 5, Part II. Topics  Review/More Higher Order Functions  Lambda functions  Folds.
Course: Programming II - Abstract Data Types HeapsSlide Number 1 The ADT Heap So far we have seen the following sorting types : 1) Linked List sort by.
Recursion. Objectives At the conclusion of this lesson, students should be able to Explain what recursion is Design and write functions that use recursion.
INTRO2CS Tirgul 8 1. Searching and Sorting  Tips for debugging  Binary search  Sorting algorithms:  Bogo sort  Bubble sort  Quick sort and maybe.
Searching/Sorting. Searching Searching is the problem of Looking up a specific item within a collection of items. Searching is the problem of Looking.
LINKED LISTS.
Test Data Generators. Why Distinguish Instructions? Functions always give the same result for the same arguments Instructions can behave differently on.
Chapter 13 Recursion Copyright © 2016 Pearson, Inc. All rights reserved.
Sort & Search Algorithms
Koen Lindström Claessen
Koen Lindström Claessen
Containers and Lists CIS 40 – Introduction to Programming in Python
Test Data Generators.
Koen Lindström Claessen
Search by Hashing.
PROGRAMMING IN HASKELL
Presentation transcript:

Datastructures Koen Lindström Claessen (guest lecture by Björn Bringert)

Data Structures Datatype –A model of something that we want to represent in our program Data structure –A particular way of storing data –How? Depending on what we want to do with the data Today: Two examples –Queues –Tables

Using QuickCheck to Develop Fast Queue Operations What we’re going to do: Explain what a queue is, and give slow implementations of the queue operations, to act as a specification. Explain the idea behind the fast implementation. Formulate properties that say the fast implementation is ”correct”. Test them with QuickCheck.

What is a Queue? Leave from the front Join at the back Examples Files to print Processes to run Tasks to perform

What is a Queue? A queue contains a sequence of values. We can add elements at the back, and remove elements from the front. We’ll implement the following operations: empty :: Q a add :: a -> Q a -> Q a remove :: Q a -> Q a front :: Q a -> a isEmpty :: Q a -> Bool -- an empty queue -- add an element at the back -- remove an element from the front -- inspect the front element -- check if the queue is empty

First Try data Q a = Q [a] deriving (Eq, Show) empty = Q [] add x (Q xs) = Q (xs++[x]) remove (Q (x:xs)) = Q xs front (Q (x:xs)) = x isEmpty (Q xs) = null xs

Works, but slow add x (Q xs) = Q (xs++[x]) [] ++ ys = ys (x:xs) ++ ys = x : (xs++ys) Add 1, add 2, add 3, add 4, add 5… Time is the square of the number of additions As many recursive calls as there are elements in xs

A Module Implement the result in a module Use as specification Allows the re-use –By other programmers –Of the same names

SlowQueue Module module SlowQueue where data Q a = Q [a] deriving (Eq, Show) empty = Q [] add x (Q xs) = Q (xs++[x]) remove (Q (x:xs)) = Q xs front (Q (x:xs)) = x isEmpty (Q xs) = null xs

New Idea: Store the Front and Back Separately bcdefghiaj Old Fast to remove Slow to add bcde ihgf a j New Fast to add Fast to remove Periodically move the back to the front.

Smart Datatype data Q a = Q [a] [a] deriving (Eq, Show) The front and the back part of the queue.

Smart Operations empty = Q [] [] add x (Q front back) = Q front (x:back) remove (Q (x:front) back) = fixQ (Q front back) front (Q (x:front) back) = x isEmpty (Q front back) = null front && null back Flip the queue when we serve the last person in the front

Flipping fixQ (Q [] back) = Q (reverse back) [] fixQ q = q This takes one function call per element in the back—each element is inserted into the back (one call), flipped (one call), and removed from the front (one call)

How can we test the smart functions? By using the original implementation as a reference The behaviour should be ”the same” –Check results First version is an abstract model that is ”obviously correct”

Comparing the Implementations They operate on different types of queues To compare, must convert between them –Can we convert a slow Q to a Q? Where should we split the front from the back??? –Can we convert a Q to a slow Q? Retrieve the simple ”model” contents from the implementation contents (Q front back) = Q (front++reverse back)

Accessing modules import qualified SlowQueue as Slow contents :: Q a -> Slow.Q a contents (Q front back) = Slow.Q (front ++ reverse back) Qualified name

The Properties prop_Empty = contents empty == Slow.empty prop_Add x q = contents (add x q) == Slow.add x (contents q) prop_Remove q = contents (remove q) == Slow.remove (contents q) prop_Front q = front q == Slow.front (contents q) prop_IsEmpty q = isEmpty q == Slow.isEmpty (contents q) The behaviour is the same, except for type conversion

Generating Qs instance Arbitrary a => Arbitrary (Q a) where arbitrary = do front <- arbitrary back <- arbitrary return (Q front back)

A Bug! Queues> quickCheck prop_Remove 1 Program error: pattern match failure: instQueue_v2925_v2 984 (Q_Q [] (_IF (null []) [] (Arbitrary_arbitrary (in stArbitrary_v2758 instArbitrary_v2752) 1 (_SEL (,) (inst Monad_v2748_v2921 (RandomGen_split instRandomGen_v2516 ( _SEL (,) (StdGen_StdGen ,StdGen_StdG en (_SEL StdGen_StdGen (StdGen_StdGen (_IF ((instOrd_v28 Ord_< (instNum_v30 Num_- (instNum_v30 Num_* Num_fromInt instNum_v ) ((instNum_v30 Num_ ) ((in…

Verbose Checking Queues> verboseCheck prop_Remove 0: Q [0] [1] 1: Q [] [] Program error: pattern match failure: instQueue_v2925_v2 984 (Q_Q [] []) We should not try to remove from an empty queue!

Preconditions A condition that must hold before a function is called prop_remove q = not (isEmpty q) ==> retrieve (remove q) == remove (retrieve q) prop_front q = not (isEmpty q) ==> front q == front (retrieve q) Useful to be precise about these

Another Bug! Queues> verboseCheck prop_Remove 0: Q [] [1] Program error: pattern match failure: instQueue_v2925_v2 984 (Q_Q [] [1]) But this ought not to happen!

An Invariant Q values ought never to have an empty front, and a non-empty back! Formulate an invariant invariant (Q front back) = not (null front && not (null back))

Testing the Invariant prop_Invariant :: Q Int -> Bool prop_Invariant q = invariant q Of course, it fails… Queues> quickCheck prop_invariant Falsifiable, after 4 tests: QI [] [-1]

Fixing the Generator instance Arbitrary a => Arbitrary (Q a) where arbitrary = do front <- arbitrary back <- arbitrary return (Q front (if null front then [] else back)) Now prop_Invariant passes the tests

Testing the Invariant We’ve written down the invariant We’ve seen to it that we only generate valid QIs as test data We must ensure that the queue functions only build valid Q values! –It is at this stage that the invariant is most useful

Invariant Properties prop_Empty_Inv = invariant empty prop_Add_Inv x q = invariant (add x q) prop_Remove_Inv q = not (isEmpty q) ==> invariant (remove q)

A Bug in the Q operations! Queues> quickCheck prop_Add_Inv Falsifiable, after 2 tests: 0 Q [] [] Queues> add 0 (Q [] []) Q [] [0] The invariant is False!

Fixing add add x (Q front back) = fixQ (Q front (x:back)) We must flip the queue when the first element is inserted into an empty queue Previous bugs were in our understanding (our properties)—this one is in our implementation code

Summary Data structures store data Obeying an invariant... that functions and operations –can make use of (to search faster) –have to respect (to not break the invariant) Writing down and testing invariants and properties is a good way of finding errors

Example Problem: Tables A table holds a collection of keys and associated values. For example, a phone book is a table whose keys are names, and whose values are telephone numbers. Problem: Given a table and a key, find the associated value. John Hughes Hans Svensson Koen Claessen Mary Sheeran

Table Lookup Using Lists Since a table may contain any kind of keys and values, define a parameterised type: type Table a b = [(a, b)] lookup :: Eq a => a -> Table a b -> Maybe b E.g. [(”x”,1), (”y”,2)] :: Table String Int lookup ”y” … Just 2 lookup ”z”... Nothing

Finding Keys Fast Finding keys by searching from the beginning is slow! A better method: look somewhere in the middle, and then look backwards or forwards depending on what you find. (This assumes the table is sorted). Aaboen A Nilsson Hans Östvall Eva Claessen?

Representing Tables Aaboen A Nilsson Hans Östvall Eva We must be able to break up a table fast, into: A smaller table of entries before the middle one, the middle entry, a table of entries after it. data Table a b = Join (Table a b) a b (Table a b)

Quiz What’s wrong with this (recursive) type? data Table a b = Join (Table a b) a b (Table a b)

Quiz What’s wrong with this (recursive) type? No base case! data Table a b = Join (Table a b) a b (Table a b) | Empty Add a base case.

Looking Up a Key To look up a key in a table: If the table is empty, then the key is not found. Compare the key with the key of the middle element. If they are equal, return the associated value. If the key is less than the key in the middle, look in the first half of the table. If the key is greater than the key in the middle, look in the second half of the table.

Quiz Define lookupT :: Ord a => a -> Table a b -> Maybe b Recall data Table a b = Join (Table a b) a b (Table a b) | Empty

Quiz Define lookupT :: Ord a => a -> Table a b -> Maybe b lookupT key Empty = Nothing lookupT key (Join left k v right) | key == k= Just v | key < k= lookupT key left | key > k= lookupT key right Recursive type means a recursive function!

Inserting a New Key We also need function to build tables. We define insertT :: Ord a => a -> b -> Table a b -> Table a b to insert a new key and value into a table. We must be careful to insert the new entry in the right place, so that the keys remain in order. Idea: Compare the new key against the middle one. Insert into the first or second half as appropriate.

Defining Insert insertT key val Empty = Join Empty key val Empty insertT key val (Join left k v right) | key <= k= Join (insertT key val left) k v right | key > k= Join left k v (insertT key val right) Many forget to join up the new right half with the old left half again.

Efficiency On average, how many comparisons does it take to find a key in a table of 1000 entries, using a list and using the new method? Using a list: 500 Using the new method: 10

Testing How should we test the Table operations? –By comparison with the list operations –By relationships between them prop_LookupT k t = lookupT k t == lookup k (contents t) prop_InsertT k v t = insert (k,v) (contents t) == contents (insertT k v t) prop_Lookup_insert k' k v t = lookupT k' (insertT k v t) == if k==k' then Just v else lookupT k' t Table a b -> [(a,b)]

Generating Random Tables Recursive types need recursive generators instance (Arbitrary a, Arbitrary b) => Arbitrary (Table a b) where We can generate arbitrary Tables......provided we can generate keys and values

Generating Random Tables Recursive types need recursive generators instance (Arbitrary a, Arbitrary b) => Arbitrary (Table a b) where arbitrary = oneof [return Empty, do k <- arbitrary v <- arbitrary left <- arbitrary right <- arbitrary return (Join left k v right)] Quiz: What is wrong with this generator?

Controlling the Size of Tables Generate tables with at most n elements table s = frequency [(1, return Empty), (s, do k <- arbitrary v <- arbitrary (l,r) <- two (table (s `div` 2)) return (Join l k v r))] instance (Arbitrary a, Arbitrary b) => Arbitrary (Table a b) where arbitrary = sized table

Controlling the Size of Tables Generate tables with at most n elements table n = frequency [(1, return Empty), (n, do k <- arbitrary v <- arbitrary (l,r) <- two (table ((n-1) `div` 2)) return (Join l k v r))] instance (Arbitrary a, Arbitrary b) => Arbitrary (Table a b) where arbitrary = sized table Size increases during testing (normally up to about 40)

Testing Table Properties Main> quickCheck prop_LookupT Falsifiable, after 10 tests: 0 Join Empty 2 (-2) (Join Empty 0 0 Empty) Main> contents (Join Empty 2 (-2) …) [(2,-2),(0,0)] prop_LookupT k t = lookupT k t == lookup k (contents t) What’s wrong?

Tables must be Ordered! Tables should satisfy an important invariant. prop_InvTable :: Table Integer Integer -> Bool prop_InvTable t = ordered ks where ks = [k | (k,v) <- contents t] Main> quickCheck prop_InvTable Falsifiable, after 4 tests: Join Empty 3 3 (Join Empty 0 3 Empty)

How to Generate Ordered Tables? Generate a random list, –Take the first (key,value) to be at the root –Take all the smaller keys to go in the left subtree –Take all the larger keys to go in the right subtree

Converting a List to a Table -- table kvs converts a list of key-value pairs into a Table -- satisfying the ordering invariant table :: Ord key => [(key,val)] -> Table key val table [] = Empty table ((k,v):kvs) = Join (table [(k',v') | (k',v') <- kvs, k' <= k]) k v (table [(k',v') | (k',v') k])

Generating Ordered Tables instance (Ord a, Arbitrary a, Arbitrary b) => Arbitrary (Table a b) where arbitrary = do xys <- arbitrary return (table xys) Keys must have an ordering List of keys and values

Testing the Properties Now the invariant holds, but the properties don’t! Main> quickCheck prop_InvTable OK, passed 100 tests. Main> quickCheck prop_LookupT Falsifiable, after 7 tests: Join (Join Empty (-1) (-2) Empty) (-1) (-1) Empty

More Testing Main> quickCheck prop_InsertT Falsifiable, after 8 tests: 0 Join Empty 0 (-1) Empty Main> quickCheck prop_lookup_insert Falsifiable, after 84 tests: 1 2 Join Empty 1 1 Empty What’s wrong? prop_InsertT k v t = insert (k,v) (contents t) == contents (insertT k v t) prop_Lookup_Insert k' k v t = lookupT k' (insertT k v t) == if k==k' then Just v else lookupT k' t

The Bug insert key val Empty = Join Empty key val Empty insert key val (Join left k v right) = | key <= k= Join (insert key val left) k v right | key > k= Join left k v (insert key val right) Inserts duplicate keys!

The Fix insertT key val Empty = Join Empty key val Empty insertT key val (Join left k v right) = | key < k= Join (insertT key val left) k v right | key==k = Join left k val right | key > k= Join left k v (insertT key val right) prop_InvTable :: Table Integer Integer -> Bool prop_InvTable t = ordered ks && ks == nub ks where ks = [k | (k,v) <- contents t] (and fix the table generator)

Testing Again Main> quickCheck prop_Lookup_Insert OK, passed 100 tests. Main> quickCheck prop_InsertT Falsifiable, after 6 tests: -2 2 Join Empty (-2) 1 Empty

Testing Again Main> quickCheck prop_lookup_insert OK, passed 100 tests. Main> quickCheck prop_InsertT Falsifiable, after 6 tests: -2 2 Join Empty (-2) 1 Empty Main> insertT (-2) 2 (Join Empty (-2) 1 Empty) Join Empty (-2) 2 Empty

Testing Again Main> quickCheck prop_lookup_insert OK, passed 100 tests. Main> quickCheck prop_insertT Falsifiable, after 6 tests: -2 2 Join Empty (-2) 1 Empty Main> insertT (-2) 2 (Join Empty (-2) 1 Empty) Join Empty (-2) 2 Empty Main> insert (-2,2) [(-2,1)] [(-2,1),(-2,2)]

Testing Again Main> quickCheck prop_lookup_insert OK, passed 100 tests. Main> quickCheck prop_insertT Falsifiable, after 6 tests: -2 2 Join Empty (-2) 1 Empty Main> insertT (-2) 2 (Join Empty (-2) 1 Empty) Join Empty (-2) 2 Empty Main> insert (-2,2) [(-2,1)] [(-2,1),(-2,2)] insert doesn’t remove the old key-value pair when keys clash—the wrong model!

Summary Recursive data-types can store data in different ways Clever choices of datatypes and algorithms can improve performance dramatically Careful thought about invariants is needed to get such algorithms right! Formulating properties and invariants, and testing them, reveals bugs early