Presentation is loading. Please wait.

Presentation is loading. Please wait.

Storage Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Similar presentations


Presentation on theme: "Storage Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See"— Presentation transcript:

1 Storage Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See http://software-carpentry.org/license.html for more information. Sets and Dictionaries

2 Introduction Let's try an experiment

3 Sets and DictionariesIntroduction >>> things = set() >>> things.add('a string') >>> print things set(['a string']) Let's try an experiment

4 Sets and DictionariesIntroduction >>> things = set() >>> things.add('a string') >>> print things set(['a string']) >>> things.add([1, 2, 3]) TypeError: unhashable type: 'list' Let's try an experiment

5 Sets and DictionariesIntroduction >>> things = set() >>> things.add('a string') >>> print things set(['a string']) >>> things.add([1, 2, 3]) TypeError: unhashable type: 'list' Let's try an experiment What's wrong?

6 Sets and DictionariesIntroduction >>> things = set() >>> things.add('a string') >>> print things set(['a string']) >>> things.add([1, 2, 3]) TypeError: unhashable type: 'list' Let's try an experiment What's wrong? And what does the error message mean?

7 Sets and DictionariesIntroduction How are sets stored in a computer's memory?

8 Sets and DictionariesIntroduction How are sets stored in a computer's memory? Could use a list

9 Sets and DictionariesIntroduction def set_create(): return [] How are sets stored in a computer's memory? Could use a list

10 Sets and DictionariesIntroduction def set_create(): return [] def set_in(set_list, item): for thing in set_list: if thing == item: return True return False How are sets stored in a computer's memory? Could use a list

11 Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item)

12 Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item) How efficient is this?

13 Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item) How efficient is this? With N items in the set, in and add take 1 to N steps

14 Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item) How efficient is this? With N items in the set, in and add take 1 to N steps "Average" is N/2

15 Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item) How efficient is this? With N items in the set, in and add take 1 to N steps "Average" is N/2 It's possible to do much better

16 Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item) How efficient is this? With N items in the set, in and add take 1 to N steps "Average" is N/2 It's possible to do much better But the solution puts some constraints on programs

17 Sets and DictionariesIntroduction Start simple: how do we store a set of integers?

18 Sets and DictionariesIntroduction Start simple: how do we store a set of integers? If the range of possible values is small and fixed, use a list of Boolean flags ("present" or "absent")

19 Sets and DictionariesIntroduction Start simple: how do we store a set of integers? If the range of possible values is small and fixed, use a list of Boolean flags ("present" or "absent") True Fals e True 0 1 2 == {0, 2}

20 Sets and DictionariesIntroduction Start simple: how do we store a set of integers? If the range of possible values is small and fixed, use a list of Boolean flags ("present" or "absent") But what if the range of values is large, or can change over time? True Fals e True 0 1 2 == {0, 2}

21 Sets and DictionariesIntroduction Use a fixed-size hash table of length L

22 Sets and DictionariesIntroduction Use a fixed-size hash table of length L Store the integer I at location I % L

23 Sets and DictionariesIntroduction Use a fixed-size hash table of length L Store the integer I at location I % L '%' is the remainder operator

24 Sets and DictionariesIntroduction Use a fixed-size hash table of length L Store the integer I at location I % L '%' is the remainder operator 0 1 2 3 4 1625 101 3378 == {3378, 1625, 101}

25 Sets and DictionariesIntroduction Time to insert or look up is constant (!)

26 Sets and DictionariesIntroduction Time to insert or look up is constant (!) But what do we do when there's a collision?

27 Sets and DictionariesIntroduction Time to insert or look up is constant(!) But what do we do when there's a collision? 0 1 2 3 4 1625 101 3378 + 206

28 Sets and DictionariesIntroduction Option #1: store it in the next empty slot 0 1 2 3 4 1625 101 3378 206

29 Sets and DictionariesIntroduction Option #2: chain values together 0 1 2 3 4 1625 101 3378 206

30 Sets and DictionariesIntroduction Either works well until the table is about 3/4 full

31 Sets and DictionariesIntroduction Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly

32 Sets and DictionariesIntroduction Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly So enlarge the table

33 Sets and DictionariesIntroduction 0 1 2 3 4 1625 101 3378 Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly So enlarge the table

34 Sets and DictionariesIntroduction 0 1 2 3 4 1625 101 3378 + 206 Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly So enlarge the table

35 Sets and DictionariesIntroduction 0 1 2 3 4 1625 101 3378 + 206 0 1 2 3 4 1625 101 3378 5 6 7 8 => Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly So enlarge the table

36 Sets and DictionariesIntroduction 0 1 2 3 4 1625 101 3378 + 206 0 1 2 3 4 1625 101 3378 5 6 7 8 => 206 Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly So enlarge the table

37 Sets and DictionariesIntroduction How do we store strings?

38 Sets and DictionariesIntroduction How do we store strings? Use a hash function to generate an integer index based on the characters in the string

39 Sets and DictionariesIntroduction "zebra "

40 Sets and DictionariesIntroduction "zebra " z e b r a ==

41 Sets and DictionariesIntroduction "zebra " z e b r a == 122 101 98 114 97 ==

42 Sets and DictionariesIntroduction "zebra " z e b r a == 122 101 98 114 97 == 532

43 Sets and DictionariesIntroduction 0 1 2 3 4 "zebra " z e b r a == 122 101 98 114 97 == 532% 5

44 Sets and DictionariesIntroduction 0 1 2 3 4 "zebra " z e b r a == 122 101 98 114 97 == 532% 5 If we can define a hash function for something, we can store it in a set

45 Sets and DictionariesIntroduction 0 1 2 3 4 "zebra " z e b r a == 122 101 98 114 97 == 532% 5 If we can define a hash function for something, we can store it in a set So long as nothing changes behind our back

46 Sets and DictionariesIntroduction 0 1 2 3 4 z e b r a This is what the previous example really looks like in memory

47 Sets and DictionariesIntroduction 0 1 2 3 4 z e b r a This is what the previous example really looks like in memory Let's take a look at what happens if we use a list

48 Sets and DictionariesIntroduction ['z','e','b','r','a ']

49 Sets and DictionariesIntroduction ['z','e','b','r','a '] 0 1 2 3 4 'z' 'e' 'r' == 'b' 'a'

50 Sets and DictionariesIntroduction 0 1 2 3 4 ['z','e','b','r','a '] 0 1 2 3 4 'z' 'e' 'r' == 'b' 'a' 523 % 5

51 Sets and DictionariesIntroduction 0 1 2 3 4 ['z','e','b','r','a '] 0 1 2 3 4 'z' 'e' 'r' == 'b' 'a' 523 % 5

52 Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 'z' 'e' 'r' 'b' 'a' This is what's actually in memory

53 Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 'z' 'e' 'r' 'b' 'a' This is what's actually in memory What happens if we change the values in the list?

54 Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 'z' 'e' 'r' 'b' 'a'

55 Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a'

56 Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a' 523 % 5

57 Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place!

58 Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place! ['s','e','b','r','a'] in S

59 Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place! ['s','e','b','r','a'] in S looks at index 0 and says False

60 Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place! ['s','e','b','r','a'] in S looks at index 0 and says False ['z','e','b','r','a'] in S

61 Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place! ['s','e','b','r','a'] in S looks at index 0 and says False ['z','e','b','r','a'] in S looks at index 2 and says True

62 Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place! ['s','e','b','r','a'] in S looks at index 0 and says False ['z','e','b','r','a'] in S looks at index 2 and says True (or blows up)

63 Sets and DictionariesIntroduction This problem arises with any mutable structure

64 Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes

65 Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive

66 Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive Option #2: allow it, and blame the programmer

67 Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive Option #2: allow it, and blame the programmer Very expensive when it goes wrong

68 Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive Option #2: allow it, and blame the programmer Very expensive when it goes wrong Option #3: only permit immutable objects in sets

69 Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive Option #2: allow it, and blame the programmer Very expensive when it goes wrong Option #3: only permit immutable objects in sets (If an object can't change, neither can its hash value)

70 Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive Option #2: allow it, and blame the programmer Very expensive when it goes wrong Option #3: only permit immutable objects in sets (If an object can't change, neither can its hash value) Slightly restrictive, but never disastrous

71 Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name?

72 Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name? Option #1: concatenate them

73 Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name? Option #1: concatenate them 'Charles' and 'Darwin' stored as 'Charles|Darwin'

74 Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name? Option #1: concatenate them 'Charles' and 'Darwin' stored as 'Charles|Darwin' (Can't use space to join 'Paul Antoine' and 'St. Cyr')

75 Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name? Option #1: concatenate them 'Charles' and 'Darwin' stored as 'Charles|Darwin' (Can't use space to join 'Paul Antoine' and 'St. Cyr') But data always changes...

76 Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name? Option #1: concatenate them 'Charles' and 'Darwin' stored as 'Charles|Darwin' (Can't use space to join 'Paul Antoine' and 'St. Cyr') But data always changes... Code has to be littered with joins and splits

77 Sets and DictionariesIntroduction Option #2 (in Python): use a tuple

78 Sets and DictionariesIntroduction Option #2 (in Python): use a tuple An immutable list

79 Sets and DictionariesIntroduction Option #2 (in Python): use a tuple An immutable list Contents cannot be changed after tuple is created

80 Sets and DictionariesIntroduction >>> full_name = ('Charles', 'Darwin')

81 Sets and DictionariesIntroduction >>> full_name = ('Charles', 'Darwin') Use '()' instead of '[]'

82 Sets and DictionariesIntroduction >>> full_name = ('Charles', 'Darwin') >>> full_name[0] Charles

83 Sets and DictionariesIntroduction >>> full_name = ('Charles', 'Darwin') >>> full_name[0] Charles >>> full_name[0] = 'Erasmus' TypeError: 'tuple' object does not support item assignment

84 Sets and DictionariesIntroduction >>> full_name = ('Charles', 'Darwin) >>> full_name[0] Charles >>> full_name[0] = 'Erasmus' TypeError: 'tuple' object does not support item assignment >>> names = set() >>> names.add(full_name) >>> names set([('Charles', 'Darwin')])

85 Sets and DictionariesIntroduction This episode has been about the science of computer science

86 Sets and DictionariesIntroduction This episode has been about the science of computer science - Designs for hash tables

87 Sets and DictionariesIntroduction This episode has been about the science of computer science - Designs for hash tables - Mutability, usability, and performance

88 Sets and DictionariesIntroduction This episode has been about the science of computer science - Designs for hash tables - Mutability, usability, and performance It's a lot to digest in one go...

89 Sets and DictionariesIntroduction This episode has been about the science of computer science - Designs for hash tables - Mutability, usability, and performance It's a lot to digest in one go......but sometimes you need a little theory to make sense of practice

90 July 2010 created by Greg Wilson Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See http://software-carpentry.org/license.html for more information.


Download ppt "Storage Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See"

Similar presentations


Ads by Google