Storage Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Slides:



Advertisements
Similar presentations
ThinkPython Ch. 10 CS104 Students o CS104 n Prof. Norman.
Advertisements

Hashing.
Input and Output Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Container Types in Python
CHAPTER 4 AND 5 Section06: Sequences. General Description "Normal" variables x = 19  The name "x" is associated with a single value Sequence variables:
Dictionaries: Keeping track of pairs
Tuples Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Control Flow Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Lists Introduction to Computing Science and Programming I.
CSE 326: Data Structures Lecture #11 B-Trees Alon Halevy Spring Quarter 2001.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
JaySummet IPRE Python Review 2. 2 Outline Compound Data Types: Strings, Tuples, Lists & Dictionaries Immutable types: Strings Tuples Accessing.
CMPT 120 Lists and Strings Summer 2012 Instructor: Hassan Khosravi.
Guide to Programming with Python
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
October 4, 2005ICP: Chapter 4: For Loops, Strings, and Tuples 1 Introduction to Computer Programming Chapter 4: For Loops, Strings, and Tuples Michael.
Lists in Python.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 9 More About Strings.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Python Lists and Such CS 4320, SPRING List Functions len(s) is the length of list s s + t is the concatenation of lists s and t s.append(x) adds.
Fall Week 4 CSCI-141 Scott C. Johnson.  Computers can process text as well as numbers ◦ Example: a news agency might want to find all the articles.
I Power Int 2 Computing Software Development High Level Language Constructs.
Collecting Things Together - Lists 1. We’ve seen that Python can store things in memory and retrieve, using names. Sometime we want to store a bunch of.
5 BASIC CONCEPTS OF ANY PROGRAMMING LANGUAGE Let’s get started …
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Roles of Variables with Examples in Python ® © 2014 Project Lead The Way, Inc.Computer Science and Software Engineering.
Built-in Data Structures in Python An Introduction.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
Functions CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
Recap form last time How to do for loops map, filter, reduce Next up: dictionaries.
Data Collections: Lists CSC 161: The Art of Programming Prof. Henry Kautz 11/2/2009.
Functions CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
Hashing Hashing is another method for sorting and searching data.
I Power Higher Computing Software Development High Level Language Constructs.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
COMPE 111 Introduction to Computer Engineering Programming in Python Atılım University
Introduction to Computing Using Python Dictionaries: Keeping track of pairs  Class dict  Class tuple.
Chapter 5 Linked List by Before you learn Linked List 3 rd level of Data Structures Intermediate Level of Understanding for C++ Please.
Operators Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Variables and Strings. Variables  When we are writing programs, we will frequently have to remember a value for later use  We will want to give this.
Invasion Percolation: Randomness Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Basics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Strings Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Dictionaries Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Functions CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
String and Lists Dr. José M. Reyes Álamo. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Equality, references and mutability COMPSCI 105 SS 2015 Principles of Computer Science.
String and Lists Dr. José M. Reyes Álamo.
Python Aliasing Copyright © Software Carpentry 2010
Introduction to Computer Science / Procedural – 67130
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
String and Lists Dr. José M. Reyes Álamo.
Data Types Every variable has a given data type. The most common data types are: String - Text made up of numbers, letters and characters. Integer - Whole.
Introduction to Computer Science
Presentation transcript:

Storage Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See for more information. Sets and Dictionaries

Introduction Let's try an experiment

Sets and DictionariesIntroduction >>> things = set() >>> things.add('a string') >>> print things set(['a string']) Let's try an experiment

Sets and DictionariesIntroduction >>> things = set() >>> things.add('a string') >>> print things set(['a string']) >>> things.add([1, 2, 3]) TypeError: unhashable type: 'list' Let's try an experiment

Sets and DictionariesIntroduction >>> things = set() >>> things.add('a string') >>> print things set(['a string']) >>> things.add([1, 2, 3]) TypeError: unhashable type: 'list' Let's try an experiment What's wrong?

Sets and DictionariesIntroduction >>> things = set() >>> things.add('a string') >>> print things set(['a string']) >>> things.add([1, 2, 3]) TypeError: unhashable type: 'list' Let's try an experiment What's wrong? And what does the error message mean?

Sets and DictionariesIntroduction How are sets stored in a computer's memory?

Sets and DictionariesIntroduction How are sets stored in a computer's memory? Could use a list

Sets and DictionariesIntroduction def set_create(): return [] How are sets stored in a computer's memory? Could use a list

Sets and DictionariesIntroduction def set_create(): return [] def set_in(set_list, item): for thing in set_list: if thing == item: return True return False How are sets stored in a computer's memory? Could use a list

Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item)

Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item) How efficient is this?

Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item) How efficient is this? With N items in the set, in and add take 1 to N steps

Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item) How efficient is this? With N items in the set, in and add take 1 to N steps "Average" is N/2

Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item) How efficient is this? With N items in the set, in and add take 1 to N steps "Average" is N/2 It's possible to do much better

Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item) How efficient is this? With N items in the set, in and add take 1 to N steps "Average" is N/2 It's possible to do much better But the solution puts some constraints on programs

Sets and DictionariesIntroduction Start simple: how do we store a set of integers?

Sets and DictionariesIntroduction Start simple: how do we store a set of integers? If the range of possible values is small and fixed, use a list of Boolean flags ("present" or "absent")

Sets and DictionariesIntroduction Start simple: how do we store a set of integers? If the range of possible values is small and fixed, use a list of Boolean flags ("present" or "absent") True Fals e True == {0, 2}

Sets and DictionariesIntroduction Start simple: how do we store a set of integers? If the range of possible values is small and fixed, use a list of Boolean flags ("present" or "absent") But what if the range of values is large, or can change over time? True Fals e True == {0, 2}

Sets and DictionariesIntroduction Use a fixed-size hash table of length L

Sets and DictionariesIntroduction Use a fixed-size hash table of length L Store the integer I at location I % L

Sets and DictionariesIntroduction Use a fixed-size hash table of length L Store the integer I at location I % L '%' is the remainder operator

Sets and DictionariesIntroduction Use a fixed-size hash table of length L Store the integer I at location I % L '%' is the remainder operator == {3378, 1625, 101}

Sets and DictionariesIntroduction Time to insert or look up is constant (!)

Sets and DictionariesIntroduction Time to insert or look up is constant (!) But what do we do when there's a collision?

Sets and DictionariesIntroduction Time to insert or look up is constant(!) But what do we do when there's a collision?

Sets and DictionariesIntroduction Option #1: store it in the next empty slot

Sets and DictionariesIntroduction Option #2: chain values together

Sets and DictionariesIntroduction Either works well until the table is about 3/4 full

Sets and DictionariesIntroduction Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly

Sets and DictionariesIntroduction Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly So enlarge the table

Sets and DictionariesIntroduction Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly So enlarge the table

Sets and DictionariesIntroduction Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly So enlarge the table

Sets and DictionariesIntroduction => Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly So enlarge the table

Sets and DictionariesIntroduction => 206 Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly So enlarge the table

Sets and DictionariesIntroduction How do we store strings?

Sets and DictionariesIntroduction How do we store strings? Use a hash function to generate an integer index based on the characters in the string

Sets and DictionariesIntroduction "zebra "

Sets and DictionariesIntroduction "zebra " z e b r a ==

Sets and DictionariesIntroduction "zebra " z e b r a == ==

Sets and DictionariesIntroduction "zebra " z e b r a == == 532

Sets and DictionariesIntroduction "zebra " z e b r a == == 532% 5

Sets and DictionariesIntroduction "zebra " z e b r a == == 532% 5 If we can define a hash function for something, we can store it in a set

Sets and DictionariesIntroduction "zebra " z e b r a == == 532% 5 If we can define a hash function for something, we can store it in a set So long as nothing changes behind our back

Sets and DictionariesIntroduction z e b r a This is what the previous example really looks like in memory

Sets and DictionariesIntroduction z e b r a This is what the previous example really looks like in memory Let's take a look at what happens if we use a list

Sets and DictionariesIntroduction ['z','e','b','r','a ']

Sets and DictionariesIntroduction ['z','e','b','r','a '] 'z' 'e' 'r' == 'b' 'a'

Sets and DictionariesIntroduction ['z','e','b','r','a '] 'z' 'e' 'r' == 'b' 'a' 523 % 5

Sets and DictionariesIntroduction ['z','e','b','r','a '] 'z' 'e' 'r' == 'b' 'a' 523 % 5

Sets and DictionariesIntroduction 'z' 'e' 'r' 'b' 'a' This is what's actually in memory

Sets and DictionariesIntroduction 'z' 'e' 'r' 'b' 'a' This is what's actually in memory What happens if we change the values in the list?

Sets and DictionariesIntroduction 'z' 'e' 'r' 'b' 'a'

Sets and DictionariesIntroduction 's' 'e' 'r' 'b' 'a'

Sets and DictionariesIntroduction 's' 'e' 'r' 'b' 'a' 523 % 5

Sets and DictionariesIntroduction 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place!

Sets and DictionariesIntroduction 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place! ['s','e','b','r','a'] in S

Sets and DictionariesIntroduction 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place! ['s','e','b','r','a'] in S looks at index 0 and says False

Sets and DictionariesIntroduction 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place! ['s','e','b','r','a'] in S looks at index 0 and says False ['z','e','b','r','a'] in S

Sets and DictionariesIntroduction 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place! ['s','e','b','r','a'] in S looks at index 0 and says False ['z','e','b','r','a'] in S looks at index 2 and says True

Sets and DictionariesIntroduction 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place! ['s','e','b','r','a'] in S looks at index 0 and says False ['z','e','b','r','a'] in S looks at index 2 and says True (or blows up)

Sets and DictionariesIntroduction This problem arises with any mutable structure

Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes

Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive

Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive Option #2: allow it, and blame the programmer

Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive Option #2: allow it, and blame the programmer Very expensive when it goes wrong

Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive Option #2: allow it, and blame the programmer Very expensive when it goes wrong Option #3: only permit immutable objects in sets

Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive Option #2: allow it, and blame the programmer Very expensive when it goes wrong Option #3: only permit immutable objects in sets (If an object can't change, neither can its hash value)

Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive Option #2: allow it, and blame the programmer Very expensive when it goes wrong Option #3: only permit immutable objects in sets (If an object can't change, neither can its hash value) Slightly restrictive, but never disastrous

Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name?

Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name? Option #1: concatenate them

Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name? Option #1: concatenate them 'Charles' and 'Darwin' stored as 'Charles|Darwin'

Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name? Option #1: concatenate them 'Charles' and 'Darwin' stored as 'Charles|Darwin' (Can't use space to join 'Paul Antoine' and 'St. Cyr')

Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name? Option #1: concatenate them 'Charles' and 'Darwin' stored as 'Charles|Darwin' (Can't use space to join 'Paul Antoine' and 'St. Cyr') But data always changes...

Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name? Option #1: concatenate them 'Charles' and 'Darwin' stored as 'Charles|Darwin' (Can't use space to join 'Paul Antoine' and 'St. Cyr') But data always changes... Code has to be littered with joins and splits

Sets and DictionariesIntroduction Option #2 (in Python): use a tuple

Sets and DictionariesIntroduction Option #2 (in Python): use a tuple An immutable list

Sets and DictionariesIntroduction Option #2 (in Python): use a tuple An immutable list Contents cannot be changed after tuple is created

Sets and DictionariesIntroduction >>> full_name = ('Charles', 'Darwin')

Sets and DictionariesIntroduction >>> full_name = ('Charles', 'Darwin') Use '()' instead of '[]'

Sets and DictionariesIntroduction >>> full_name = ('Charles', 'Darwin') >>> full_name[0] Charles

Sets and DictionariesIntroduction >>> full_name = ('Charles', 'Darwin') >>> full_name[0] Charles >>> full_name[0] = 'Erasmus' TypeError: 'tuple' object does not support item assignment

Sets and DictionariesIntroduction >>> full_name = ('Charles', 'Darwin) >>> full_name[0] Charles >>> full_name[0] = 'Erasmus' TypeError: 'tuple' object does not support item assignment >>> names = set() >>> names.add(full_name) >>> names set([('Charles', 'Darwin')])

Sets and DictionariesIntroduction This episode has been about the science of computer science

Sets and DictionariesIntroduction This episode has been about the science of computer science - Designs for hash tables

Sets and DictionariesIntroduction This episode has been about the science of computer science - Designs for hash tables - Mutability, usability, and performance

Sets and DictionariesIntroduction This episode has been about the science of computer science - Designs for hash tables - Mutability, usability, and performance It's a lot to digest in one go...

Sets and DictionariesIntroduction This episode has been about the science of computer science - Designs for hash tables - Mutability, usability, and performance It's a lot to digest in one go......but sometimes you need a little theory to make sense of practice

July 2010 created by Greg Wilson Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See for more information.