Maps and Hashing Eric Roberts CS 106B February 15, 2013.

Slides:



Advertisements
Similar presentations
| © 2006 LenovoLenovo Protection Services. | © 2007 LenovoLenovo ThinkPad Protection and LenovoCare Protection Service Accidental Damage Protection 1,2,3,
Advertisements

Todd Geerdes Period 1 WA ME MT ND OR MN WI NY ID SD MI WY IA PA NV NE
U.S. States Abbreviations Directions: Type the state abbreviation in the box on each slide. “The English language website where everything.
The Five regions of the USA. Naming the states and capitals by region
States and Capitals Flash Cards © Brent Coley 2008 |
Company LOGO U. S. EPA, Region 4 Atlanta, Georgia.
Company LOGO U. S. EPA, Region 4 Atlanta, Georgia.
The West` Washington Idaho 1 Montana Oregon California 3 4 Nevada Utah
WA 2.ID 3.MT 4.OR 5.CA 6.NV 7.UT 8.WY 9.CO 10.AZ 11.NM 12.AK 13.HI 13 The West`
État-Unis Par: Shaelyn Gagnon. Delaware DE Pennsylvania PA.
How to win a Presidential Election. The rules of the game Electoral College.
BINARY CODING. Alabama Arizona California Connecticut Florida Hawaii Illinois Iowa Kentucky Maine Massachusetts Minnesota Missouri 0 Nebraska New Hampshire.
DRŽAVE SEVERNE AMERIKE PREDSTAVIL BOM VSE DRŽAVE SEVERNE AMERIKE.
Prevalence of Self-Reported Obesity Among U.S. Adults by State and Territory Definitions  Obesity: Body Mass Index (BMI) of 30 or higher.  Body Mass.
Southeast Region States Contained in the region: Alabama (AL) Florida (FL) Georgia (GA) Mississippi (MS) Louisiana (LA) Arkansas (AR) Tennessee (TN)
Maps and Hashing.
` ` Washington Oregon Idaho Montana Wyoming Utah Arizona New Mexico Nebraska Kansas Oklahoma South Dakota North Dakota Minnesota Iowa Wisconsin *Michigan.
LES ÉTATS-UNIS Par: Sarah. . Delaware DE Pennsylvanie  Pa.
Map Review. California Kentucky Alabama.
1. AFL-CIO What percentage of the funds received by Alabama K-12 public schools in school year was provided by the state of Alabama? a)44% b)53%
Kentucky Adult Education Improving Lives: Policy Summit on Serving Adult Students Charlotte, North Carolina September 28, 2005.
Geography Bee Meeting 3: United States Cities and Landforms (While you wait, please insert your microphones and adjust your volume.)
From “Animaniacs”. LA Baton Rouge Louisiana IN Indianapolis Indiana.
States and Capitals Flash Cards Mrs. Miller 2008.
Revised Charts and Graphs for Hawaii A New Look at the Institutional Component of Higher Education Finance: A Guide for Evaluating Performance Relative.
Alabama (AL) Capitol- Montgomery. Alaska (AK) Capitol- Juneau.
Punctuation for Business Letters
CHAPTER 7 FILINGS IN MAINE CALENDAR YEARS 1999 – 2009 CALENDAR YEAR CHAPTER 7 FILINGS This chart shows total case filings in Maine for calendar years 1999.
DIET = DIE T RESCRAMBLE LETTERS – what can you get?
Hawaii Alaska (not to scale) Alaska GeoCurrents Customizable Base Map text.
US MAP TEST Practice
The abbreviation for Texas is TX. The capital for South Dakota is Pierre.
NAPHSIS Making the Impossible a Reality by Garland Land NAPHSIS Executive Director.
Java Collection Classes Eric Roberts CS 106A February 24, 2016.
AK – Alaska Juneau. AL – Alabama Montgomery AR – Arkansas Little Rock.
ecoprintQ NA Sales Coverage
1 Washington, WA 2 Oregon, OR 3 California, CA 4 Arizona, AZ 5 Nevada, NE 6 Utah, UT 7 Idaho, ID 8 Montana, MT 9 Wyoming, WY 10 Colorado, CO 11 New Mexico,
How Ibotta Works—The consumer journey
USAGE OF THE – GHz BAND IN THE USA
EVVE Implementation – August 2013 Northern Mariana Islands
Five-Minute Review What debugging strategies do you know?
The States How many states are in the United States?
Voluntary – Mandatory – Integrated What Does the Terminology Mean?
Identify the State & Capital
Human Resources Regional Captain Map
U.S. FEDERAL CIRCUITS Cir State 11 Alabama AL
WY WI WV WA VA VT UT TX TN1 SD SC RI PA1 OR OK OH ND NC NY NM NJ NH2
WY WI WV WA VA VT UT TX TN1 SD SC RI PA OR OK OH1 ND NC NY NM NJ NH NV
Current Status of the Medicaid Expansion Decision, as of May 30, 2013
IAH CONVERSION: ELIGIBLE BENEFICIARIES BY STATE
WAHBE Brokers / QHPs across the country as of
LEARNING YOUR 50 STATES (with Postal Codes)
Status of State Medicaid Expansion Decisions
Ohio Oklahoma Oregon What’s the capital city? What’s the abbreviation? Where is it located? Columbus What’s the capital city? What’s the abbreviation?
Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota.
United States of America.
S Co-Sponsors by State – May 23, 2014
Tennessee Texas What’s the capital city? What’s the abbreviation? Where is it located? Nashville What’s the capital city? What’s the abbreviation?
United States of America.
United States of America.
Name _____________________ Write the capital, state below. Augusta, Maine Montpelier, Vermont Concord, New Hampshire Abbreviations Below VT ME NH.
Current Status of State Medicaid Expansion Decisions
Status of State Medicaid Expansion Decisions
WASHINGTON MAINE MONTANA VERMONT NORTH DAKOTA MINNESOTA MICHIGAN
Status of State Medicaid Expansion Decisions
CBD Topical Sales Restrictions by State (as of May 23, 2019)
Human Resources Region Captain Map
DRŽAVE SEVERNE AMERIKE
Status of State Medicaid Expansion Decisions
Human Resources Region Captain Map
Presentation transcript:

Maps and Hashing Eric Roberts CS 106B February 15, 2013

Simplifying the Map Abstraction Although templates offer considerable flexibility when you are designing a collection class, they also complicate both the interface and the implementation, making them harder to follow. To make sure that you understand the details of the various strategies for implementing maps, Chapter 15 simplifies the interface so that both the keys and values are always strings. The resulting class is called StringMap. Although the book includes a more expansive set of methods, this lecture looks only at put, get, and containsKey. Once you understand how to implement the StringMap class using each of the possible representations, you can add the remaining methods and use the C++ template mechanism to generalize the key and value types.

/* * File: stringmap.h * * This interface exports the StringMap class, which maintains a collection * of key/value pairs, both of which are of type string. These slides * further simplify the interface by including only the methods get, put, * and containsKey. */ #ifndef _hashmap_h #define _hashmap_h #include class StringMap { public: /* * Constructor: StringMap * Usage: StringMap map; * * Initializes a new empty StringMap. */ StringMap(); The stringmap.h Interface

/* * File: stringmap.h * * This interface exports the StringMap class, which maintains a collection * of key/value pairs, both of which are of type string. These slides * further simplify the interface by including only the methods get, put, * and containsKey. */ #ifndef _hashmap_h #define _hashmap_h #include class StringMap { public: /* * Constructor: StringMap * Usage: StringMap map; * * Initializes a new empty StringMap. */ StringMap(); /* * Destructor: ~StringMap * * Frees any heap storage associated with this map. */ ~StringMap(); /* * Method: get * Usage: string value = map.get(key); * * Returns the value for key or the empty string, if key is unbound. */ std::string get(const std::string & key) const; /* * Method: put * Usage: map.put(key, value); * * Associates key with value in this map. */ void put(const std::string & key, const std::string & value); The stringmap.h Interface

/* * Destructor: ~StringMap * * Frees any heap storage associated with this map. */ ~StringMap(); /* * Method: get * Usage: string value = map.get(key); * * Returns the value for key or the empty string, if key is unbound. */ std::string get(const std::string & key) const; /* * Method: put * Usage: map.put(key, value); * * Associates key with value in this map. */ void put(const std::string & key, const std::string & value); The stringmap.h Interface /* * Method: containsKey * Usage: if (map.containsKey(key))... * * Returns true if there is an entry for key in this map. */ bool containsKey(const std::string & key) const; }; /* * Function: hashCode * Usage: int hash = hashCode(key); * * Returns a hash code for the specified key, which is always a * nonnegative integer. This function is overloaded to support * all of the primitive types and the C++ string type. */ int hashCode(const std::string & key); #endif Private section goes here.

An Illustrative Mapping Application Suppose that you want to write a program that displays the name of a state given its two-letter postal abbreviation. This program is an ideal application for the StringMap class because what you need is a map between two-letter codes and state names. Each two-letter code uniquely identifies a particular state and therefore serves as a key for the StringMap ; the state names are the corresponding values. To implement this program in C++, you need to perform the following steps, which are illustrated on the following slide: Create a StringMap ontaining the key/value pairs.1. Read in the two-letter abbreviation to translate.2. Call get on the StringMap to find the state name.3. Print out the name of the state.4.

The PostalLookup Program skip simulation int main() { StringMap stateMap; initStateMap(stateMap); while (true) { cout << "Enter two-letter state abbreviation: "; string code = getLine(); if (code == "") break; if (stateMap.containsKey(code)) { cout << code << " = " << stateMap.get(code) << endl; } else { cout << code << " = ???" << endl; } codestateMap Enter two-letter state abbreviation: PostalLookup HI HI = Hawaii Enter two-letter state abbreviation: WI WI = Wisconsin Enter two-letter state abbreviation: VE VE = ??? Enter two-letter state abbreviation: AL=Alabama AK=Alaska AZ=Arizona FL=Florida GA=Georgia HI=Hawaii WI=Wisconsin WY=Wyoming... HIWIVE void initStateMap(StringMap & map) { map.put("AL", "Alabama"); map.put("AK", "Alaska"); map.put("AZ", "Arizona"); map.put("FL", "Florida"); map.put("GA", "Georgia"); map.put("HI", "Hawaii"); map.put("WI", "Wisconsin"); map.put("WY", "Wyoming"); } map... int main() { StringMap stateMap; initStateMap(stateMap); while (true) { cout << "Enter two-letter state abbreviation: "; string code = getLine(); if (code == "") break; if (stateMap.containsKey(code)) { cout << code << " = " << stateMap.get(code) << endl; } else { cout << code << " = ???" << endl; } stateMapcode

Implementation Strategies for Maps There are several strategies you might choose to implement the map operations get and put. Those strategies include: Linear search. Keep track of all the name/value pairs in an array. In this model, both the get and put operations run in O(N) time. 1. Binary search. If you keep the array sorted by the two-character code, you can use binary search to find the key. Using this strategy improves the performance of get to O(log N). 2. Table lookup in a grid. In this specific example, you can store the state names in a 26 x 26 Grid in which the first and second indices correspond to the two letters in the code. Because you can now find any code in a single step, this strategy is O(1), although this performance comes at a cost in memory space. 3.

The Idea of Hashing The third strategy on the preceding slide shows that one can make the get and put operations run very quickly, even to the point that the cost of finding a key is independent of the number of keys in the table. This O(1) performance is possible only if you know where to look for a particular key. To get a sense of how you might achieve this goal in practice, it helps to think about how you find a word in a dictionary. Most dictionaries have thumb tabs that indicate where each letter appear. Words starting with A are in the A section, and so on. The most common implementations of maps use a strategy called hashing, which is conceptually similar to the thumb tabs in a dictionary. The critical idea is that you can improve performance enormously if you use the key to figure out where to look.

Hash Codes The rest of today’s lecture focuses on the implementation of the StringMap class that uses the hashing strategy. The implementation requires the existence of a free function called hashCode that transforms a key into a nonnegative integer. The hash code tells the implementation where it should look for a particular key, thereby reducing the search time dramatically. The important things to remember about hash codes are: Every string has a hash code, even if you don’t know what it is.1. The hash code for any particular string is always the same.2. If two strings are equal (i.e., they contain the same characters), they have the same hash code. 3.

The hashCode Function for Strings const int HASH_SEED = 5381; const int HASH_MULTIPLIER = 33; const int HASH_MASK = unsigned(-1) >> 1; /* * Function: hashCode * Usage: int code = hashCode(key); * * This function takes a string key and uses it to derive a hash code, * which is nonnegative integer related to the key by a deterministic * function that distributes keys well across the space of integers. * The specific algorithm used here is called djb2 after the initials * of its inventor, Daniel J. Bernstein, Professor of Mathematics at * the University of Illinois at Chicago. */ int hashCode(const string & str) { unsigned hash = HASH_SEED; int nchars = str.length(); for (int i = 0; i < nchars; i++) { hash = HASH_MULTIPLIER * hash + str[i]; } return (hash & HASH_MASK); }

The Bucket Hashing Strategy One common strategy for implementing a map is to use the hash code for each key to select an index into an array that will contain all the keys with that hash code. Each element of that array is conventionally called a bucket. In practice, the array of buckets is smaller than the number of hash codes, making it necessary to convert the hash code into a bucket index, typically by executing a statement like int index = hashCode(key) % nBuckets; The value in each element of the bucket array cannot be a single key/value pair given the chance that different keys fall into the same bucket. Such situations are called collisions. To take account of the possibility of collisions, each elements of the bucket array is a linked list of the keys that fall into that bucket, as illustrated on the next slide.

hashCode("AK")  Simulating Bucket Hashing AK Alaska AL Alabama AR Arkansas AZ Arizona CA California CO Colorado CT Connec- ticut DE Delaware FL Florida GA Georgia HI Hawaii IA Iowa ID Idaho IL Illinois IN Indiana KS Kansas KY Kentucky LA Louisiana MA Massa- chusetts MD Maryland ME Maine MI Michigan MN Minnesota MO Missouri MS Mississippi MT Montana NC North Carolina ND North Dakota NE Nebraska NH New Hampshire NJ New Jersey NM New Mexico NV Nevada NY New York OH Ohio OK Oklahoma OR Oregon PA Pennsyl- vania RI Rhode Island SC South Carolina SD South Dakota TN Tennessee TX Texas UT Utah VA Virginia VT Vermont WA Washingto n WI Wisconsin WV West Virginia WY Wyoming NV Nevada stateMap.put("AK", "Alaska") The key "AK" therefore goes in bucket % 7  0 stateMap.put("AL", "Alabama") hashCode("AL")  % 7  1 The key "AL" therefore goes in bucket 1. stateMap.put("AR", "Arkansas") hashCode("AR")  % 7  0 The key "AR" also goes in bucket 0. Linked lists are usually written left to right. Suppose you call stateMap.get("NV") hashCode("NV")  % 7  6 The key "NV" must therefore be in bucket 6 and can be located by searching the chain.

Achieving O(1) Performance The simulation on the previous side uses only seven buckets to emphasize what happens when collisions occur: the smaller the number of buckets, the more likely collisions become. In practice, the implementation of StringMap would use a much larger value for nBuckets to minimize the opportunity for collisions. If the number of buckets is considerably larger than the number of keys, most of the bucket chains will either be empty or contain exactly one key/value pair. The ratio of the number of keys to the number of buckets is called the load factor of the map. Because a map achieves O(1) performance only if the load factor is small, the library implementation of HashMap increases the number of buckets when the table becomes too full. This process is called rehashing.

/* Private section */ private: /* Type definition for cells in the bucket chain */ struct Cell { std::string key; std::string value; Cell *link; }; /* Instance variables */ Cell **buckets; /* Dynamic array of pointers to cells */ int nBuckets; /* The number of buckets in the array */ int count; /* The number of entries in the map */ /* Private method prototypes */ Cell *findCell(int bucket, std::string key); Private Section of the StringMap Class

/* * File: stringmap.cpp * * This file implements the stringmap.h interface using a hash table * as the underlying representation. */ #include #include "stringmap.h" using namespace std; /* * Implementation notes: StringMap constructor and destructor * * The constructor allocates the array of buckets and initializes each * bucket to the empty list. The destructor frees the allocated cells. */ StringMap::StringMap() { nBuckets = INITIAL_BUCKET_COUNT; buckets = new Cell*[nBuckets]; for (int i = 0; i < nBuckets; i++) { buckets[i] = NULL; } The stringmap.cpp Implementation

/* * File: stringmap.cpp * * This file implements the stringmap.h interface using a hash table * as the underlying representation. */ #include #include "stringmap.h" using namespace std; /* * Implementation notes: StringMap constructor and destructor * * The constructor allocates the array of buckets and initializes each * bucket to the empty list. The destructor frees the allocated cells. */ StringMap::StringMap() { nBuckets = INITIAL_BUCKET_COUNT; buckets = new Cell*[nBuckets]; for (int i = 0; i < nBuckets; i++) { buckets[i] = NULL; } StringMap::~StringMap() { for (int i = 0; i < nBuckets; i++) { Cell *cp = buckets[i]; while (cp != NULL) { Cell *oldCell = cp; cp = cp->link; delete oldCell; } /* * Implementation notes: get * * This method calls findCell to search the linked list for the matching * key. If no key is found, get returns the empty string. */ string StringMap::get(const string & key) const { Cell *cp = findCell(hashCode(key) % nBuckets, key); return (cp == NULL) ? "" : cp->value; } The stringmap.cpp Implementation

StringMap::~StringMap() { for (int i = 0; i < nBuckets; i++) { Cell *cp = buckets[i]; while (cp != NULL) { Cell *oldCell = cp; cp = cp->link; delete oldCell; } /* * Implementation notes: get * * This method calls findCell to search the linked list for the matching * key. If no key is found, get returns the empty string. */ string StringMap::get(const string & key) const { Cell *cp = findCell(hashCode(key) % nBuckets, key); return (cp == NULL) ? "" : cp->value; } /* * Implementation notes: put * * The put method calls findCell to search the linked list for the * matching key. If a cell already exists, put simply resets the * value field. If no matching key is found, put adds a new cell * to the beginning of the list for that chain. */ void StringMap::put(const string & key, const string & value) { int bucket = hashCode(key) % nBuckets; Cell *cp = findCell(bucket, key); if (cp == NULL) { cp = new Cell; cp->key = key; cp->link = buckets[bucket]; buckets[bucket] = cp; } cp->value = value; } The stringmap.cpp Implementation

/* * Implementation notes: put * * The put method calls findCell to search the linked list for the * matching key. If a cell already exists, put simply resets the * value field. If no matching key is found, put adds a new cell * to the beginning of the list for that chain. */ void StringMap::put(const string & key, const string & value) { int bucket = hashCode(key) % nBuckets; Cell *cp = findCell(bucket, key); if (cp == NULL) { cp = new Cell; cp->key = key; cp->link = buckets[bucket]; buckets[bucket] = cp; } cp->value = value; } /* * Implementation notes: containsKey * * This method simply checks whether the result of findCell is NULL. */ bool StringMap::containsKey(const string & key) const { return findCell(hashCode(key) % nBuckets, key) != NULL; } /* * Private method: findCell * Usage: Cell *cp = findCell(bucket, key); * * Finds a cell in the chain for the specified bucket that matches key. * If a match is found, the return value is a pointer to the cell * containing the matching key. If no match is found, findCell * returns NULL. */ StringMap::Cell *StringMap::findCell(int bucket, const string & key) const { Cell *cp = buckets[bucket]; while (cp != NULL && key != cp->key) { cp = cp->link; } return cp; } The stringmap.cpp Implementation

The End