Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,

Similar presentations


Presentation on theme: "Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,"— Presentation transcript:

1 Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia, Main “associative memory”

2 Access to data  O( n ) - linked list, array  O(log n ) – sorted array, search tree  O(1) – array by index index access is O(1) because data location is found by computation, not search

3 Computed access to array data e.g. array of objects arr 8423400 location in memory address of arr[12]: 8423400 + 4 * 12 = 8423448

4 Access by Hashing  Hashing applies same concept at software level: access operations do not search for data keys; they compute data indexes index = f(data.key)  performance “almost” O(1)

5 Access example  student number is key : s01324092  i = f(key) = 01324092 % 10000 = 4092  data for student s01324092 is at location 4092 in data array arr[4092].key = “s01324092”  problems wasted storage – array must have 10 000 elements competition for space: s01324092 and s02894092 iterated operations are more difficult

6 Access example key : “s01324092” i = f(key) = 01324092 % 10000 = 4092 f(“s01324092”) = 4092 “s01324092” key

7 Hashing terminology  student number is key : s01324092  i = f(key) = 01324092 % 10000 = 4092  data for student s01324092 is at location 4092 in data array arr[4092].key = “s01324092”  problems wasted storage – array must have 10 000 elements competition for space : s01324092 and s02894092 hash function hash table collision

8 Hashing Fact-of-Life Collisions are unavoidable Solution strategy:  minimize number of collisions  resolve the collisions that do occur

9 Hash functions for hash table of size n map key -> {0,n-1} typical function: key -> integer % n eg. // student number key int hash(String stuNo, int n) { return Integer.parseInt(stuNo.substring(1))%n; }

10 Hash function goals  simple as possible (speed)  distribute keys uniformly over indices (minimize collisions) two steps: 1.transform key to integer if necessary (hashCode()) 2.restrict integer to range of data array (hash())

11 Java’s hashCode() method public int hashCode()  Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable. The general contract of hashCode is: Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.  If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.  It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.equals(java.lang.Object)  As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)  Returns: a hash code value for this object.

12 Other hashing methods  hashCode can be overwritten for any class  hashCode usually should be overwritten fit actual data improve performance remove dependence on location in memory

13 Design model  equals() is based on key field match implies same record  hashCode function also based on key field key is used for access BUT hash function is also based on table size

14 Resizing table  if table is resized, all data must be re- entered into new array, not just copied e.g.: int hash(String stuNo, int n) { return Integer.parseInt(stuNo.substring(1))%n; } hash(“s01324092”,10000) => 4092 hash(“s01324092”, 6667) => 4026

15 Resolving collisions When a collision occurs on insertion:  internal store new element at another location in the table  external store new element outside the table

16 Linear probing  sequential search for next available location to store data when collision occurs eg. hash(key) -> index=4 if table[4] is occupied, try table[5] then table[6],…, until empty location found

17 Linear probing hash table after each insertion (Weiss) Data Structures & Problem Solving using JAVA/2E Mark Allen Weiss © 2002 Addison Wesley

18 fail find(58) delete(89) find(58) The Deletion Problem (Weiss, 2002)

19 Lazy deletion 49 58 9 18 89 a a a a a a a a a a find(58) delete(89) find(58) insert(99) find(58) 49 58 9 18 89 a a a a a a a a a d insert criterion value==-1 OR state==d valuestatevaluestate continue search criterion value!=-1 49 58 9 18 99 a a a a a a a a a a valuestate

20 Linear probing performance Ideal performance depends on fraction of table that is full  k items in table of size n  probability of insertion collision: k/n=p  average probes to free space: n/(n-k) or 1/(1-p) e.g. table half full: 2 probes BUT…

21 Linear probing performance Linear probing for insertion produces primary clustering:  probability of insertion collision: (1+(1-p) -2 )/2 e.g. table half full: 2.5 probes

22 Illustration of primary clustering in linear probing (b) versus no clustering (a) and the less significant secondary clustering in quadratic probing (c). Long lines represent occupied cells, and the load factor is 0.7. Data Structures & Problem Solving using JAVA/2E Mark Allen Weiss © 2002 Addison Wesley

23 Linear probing performance

24 Clustering  primary clustering from linear probing  solution: alternate probing actions  e.g. quadratic probing constraint: minimize computation of probe

25 Clustering  primary clustering linear probing  secondary clustering different probes from different indices quadratic probing  even better: different probes for different keys at same index secondary hashing 49 58 66 26 18 89 a a a a a a a a a a valuestate 16 linear

26 Probing comparison 49 58 66 26 18 89 a a a a a a a a a a valuestate 16 66 18 89 a a a a a a a a a a valuestate 16 48 linearnon-linear 66 18 89 a a a a a a a a a a valuestate 16 secondary hash 96

27 Secondary hashing  Hash function determines initial index  Secondary hash function determines step size for probe after collision

28 Table class – Main p.571 public class Table { private int manyItems; private Object[ ] keys; private Object[ ] data; private boolean[ ] hasBeenUsed; …

29 constructor public Table(int capacity) { if (capacity <= 0) throw new IllegalArgumentException("Capacity is negative"); keys = new Object[capacity]; data = new Object[capacity]; hasBeenUsed = new boolean[capacity]; }

30 search for an object by key public boolean containsKey(Object key) { return findIndex(key) != -1; } private int findIndex(Object key) { int count = 0; int i = hash(key); while (count < data.length && hasBeenUsed[i]) { if (key.equals(keys[i])) return i; count++; i = nextIndex(i); } return -1; }

31 wrap around indexing private int nextIndex(int i) { if (i+1 == data.length) return 0; else return i+1; }

32 get an object public Object get(Object key) { int index = findIndex(key); if (index == -1) return null; else return data[index]; }

33 insert a key and object public Object put(Object key, Object element) { int index = findIndex(key); Object answer; if (index != -1) // replace object for key { answer = data[index]; data[index] = element; return answer; } else if (manyItems < data.length) // new key and object { index = hash(key); while (keys[index] != null) index = nextIndex(index); keys[index] = key; data[index] = element; hasBeenUsed[index] = true; manyItems++; return null; } else // table is full { throw new IllegalStateException("Table is full."); }

34 remove a key and object public Object remove(Object key) { int index = findIndex(key); Object answer = null; if (index != -1) { answer = data[index]; keys[index] = null; data[index] = null; manyItems--; } return answer; }

35 Changing probe strategy double hash private int findIndex(Object key) { int count = 0; int i = hash1(key); int p = hash2(key); while (count < data.length && hasBeenUsed[i]) { if (key.equals(keys[i])) return i; count++; i = nextIndex(i,p); } return -1; } private int nextIndex(int i, int p) { return (i+p)%data.length; }

36 Picking good hash strategies  division hash functions prime table size (n) is required index is hashCode % n stepSize is1+ hashCode % (n-2) (Knuth: best if (n-2) also prime)  mid-square hashCode 2 – take ‘middle’ digits  multiplicative hashCode * r (0<r<1) – take fraction digits

37 External Hashing (Chaining)  array of linked lists of objects  for map, objects contain map entry pairs key hash function index 0123…0123… data pair

38 External Hashing (Chaining)  less sensitive to load factor  more memory access (list)  easier to manage

39 Comparison of Hashing Performance

40 Analysis of performance Linear probing:  ½(1 + 1/(1-α)) comparisons for successful search where α is load factor (Knuth)  assumptions: uniform hashing no deletions  e.g., 1365 entries in table of 1709 α =.80, expect 3 comparisons

41 Analysis of performance Double hashing:  -ln(1- α )/ α comparisons for successful search where α is load factor (Knuth)  assumptions: uniform hashing no deletions  e.g., 1365 entries in table of 1709 α =.80, expect 2 comparisons

42 Analysis of performance Chained hashing:  1+ α /2 comparisons for successful search where α is load factor  assumptions: uniform hashing  e.g., 1365 entries in table of 1709 α =.80, expect 1.4 comparisons

43 Hash table summary  hash table – array  computed access into array based on key  n to 1 relation of keys to indexes  collisions  collision resolution open hashing double hashing chained hashing

44 JAVA Collections Interfaces  Collection List  Queue Set  SortedSet Map  SortedMap Implementations  array (resizable)  linked list  balanced search tree  hash table  hash table plus linked list

45 hashed implementations  HashSet implements Set  HashMap implements Map constructors: capacity load factor  performance


Download ppt "Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,"

Similar presentations


Ads by Google