Presentation is loading. Please wait.

Presentation is loading. Please wait.

9-1 9 Set ADTs Set concepts. Set applications. A set ADT: requirements, contract. Implementations of sets: using arrays, linked lists, boolean arrays.

Similar presentations


Presentation on theme: "9-1 9 Set ADTs Set concepts. Set applications. A set ADT: requirements, contract. Implementations of sets: using arrays, linked lists, boolean arrays."— Presentation transcript:

1 9-1 9 Set ADTs Set concepts. Set applications. A set ADT: requirements, contract. Implementations of sets: using arrays, linked lists, boolean arrays. Sets in the Java class library. © 2001, D.A. Watt and D.F. Brown

2 9-2 Set concepts (1) A set is a collection of distinct members (values or objects), whose order is insignificant. Notation for sets: {a, b, …, z}. The empty set is { }.  Set notation is used here, but not supported by Java.

3 9-3 Set concepts (2) Examples of sets: evens= {0, 2, 4, 6, 8} punct= {‘.’, ‘!’, ‘?’, ‘:’, ‘;’, ‘,’} EU= {AT, BE, DE, DK, ES, FI, FR, GR, IE, IT, LU, NL, PT, SE, UK} NAFTA= {CA, MX, US} NATO= {BE, CA, CZ, DE, DK, ES, FR, GR, HU, IS, IT, LU, NL, NO, PL, PT, TR, UK, US} set of integers set of characters sets of countries

4 9-4 Set concepts (3) The cardinality of a set s is the number of members of s. This is written #s. E.g.: #EU = 15 #{red, white, red} = 2 Duplicate members aren’t counted. An empty set has cardinality zero. We can test whether x is a member of set s (i.e., s contains x). This is the membership test, written x  s. E.g.: UK  EU SE  EU SE  NATO SE is not a member of NATO.

5 9-5 Set concepts (4) Two sets are equal if they contain exactly the same members. E.g.: NAFTA = {US, CA, MX} NAFTA  {CA, US} Order of members doesn’t matter. These two sets are unequal. Set s 1 subsumes (is a superset of) set s 2 if every member of s 2 is also a member of s 1. This is written s 1  s 2. E.g.: NATO  {CA, US} NATO  EU NATO does not subsume EU.

6 9-6 Set concepts (5) The union of sets s 1 and s 2 is a set containing just those values that are members of s 1 or s 2 or both. This is written s 1  s 2. E.g.: {DK, NO, SE}  {FI, IS}= {DK, FI, IS, NO, SE} {DK, NO, SE}  {IS, NO}= {DK, IS, NO, SE}

7 9-7 Set concepts (6) The intersection of sets s 1 and s 2 is a set containing just those values that are members of both s 1 and s 2. This is written s 1  s 2. E.g.: NAFTA  NATO= {CA, US} NAFTA  EU= {} Two sets are disjoint if they have no common member, I.e., if their intersection is empty. E.g.: NAFTA and EU are disjoint NATO and EU are not disjoint.

8 9-8 Set concepts (7) The difference of sets s 1 and s 2 is a set containing just those values that are members of s 1 but not of s 2. This is written s 1 – s 2. E.g.: NATO – EU= {CA, CZ, HU, IS, NO, PL, TR, US} EU – NATO= {AT, FI, IE, SE}

9 9-9 Set applications Spelling checker:  A spelling checker’s dictionary is a set of words.  The spelling checker highlights any words in the document that are not in the dictionary.  The spelling checker might allow the user to add words to the dictionary. Relational database system:  A relation is essentially a set of tuples.  Each tuple is distinct.  The tuples are in no particular order.

10 9-10 Example 1: prime numbers A prime number is an integer that is divisible only by itself and 1. E.g.: 2, 7, 11, 13 are prime numbers. Eratosthenes’ sieve algorithm: To compute the set of prime numbers less than m (where m > 0): 1.Set sieve = {2, 3, …, m–1}. 2.For i = 2, 3, …, while i 2  m, repeat: 2.1.If i is a member of sieve: 2.1.1.Remove all multiples of i from sieve. 3.Terminate with answer sieve. 2.1.1.For mult = 2i, 3i,..., while mult < m, repeat: 2.1.1.1.Remove mult from sieve. 1.1.Set sieve = { }. 1.2.For i = 2,..., m–1, repeat: 1.2.1.Add i to sieve.

11 9-11 Set ADT: requirements Requirements: 1)It must be possible to make a set empty. 2)It must be possible to test whether a set is empty. 3)It must be possible to obtain the cardinality of a set. 4)It must be possible to perform a membership test. 5)It must be possible to add or remove a member of a set. 6)It must be possible to test whether two sets are equal. 7)It must be possible to test whether one set subsumes another. 8)It must be possible to compute the union, intersection, or difference of two sets. 9)It must be possible to traverse a set.

12 9-12 Set ADT: contract (1) Possible contract: public interface Set { // Each Set object is a set whose members are objects. //////////// Accessors //////////// public boolean isEmpty (); // Return true if and only if this set is empty. public int size (); // Return the cardinality of this set. public boolean contains (Object obj); // Return true if and only if obj is a member of this set.

13 9-13 Set ADT: contract (2) Possible contract (continued): public boolean equals (Set that); // Return true if and only if this set is equal to that. public boolean containsAll (Set that); // Return true if and only if this set subsumes that.

14 9-14 Set ADT: contract (3) Possible contract (continued): //////////// Transformers //////////// public void clear (); // Make this set empty. public void add (Object obj); // Add obj as a member of this set. public void remove (Object obj); // Remove obj from this set. public void addAll (Set that); // Make this set the union of itself and that.

15 9-15 Set ADT: contract (4) Possible contract (continued): public void removeAll (Set that); // Make this set the difference of itself and that. public void retainAll (Set that); // Make this set the intersection of itself and that. //////////// Iterator //////////// public Iterator iterator(); // Return an iterator that will visit all members of this set, in no // particular order. }

16 9-16 Implementation of sets using arrays (1) Represent a bounded set (cardinality  maxcard) by:  a variable card, containing the current cardinality  an array members of length maxcard, containing the set members in members[0… card–1]. Keep the array sorted, and avoid storing duplicates. Illustration (maxcard = 6): MXUSCA 012 4 card=35 Empty set: 1 card=0maxcard–1 01card–1card Invariant: member maxcard–1 greatest member unoccupied least member

17 9-17 Implementation using arrays (2) Summary of algorithms and time complexities: OperationAlgorithmTime complexity contains binary searchO(log n) add binary search + insertionO(n)O(n) remove binary search + deletionO(n)O(n) equals pairwise comparisonO(n2)O(n2) containsAll variant of pairwise comparisonO(n2)O(n2) addAll array mergeO(n1+n2)O(n1+n2) removeAll variant of array mergeO(n1+n2)O(n1+n2) retainAll variant of array mergeO(n1+n2)O(n1+n2)

18 9-18 Implementation of sets using SLLs (1) Represent an (unbounded) set by:  a variable card, containing the current cardinality  an SLL, containing one member per node. Keep the SLL sorted, and avoid storing duplicates. member Invariant: Empty set: Illustration: CAMXUS least membergreatest member represents the set {CA, US, MX}

19 9-19 Implementation using SLLs (2) Summary of algorithms and time complexities: OperationAlgorithmTime complexity contains SLL linear searchO(n)O(n) add SLL linear search + insertionO(n)O(n) remove SLL linear search + deletionO(n)O(n) equals pairwise comparisonO(n2)O(n2) containsAll variant of pairwise comparisonO(n2)O(n2) addAll SLL mergeO(n1+n2)O(n1+n2) removeAll variant of SLL mergeO(n1+n2)O(n1+n2) retainAll variant of SLL mergeO(n1+n2)O(n1+n2)

20 9-20 Implementation of small-integer sets using boolean arrays (1) If the members are known to be small integers, in the range 0…m–1, represent the set by:  a boolean array b of length m, such that b[i] is true if and only if i is a member of the set. 01m–1 Invariant: bool. 2 Empty set: 01m–1 false 2 Illustration (m = 10): falsetrue false 012 4 true 5 3 false 6 true 7 represents the set {2, 3, 5, 7}

21 9-21 Implementation using boolean arrays (2) Summary of algorithms and time complexities: OperationAlgorithmTime complexity contains test array componentO(1) add set array component to trueO(1) remove set array component to falseO(1) equals pairwise equality testO(m)O(m) containsAll pairwise implication testO(m)O(m) addAll pairwise disjunctionO(m)O(m) removeAll pairwise negation + conjunctionO(m)O(m) retainAll pairwise conjunctionO(m)O(m)

22 9-22 Summary of set implementations (1) OperationArray representation SLL representation Boolean array representation contains O(log n)O(n)O(n)O(1) add O(n)O(n)O(n)O(n)O(1) remove O(n)O(n)O(n)O(n)O(1) equals O(n2)O(n2)O(n2)O(n2)O(m)O(m) containsAll O(n2)O(n2)O(n2)O(n2)O(m)O(m) addAll O(n1+n2)O(n1+n2)O(n1+n2)O(n1+n2)O(m)O(m) removeAll O(n1+n2)O(n1+n2)O(n1+n2)O(n1+n2)O(m)O(m) retainAll O(n1+n2)O(n1+n2)O(n1+n2)O(n1+n2)O(m)O(m)

23 9-23 Summary of set implementations (2) The array representation is suitable only for small or static sets.  A static set is one in which members are never/infrequently added or removed. The SLL representation is suitable only for small sets. The boolean-array representation is suitable only for dense sets of small integers.  A dense set is one where most potential members are actually present. For general applications, we need a more efficient set representation: search tree (see 10) or hash table (see 12).

24 9-24 Sets in the Java class library The java.util.Set interface is similar to the Set interface above. The java.util.TreeSet class implements the java.util.Set interface, representing each set by a search tree (see 10). The java.util.HashSet class implements the java.util.Set interface, representing each set by an open-bucket hash table (see 12).

25 9-25 Example 2: information retrieval (1) Consider a very simple information retrieval system. A query is a set of key words. Each document in the document base is viewed as a set of words. The order of words in a document is of no significance. In response to a query, the system identifies each document that contains all or some of the key words.

26 9-26 Example 2 (2) Outline of implementation: public static final int NONE=0, SOME=1, ALL=2; public static int score (String name, Set keywords) { // Return a score reflecting whether the document named name // contains all, some, or none of the words in keywords. Set docwords = readAllWords(name); if (docwords.containsAll(keywords)) return ALL; else if (disjoint(docWords, keywords)) return NONE; else return SOME; }

27 9-27 Example 2 (3) Outline of implementation (continued): private static boolean disjoint ( Set docwords, Set keywords) { // Return true if and only if the sets docwords and keywords // have no common words. Iterator iter = keywords.iterator(); while (iter.hasNext()) { String keyword = (String) iter.next(); if (docwords.contains(word)) return false; } return true; }

28 9-28 Example 2 (4) Outline of implementation (continued): private static Set readAllWords (String name) { // Return the set of all words occurring in the document name. BufferedReader doc = new BufferedReader( new InputStreamReader( new FileInputStream(name))); Set words = new TreeSet(); for (;;) { String word = readWord(doc); if (word == null) break; // end of document words.add(word.toLowerCase()); } doc.close(); return words; } or: new HashSet()

29 9-29 Example 2 (5) Outline of implementation (continued): private static String readWord ( BufferedReader doc) throws IOException { // Read and return the next word from doc, skipping any preceding // white space or punctuation. Return null if no word remains to be // read. … }


Download ppt "9-1 9 Set ADTs Set concepts. Set applications. A set ADT: requirements, contract. Implementations of sets: using arrays, linked lists, boolean arrays."

Similar presentations


Ads by Google