Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Structure – Final Review

Similar presentations


Presentation on theme: "Data Structure – Final Review"— Presentation transcript:

1 Data Structure – Final Review
27-Apr-2009 SUNY Buffalo

2 About this review I’ve been asked to review several data structures covered in class. May not be totally complete as it is unrealistic to cover all the materials in ~40mins! Exam may ask questions that weren’t covered in this review but were covered in class. If you have questions, ask your instructor ASAP. I’ve used a different book than the one in this class. Materials were mostly from “Data structure with C”. I have years of hands on experiences with data structures/algorithms. If you wonder how data structures are used in the “real world”, ask them. 27-Apr-2009 Data Structure Review

3 Review Topics Tree ADT: Dictionary (map) ADT: Graph ADT:
Heap, AVL Tree, Red-Black Tree, and 2-3 Tree (B-tree). Dictionary (map) ADT: Hash tables and hash functions. Graph ADT: (?) Breadth-First Search (BFS), and (?) Depth-First Search (DFS). For each topic, you should prepare to answer: What is it? How to represent it? What operations does it support? How each operation works? Practice your drawing; do as much examples as you can! How long each operation takes? Best-case, Average-case, and Worst-case. 27-Apr-2009 Data Structure Review

4 Review: Trees Operations: Binary tree-walks: Time complexity:
Terminology: Size, height, depth (level), link (edge), path. Root, parent, children, sibling, leaves, ancestor, descendant, etc. Representation: Node structure. Storage: Array, Linked list. Types: Binary tree: Binary Heap. Binary search tree (BST): AVL and R-B. B-tree: 2-3 tree. Operations: insert(), delete(), search(), sort() and etc. Binary tree-walks: Pre-order (Root,L,R), In-order (L,Root,R), Post-order (L,R, Root), Level-order. Time complexity: Insertion: O (log n), Searching: O (log n), Deletion: O (log n), Sorting: O (n log n). 27-Apr-2009 Data Structure Review

5 Binary Tree: Importance of Balance
Binary tree, in general, is useful for implementing many operations: For examples, search(), successor(), predecessor(), minimum(), maximum(), insert(), and delete() can be achieved in O(h) time, where h is the height of the tree. That is, the average running time of above operations on a balanced tree is h = O(lg n). But, the insert() and delete() alter the shape of the tree and can result in an unbalanced tree. In the worst case, h = O(n)  no better than a linked list! So, we want to correct the imbalance in at most O(lg n) time  no complexity overhead. 27-Apr-2009 Data Structure Review

6 Review: Balanced Trees
To make sure a binary tree is balanced, add a requirement, called the heap property, to the binary tree. Binary heap is commonly use for implementing Priority Queue ADT. Aside, heap could also mean the memory space used for dynamic allocation. To make sure a BST is balanced, add a constrain on the height of BST trees. The most popular data structures are AVL and Red-Black trees. 27-Apr-2009x Data Structure Review

7 Review: Binary Heap A binary heap extends binary tree data structure and has the following properties: Each node has a key <greater|less> than or equal to the key of its children. Greater - Max heap; Less - Min heap; The tree is a complete binary tree. A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible. Longest path is ceiling(lg n) for n nodes. 27-Apr-2009 Data Structure Review

8 Heap: Maintaining the Heap Property
heapifyUp() and heaifyDown() are the key operations for maintaining the heap property in O(lg n) time. How does heapifyDown() work? Given a node i in the heap. For maxheap: A[i] < A[left(i)] or A[i] < A[right(i)], swap A[i] with the largest of A[left(i)] and A[right(i)]. Recurs on that sub-tree. How does heapifyUp() work? For maxheap: A[i] > A[parent[i]], swap A[i] with A[parent[i]. Recurs on parent[i]. What about other operations and their running time? delete(), insert(), buildHeap(), heapSort(). 27-Apr-2009 Data Structure Review

9 AVL: Adelson-Velsky and Landis, 1962
Review: AVL Tree An AVL tree extends BST data structure and include the following property: Any node in the tree has the height difference between its left and right sub-trees is at most one. Observe that: The smallest AVL tree of depth 1 has 1 node. The smallest AVL tree of depth 2 has 2 nodes. Th-1 Th-2 h-2 h-1 h Th x Th = Th–1 + Th–2 + 1 Size of tree: AVL: Adelson-Velsky and Landis, 1962 27-Apr-2009 Data Structure Review

10 AVL: Maintaining the AVL Property
Tree rotation is the key operations for maintaining the AVL property in O(lg n) time. If a node is not balanced, the difference between its children heights is 2. 4 possible cases with a height difference of 2. y x B C A x y B A C x y C A B y x C A B (1) (2) (3) (4) 27-Apr-2009 Data Structure Review

11 AVL: Maintaining the AVL Property (2)
Case 1: rightRotate (y); x = y.getLeftChild(); y.setLeftChild(x.getRightChild()); x.setRightChild(y); x = y; Case 3: leftRotate (y); rightRotate (x); Case 2: leftRotate(x); y= x.getRightChild(); x.setRightChild(y.getLeftChild()); y.setLeftChild(x) x = y; Case 4: rightRotate(x); leftRotate (y); C y x A B rightRotate(y) leftRotate(x) 27-Apr-2009 Data Structure Review

12 AVL: Insert/Delete Insertion is similar to a regular BST Insert:
Search for the position : Keep going left (or right) in the tree until a null child is reached. Insert a new node in this position. An inserted node is always a leaf. Rebalance the tree: Search from inserted node to root looking for any node that violate the AVL property. Use rotation to fix. Only require to find the first unbalanced node. Deletion is similar to a regular BST Delete: Search for the node. Remove it : 0 children: replace it with null 1 child: replace it with the only child 2 children: replace it with right-most node in the left subtree Rebalance the tree: Search from inserted node to root for all node that violate the AVL property. Use rotation to fix. Require to work all the way back to the root. 27-Apr-2009 Data Structure Review

13 Review: Red-Black Trees
Red-black trees extends BST data structure and include the following properties: The root is always black. Every node is either red or black. Every leaf (NULL pointer) is black (every “real” node has 2 children). Both children of every red node are black (can’t have 2 consecutive reds on a path). Every simple path from node to descendent leaf contains the same number of black nodes. RB tree has height h  2 lg(n+1). So, operation is guaranteed to be the height h = O(lg n). 27-Apr-2009 Data Structure Review

14 RB Trees: Maintaining RB Tree Property
Tree rotation is the key operation for maintaining a RB tree property in O(lg n) time: Rotation preserves in-order key ordering Rotation takes O(1) time (just swaps pointers) C y x A B rightRotate(y) leftRotate(x) 27-Apr-2009 Data Structure Review

15 RB Trees: Insert/Delete
Insertion is similar to BST’s insert: BST Insert. Color the new node red. Rebalance the tree: If parent is black, done. Otherwise: Parent’s sibling is red. Parent’s sibling is black and new node is a right child. Parent’s sibling is black and new node is a left child. Repeat, moving up the tree until there are no violation. Deletion is similar to BST’s delete: BST Delete; Rebalance the tree: If node is red, color black, done. Otherwise: Sibling has two black children. Sibling’s children are both black. Sibling's left child is red. sibling's right child is black, Sibling is black, sibling's right child is red. Repeat, moving up the tree until there are no violation. 27-Apr-2009 Data Structure Review

16 Review: 2-3 B-Trees A B-tree extends tree data structure and has the following properties: The root is either a leaf or has between 2 and m children. Each internal node has between ceiling(m/2) and m children. Each internal node has between ceiling(m/2)-1 and m-1 keys. A leaf node has between 1 and m-1 keys. The tree is perfectly balanced. So, a 2-3 B-tree is a B-tree of 3 order. A node can have 2 or 3 children, which means that a node can have 1, 2 or 3 keys. R-B tree is a B-tree with degree 2. < x , y> <=x >x and <=y >y 27-Apr-2009 Data Structure Review

17 2-3: Insert/Delete Deletion is similar to delete in a BST.
Insertion is similar to insert in a BST: Searching for the item. If found, done. Otherwise, Stop at a 2-node? Upgrade the 2-node to a 3-node. Stop at a 3-node? Replace the 3-node by 2 2-nodes and push the middle value up to the parent node. Repeat recursively until you upgrade a 2-node or create a new root. When is a new root created? Deletion is similar to delete in a BST. Start deletion at a leaf. Swap the value to be deleted with its immediate successor in the tree. Delete the value from the node. If the node still has a value, done. We’ve changed a 3-node into a 2-node; Otherwise, Find a value from sibling or parent. 27-Apr-2009 Data Structure Review

18 Review: Hash Tables Given n elements, each with a key and satellite data, we need to support: insert (T, x), delete (T, x), and search(T, x), But, don’t care about sorting the elements. Suppose no two elements have the same key and the range of keys is 0…m-1, where m is not too large. Set up an array T[0…m-1] in which. T[i] = x if x T and i=h(key(x)); T[i] = NULL otherwise. h() is called the hash function (or hashing) and T is called a direct-address table. Hash tables support insert, delete, and search in O(1) expected time. 27-Apr-2009 Data Structure Review

19 Hash: Resolving Collisions
Collision happens when two keys hash to the same memory location. Two ways to resolve collisions: Open addressing: To insert, if slot is full, try another slot, and another, until an open slot is found (probing). To search, follow same sequence of probes as would be used when inserting the element. Chaining: To insert, keep linked list of elements in slots. Upon collision, just add new element to list. To search: search the linked list. 27-Apr-2009 Data Structure Review

20 Hash: Choosing A Hash Function
Choosing a good hash function is crucial. Bad hash function puts all elements in same slot. A good hash function: Should distribute keys uniformly into slots. Should not depend on patterns in the data. There are three common hash functions: Division method: h(k) = k mod m, m is prime number; Multiplication method: h(k) = floor (m * (kA mod 1)); Universal method: h(k) = f(g(k) and g(k) = ((ak +b) mod p); 27-Apr-2009 Data Structure Review

21 Review: Graphs A graph G = (V, E), where V = set of vertices, E = set of edges. Dense graph: |E|  |V|2 Sparse graph: |E|  |V| Undirected graph: Edge (u,v) = edge (v,u) No self-loops Directed graph: Edge (u,v) goes from vertex u to vertex v, notated uv A weighted graph associates weights with either the edges or the vertices. 27-Apr-2009 Data Structure Review

22 Graphs: Adjacency Matrix
Assume V = {1, 2, …, n}. An adjacency matrix represents the graph as a n x n matrix A: A[i, j] = 1 if edge (i, j)  E (or weight of edge) = 0 if edge (i, j)  E 1 2 4 3 a d b c A 1 2 3 4 27-Apr-2009 Data Structure Review

23 Graphs: Adjacency List
An adjacency list represents the graph as an array of linked list. For each vertex v  V, store a list of vertices adjacent to v. Example: Adj[1] = {2,3} Adj[2] = {3} Adj[3] = {} Adj[4] = {3} Variation: can also keep a list of edges coming into vertex. 1 2 4 3 27-Apr-2009 Data Structure Review

24 Graphs: Storage Adjacency matrix takes O(V2) storage.
Usually too much storage for large graphs. But can be very efficient for small graphs. Adjacency list takes O(V+E) storage: The degree of a vertex v = # incident edges. For directed graphs, # of items in adjacency lists is:  out-degree(v) = |E| takes (V + E) storage. For undirected graphs, # items in adjacency lists is:  degree(v) = 2 |E| (handshaking lemma) also takes (V + E) storage. Most large interesting graphs are sparse. E.g., planar graphs, in which no edges cross, have |E| = O(|V|) by Euler’s formula; So, the adjacency list is often a more appropriate representation. 27-Apr-2009 Data Structure Review

25 Review: Graph Searching
Given: a graph G = (V, E), directed or undirected. Goal: systematically explore every vertex and every edge. General idea: build a tree on the graph. Pick a vertex as the root, Choose certain edges to produce a tree. Note: might also build a forest if graph is not connected. 27-Apr-2009 Data Structure Review

26 Breadth-First Search General idea: Associate vertex “colours”:
Expand frontier of explored vertices across the breadth of the frontier. Pick a source vertex to be the root. Find (“discover”) its children, then their children, etc. Associate vertex “colours”: White vertices have not been discovered. All vertices start out white. Grey vertices are discovered but not fully explored. They may be adjacent to white vertices. Black vertices are discovered and fully explored. They are adjacent only to black and grey vertices. Explore vertices by scanning the FIFO queue of grey vertices. 27-Apr-2009 Data Structure Review

27 BFS and Shortest-path BFS can thought of as being like Dijkstra’s for shortest-path except every edge has the same weight. BFS calculates the shortest-path distance to the source node. Shortest-path distance (s,v) = minimum number of edges from s to v, or  if v not reachable from s. Proof should be in the book. BFS builds breadth-first tree, in which paths to root represent shortest paths in G. Thus can use BFS to calculate shortest path from one vertex to another in O(V+E) time. 27-Apr-2009 Data Structure Review

28 Depth-First Search General idea: Like BFS, associate vertex “colours”:
Explore “deeper” in the graph whenever possible. Edges are explored out of the most recently discovered vertex v that still has unexplored edges. When all of v’s edges have been explored, backtrack to the vertex from which v was discovered. Like BFS, associate vertex “colours”: Vertices initially white. Then coloured grey when discovered. Then coloured black when finished. Explore vertices by scanning the stack of grey vertices 27-Apr-2009 Data Structure Review

29 DFS And Cycles An undirected graph is acyclic iff a DFS yields no back edges If acyclic, no back edges (because a back edge implies a cycle). If no back edges, acyclic. No back edges implies only tree edges (Why?) Only tree edges implies we have a tree or a forest, which by definition is acyclic. Thus, can run DFS to find whether a graph has a cycle. We can actually determine if cycles exist in O(V) time: In an undirected acyclic forest, |E|  |V| - 1. So count the edges: if ever see |V| distinct edges, must have seen a back edge along the way. 27-Apr-2009 Data Structure Review

30 Remarks (1) Clearly data structures and algorithms are closely related: Selecting the most efficient data structure and algorithm will almost always be the best way to proceed. However, consideration of many factors are required to produce a good implementation: The obvious solution isn’t always the best. Sometimes it makes sense to have multiple data structures, each with different properties, to represent a single object. Factors to be considered are: The memory footprint implied by a given representation. The cost of operations in that representation. The cost of converting to another representation. The amount of computation expected using a given representation. 27-Apr-2009 Data Structure Review

31 Remarks (2) When it comes to the implementation of an algorithm, the main point is that constant factors matter. Mapping algorithms and data structures in a way that matches the architecture characteristics is VERY important ! Often, require to restructure a program, not functionally but behaviourally, to get better performance. However, restructuring code, can be a bit more involved than just performing optimisations. So, the bottom line is to think about trade-offs that could change the quality of an implementation. Direct, obvious algorithm translations don’t always mean good performance; Best performance comes from considering the many aspects of execution, e.g., memory access, processor characteristics, language overheads. 27-Apr-2009 Data Structure Review

32 Good Luck! 27-Apr-2009 Data Structure Review


Download ppt "Data Structure – Final Review"

Similar presentations


Ads by Google