ITEC 2620M Introduction to Data Structures

ITEC 2620M Introduction to Data Structures
Instructor: Prof. Z. Yang Course Website: Office: DB 3049

Graphs

Key Points Graph Algorithms Definitions, representations, analysis
Shortest paths Minimum-cost spanning tree

Basic Definitions A graph G = ( V, E ) consists of a set of vertices V and a set of edges E – each edge E connects a pair of vertices in V. Graphs can be directed or undirected. redraw above with arrows – first vertex is source Graphs may be weighted. redraw above with weights, combine definitions A vertex vi is adjacent to another vertex vj if they are connected by an edge in E. These vertices are neighbors. A path is a sequence of vertices in which each vertex is adjacent to its predecessor and successor. The length of a path is the number of edges in it. The cost of a path is the sum of edge weights in the path

Basic Definitions (Cont’d)
A cycle is a path of length greater than one that begins and ends at the same vertex. A simple cycle is a cycle of length greater than three that does not visit any vertex (except the start/finish) more than once. Two vertices are connected if there as a path between them. A subset of vertices S is a connected component of G if there is a path from each vertex vi to every other distinct vertex vj in S. The degree of a vertex is the number of edges incident to it. – the number of vertices that it is connected to A graph is acyclic if it has no cycles (e.g. a tree) . A directed acyclic graph is called a DAG or digraph

Representations The adjacency matrix of graph G = ( V, E ) for vertices numbered 0 to n-1 is an n x n matrix M where M[i][j] is 1 if there is an edge from vi to vj, and 0 otherwise. The adjacency list of graph G = ( V, E ) for vertices numbered 0 to n-1 consists of an array of n linked lists. The ith linked list includes the node j if there is an edge from vi to vj. Example

Comparisons and Analysis
Space adjacency matrix uses O( ) space (constant) adjacency list uses O(|V| + |E|) space (note: pointer overhead) better for sparse graphs (graphs with few edges) Access Time Is there an edge connecting vi to vj? adjacency matrix – O(1) adjacency list – O(d) Visit all edges incident to vi adjacency matrix – O(n) Primary operation of algorithm and density of graph determines more efficient data structure. complete graphs should use adjacency matrix traversals of sparse graphs should use adjacency list

Spanning Tree and Shortest Paths
Minimum-Cost Spanning Tree assume weighted (undirected) connected graph use Prim’s algorithm (a greedy algorithm) from visited vertices, pick least-cost edge to an unvisited vertex Shortest Paths use Dijkstra’s algorithm (a greedy algorithm) build paths from unvisited vertex with least current cost

HASHING

Key Points Hash tables Hash functions
Collision resolution and clustering Deletions

Indices vs. Keys Each key/record is associated with an array slot.
We could map each key to each slot. e.g. last name to apartment number We could then search either the array (unsorted?) or a look-up table (sorted?) . However, what if the look-up is actually a calculated function? eliminate look-up!

Hash Functions A hash function h() converts a key (integer, string, float, etc) into a table index. Example

Hash Tables Records are stored in slots specified by a hash function.
Look-up/store Convert key into a table index with hash function h() h(key) = index Find record/empty slot starting at index = h(key) (use resolution policy if necessary)

Comments Hash function should evenly distribute keys across table.
not easy given unspecified input data distribution Hash table should be about half full. note: time-space tradeoff more space -> less time (and already twice as much space as a sorted array) if half full, 50% chance of one collision 25% chance of two collisions etc... 2 accesses on average (approaches n as table fills)

How to do better What to do with collisions?
linear probing (“classic hashing”) if collision, search spaces sequentially To eliminate clustering, we would like each remaining slot to have equal probability. Can’t use random – needs to be reproducable. Pseudo-random probing (see text) Goal of random probing? --> cause divergence Probe sequences should not all follow same path.

Quadratic Probing Simple divergence method
Linear probing – ith probe is i slots away Quadratic probing

Secondary Clustering If multiple keys are hashed to the same index/home position, quadratic probing still follows the same path each time. This is secondary clustering Use second hash function to determine probe sequence.

ITEC 2620M Introduction to Data Structures

Similar presentations

Presentation on theme: "ITEC 2620M Introduction to Data Structures"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ITEC 2620M Introduction to Data Structures

Similar presentations

Presentation on theme: "ITEC 2620M Introduction to Data Structures"— Presentation transcript:

Similar presentations

About project

Feedback