Presentation on theme: "Introduction to graph theory and applications Introduction and definitions Typical network modelling applications Classic graph theory problems and proofs."— Presentation transcript:
Introduction to graph theory and applications Introduction and definitions Typical network modelling applications Classic graph theory problems and proofs The Seven Bridges of Königsburg The four colour map colouring theorem The three cottage problem Data structures used for storing graphs Incidence and adjancancy lists and matrixes
Definitions Graph theory concerns the study of networks based on a mathematical abstraction of the form of a graph. A graph is made up of vertices (singular: vertex) and edges. An edge connects exactly 2 vertices. These are the same vertex in the special case where the edge is called a loop or loopback. A vertex can have any number of edges connected to it. Edges might be directed, and drawn as an arrow to indicate that network flow or traffic or the connection represented by the edge works in one direction only. In an undirected graph the edges are bidirectional. Edges may be associated with a numeric cost. The meaning of edge cost will depend upon the graph application.
Vertices and bidirectional edges
More definitions A path is a route through a graph visiting vertices and edges in turn. A cycle is a path that ends at the starting vertex. A Hamiltonian path or cycle visits all vertices exactly once. A Eulerian path or cycle visits all edges exactly once. A graph is simple if no edge is a loop, and no 2 edges have the same endpoints. A planar graph can be drawn on a plane surface, e.g. a piece of paper, so that no edges cross over other edges.
Even more definitions... A directed acyclic graph is a directed graph without cycles, i.e. you can't get back to the vertex where you started by following edges in their defined direction. This is a way to map a forest of trees so that subtrees can be shared between trees. Sources are vertices all of whose edges lead out from them and sinks are vertices all of whose edges lead into them. A tree is a graph which is connected, acyclic and simple.
Typical Graph Applications 1 Modelling a road network with vertexes as towns and edge costs as distances. Modelling a water supply network. A cost might relate to current or a function of capacity and length. As water flows in only 1 direction, from higher to lower pressure connections or downhill, such a network is inherently an acyclic directed graph. Modelling the recent contacts of someone who has become ill with a notifiable illness, e.g. Sars or Meningitis. Edge costs might be a function of the probability that the contact resulted in an infection.
Typical Graph Applications 2 Dynamically modelling the status of a set of routes by which traffic might be directed over the Internet. Modelling the connections between a number of potential witnesses or suspects who were reported or came forward as having been within the vicinity of a serious crime within an hour of of when it occurred. Minimising the cost and time taken for air travel when direct flights don't exist between starting and ending airports. Using a directed graph to map the links between pages within a website and to analyse ease of navigation between different parts of the site.
Classic Graph Theory Problems Graph theory started from a mathematical curiosity. "The Seven Bridges of Königsberg is a problem inspired by an actual place and situation. The city of Kaliningrad, Russia (at the time, Königsberg, Germany) is set on the Pregolya River, and included two large islands which were connected to each other and the mainland by seven bridges. The question is whether it is possible to walk with a route that crosses each bridge exactly once, and return to the starting point. In 1736, Leonhard Euler proved that it was not possible." Source: http://en.wikipedia.org/wiki/Seven_Bridges_of_Koenigsberg
Seven Bridges of Königsberg 2 "In proving the result, Euler formulated the problem in terms of graph theory, by abstracting the case of Königsberg -- first, by eliminating all features except the landmasses and the bridges connecting them; second, by replacing each landmass with a dot, called a vertex or node, and each bridge with a line, called an edge or link. The resulting mathematical structure is called a graph."
Seven Bridges of Königsberg 3 "The shape of a graph may be distorted in any way without changing the graph itself, so long as the links between nodes are unchanged. It does not matter whether the links are straight or curved, or whether one node is to the left of another. Euler realized that the problem could be solved in terms of the degrees of the nodes. The degree of a node is the number of edges touching it; in the Königsberg bridge graph, three nodes have degree 3 and one has degree 5. Euler proved that a circuit of the desired form is possible if and only if there are no nodes of odd degree. Such a walk is called an Eulerian circuit or an Euler tour. Since the graph corresponding to Königsberg has four nodes of odd degree, it cannot have an Eulerian circuit."
Seven Bridges of Königsberg 4 "The problem can be modified to ask for a path that traverses all bridges but does not have the same starting and ending point. Such a walk is called an Eulerian trail or Euler walk. Such a path exists if and only if the graph has exactly two nodes of odd degree, those nodes being the starting and ending points. (So this too was impossible for the seven bridges of Königsberg.)"
The four colour theorem This theorem states that given any set of areas on a planar surface, e.g. representing political regions on a map, it is possible to colour every area so that no 2 regions sharing a border need use the same colour if 4 colours are used. Sharing a border means a linear border between 2 areas of more than zero length; e.g. many regions can meet at a point but this doesn't count as a border. Also an area within the set must be single and contiguous.
The four colour theorem 2 Political geographic entities such as countries sometimes break this requirement by having enclaves, e.g. the USA includes Alaska which is not part of the largest geographically contiguous region of the USA. Old maps of English and Welsh counties split the county of Flint into 2 seperate areas. This theorem comes from a question (conjecture) asked by a student Francis Guthrie of his maths lecturer Augustus De Morgan in 1852. It was finally proved in 1976.
Graph theory presentation of the theorem "To formally state the theorem, it is easiest to rephrase it in graph theory. It then states that the vertices of every planar graph can be colored with at most four colors so that no two adjacent vertices receive the same color. Or "every planar graph is four-colorable" for short. Here, every region of the map is replaced by a vertex of the graph, and two vertices are connected by an edge if and only if the two regions share a border segment (not just a corner)" Source: http://en.wikipedia.org/wiki/Four-color_theorem
Three cottage problem Source: http://en.wikipedia.org/wiki/Three_cottage_problem "The three cottage problem is a well-known mathematical puzzle. It can be stated like this: Suppose there are three cottages on a plane (or sphere) and each needs to be connected to the gas, water, and electric companies. Is there a way to do so without any of the lines crossing each other?"
Three cottage problem 2 "The problem is part of the mathematical field of topological graph theory which studies the embedding of graphs on surfaces. In more formal graph theoretic terms the problem asks whether the complete bipartite graph K3,3 is planar. Kazimierz Kuratowski proved in 1930 that K3,3 is nonplanar, and thus that the three cottage problem has no solution. But K3,3 is toroidal, that is it can be embedded on the torus. In terms of the three cottage problem this means the problem can be solved by punching a hole through the plane (or the sphere). This changes the topological properties of the surface and using the hole we can connect the three cottages without crossing lines."
Three cottage problem 3 From this description note that the term: "complete bipartite graph" means a graph with 2 sets of vertices where every vertex in one set is connected by an edge to every vertex in the other set. The designation K3,3 for this graph indicates that there are 3 vertices in each of the 2 sets, representing utilities and cottages respectively. K5 is diagrammed on the right.
Three cottage problem 4 Kuratowski's proof is based on a process involving simplification of complex graphs into simpler graphs by removing vertices and edges in certain situations and finding instances of one of the 2 simplest non planar graphs as subsets of the more complex graph. The other simplest non-planar graph is a complete graph known as K5, because it has a single set of 5 vertices with each vertex connected to every other vertices.
Data structures used for storing graphs Before choosing one of various options or designing a customised approach, programmers need to consider application data, and the means by which this will be attached to vertices and edges. Records will need to be attached either to vertices, or to edges, or one type of record might need attachment to vertices and another to edges. The choice and design of data structure(s) will influence and be influenced by program performance and memory usage requirements.
Incidence list The connectivity of an edge is regular in that each represents a connection between exactly 2 vertices. This lends itself to fixed- size record designs, e.g. an array of records known as an incidence list. The coding in the next slide uses pointers to the vertex records, as a vertex may appear in any number of edge records. Alternatively, if the vertices are stored in an array which is populated before the edge records are created, and unchanged during the runtime of the application, the edge records could store the integer array indices. Which approach is optimal depends upon how vertex records are stored and how this collection of data is searched and updated.
Incidence list structure coding
Adjacency list The connectivity of a vertex is irregular, in the sense each can be connected using a variable number of edges. Vertex information can be stored in a fixed width record containing the head pointer of a linked list. Each linked list item will represent an edge. The latter will contains references (e.g. array indices, keys or pointers) to edge record storage so this isn't duplicated. If this isn't done within the edge records themselves, the linked list nodes must also reference other vertices connected via the edges. This approach is known as an adjacency list.
Adjacency list structure coding
Adjacency list coding comments The above structure is better suited to a directed graph than an undirected graph. A link from a webpage to another page (i.e. a directed edge as in the above example ) does not require a link in the opposite direction. Using this kind of approach for an undirected graph would either lead to duplication of information, in the sense that each edge would have to be stored and updated twice, or added complexity of path access and traversal. Edge costs, if required, could be stored as an additional field within the url_list structure, for which each node in the list is an edge. A reason for having edge costs in this example might be if not all links between pages are equally easy for a user to find.
Incidence matrix This structure is a 2 dimensional array where the rows index the edges, and columns index the vertices. In the simplest implementation the array element is boolean, with a 1 indicating a connection and a 0 indicating no connection. The advantage of this structure is that it enables data to be rapidly indexed, either for vertices or for edges. The disadvantage is that memory proportional to V x E is required, where V is the number of vertices, and E the number of edges. List or array based structures require memory proportional to V + E.
Adjacency matrix This is a V x V 2D array where V is the number of vertices. In the simplest case, if the data present at index [i][j] is a 1 this indicates a (possibly directed) edge from i to j, while a 0 indicates there being no such edge. If the edges have costs, this could be a floating point number present at index [i][j], with a sentinel value, e.g. 0 or -1, indicating lack of an edge from vertex i to vertex j. This data structure makes it easier to search for subgraphs of particular patterns or identities, e.g. K3,3 or K5.
Conclusions Graph theory enables us to study and model networks and solve some difficult problems inherently capable of being modelled using networks. Various terms e.g. vertex and edge, are associated with graph theory which gives these terms special meanings. These meanings need to be understood and remembered in order to apply graph theoretic approaches to solving problems. When solving a problem by developing a graph-based program, careful attention must be given at the design stage to the structuring of data to help make solving the problem tractable, to enable linkages to be traced efficiently and to avoid duplication of data.