# The Structure and Function of Complex Networks Part I Jim Vallandingham M. E. J. Newman.

## Presentation on theme: "The Structure and Function of Complex Networks Part I Jim Vallandingham M. E. J. Newman."— Presentation transcript:

The Structure and Function of Complex Networks Part I Jim Vallandingham M. E. J. Newman

Introduction Paper is a review of – Network types – Common network properties – Network models Examine large networks – Millions / Billions of nodes Statistical methods are an attempt to find something to play the part of the eye in current network analysis

Organization I.Definitions II.Types of Networks III.Properties of Networks IV.Random Graphs V.Extensions to Random Graphs VI.Markov Graphs

Definitions Network | Graph: – Composed of items : vertices / nodes – Connections between vertices : edges Directed edge: – One that runs in only one direction Degree: – Number of edges connected to a vertex – Directed graph has an in-degree and out-degree for each vertex

Definitions VertexDegree 12 23 32 43 53 61 Undirected Graph Vertex In-DegreeOut-Degree 102 220 322 411 Directed Graph

Definitions Component: – Set of vertices connected together by edges Geodesic Path: – The shortest path through the network from one vertex to another. – Can be multiple geodesic paths between two vertices Diameter: – Length of the longest geodesic path – In terms of edges

Definitions Three components in a network

Types of Networks A.Social Networks B.Information Networks C.Technological Networks D.Biological Networks

Social Networks Definition: – Set of people or groups of people with some interaction pattern between them Early Work: – Southern Women Study Social circles of small southern town in 1936 – Social networks of factory workers in 1930s Current Work: – Business communities – Sexual partner studies

Social Networks Internet Chat Relay (IRC) communications between individuals

Social Networks Dating relationships between students in a high school

Social Networks Small-World experiments – Looked at the distribution of path lengths in network – Participants were asked to pass letter around in an attempt to reach a specific individual – Shown that there is usually short path between any two vertices in a network – Later became the basis of the 6 degrees of separation concept.

Social Networks Problems with traditional social networks – Based on questionnaires Labor intensive process which limits the size of network Source of bias which skews results – Friend might mean different thing to different people Presents need for other methods for probing social networks

Social Networks Collaboration Networks – Affiliation networks in which vertices collaborate in groups of some sort – Edges are created between pairs of nodes that have a common group membership – Classic Example : IMDB – Internet Movie Database Vertices are actors Edges indicate two actors have been in the same film together

Social Networks

Other social network data sources – Phone Calls – Email – Instant Messaging Produce Millions of pieces of data a day – Demonstrate the need for new analytical methods

Information Networks Also known as knowledge networks Definition: – Representation of how information moves through a population or group Classic Example: – Network of citations between academic papers Directed edges Mostly acyclic – Papers can only cite other papers already written and not future papers. (not always true)

Information Networks Citation Network for Inferring network mechanisms: The Drosophila melanogaster protein interaction network

Information Networks The World Wide Web – Network of information containing pages Vertices are the pages themselves Edge is created when one page links to another – No constraints as seen in the citation network Cycles Multiple edges between vertices – Power-law in-degree and out-degree distributions

Information Networks Graph of Relationships between Facebook pages. Example of an Information Network with Social Network aspects.

Information Networks Preference Networks – Includes two kinds of vertices Individuals Objects of their preference – Example: books or films – Edges connect vertices of different types – Edges can be weighted – Example of Bipartite Information Network

Technological Networks Definition: – Man-made networks designed for the transportation of a resource or commodity Examples – Power grid – Airline routes – The Internet Physical network of machines

Technological Networks Bandwidth transfer in Europe between countries

Biological Networks Wide variety of biological systems can be represented as networks Metabolic Pathways – Vertices are metabolic substrates and products – Directed edges between known reaction exists that produces product from substrate Protein Interactions – Mechanistic physical interactions between proteins

Biological Networks

Portion of yeast protein interactions

Biological Networks Gene Regulatory Networks – Expression of protein coded by particular genes – Controlled by other proteins Act as inducers and inhibitors – Vertices represent proteins – Edges represent dependencies between proteins – One of the first networked dynamical systems for which large-scale modeling attempts were made

Biological Networks Food Webs – Vertices represent species – Directed edge indicates predatory relationship Could be the other way in terms of carbon movement Neural Networks – Actual biological neuron pathways

Biological Networks Reef fish food web

Biological Networks Rat hippocampal neurons

Properties of Networks Look at features that are common to many types of networks May or may not encode important or relevant information for any one graph Might be suggestive of the mechanisms in how real networks are formed Most involve how real networks are different than random graphs

Properties of Networks Small-World Effect Transitivity Degree Distribution Network Resilience Mixing Patterns Degree Correlation Community Structure Network Navigation

Properties of Networks Small-World Effect Transitivity Degree Distribution Network Resilience Mixing Patterns Degree Correlation Community Structure Network Navigation

Properties: Small-World Effect Most pairs of vertices are connected by a relatively short path through the network Distance between any two vertices in a graph is usually much smaller than the total number of vertices Deals with the geodesic distance property – Uses Mean Geodesic Distance :

Properties: Small-World Effect can be measured in O(mn) time where m is the number of edges n is the number of vertices – Usually is much smaller than n Can be problematic if there are multiple components in the graph – Represented as edges and thus average geodesic distance – Alternate way is to exclude any vertices that connect multiple components

Properties: Small-World Effect This property implies that spread of x through real networks occurs fast – Rumor – Information Mathematically obvious – If number of vertices within distance r grows exponentially – Value of will increase as log n – small-world can refer to networks in which value of l scales logarithmically or slower with network size

Properties: Small-World Effect Biological example: protein-protein interactions in the yeast, S. cerevisiae Vertices: 1870 Edges: 2240 : 6.80

Properties of Networks Small-World Effect Transitivity Degree Distribution Network Resilience Mixing Patterns Degree Correlation Community Structure Network Navigation

Properties: Transitivity Probability that if vertex A is connected to vertex B, and vertex B is connected to vertex C, than vertex A will also be connected to vertex C In social network terms: the friend of your friend is likely also to be your friend Also known as clustering – This is confusing as it has another meaning – Quantified using the Clustering Coefficient

Properties: Transitivity C : Clustering coefficient

Properties: Transitivity 1 8 2 1 3 4 5 6 7 8 Fraction of Transitive Triples

Properties: Transitivity Can also be defined locally for each vertex With this value the definition of C becomes:

Properties: Transitivity Alternative method for clustering coefficient 1 C 1 = 1 / 1 = 1 2 C 2 = 1 C 3 = 1/6 C 4 = 0 C 5 = 0 3 4 5 C = 1/5(1+1+(1/6)) C = 13/30

Properties: Transitivity Two definitions labeled C (1) and C (2) in text Effectively reverses the order of the operations: – Taking the ratio of triangles to triples – Averaging over vertices C (2) calculates the mean of the ratio C (1) calculates the ratio of the means C (2) tends to weigh contributions of low-degree vertices more heavily – Give significantly different results

Properties: Transitivity C i used often as well in sociological literature – Called network density Both C (1) and C (2) usually are significantly higher in real networks than random graphs

Properties of Networks Small-World Effect Transitivity Degree Distribution Network Resilience Mixing Patterns Degree Correlation Community Structure Network Navigation

Properties: Degree Distributions Degree of a vertex is the number of edges connected to that vertex p k is the probability that a vertex chosen at random has a degree k Look at by creating a histogram of p k – Called the degree distribution for that network

Properties: Degree Distributions

Real World networks are usually highly right- skewed – Long right tail of values above the mean Measuring of the tail is difficult – small sample size in that section – Usually noisy

Properties: Degree Distributions Histograms depicting the Noise and lack of measurements indicative of the tail section of the degree distribution

Properties: Degree Distributions Many real world graph degree distributions follow power laws in their tails – p k ~ k -α for some constant α Others have exponential tails – p k ~ e -k/κ Knowing this makes power-law and exponential distributions easy to find experimentally – Plot on logarithmic scales : power laws – Semi-logarithmic scales : exponentials

Properties: Degree Distributions Power lawExponential

Properties: Degree Distributions Power-law degree distributions sometimes called scale-free networks Include networks of: – World wide web – Metabolic pathways – Telephone calls

Properties of Networks Small-World Effect Transitivity Degree Distribution Network Resilience Mixing Patterns Degree Correlation Community Structure Network Navigation

Properties: Network Resilience How resilient is a network to the removal of its vertices – How the geodesic distance is affected by node deletion Two main removal processes discussed 1.Random removal of vertices 2.Targeted removal Usually remove the vertices with highest degrees

Properties: Network Resilience Two recent studies done on the resilience of the Internet and World Wide Web – One study found that these networks resilient to random deletions but vulnerable to targeted attacks – Other study found the opposite: WWW resilient to targeted attack as well as deletion of all vertices with degree greater than 5 would be needed – Difference attributed to the high skew of degree distribution as only a very small fraction of nodes have degree greater than 5

Properties: Network Resilience Biological Example: – Metabolic network of yeast Diameter: total of all path lengths divided by total number of paths Targeted Random

Properties of Networks Small-World Effect Transitivity Degree Distribution Network Resilience Mixing Patterns Degree Correlation Community Structure Network Navigation

Properties: Mixing Patterns What types of vertices associate with other types of vertices Examples: – Food web: Many links between herbivores and carnivores Few links between carnivores and plants – Internet: Many links between end-users and ISPs Few between end-users and backbone

Properties: Mixing Patterns Quantified by assortativity coefficient Other ways to look at assortative mixing – By scalar characteristics Age, income – Vector characteristics Location : 2D vector

Properties of Networks Small-World Effect Transitivity Degree Distribution Network Resilience Mixing Patterns Degree Correlation Community Structure Network Navigation

Properties: Degree Correlations Special case of assortative mixing – Based on a particular scalar vertex property : degree Do high-degree vertices prefer other high- degree? Do high-degree associate more with low- degree vertices?

Properties: Degree Correlations Several different ways to quantify: – Two-dimensional histogram – One-parameter curve based on the degree – A single number Positive for assortatively mixed networks Negative for disassortative networks Social networks tend to be assortative All other networks discussed are disassortative

Properties: Degree Correlations Degree Increasing Highest degree correlation Yeast protein interactions

Properties of Networks Small-World Effect Transitivity Degree Distribution Network Resilience Mixing Patterns Degree Correlation Community Structure Network Navigation

Properties: Community Structure Structure and formation of groups in the network Social Networks: – People tend to divide into sub-sections based on common interests, occupations, etc. Cluster Analysis – Extracting community structure from a network – Assigns connection strength to vertex pairs of interest – Finished process of cluster analysis can be represented by a tree or dendrogram

Properties: Community Structure Groups in protein interactions

Properties of Networks Small-World Effect Transitivity Degree Distribution Network Resilience Mixing Patterns Degree Correlation Community Structure Network Navigation

Properties: Network Navigation Finding paths in networks Use some domain knowledge about the network – Example: small-world experiments – people knew who to give the letter to so as to reach the destination quickly If it were possible to construct artificial networks that were easy to navigate in the same way social networks seem to be, then they could be used for databases or P2P networks

Other Properties Largest Component Size – The Giant component Betweenness Centrality: – Number of geodesic paths between other vertices that run through a particular vertex Recurrent Motifs: – Small sub-graphs that repeat in the network

Random Graphs Poisson Random Graphs Configuration Model Extensions to Random Graphs Markov Graphs

Poisson Random Graphs Developed by – Solomnoff and Rapoport (1951) – Erdős and Rényi (1959) Used as a straw man when discussing graph theory Most of the interesting work is in how real world graphs are not like random graphs

Poisson Random Graphs Building Random Graphs: Very simple process – Take some number n of vertices – Connect each pair with a probability p

Poisson Random Graphs Many properties of the random graph are exactly solvable in the limit of large graph size. Probability of a vertex having degree k : – (Degree Distribution) Hence the name Poisson Exact in large graph limit

Poisson Random Graphs Expected structure varies with p. Most important property: phase transition – From low-density, low-p state Containing few edges and all components are small – To high-density, high-p state Extensive fraction of all vertices are joined together in single giant component Giant component is main significant feature of random graphs discussed in this paper

Poisson Random Graphs Two properties in random graphs : – Giant component size Calculating the expected size of the giant component: – Mean size of the non-giant components:

Poisson Random Graphs

Models – Small-world effect Typical distance through network log n / log z Does not Model – Clustering coefficient Lower than real world – Degree Distribution Poisson instead of power-law / exponential – Random Mixing Pattern – No community structure – Navigation is impossible using local algorithms

Poisson Random Graphs Linear graphLogarithmic graph Scale-free random

Poisson Random Graphs Still, it forms the basis of our basic intuition about how networks behave Giant component & phase transition are ideas that underlie much of graph theory Many future models started with this random graph as a springboard

Random Graphs Poisson Random Graphs Configuration Model Extensions to Random Graphs Markov Graphs

Configuration model Trying to make random graphs more realistic Configuration model incorporates idea of non- Poisson degree distribution Building configuration model: – p k : degree distribution : the fraction of vertices having degree k – Degree sequence a set of n values of the degrees k i of vertices i = 1 … n Visualized as giving each vertex k i spokes sticking out of it – Choose pairs of spokes at random and connect them

Configuration model Two important points on the configuration model 1.p k is the distribution of degrees of vertices But not the degree of the vertex reached by following a randomly chosen edge k edges that arrive at a vertex of degree k, we are k times as likely to arrive at that vertex as some other vertex of degree = 1. Thus degree distribution of a random vertex is proportional to k p k

Configuration model 2.Chance of finding a loop in a small component of the graph goes as n -1 – Probability that there is more than one path between any pair of vertices is O(n -1 ) – Not true of most real world networks

Configuration model Example : power-law degree distribution

Configuration model Gets rid of Poisson degree distribution Still no clustering (transitivity) Explanation : – Configuration model graphs are suitable for modeling the global network – Clustering is a characteristic of the local network

Random Graphs Poisson Random Graphs Configuration Model Extensions to Random Graphs Markov Graphs

Extension to Random Graph: Directed Graphs Directed Graphs: Each vertex has – An in-degree : j – An out-degree: k Control both in creation of the random graph

Extension to Random Graph: Directed Graphs Use of extended random graph to model directed network: WWW

Extension to Random Graph: Bipartite Graphs Have two types of nodes Edges run only between two different types Work well for modeling some real world networks Fail to capture the complexity of others

Extension to Random Graph: Bipartite Graphs

Indication of shortcomings of modeled bipartite graphs The theoretical predictions of the last two data sets show account for only half of the actual clustering present.

Random Graphs Poisson Random Graphs Configuration Model Extensions to Random Graphs Markov Graphs

Generalized random graph models have serious shortcoming: – Fail to show transitivity Look for completely different model – Add clustering to generated systems

Markov Graphs Looks at properties (edge configurations) of a graph Use properties to construct conditional tie variables (X ij ) – Signify a relationship between nodes i & j – X ij = 1 if there is an observed relational tie – X ij = 0 otherwise These tie variables are not independent – Need some way to reflect dependency – Markovian dependence structure: ties are conditionally dependent when they share a node.

Markov Graphs Social Network Example: – Work ties among lawyers Vertices : Lawyers in a law firm Edges : Collaboration (work ties) among them – How is work flow structured? Discernable form of local structuring? – Social ties are not interdependent of each other but the dependence is expressed through any persons directly involved in the ties in question

Markov Graphs Network Ties Among Lawyers

Markov Graphs Significant Graph Features when considering Markovian Relational ties

Markov Graphs Results indicate improved local clustering (transitivity) representation.

Markov Graphs Problem : – Tend to condense Form regions of complete cliques – Subsets of vertices in which each vertex is connected to every other vertex in that subset – Networks in the real world do not share this clumpy transitivity

Markov Graphs Clumping effect indicative of Markov Graph representation

Summary Types of Real World Networks A.Social Networks B.Information Networks C.Technological Networks D.Biological Networks

Summary Properties of networks – Small-World Effect – Transitivity – Degree Distribution – Network Resilience – Mixing Patterns – Degree Correlation – Community Structure – Network Navigation

Summary Random Graphs and extensions – Model only some of the properties found in real networks – Motivates the exploration of other models that can represent these properties

Similar presentations