3 Directed and undirected networks Vertex/NodeAEAEFDFDEdgeDirected EdgeBBCCUndirected networkDirected network
4 Node degree Undirected network Directed network Degree, k: Number of neighbors of a nodeDirected networkIndegree, kin: Number of incoming edgesOut degree, kout: Number of outgoing edgesAverage degree (undirected network)AEIndegree of F is 4Outdegree of E is 1FDDirected EdgeBC
5 Average degree Consider an undirected network with N nodes and L edges Let ki denote the degree of node iAverage degree isAverage degree is equivalently defined as
6 Degree distributionP(k) gives the probability that a selected node has k edgesDifferent networks can have different degree distributionsA fundamental property that can be used to characterize a network
7 Different degree distributions Poisson distributionThe mean is a good representation of ki of all nodesExhibited in Erdos Renyi networksPower law distributionAlso called scale freeThere is no “typical” node that captures the degree of nodes.
8 Poisson distribution A discrete distribution The Poisson is parameterized by which can be easily estimated by maximum likelihoodP(X=k)k
9 Power law distribution Used to capture the degree distribution of most biological/real networksTypical value of is between 2 and 3.MLE exists for but is more complicatedSee Power-Law Distributions in Empirical Data. Clauset, Shalizi and Newman, 2009 for detailsP(k)
10 Erdos Renyi random graphs Dates back to 1960 due to two mathematicians Paul Erdos and Alfred Renyi.Provides a probabilistic model to generate a graphStarts with N nodes and connects two nodes with probability pNode degrees follow a Poisson distributionTail falls off exponentially, suggesting that nodes with degrees different from the mean are very rare
11 Generating a graph using the ER model Inputp: probability of an edgeN: number of nodes in the networkOutput: An ER network of N nodes with on p*N(N-1)/2 edges on averageFor each possible edge add with probability p
12 Scale free networksDegree distribution is captured by a power law distributionSuch networks are ubiquitous in natureScale-free networks can be generated by the preferential attachment model from Barabasi-AlbertA “rich gets richer” model
13 Generating a Scale free network with the preferential attachment model Input:N: number of nodesm: number of existing nodes to connectOutput: a scale-free networkAt each iterationAdd a node with m connectionsSelect a node i as one of the m neighbors with probability
15 Path lengths The shortest path length between two nodes A and B: The smallest number of edges that need to be traversed to get from A to BMean path length is the average of all shortest path lengthsDiameter of a graph is the longest of all shortest paths in the network
16 Scale-free networks are ultra-small Average path length is log log NIn a random network (Erdos Renyi network) the average path length is log N
17 Clustering coefficient Measure of transitivity in the networkIf A is connect to B, and B is connected to C, how often is A connected to CClustering coefficient Ci for each node i isni is the number of edges among neighbors of iThe ratio of the number of edges connecting i’s neighbors to the max possibleAverage clustering coefficient gives a measure of nodes to form clustersBCA?
23 Relationship between clustering coefficient and degree Define C(k) as the average clustering coefficient of all nodes with degree kIn some networksIf this is true, the networks are said to have a hierarchical organizationSmaller node sets are linked together to form larger modules.
24 Hierarchical networkA hierarchical network generated by replicating the current set of nodesScale-free distribution of degreesInverse relationship between C(k) and degreeBarabasi & Oltvai, 2004
25 Hierarchical organization is seen also among nodes Regulators are hierarchically organized with different roles per levelTop: Master regulators influence many genesMiddle: Bottle necks directly targeting most genesBottom: Essential regulatorsHierarchical structure of S. cerevisiae regulatory networkYu & Gerstein 2006, Jothi et al. 2009
26 Given a network how can we test what degree distribution it follows? Compute the empirical degree distributionDegree distribution can Poisson or Power lawEstimate parameters of the distribution from the dataPick the distribution that fits the data better.
27 Properties of scale free networks Degree distribution is best captured by a power law distributionAverage clustering coefficient is higher than expected from a random networkAverage path length is smaller than expected from a random network
28 Centrality measures in networks A measure of how important network node isFour types of centrality measures defined for each nodeDegree centralityThe degree of a nodeBetweenness centralityThe number of shortest paths between two nodes that passes through the node of interestCloseness centralitySum of a distances from other nodesEigenvector centralityGiven by the largest eigen vector of the adjacency matrix
29 Eigenvector centrality Based on the idea that nodes with high score should influence the importance of a node moreGiven byThe centrality measures are given by the entries of the first eigen vectorGoogle’s page rank algorithm makes use of a type of Eigen vector centralityLargest eigen valueNeighbors of v
30 Degree centrality of a node is correlated to functional importance of a node Yeast protein-protein interaction networkRed nodes on deletion cause the organism to dieRed nodes also among the most degree central
31 Network motifsDegree distributions capture important global properties of the networkCan we say something about more local properties of the network?Network motifs are defined as small recurring subnetworks that occur much more than a randomized networkA subgraph is called a network motif of a network if its occurrence in randomized networks is significantly less than the original network.Some motifs are associated to explain specific network dynamicsMilo Science 2002
32 Network motifs of size three in a directed network
33 Finding network motifs Enumerating motifsSubgraph enumerationCalculating the number of occurrences in randomized networksMilo 2002
34 Network motifs found in many complex networks The occurrence of the feedforward loop in both networks suggests a fundamental similarity in the design on these networks
35 Structural common motifs seen in the yeast regulatory network Auto-regulationMulti-componentFeed-forward loopSingle InputMulti InputRegulatory ChainFeed-forward loops involved in speeding up in response of target geneLee et.al. 2002, Mangan & Alon, 2003
36 Modularity in networks Modularity “refers to a group of physically or functionally linked nodes that work together to achieve a distinct function”-- Barabasi & OltvaiSimilar idea is captured by the “community structure” in networksTwo questionsGiven a network is it modular?Given a network what are the modules in the network?
38 Assessing the modularity of a network Modularity of a network can be assessed in two ways:Recall the average clustering coefficientA modular network is one that has a significantly higher clustering coefficient than a network with equivalent number of nodes and degree distributionIf we know an existing grouping of nodes, we can compute modularity (Q) asdifference between within group (community) connections and expected connections within a groupQ defined as in: Finding and evaluating community structure in networks,
39 Finding modules in a graph Given a graph find the densely connected subgraphsGraph clustering algorithms are applicable hereHierarchical clustering using the edge weight as a distanceHow to define weight?Markov clustering algorithmGirvan-Newman algorithm
40 Girvan-Newman algorithm InitializeCompute betweennees for all edgesRepeat until convergence criteriaRemove the node with the highest betweenneesRecompute betweenness of affected edgesConvergence criteria can beNo more edgesDesired modularity.
41 Zachary’s karate club study Node grouping based on betweennessEach node is an individual and edges represent social interactions among individuals. The shape and colors represent different groups.
42 Summary of network analysis Given a network, its topology can be characterized using different measuresDegree distributionAverage path lengthClustering coefficientCentrality measuresAllow us to assess the importance of different nodesNetwork motifsOverrepresentation of subgraphs of specific typesNetwork modularity