Complete Network Analysis

Complete Network Analysis
Exploratory Analysis Social Networks capture the relations between people. These relations form a system that can be thought of as a social space. The advantage of the space analogy is that it captures the “topography” of social networks: classes, clusters, distance, “centrality”, etc. The disadvantage is that “spaces” and “fields” are notoriously difficult to study, because key features are simultaneously active. Current calls for “relational” sociology make this point clearly (See Martin 2003, Abbott 2001). “Field serves as some sort of representation for those overarching social regularities that may also be visualized … as quasi-organisms, systems or structures” J. L. Martin AJS 2003. Examples of fields range from abstract notions of status spaces to concrete examples such as the French academic system.

Exploratory Analysis Bourdieu “Social Space and Symbolic Space” Sociologists often use spatial analogies, such as MDS or correspondence analysis, based on patterns of actor attributes. Social Network Analysis lets you explore the relational space directly, by mapping relations directly. The first step in this exploration is often visualizing the network.

Exploratory Analysis: Network visualization Network visualization helps build intuition, but you have to keep the drawing algorithm in mind: Tree-Based layouts Spring-embeder layouts Most effective for very sparse, regular graphs. Very useful when relations are strongly directed, such as organization charts, internet connections, Most effective with graphs that have a strong community structure (clustering, etc). Provides a very clear correspondence between social distance and plotted distance Two images of the same network

Exploratory Analysis: Network visualization Network visualization helps build intuition, but you have to keep the drawing algorithm in mind: Tree-Based layouts Spring-embeder layouts Two images of the same network

Exploratory Analysis: Network visualization Network visualization helps build intuition, but you have to keep the drawing algorithm in mind. Hierarchy & Tree models Use optimization routines to add meaning to the “Y-axis” of the plot. This makes it possible to easily see who is most central because of who is on the top of the figure. Usually includes some routine for minimizing line-crossing. Spring Embedder layouts Work on an analogy to a physical system: ties connecting a pair have ‘springs’ that pull them together. Unconnected nodes have springs that push them apart. The resulting image reflects the balance of these two features. This usually creates a correspondence between physical closeness and network distance.

Exploratory Analysis: Network visualization

Exploratory Analysis: Network visualization Using colors to code attributes makes it simpler to compare attributes to relations. Here we can assess the effectiveness of two different clustering routines on a school friendship network.

Exploratory Analysis: Network visualization Using colors to code attributes makes it simpler to compare attributes to relations. Here color & size are used to express node characteristics. Trade among OECD Countries in 1981

Exploratory Analysis: Network visualization Using colors to code attributes makes it simpler to compare attributes to relations. Here clusters are abstracted from the nodes & then colored based on functional elements. NATURE |VOL 433 | 24 FEBRUARY 2005 |

Exploratory Analysis: Network visualization As networks increase in size, the effectiveness of a point-and-line display diminishes, because you simply run out of plotting dimensions. I’ve found that you can still get some insight by using the ‘overlap’ that results in from a space-based layout as information. Here you see the clustering evident in movie co-staring for about 8000 actors.

Exploratory Analysis: Network visualization As networks increase in size, the effectiveness of a point-and-line display diminishes, because you simply run out of plotting dimensions. I’ve found that you can still get some insight by using the ‘overlap’ that results in from a space-based layout as information. This figure contains over 29,000 social science authors. The two dense regions reflect different topics.

Where does sociology fit?

Exploratory Analysis: Network visualization Adding time to social networks is also complicated, as you run out of space to put time in most network figures. One solution is to animate the network. Here we see streaming interaction in a classroom, where the teacher (yellow square) has trouble maintaining order. The SONIA software program (McFarland and Bender-deMoll) will produce these figures.

Exploratory Analysis: Network visualization Adding time to social networks is also complicated, as you run out of space to put time in most network figures. One solution is to animate the network. When the network is very sparse, sometimes it makes more sense to build “layers” in a flipbooks style:

Exploratory Analysis: Network visualization Data on drug users in Colorado Springs, over 5 years Drug Relations, Colorado Springs, Year 1

Exploratory Analysis: Network visualization Data on drug users in Colorado Springs, over 5 years Drug Relations, Colorado Springs, Year 2 Current year in red, past relations in gray

Exploratory Analysis: Network visualization Data on drug users in Colorado Springs, over 5 years In general, adding time to networks changes many of our notions of “structure” – which we’ll go through in detail later. 616 actors eventually part of the largest component Diameter is 13 steps There are a number of major bi-components, the largest having 214 members. Drug Relations, Colorado Springs, Year 5 Current year in red, past relations in gray

Exploratory Analysis: Network visualization Visualization is a tool, but networks are complex and our visualization tools can sometimes confound. The strong advantage is that you get a complete overview of multiple features at once. The difficulty comes with trying to map a complex multi-dimensional object in low-dimensional space. Here we use a hierarchy to trace diffusion from 10 seed nodes, but display in two formats.

“Goods” flow through networks:
Complete Network Analysis Network Connections “Goods” flow through networks:

Network Connections We often care about networks because of how “goods” travel through the network. In addition to the simple pairwise probability that one actor passes information on to another (pij), two factors affect flow through a network: Topology the shape, or form, of the network - Example: one actor cannot pass information to another unless they are either directly or indirectly connected Time - the timing of contact matters - Example: an actor cannot pass information he has not receive yet

Network Connections: Topology Two features of the network’s topology are known to be important: connectivity and centrality Connectivity refers to how actors in one part of the network are connected to actors in another part of the network. Reachability: Is it possible for actor i to reach actor j? This can only be true if there is a chain of contact from one actor to another. Distance: Given they can be reached, how many steps are they from each other? Number of paths: How many different paths connect each pair?

Network Connections: Topology Without full network data, you can’t distinguish actors with limited flow potential from those more deeply embedded in a setting. c b a

Network Connections: Connectivity Indirect connections are what make networks systems. One actor can reach another if there is a path in the graph connecting them. a b d b f a c e c d e f Paths can be directed, leading to a distinction between strong and weak components

Network Connections: Connectivity Basic elements in connectivity A path is a sequence of nodes and edges starting with one node and ending with another, tracing the indirect connection between the two. On a path, you never go backwards or revisit the same node twice. Example: a  b  cd A walk is any sequence of nodes and edges, and may go backwards. Example: a  b  c  b c d A cycle is a path that starts and ends with the same node. Example: a  b  c  a

Network Connections: Connectivity Reachability If you can trace a sequence of relations from one actor to another, then the two are reachable. If there is at least one path connecting every pair of actors in the graph, the graph is connected and is called a component. Intuitively, a component is the set of people who are all connected by a chain of relations.

Network Connections: Connectivity This example contains many components.

Network Connections: Connectivity Because relations can be directed or undirected, components come in two flavors: For a graph with any directed edges, there are two types of components: Strong components consist of the set(s) of all nodes that are mutually reachable Weak components consist of the set(s) of all nodes where at least one node can reach the other.

Network Connections: Connectivity There are only 2 strong components with more than 1 person in this network. Components are the minimum requirement for social groups. As we will see later, they are necessary but not sufficient All of the major network analysis software identifies strong and weak components

Network Connections: Distance Geodesic distance is measured by the smallest (weighted) number of relations separating a pair: Actor “a” is: 1 step from 4 2 steps from 5 3 steps from 4 4 steps from 3 5 steps from 1 a

a Complete Network Analysis k m b l i j e c f d h g
Network Connections: Distance When the graph is directed, distance is also directed (distance to vs distance from), following the direction of the tie. b c d g f e k i j h l m a a b c d e f g h i j k l m a b c d e f g h i j k l m

Network Connections: Distance High-risk actors over 4 years 695 people represented Longest path is 17 steps Average distance is about 5 steps Average person is within 3 steps of 75 other people 137 people connected through 2 independent paths, core of 30 people connected through 4 independent paths Reachability in Colorado Springs (Sexual contact only) 695 actors represented Longest path is 17 steps Average distance is about 5 steps Average person is within 3 steps of 75 other people 137 people connected through 2 independent paths, core of 30 people connected through 4 independent paths (Node size = log of degree)

e d c f b a Complete Network Analysis
Network Connections: Distance Calculating distance in global networks: Powers of the adjacency matrix Calculate reachability through matrix multiplication. (see p.162 of W&F) X X2 X3 a b c e d f Distance Distance

Network Connections: Distance Calculating distance in global networks: Breadth-First Search In large networks, matrix multiplication is just too slow. A breadth-first search algorithm works by walking through the graph, reaching all nodes from a particular start node. Distance is calculated directly in most SNA software packages.

Network Connections: Distance As a graph statistic, the distribution of distance can tell you a good deal about how close people are to each other (we’ll see this more fully when we get to closeness centrality). The diameter of a graph is the longest geodesic, giving the maximum distance. We often use the l, or the mean distance between every pair to characterize the entire graph. For example, all else equal, we would expect rumors to travel faster through settings where the average distance is small.

Network Connections: Distance

Network Connections: Distance Travers and Milgram’s work on the small world is responsible for the standard belief that “everyone is connected by a chain of about 6 steps.” Two questions: Given what we know about networks, what is the longest path (defined by handshakes) that separates any two people? Is 6 steps a long distance or a short distance?

What if everyone maximized structural holes? Associates do not know each other: Results in an exponential growth curve. Reach entire planet quickly. Complete Network Analysis Network Connections: Distance

What if people know each other randomly?: Random graph theory shows that we could reach people quite quickly if ties were random Complete Network Analysis Network Connections: Distance

By number of close friends
Complete Network Analysis Network Connections: Distance Random Reachability: By number of close friends 100% Degree = 4 Degree = 3 80% Degree = 2 60% Percent Contacted 40% 20% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Remove

Network Connections: Distance Distance-Reach Distribution for a large Jr. High School (Add Health data) Random graph Observed

Arbitrarily select people from 3 pools: a) People in Boston
Complete Network Analysis Network Connections: Distance Milgram’s test: Send a packet from sets of randomly selected people to a stockbroker in Boston. Experimental Setup: Arbitrarily select people from 3 pools: a) People in Boston b) Random in Nebraska c) Stockholders in Nebraska

Milgram’s Findings: Complete Network Analysis
Network Connections: Distance Milgram’s Findings: Distance to target person, by sending group.

Network Connections: Distance Most chains found their way through a small number of intermediaries. Understanding why this is true has been called the “Small-World Problem,” which has since been generalized to a much more formal understanding of tie patterns in large networks (see below) For purposes of flow through graphs, distance is a primary concern so long as pij < 1. Most measures of position in a network account for some aspect of distance.

Network Connections: Connectivity We can extend our conception of component to increase the structural cohesion of the definition. Multiple connectivity: Two paths with the same start and end point, but that have no other nodes in common are called node independent. In every component, the paths linking actors i and j must pass through a set of nodes, S, that if removed would disconnect the graph. The number of nodes in the smallest S is equal to the number of independent paths connecting i and j.

Network Connections: Connectivity Simple component 1 2 5 4 3 6 8 7 Every path from 1 to 8 must go through 4. S(1,8) = 4, and N(1,8)=1. That is, the graph is a component.

In this graph, there are multiple paths connecting nodes 1 and 8.
Complete Network Analysis Network Connections: Connectivity In this graph, there are multiple paths connecting nodes 1 and 8. Multiple connectivity 1 2 5 4 3 6 8 7 1 But only 2 of them are independent. 5 2 4 3 8 1 1 6 6 2 7 7 4 5 3 8 6 8 8 5 N(1,8) = 2. 7 8 8

Network Connections: Connectivity A bicomponent is the set of all nodes connected by at least 2 node-independent paths.

1 2 5 4 3 6 8 7 4 is a cutpoint 1 is a cutpoint
Complete Network Analysis Network Connections: Connectivity Bicomponents can overlap by at most 1 person. These nodes are cutpoints in the graph. If that node is removed, the graph would be disconnected. 1 2 5 4 3 6 8 7 1 4 is a cutpoint 1 is a cutpoint

Network Connections: Connectivity White, D. R. and F. Harary "The Cohesiveness of Blocks in Social Networks: Node Connectivity and Conditional Density." Sociological Methodology 31: Moody, James and Douglas R. White “Structural Cohesion and Embeddedness: A hierarchical Conception of Social Groups” American Sociological Review 68: White, Douglas R., Jason Owen-Smith, James Moody, & Walter W. Powell (2004) "Networks, Fields, and Organizations: Scale, Topology and Cohesive Embeddings." Computational and Mathematical Organization Theory. 10:95-117 Moody, James "The Structure of a Social Science Collaboration Network: Disciplinary Cohesion from 1963 to 1999" American Sociological Review. 69:

Network Connections: Connectivity Analytically, most of work on connectivity has focused on summaries of completely local properties (degree distributions or clustering). We turn the argument around and ask what features of a network are essential for holding the whole structure together? Def. 1: “A collectivity is cohesive to the extent that the social relations of its members hold it together.” What network pattern embodies all the elements of this intuitive definition?

This definition contains 5 essential elements:
Complete Network Analysis Network Connections: Connectivity This definition contains 5 essential elements: Focuses on what holds the group together Expressed as a group level property The conception is continuous Rests on observable social relations Applies to groups of any size

Network Connections: Connectivity 1) Actors must be connected: a collection of isolates is not cohesive. Minimally cohesive: a single path connects everyone Not cohesive

Network Connections: Connectivity 1) Reachability is an essential element of relational cohesion. As more paths re-link actors in the group, the ability to ‘hold together’ increases. The important feature is not the density of relations, but the pattern. Cohesion increases as # of paths connecting people increases

Consider the minimally cohesive group:
Complete Network Analysis Network Connections: Connectivity Consider the minimally cohesive group: D = . 25 Moving a line keeps density constant, but changes reachability.

Removal of 1 person destroys the group.
Complete Network Analysis Network Connections: Connectivity What if density increases, but through a single person? D = . 25 D = . 39 Removal of 1 person destroys the group.

Network Connections: Connectivity Cohesion increases as the number of independent paths in the network increases. Ties through a single person are minimally cohesive. D = . 39 Minimal cohesion D = . 39 More cohesive

Network Connections: Connectivity Substantive differences between networks connected through a single actor and those connected through many. Minimally Cohesive Strongly Cohesive Power is centralized Power is decentralized Information is concentrated Information is distributed Expect actor inequality Actor equality Vulnerable to unilateral action Robust to unilateral action Segmented structure Even structure Def 2. “A group is structurally cohesive to the extent that multiple independent relational paths among all pairs of members hold it together.”

1 2 3 Complete Network Analysis Node Connectivity
Network Connections: Connectivity Def 2. “A group is structurally cohesive to the extent that multiple independent relational paths among all pairs of members hold it together.” 1 2 3 Node Connectivity

Network Connections: Connectivity Formalize the argument: If there is a path between every node in a graph, the graph is connected, and called a component. In every component, the paths linking actors i and j must pass through a set of nodes, S, that if removed would disconnect the graph. The number of nodes in the smallest S is equal to the number of independent paths connecting i and j.

These two definitions are equivalent.
Complete Network Analysis Network Connections: Connectivity The relation between cut-set size and number of paths (recall our discussion of bicomponents) leads to the two versions of our final definition: Def 3a “A group’s structural cohesion is equal to the minimum number of actors who, if removed from the group, would disconnect the group.” Def 3b “A group’s structural cohesion is equal to the minimum number of independent paths linking each pair of actors in the group.” These two definitions are equivalent.

Network Connections: Connectivity Some graph theoretic properties of k-components 1) Every member of a k-components must have at least k-ties. If a person has less than k ties, then there would be fewer than k paths connecting them to the rest of the network. 2) A graph where every person has k-ties is not necessarily a k-component. That is, (1) does not work in reverse. Structures can have high degree, but low connectivity. 3) Two k-components can only overlap by k-1 members. If the k-components overlap by more than k-1 members, then there would be at least k paths connecting the two components, and they would be a single k-component. 4) A clique is n-1 connected. 5) k-components can be nested, such that a k+l component is contained within a k-component.

Network Connections: Connectivity Nested connectivity sets: An operationalization of embeddedness. 2 1 3 9 4 8 10 11 5 7 12 13 6 14 15 17 18 16 19 20 21 22 23

Network Connections: Connectivity “Embeddedness” refers to the fact that economic action and outcomes, like all social action and outcomes, are affected by actors’ dyadic (pairwise) relations and by the structure of the overall network of relations. As a shorthand, I will refer to these as the relational and the structural aspects of embeddedness. The structural aspect is especially crucial to keep in mind because it is easy to slip into “dyadic atomization,” a type of reductionism. (Granovetter 1992:33, italics in original)

G Complete Network Analysis {7,8,9,10,11 12,13,14,15,16}
Network Connections: Connectivity G {7,8,9,10,11 12,13,14,15,16} {1, 2, 3, 4, 5, 6, 7, 17, 18, 19, 20, 21, 22, 23} {7, 8, 11, 14} {1,2,3,4, 5,6,7} {17, 18, 19, 20, 21, 22, 23}

a) Embeddedness and School Attachment
Complete Network Analysis Network Connections: Connectivity Empirical Examples: a) Embeddedness and School Attachment b) Political similarity among Large American Firms

School Attachment Complete Network Analysis
Network Connections: Connectivity School Attachment

Network Connections: Connectivity Business Political Action

Theoretical Implications:
Complete Network Analysis Network Connections: Connectivity Theoretical Implications: Resource and Risk Flow Structural cohesion increases the probability of diffusion in a network, particularly if flow depends on individual behavior (as opposed to edge capacity).

Network Connections: Connectivity Structural Cohesion also provides a new way of thinking about STD cores Project 90, Sex-only network (n=695) 3-Component (n=58)

Connected Bicomponents Complete Network Analysis IV Drug Sharing
Network Connections: Connectivity Connected Bicomponents IV Drug Sharing Largest BC: 247 k > 4: 318 Max k: 12 Structural Cohesion simultaneously gives us a positional and subgroup analysis.

Network Connections: Connectivity Development of STD cores in low-degree networks: rapid transition without stars.

Network Connections: Connectivity

Network Connections: Distance, cohesion & Diffusion Probability of transfer by distance and number of paths, assume a constant pij of 0.6 0.2 0.4 0.6 0.8 1 1.2 2 3 4 5 6 Path distance probability 10 paths 5 paths 2 paths 1 path

Network Connections: Distance, cohesion & Diffusion Clustering and diffusion Arcs: 11 Largest component: 12, Clustering: 0 Arcs: 11 Largest component: 8, Clustering: 0.205 Clustering turns network paths back on already identified nodes. This has been well known since at least Rappaport, and is a key feature of the “Biased Network” models in sociology.

Network Connections: Distance, cohesion & Diffusion

Network Connections: Distance, cohesion & Diffusion Define as a general measure of the “diffusion susceptibility” of a graph as the ratio of the area under the observed curve to the area under the random curve. As this gets smaller than 1.0, you get effectively slower median transmission.

Network Connections: Distance, cohesion & Diffusion

Network Connections: Centrality Distance & Connectivity measures “locate” a node based on particular features of the path strcutre, but there are many other ways of locating nodes in networks. Centrality refers to (one dimension of) location, identifying where an actor resides in a network. For example, we can compare actors at the edge of the network to actors at the center. In general, this is a way to formalize intuitive notions about the distinction between insiders and outsiders. As a terminology point, some authors distinguish centrality from prestige based on the directionality of the tie. Since the formulas are the same in every other respect, I stick with “centrality” for simplicity.

Network Connections: Centrality Conceptually, centrality is fairly straight forward: we want to identify which nodes are in the ‘center’ of the network. In practice, identifying exactly what we mean by ‘center’ is somewhat complicated, but substantively we often have reason to believe that people at the center are very important. The standard centrality measures capture a wide range of “importance” in a network: Degree Closeness Betweenness Eigenvector / Power measures After discussing these, I will describe measures that combine features of each of them.

Network Connections: Centrality The most intuitive notion of centrality focuses on degree. Degree is the number of direct contacts a person has. The ideas is that the actor with the most ties is the most important:

Network Connections: Centrality In a simple random graph (Gn,p), degree will have a Poisson distribution, and the nodes with high degree are likely to be at the intuitive center. Deviations from a Poisson distribution suggest non-random processes, which is at the heart of current “scale-free” work on networks (see below).

Network Connections: Centrality Degree centrality, however, can be deceiving, because it is a purely local measure.

Network Connections: Centrality If we want to measure the degree to which the graph as a whole is centralized, we look at the dispersion of centrality: Simple: variance of the individual centrality scores. Or, using Freeman’s general formula for centralization (which ranges from 0 to 1): UCINET, SPAN, PAJEK and most other network software will calculate these measures.

Complete Network Analysis Degree Centralization Scores
Network Connections: Centrality Degree Centralization Scores Freeman: 0.0 Variance: 0.0 Freeman: 1.0 Variance: 3.9 Freeman: .02 Variance: .17 Freeman: .07 Variance: .20

Network Connections: Centrality A second measure of centrality is closeness centrality. An actor is considered important if he/she is relatively close to all other actors. Closeness is based on the inverse of the distance of each actor to every other actor in the network. Closeness Centrality: Normalized Closeness Centrality

Complete Network Analysis Closeness Centrality in the examples
Network Connections: Centrality Closeness Centrality in the examples Distance Closeness normalized Distance Closeness normalized

Closeness Centrality in the examples
Complete Network Analysis Network Connections: Centrality Closeness Centrality in the examples Distance Closeness normalized

b a C d e f g h Complete Network Analysis Betweenness Centrality:
Network Connections: Centrality Betweenness Centrality: Model based on communication flow: A person who lies on communication paths can control communication flow, and is thus important. Betweenness centrality counts the number of shortest paths between i and k that actor j resides on. b a C d e f g h

Network Connections: Centrality Betweenness Centrality: Where gjk = the number of geodesics connecting jk, and gjk(ni) = the number that actor i is on. Usually normalized by:

Network Connections: Centrality Betweenness Centrality: Centralization: 1.0 Centralization: .59 Centralization: 0 Centralization: .31

Network Connections: Centrality Betweenness Centrality: Centralization: .183

Network Connections: Centrality Information Centrality: It is quite likely that information can flow through paths other than the geodesic. The Information Centrality score uses all paths in the network, and weights them based on their length.

Network Connections: Centrality Graph Theoretic Center (Barry or Jordan Center). Identify the points with the smallest, maximum distance to all other points. Value = longest distance to any other node. The graph theoretic center is ‘3’, but you might also consider a continuous measure as the inverse of the maximum geodesic

Network Connections: Centrality Information Centrality:

Network Connections: Centrality Comparing across these 3 centrality values Generally, the 3 centrality types will be positively correlated When they are not (low) correlated, it probably tells you something interesting about the network. Low Degree Closeness Betweenness High Degree Embedded in cluster that is far from the rest of the network Ego's connections are redundant - communication bypasses him/her High Closeness Key player tied to important important/active alters Probably multiple paths in the network, ego is near many people, but so are many others High Betweenness Ego's few ties are crucial for network flow Very rare cell. Would mean that ego monopolizes the ties from a small number of people to many others.

Network Connections: Centrality Bonacich Power Centrality: Actor’s centrality (prestige) is equal to a function of the prestige of those they are connected to. Thus, actors who are tied to very central actors should have higher prestige/ centrality than those who are not. a is a scaling vector, which is set to normalize the score. b reflects the extent to which you weight the centrality of people ego is tied to. R is the adjacency matrix (can be valued) I is the identity matrix (1s down the diagonal) 1 is a matrix of all ones.

Network Connections: Centrality Bonacich Power Centrality: The magnitude of b reflects the radius of power. Small values of b weight local structure, larger values weight global structure. If b is positive, then ego has higher centrality when tied to people who are central. If b is negative, then ego has higher centrality when tied to people who are not central. As b approaches zero, you get degree centrality.

Network Connections: Centrality Bonacich Power Centrality: b = 0.23

b=-.35 b=.35 Complete Network Analysis Bonacich Power Centrality:
Network Connections: Centrality Bonacich Power Centrality: b=-.35 b=.35

b=.23 b=-.23 Complete Network Analysis Bonacich Power Centrality:
Network Connections: Centrality Bonacich Power Centrality: b=.23 b=-.23

Network Connections: Centrality In recent work, Borgatti (2003; 2005) discusses centrality in terms of two key dimensions: Radial Medial Frequency Degree Centrality Bon. Power centrality Betweenness (empty: but would be an interruption measure based on distance) Distance Closeness Centrality

Network Connections: Centrality In recent work, Borgatti (2003; 2005) discusses centrality in terms of two key dimensions: Substantively, the key question for centrality is knowing what is flowing through the network. The key features are: Whether the actor retains the good to pass to others (Information, Diseases) or whether they pass the good and then loose it (physical objects) Whether the key factor for spread is distance (disease with low pij) or multiple sources (information) The off-the-shelf measures do not always match the social process of interest, so researchers need to be mindful of this.

Network Connections: Centrality Actors that appear very different when seen individually, are comparable in the global network. Graph is 27% centralized (Node size proportional to betweenness centrality )

Centrality example: Add Health Complete Network Analysis
Network Connections: Centrality Node size proportional to betweenness centrality Graph is 45% centralized

Network Topology: Centrality and Centralization
Measures research: Rothenberg, et al "Choosing a Centrality Measure: Epidemiologic Correlates in the Colorado Springs Study of Social Networks." Social Networks: Special Edition on Social Networks and Infectious Disease: HIV/AIDS 17: Found that the HIV positive actors were not central to the overall network Bell, D. C., J. S. Atkinson, and J. W. Carlson "Centrality Measures for Disease Transmission Networks." Social Networks 21:1-21. Using a data-based simulation on 22 people, found that simple degree measures were adequate, relative to complexity Poulin, R., M.-C. Boily, and B. R. Masse "Dynamical Systems to Define Centrality in Social Networks." Social Networks 22: Method that allows one to compare across non-connected portions of a network, applied to a network of 40 people w. AIDS

Two factors that affect network flows:
Complete Network Analysis Network Connections: Network Dynamics Two factors that affect network flows: Topology - the shape, or form, of the network - simple example: one actor cannot pass information to another unless they are either directly or indirectly connected Time - the timing of contacts matters - simple example: an actor cannot pass information he has not yet received.

Network Connections: Network Dynamics The Cocktail Party Problem Imagine a typical ‘mixer’ party, where one of the guests knows a bit of gossip that everyone would like to know. Assuming that people tell this gossip to the people they meet at the party: How many people would eventually hear the gossip? How long would it take to spread through the group?

Network Connections: Network Dynamics The Cocktail Party Problem Some specifics to narrow down the problem. 30 people invited, party lasts an hour. At any given moment in time, you can only carry on a conversation with 3 other people Guests mingle well – they spend a short time talking to most people, but a long time to a small number (such as their date). Mingling is somewhat space-based – you talk to the people you bump into, then move on to someone else after a short time. The bit of gossip moves instantaneously across connected sets (so time-to-diffuse=0).

Network Connections: Network Dynamics Some specifics to narrow down the problem. A (seemingly) simple network problem: record who talks to who, and map the network. Mean distance: 1.99 Diameter: 4 steps

Network Connections: Network Dynamics But such an image conflates many temporally distinct events. A more accurate image is something like this: In general, the graphs over which diffusion happens often: Have timed edges Nodes enter and leave Edges can re-occur multiple times Edges can be concurrent These features break transmission paths, generally lowering diffusion potential – and opening a host of interesting questions about the intersection of structure and time in networks.

Network Connections: Network Evolution Timing in networks A focus on contact structure has often slighted the importance of network dynamics,though a number of recent pieces are addressing this. Time affects networks in two important ways: The structure itself evolves, in ways that will affect the topology an thus flow. Wasserheit and Aral, “The dynamic topology of Sexually Transmitted Disease Epidemics” The Journal of Infectious Diseases 74:S201-13 Rothenberg, et al “Using Social Network and Ethnographic Tools to Evaluate Syphilis Transmission” Sexually Transmitted Diseases 25: 2) The timing of contact constrains flow Moody 2002, Social Forces, Morris and Kretchmar, 1995

Jan - June, 1995 Complete Network Analysis
Network Connections: Network Evolution Sexual Relations among A syphilis outbreak Rothenberg et al map the pattern of sexual contact among youth involved in a Syphilis outbreak in Atlanta over a one year period. (Syphilis cases in red) Jan - June, 1995

Sexual Relations among A syphilis outbreak
July-Dec, 1995

Network Connections: Network Evolution Drug Relations, Colorado Springs, Year 1 Data on drug users in Colorado Springs, over 5 years

Network Connections: Network Evolution Drug Relations, Colorado Springs, Year 2 Current year in red, past relations in gray

Network Connections: Network Evolution Drug Relations, Colorado Springs, Year 5 Current year in red, past relations in gray 616 actors eventually part of the largest component Diameter is 13 steps There are a number of major bi-components, the largest having 214 members.

Network Connections: Network Evolution How do we analyze change in networks over time? a) Descriptive techniques (change in measures over time) b) Visualization c) Network statistical models (Sienna, see below under models)

Network Connections: Social Balance One of the best theoretical approaches to understanding change in networks over time is to ask how the current relational patterns are likely to affect future relations. That is, make relational change endogenous. There are many models that do this, but the most famous for affective relations is social balance. Other models include: Preferential attachment: “the rich get richer” (Barabasi) Avoiding asymmetry (Gould) Avoiding close past relations (cycles of 4) (Bearman, Moody & Stovel) Development of Hierarchy (Ivan Chase)

(+)(+)(+) = (+) Balanced (-)(+)(-) = (-) Balanced (-)(-)(-) = (-)
Complete Network Analysis Network Connections: Social Balance Social Balance & Transitivity We determine balance based on the product of the edges: + “A friend of a friend is a friend” (+)(+)(+) = (+) Balanced - - “An enemy of my enemy is a friend” (-)(+)(-) = (-) Balanced + - - (-)(-)(-) = (-) Unbalanced “An enemy of my enemy is an enemy” - “A Friend of a Friend is an enemy” + + (+)(-)(+) = (-) Unbalanced -

+ + + + - + + - - - + - Complete Network Analysis Become Friends
Network Connections: Social Balance Heider argued that unbalanced triads would be unstable: They should transform toward balance + + Become Friends + + - + + Become Enemies - - - + Become Enemies -

Network Connections: Social Balance IF such a balancing process were active throughout the graph, all intransitive triads would be eliminated from the network. This would result in one of two possible graphs (Balance Theorem): Complete Clique Balanced Opposition Friends with Enemies with

Network Connections: Social Balance Empirically, we often find that graphs break up into more than two groups. What does this imply for balance theory? It turns out, that if you allow all negative triads, you can get a graph with many clusters. That is, instead of treating (-)(-)(-) as an forbidden triad, treat it as allowed. This implies that the micro rule is different: negative ties among enemies are not as motivating as positive ties.

1) Mutual 2) Asymmetric 3) Null Complete Network Analysis
Network Connections: Social Balance Empirically, we also rarely have symmetric relations (at least on affect) thus we need to identify balance in undirected relations. Directed dyads can be in one of three states: 1) Mutual 2) Asymmetric 3) Null Every triad is composed of 3 dyads, and we can identify triads based on the number of each type, called the MAN label system

i j j k i k Complete Network Analysis Balance in directed relations
Network Connections: Social Balance Balance in directed relations Actors seek out transitive relations, and avoid intransitive relations. A triple is transitive If: i j & j k then: i k A property of triples within triads Assumes directed relations The saliency of a triad may differ for each actor, depending on their position within the triad.

Ordered Triples: a b c; a c b a c b; a b b a c; b c a c b c a; b a c a
Complete Network Analysis Network Connections: Social Balance Once we admit directed relations, we need to decompose triads into their constituent triples. Ordered Triples: a b c; a c Transitive b a c b; a b Vacuous b a c; b c a c Vacuous b c a; b a 120C Intransitive c a b; c b Intransitive c b a; c a Vacuous

Network Connections: Social Balance Network Sub-Structure: Triads (0) (1) (2) (3) (4) (5) (6) 003 012 102 111D 201 210 300 021D 111U 120D Intransitive Transitive 021U 030T 120U Mixed 021C 030C 120C

Network Connections: Social Balance An Example of the triad census Type Number of triads D U C D U T C D U C Sum (2 - 16): Pajek & SPAN will give you the triad census

Network Connections: Social Balance As with undirected graphs, you can use the type of triads allowed to characterize the total graph. But now the potential patterns are much more diverse 1) All triads are 030T: A perfect linear hierarchy.

N* 1 1 Complete Network Analysis M M
Network Connections: Social Balance Triads allowed: {300, 102} N* M M 1 1

N* N* N* N* N* 1 Complete Network Analysis
Network Connections: Social Balance Cluster Structure, allows triads: {003, 300, 102} N* Eugene Johnsen (1985, 1986) specifies a number of structures that result from various triad configurations M M N* N* N* 1 N* M M

1 1 1 1 1 1 1 1 1 1 1 1 1 And many more... Complete Network Analysis
Network Connections: Social Balance PRC{300,102, 003, 120D, 120U, 030T, 021D, 021U} Ranked Cluster: M N* A* 1 1 1 1 1 1 1 1 1 1 1 1 1 And many more...

Network Connections: Social Balance Substantively, specifying a set of triads defines a behavioral mechanism, and we can use the distribution of triads in a network to test whether the hypothesized mechanism is active. We do this by (1) counting the number of each triad type in a given network and (2) comparing it to the expected number, given some random distribution of ties in the network. See Wasserman and Faust, Chapter 14 for computation details, and the SPAN manual for SAS code that will generate these distributions, if you so choose.

Structural Indices based on the distribution of triads
Complete Network Analysis Network Connections: Social Balance Structural Indices based on the distribution of triads The observed distribution of triads can be fit to the hypothesized structures using weighting vectors for each type of triad. Where: l = 16 element weighting vector for the triad types T = the observed triad census mT= the expected value of T ST = the variance-covariance matrix for T

Indicating that a ranked-cluster model fits the best.
Complete Network Analysis Network Connections: Social Balance For the Add Health data, the observed distribution of the tau statistic for various models was: Indicating that a ranked-cluster model fits the best.

Network Connections: Social Balance So far, the structural features of a network focus on the graph ‘at equilibrium.’ That is, we have hypothesized structures once people have made all the choices they are going to make. What we have not done, is really look closely at the implication of changing relations. That is, we might say that triad 030C should not occur, but what would a change in this triad imply from the standpoint of the actor making a relational change?

Network Connections: Social Balance 030C 120C 102 111U 021C 201 003 012 111D 210 300 021D 120U vacuous transition Increases # transitive Decreases # intransitive Decreases # transitive Increases # intransitive No prediction triad Intransitive triad Transitive triad 030T 021U 120D (some transitions will both increase transitivity & decrease intransitivity – the effects are independent – they are colored here for net balance)

Network Connections: Social Balance Observed triad transition patterns, from Sorensen and Hallinan (1976) 003 102 021D 030T 201 120U 120C 210 300 012 021C 021U 111D 111U 030C 120D

Network Connections: Social Balance At the micro level, we can ask how different rules about individual behavior affect which paths in the overall network of triad states that will be preferred. This gives us the first step in understanding which features can simultaneously create coordinated (rule-based) action while maintaining a fluid state-space. Example: Favor transitions that avoid intransitivity 003 102 021D 021U 030C 111D 111U 030T 201 120D 120U 120C 210 300 012 021C

Dynamic Social Balance
Triad-Transition models on observed data The triad transition model can be tested on observed graphs within the ERG framework by specifying the triad-transition counts weighted by the number of transitive and intransitive triples that would be created in each transition. In addition to the triad transition parameters, the model includes parameters for dyadic attributes, individual expansiveness and attractiveness, and reciprocity and school activities.

Triad-Transition models on observed data Standardized Coefficients from an Exponential Random Graph Model 0.8 Endogenous Focal Orgs. Dyadic Similarity/Distance. 0.6 0.4 0.2 -0.2 -0.4 -0.6 SES GPA Fight College Drinking Transitivity Same Clubs Same Sex Intransitivity Reciprocity Same Grade Same Race Both Smoke

Triad-Transition models on observed data

Network Connections: Social Balance Doreian, Kapuscinski, Krackhardt & Szczypula: A breif history of balance through time. Reanalyzes the Newcomb fraternity data, to look at changes in social balance over time. The basic balance theory hypothesis is that people who find themselves in an unbalanced position should change their relations to generate balance. Hypothetically, this should lead to greater balance over time. After discussing a set of problems imposed because the data are forced ranks, they first look at simple reciprocity.

Network Connections: Social Balance Doreian, Kapuscinski, Krackhardt & Szczypula: A brief history of balance through time.

Network Connections: Social Balance Doreian, Kapuscinski, Krackhardt & Szczypula: A brief history of balance through time. Relational Stability 40 30 % Change in ties 20 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Week

Network Connections: Social Balance Doreian, Kapuscinski, Krackhardt & Szczypula: A brief history of balance through time. In addition to the simple degree of transitivity, they want to measure whether the structure as a whole conforms to the prediction of structural balance. They identify groups by partitioning the network to minimize the number of negative ties within group and the number of positive ties between group (this algorithm is implemented in PAJEK). They can then measure structural imbalance as the sum if departures for structural balance (2 and only 2 groups) and generalized balance (greater than 2 groups).

Network Connections: Social Balance Extent of Structural Imbalance 17 Structural Imbalance Generalized Imbalance 15 13 11 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Week

Network Connections: Social Balance Doreian, Kapuscinski, Krackhardt & Szczypula: A brief history of balance through time. They point out that the dynamic action of individuals had group implications, which is part of what makes balance so attractive. “…the micro-level processes can be viewed as generating social forces that move the structure toward group balance.” They also point out that negative ties within groups are likely less tolerated than positive ties between groups, as negatives within group may threaten the group in ways that positive ties between groups do not.

Network Connections: Time Constraint What impact does this kind of timing have on flow through a network? Sequence constrains reachability – relations have to be reachable through time. The most dramatic effect occurs with the distinction between concurrent and serial relations. Relations are concurrent whenever an actor has more than one sex partner during the same time interval. Concurrency is dangerous for disease spread because: a) compared to serially monogamous couples, and STDis not trapped inside a single dyad b) the std can travel in two directions - through ego - to either of his/her partners at the same time

Network Connections: Time Constraint Concurrency and Epidemic Size Morris & Kretzschmar (1995) 1200 800 400 1 2 3 4 5 6 7 Monogamy Disassortative Random Assortative Population size is 2000, simulation ran over 3 ‘years’

Network Connections: Time Constraint Concurrency and disease spread Variable Constant Concurrent K2 Degree Correlation Bias Coefficient 84.18 357.07 440.38 982.31 Adjusting for other mixing patterns: Each .1 increase in concurrency results in 45 more positive cases

Network Connections: Time Constraint What impact does timing have on flow through the network? 8 - 9 C E 3 - 7 2 - 5 A B 0 - 1 3 - 5 D F Numbers above lines indicate contact periods

Network Connections: Time Constraint The path graph for the hypothetical contact network C E A B D F While clearly important, this is not often handled well by current software.

Network Connections: Time Constraint Direct Contact Network of 8 people in a ring

Network Connections: Time Constraint Implied Contact Network of 8 people in a ring All relations Concurrent

Implied Contact Network of 8 people in a ring
Complete Network Analysis Network Connections: Time Constraint Implied Contact Network of 8 people in a ring Mixed Concurrent 3 2 1 2 2 1 2 3

Complete Network Analysis Network Connections: Time Constraint Implied Contact Network of 8 people in a ring Serial Monogamy (1) 8 1 7 2 6 3 5 4

Edge timing constraints on diffusion
Timing alone can change mean reachability from 1.0 when all ties are concurrent to 0.42. In general, ignoring time order is equivalent to assuming all relations occur simultaneously – assumes perfect concurrency across all relations. 1 2

Network Connections: Time Constraint Identifying the Minimum Path Density of a Graph It turns out that the lowest generalized reach network (a minimum time-connectivity) is one where relations are ‘inter-woven’ in a “early-late-earlier” pattern. To identify the paths empirically, you must search all possible paths in the network. t1 t2

Implications of Time-ordered Networks
Complete Network Analysis Network Connections: Time Constraint Implications of Time-ordered Networks Any measure calculated on the adjacency structure that rests on reachability or flow may be misleading. There are highly non-linear effects to changing the timing of a relation on total reachability Within connected components, time order may partition the network into reachable sub-groups. Infection risk can be assessed on a continuum from complete concurrency to some minimum level of reachability in the network.

Network Connections: Time Constraint

Network Connections: Time Constraint The distribution of paths is important for many of the measures we typically construct on networks, and these will all change if edge timing considered: Centrality: Closeness centrality Path Centrality Information Centrality Betweenness centrality Network Topography Clustering Path Distance Groups & Roles: Correspondence between degree-based position and reach-based position Structural Cohesion & Embeddedness Opportunities for Time-based block-models (similar reachability profiles)

Network Connections: Time Constraint New versions of classic reachability measures: Temporal reach: The ij cell = 1 if i can reach j through time. Temporal geodesic: The ij cell equals the number of steps in the shortest path linking i to j over time. Temporal paths: The ij cell equals the number of time-ordered paths linking i to j. These will only equal the standard versions when all ties are concurrent.

Network Connections: Time Constraint Duration explicit measures Quickest path: The ij cell equals the shortest time within which i could reach j. Earliest path: The ij cell equals the real-clock time when i could first reach j. Latest path: The ij cell equals the real-clock time when i could last reach j. 7) Exposure duration: The ij cell equals the longest (shortest) interval of time over which i could transfer a good to j. Each of these also imply different types of “betweenness” roles for nodes or edges, such as a “limiting time” edge, which would be the edge whose comparatively short duration places the greatest limits on other paths.

Network Connections: Time Constraint Define time-dependent closeness as the inverse of the sum of the distances needed for an actor to reach others in the network.* Actors with high time-dependent closeness centrality are those that can reach others in few steps given temporal order. Note this is directed. Since Dij =/= Dji (in most cases) once you take time into account. *If i cannot reach j, I set the distance to n+1

Network Connections: Time Constraint Define fastness centrality as the average of the clock-time needed for an actor to reach others in the network: Actors with high fastness centrality are those that would reach the most people early. These are likely important for any “first mover” problem.

Network Connections: Time Constraint Define quickness centrality as the average of the minimum amount of time needed for an actor to reach others in the network: Where Tjit is the time that j receives the good sent by i at time t, and Tit is the time that i sent the good. This then represents the shortest duration between transmission and receipt between i and j. Note that this is a time-dependent feature, depending on when i “transmits” the good out into the population. The min is one of many functions, since the time-to-target speed is really a profile over the duration of t.

Network Connections: Time Constraint Define exposure centrality as the average of the amount of time that actor j is at risk to a good introduced by actor i. Where Tijl is the last time that j could receive the good from i and Tiif is the first time that j could receive the good from i, so the difference is the interval in time when i is at risk from j.

Network Connections: Time Constraint How do these centrality scores compare? Here I compare the duration-dependent measures to the standard measures on this example graph. Based only on the structure of the ties, this graph has lots of different centers, depending on closeness, betweeneess or degree. In this graph, closeness and betweenness correlate at 0.64, closeness and degree at 0.56, and betweeness and degree at 0.71 Node size proportional to degree

Network Connections: Time Constraint How do these centrality scores compare? Here I compare the duration-dependent measures to the standard measures on this example graph. But these edges are timed, since publications occur at a particular date. Here I treat the edges as lasting between the first and last publication date, and animate the resulting network. Dark blue edges are active, past edges are “ghosted” onto the map. Make note of the fairly high concurrency (some of it necessary due to two-mode data).

Network Connections: Time Constraint How do these centrality scores compare? What is the relation between structural centrality and duration centrality? Here for the observed edge timings.

Network Connections: Time Constraint How do these centrality scores compare? 0.6 Correlation w. Closeness centrality 0.4 0.2 Box plots based on 500 permutations of the observed time durations, which holds constant the duration distribution and the number of edges active at any given time.

Network Connections: Time Constraint How do these centrality scores compare? The “most important actors” in the graph depend crucially on when they are active. The correlations can range wildly over the exact same contact structure. Concurrency is important, but not determinant (at least within the range studied here). We need to extend our intuition on the global distribution of time in the graph.

When is a network? Diffusion: What do networks do? At the graph level, we are interested in two properties immediately: the temporal-implied reachability (perhaps relative to minimum) b) The asymmetry in reachability. What proportion of reachable dyads can mutually reach each other? These are directly relevant for overall diffusion potential in a network. Can we identify actor rules that account for this? That interact w. the structure of the graph? 2 1 1 2 2 1 1 2

Dynamic Networks : Open Problems There are a number of “open problems” w. dynamic networks. 1) Observation window lengths Source: Bender-deMoll & McFarland “The Art and Science of Dynamic Network Visualization” JoSS Forthcoming

Dynamic Networks : Open Problems There are a number of “open problems” w. dynamic networks. 2) Scale: How do we effectively visualize very large dynamic nets? Overlay points and lines with density contours. Works very well for networks that can be projected (fairly) well in 2 or 3 dimension. – sparse, strongly clustered, etc.

Dynamic Networks : Open Problems There are a number of “open problems” w. dynamic networks. 2) Scale: How do we effectively visualize very large dynamic nets? Replace points & lines w. 3D surfaces. Dynamically, this should give us a real “dancing landscape” (to borrow a phrase from McPherson).

Two versions of the same dynamic data
When is a network? Structural Change: What are networks? Open Problems: Where does dynamic visualization & modeling of networks need to go? 3) Image “Fit”  What makes a scientifically useful dynamic layout? Two versions of the same dynamic data “Poor” “Good”

When is a network? Structural Change: What are networks? Open Problems: Where does dynamic visualization & modeling of networks need to go? 4) Groups and affiliation networks: how do we incorporate membership information in a meaningful way?  Hyper “ellipses” are promising, but can overlap in very uninformative ways 5) How much object information (shape, color, etc.) is useful before the graphs become unreadable?  Some good work on information perception coming out information sciences 6) Can we link clear analytic features to the layout itself?  Explicit graph “spaces” through statistical models (Hoff & Handcock on Latent Space models for graphs, for example)

Network Connections: Network Diffusion & Peer Influence Most of our interest in networks is in how things flow through the network, which brings us to questions about the diffusion of goods through networks. We have already seen the limits to diffusion through network connections and timing, but a number of studies focus on how network structure affects diffusion directly. These include questions about the diffusion of goods and ideas through a network as well as the outcomes of diffusion. Examples include: Spatial Diffusion Models Critical Mass Models Dyadic Contact models Peer Influence Models

Network Connections: Network Diffusion & Peer Influence Coleman, Katz and Menzel, “Diffusion of an innovation among physicians” Sociometry (1957) 0.2 0.4 0.6 0.8 1 2 4 6 8 10 12 14 16 18 > 3 nominations 0 nominations 1 - 2 noms Week since introduction Cumulative % using “Gammanym”

Attitudes are a function of two sources: a) Individual characteristics
Complete Network Analysis Network Connections: Network Diffusion & Peer Influence Attitudes are a function of two sources: a) Individual characteristics Gender, Age, Race, Education, Etc. Standard sociology b) Interpersonal influences Actors negotiate opinions with others

Network Connections: Network Diffusion & Peer Influence Freidkin claims in his Structural Theory of Social Influence that the theory has four benefits: relaxes the simplifying assumption of actors who must either conform or deviate from a fixed consensus of others (public choice model) Does not necessarily result in consensus, but can have a stable pattern of disagreement Is a multi-level theory: micro level: cognitive theory about how people weigh and combine other’s opinions macro level: concerned with how social structural arrangements enter into and constrain the opinion-formation process Allows an analysis of the systemic consequences of social structures

Y(1) = an N x M matrix of initial opinions on M issues for N actors
Complete Network Analysis Network Connections: Network Diffusion & Peer Influence Formal Model (1) (2) Y(1) = an N x M matrix of initial opinions on M issues for N actors X = an N x K matrix of K exogenous variable that affect Y B = a K x M matrix of coefficients relating X to Y a = a weight of the strength of endogenous interpersonal influences W = an N x N matrix of interpersonal influences

(1) Complete Network Analysis Formal Model
Network Connections: Network Diffusion & Peer Influence Formal Model (1) This is the standard sociology model for explaining anything: the General Linear Model. It says that a dependent variable (Y) is some function (B) of a set of independent variables (X). At the individual level, the model says that: Usually, one of the X variables is e, the model error term.

(2) Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence (2) This part of the model taps social influence. It says that each person’s final opinion is a weighted average of their own initial opinions And the opinions of those they communicate with (which can include their own current opinions)

Network Connections: Network Diffusion & Peer Influence The key to the peer influence part of the model is W, a matrix of interpersonal weights. W is a function of the communication structure of the network, and is usually a transformation of the adjacency matrix. In general: Various specifications of the model change the value of wii, the extent to which one weighs their own current opinion and the relative weight of alters.

Network Connections: Network Diffusion & Peer Influence 1 2 Self weight: Even 3 4 2*self degree

(2) Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence Formal Properties of the model (2) When interpersonal influence is complete, model reduces to: When interpersonal influence is absent, model reduces to:

Network Connections: Network Diffusion & Peer Influence Formal Properties of the model If we allow the model to run over t, we can describe the model as: The model is directly related to spatial econometric models: Where the two coefficients (a and b) are estimated directly (See Doreian, 1982, SMR)

Network Connections: Network Diffusion & Peer Influence Simple example 1 2 Y 1 3 5 7 a = .8 3 4 T:

Network Connections: Network Diffusion & Peer Influence Simple example 1 2 Y 1 3 5 7 a = 1.0 3 4 T:

Network Connections: Network Diffusion & Peer Influence Extended example: building intuition Consider a network with three cohesive groups, and an initially random distribution of opinions: (to run this model, use peerinfl1.sas)

Network Connections: Network Diffusion & Peer Influence Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha = .8, 7 iterations

Network Connections: Network Diffusion & Peer Influence Extended example: building intuition Consider a network with three cohesive groups, and an initially random distribution of opinions: Now weight in-group ties higher than between group ties

Network Connections: Network Diffusion & Peer Influence Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha = .8, 7 iterations, in-group tie: 2

Network Connections: Network Diffusion & Peer Influence Consider the implications for populations of different structures. For example, we might have two groups, a large orthodox population and a small heterodox population. We can imagine the groups mixing in various levels: Heterodox: 10 people Orthodox: 100 People Little Mixing Moderate Mixing Heavy Mixing

Network Connections: Network Diffusion & Peer Influence Heavy Light Moderate

Network Connections: Network Diffusion & Peer Influence Light mixing

Network Connections: Network Diffusion & Peer Influence Moderate mixing

Network Connections: Network Diffusion & Peer Influence High mixing

Network Connections: Network Diffusion & Peer Influence In an unbalanced situation (small group vs large group) the extent of contact can easily overwhelm the small group. Applications of this idea are evident in: Missionary work (Must be certain to send missionaries out into the world with strong in-group contacts) Overcoming deviant culture (I.e. youth gangs vs. adults) Work by Hyojung Kim (U Washington) focuses on the first of these two processes in social movement models

Network Connections: Network Diffusion & Peer Influence In extensions (Friedkin, 1998), Friedkin generalizes the model so that alpha varies across people. We can extend the basic model by (1) simply changing a to a vector (A), which then changes each person’s opinion directly, and (2) by linking the self weight (wii) to alpha. Were A is a diagonal matrix of endogenous weights, with 0 < aii < 1. A further restriction on the model sets wii = 1-aii This leads to a great deal more flexibility in the theory, and some interesting insights. Consider the case of group opinion leaders with unchanging opinions (I.e. many people have high aii, while a few have low):

Network Connections: Network Diffusion & Peer Influence Peer Opinion Leaders Group 1 Leaders Group 2 Leaders Group 3 Leaders

Network Connections: Network Diffusion & Peer Influence Peer Opinion Leaders

Network Connections: Network Diffusion & Peer Influence Further extensions of the model might: Time dependent a: people likely value other’s opinions more early than later in a decision context Interact a with XB: people’s self weights are a function of their behaviors & attributes Make W dependent on structure of the network (weight transitive ties greater than intransitive ties, for example) Time dependent W: The network of contacts does not remain constant, but is dynamic, meaning that influence likely moves unevenly through the network And others likely abound….

Network Connections: Network Diffusion & Peer Influence Testing the fit of the general model Identifying peer influence in real data There are two general ways to test for peer influence in an observed network. The first estimates the parameters (a and b) of the peer influence model directly, the second transforms the network into a dyadic model, predicting similarity among actors. For details, see Doriean, 1982, Sociological Methods and Research, Ord (1976) JASA. Also Roger Gould (AJS, Paris Commune paper for example)

Network Connections: Network Diffusion & Peer Influence For details, see Doriean, 1982, sociological methods and research. Also Roger Gould (AJS, Paris Commune paper for example) The basic model says that people’s opinions are a function of the opinions of others and their characteristics. WY = A simple vector which can be added to your model. That is, multiple Y by a W matrix, and run the regression with WY as a new variable, and the regression coefficient is an estimate of a. This is what Doriean calls the QAD estimate of peer influence.

Network Connections: Network Diffusion & Peer Influence The problem with the above regression is that cases are, by definition, not independent. In fact, WY is also known as the ‘network autocorrelation’ coefficient, since a ‘peer influence’ effect is an autocorrelation effect -- your value is a function of the people you are connected to. In general, OLS is not the best way to estimate this equation. That is, QAD = Quick and Dirty, and your results will not be exact. In practice, the QAD approach (perhaps combined with a GLS estimator) results in empirical estimates that are “virtually indistinguishable” from MLE (Doreian et al, 1984) The proper way to estimate the peer equation is to use maximum likelihood estimates, and Doreian gives the formulas for this in his paper. The other way is to use non-parametric approaches, such as the Quadratic Assignment Procedure, to estimate the effects. Doreian, Patrik, Klaus Teuter and Chi-Hsein Wang “Network Autocorrelation Models: Some monte Carlo Results” Sociological Methods and Research, 13:

Network Connections: Network Diffusion & Peer Influence An empirical Example: Peer influence in a Graduate Student Network. Each person was asked to rank their satisfaction with the program, which is the dependent variable in this analysis. I constructed two W matrices, one from HELP the other from Best Friend. I treat relations as symmetric and valued, such that: I also include Race (white/Non-white, Gender and Cohort Year as exogenous variables in the model. (to run the model, see osupeerpi1.sas)

Network Connections: Network Diffusion & Peer Influence Distribution of Satisfaction with the department.

Network Connections: Network Diffusion & Peer Influence Parameter Estimates Parameter Standardized Variable Estimate Pr > |t| Estimate Intercept FEMALE NONWHITE y y y y PEER_BF PEER_H If we treat this as an ordered logit instead of a continuous variable, the coefficient for help becomes significant at the .1 level. There are all kinds of statistical problems with this model. It is likely underspecified (there are things left out of the model) and there is no direct correction for autocorrelated errors. Recall, however, that the monte carlo work seems to suggest this is will be a pretty robust measure in the face of different methods. Model R2 = .41, compared to .15 without the peer effects Alternative is to use a QAP Model (see below)

Peer influence through Dyad Models
Complete Network Analysis Network Connections: Network Diffusion & Peer Influence Peer influence through Dyad Models Another way to get at peer influence is not through the level of Y, but through the extent to which actors are similar with respect to Y. Recall the simulated example: peer influence is reflected in how close points are to each other.

Peer influence through Dyad Models
Complete Network Analysis Network Connections: Network Diffusion & Peer Influence Peer influence through Dyad Models The model is now expressed at the dyad level as: Where Y is a matrix of similarities, A is an adjacency matrix, and Xk is a matrix of similarities on attributes

Network Connections: Network Diffusion & Peer Influence If we break the original peer influence model into it’s components, the attribute part of the model suggests that any two people with the same attribute should have the same value for Y. The Peer influence model says that (a) if you and I are tied to each other, then we should have similar opinions and (b) that if we are tied to many of the same people, then we should have similar opinions. We can test both sides of these (and many other dyadic properties) directly at the dyad level.

Network Connections: Network Diffusion & Peer Influence NODE ADJMAT SAMERCE SAMESEX

Network Connections: Network Diffusion & Peer Influence Y 0.32 0.59 0.54 0.50 0.04 0.02 0.41 0.01 -0.17 Distance (Dij=abs(Yi-Yj)

Network Connections: Network Diffusion & Peer Influence Obs SENDER RCVER SIM NOM SAMERCE SAMESEX

Network Connections: Network Diffusion & Peer Influence The REG Procedure Model: MODEL1 Dependent Variable: SIM Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 NOM SAMERCE SAMESEX NCOMFND Note: Because everything in this model is symmetric, I run it with the statement: where sender > rcver; so I only get the top half of the distance matrix.

Network Connections: Network Diffusion & Peer Influence Like the basic Peer influence model, cases in a dyad model are not independent. However, the non-independence now comes from two sources: the fact that the same person is represented in (n-1) dyads and that i and j are linked through relations. One of the best solutions to this problem is QAP: Quadratic Assignment Procedure. A non-parametric procedure for significance testing. QAP runs the model of interest on the real data, then randomly permutes the rows/cols of the data matrix and estimates the model again. In so doing, it generates an empirical distribution of the coefficients.

Network Connections: QAP Comparing multiple networks: QAP The substantive question is how one set of relations (or dyadic attributes) relates to another. For example: Do marriage ties correlate with business ties in the Medici family network? Are friendship relations correlated with joint membership in a club?

Network Connections: QAP Assessing the correlation is straight forward, as we simply correlate each corresponding cell of the two matrices: Dyads: Marriage 1 ACCIAIUOL 2 ALBIZZI 3 BARBADORI 4 BISCHERI 5 CASTELLAN 6 GINORI 7 GUADAGNI 8 LAMBERTES 9 MEDICI PAZZI 11 PERUZZI PUCCI 13 RIDOLFI 14 SALVIATI 15 STROZZI 16 TORNABUON Business Correlation:

Network Connections: QAP But is the observed value statistically significant? Can’t use standard inference, since the assumptions are violated. Instead, we use a permutation approach. Essentially, we are asking whether the observed correlation is large (small) compared to that which we would get if the assignment of variables to nodes were random, but the interdependencies within variables were maintained. Do this by randomly sorting the rows and columns of the matrix, then re-estimating the correlation.

Network Connections: QAP Comparing multiple networks: QAP When you permute, you have to permute both the rows and the columns simultaneously to maintain the interdependencies in the data: ID ORIG A B C D E Sorted A D B C E

Calculate the observed correlation for K iterations do:
Complete Network Analysis Network Connections: QAP Procedure: Calculate the observed correlation for K iterations do: a) randomly sort one of the matrices b) recalculate the correlation c) store the outcome 3. compare the observed correlation to the distribution of correlations created by the random permutations.

Network Connections: QAP

This can be done simply in UCINET
QAP MATRIX CORRELATION Observed matrix: PadgBUS Structure matrix: PadgMAR # of Permutations: Random seed: Univariate statistics PadgBUS PadgMAR 1 Mean 2 Std Dev Sum 4 Variance SSQ 6 MCSSQ 7 Euc Norm 8 Minimum 9 Maximum 10 N of Obs Hubert's gamma: Bivariate Statistics Value Signif Avg SD P(Large) P(Small) NPerm 1 Pearson Correlation: Simple Matching: 3 Jaccard Coefficient: 4 Goodman-Kruskal Gamma: Hamming Distance: This can be done simply in UCINET

Network Connections: QAP Using the same logic,we can estimate alternative models, such as regression, logits, probits, etc. Only complication is that you need to permute all of the independent matrices in the same way each iteration.

Network Connections: QAP Peer-influence results on similarity dyad model, using QAP # of permutations: Diagonal valid? NO Random seed: Dependent variable: EX_SIM Expected values: C:\moody\Classes\soc884\examples\UCINET\mrqap-predicted Independent variables: EX_SSEX EX_SRCE EX_ADJ Number of valid observations among the X variables = 72 N = 72 Number of permutations performed: 1999 MODEL FIT R-square Adj R-Sqr Probability # of Obs REGRESSION COEFFICIENTS Un-stdized Stdized Proportion Proportion Independent Coefficient Coefficient Significance As Large As Small Intercept EX_SSEX EX_SRCE EX_ADJ

Complete Network Analysis

Similar presentations

Presentation on theme: "Complete Network Analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Complete Network Analysis

Similar presentations

Presentation on theme: "Complete Network Analysis"— Presentation transcript:

Similar presentations

About project

Feedback