Presentation is loading. Please wait.

Presentation is loading. Please wait.

Complete Network Analysis Exploratory Analysis Social Networks capture the relations between people. These relations form a system that can be thought.

Similar presentations


Presentation on theme: "Complete Network Analysis Exploratory Analysis Social Networks capture the relations between people. These relations form a system that can be thought."— Presentation transcript:

1 Complete Network Analysis Exploratory Analysis Social Networks capture the relations between people. These relations form a system that can be thought of as a social space. The advantage of the space analogy is that it captures the “topography” of social networks: classes, clusters, distance, “centrality”, etc. The disadvantage is that “spaces” and “fields” are notoriously difficult to study, because key features are simultaneously active. Current calls for “relational” sociology make this point clearly (See Martin 2003, Abbott 2001). “Field serves as some sort of representation for those overarching social regularities that may also be visualized … as quasi-organisms, systems or structures” J. L. Martin AJS 2003. Examples of fields range from abstract notions of status spaces to concrete examples such as the French academic system.

2 Complete Network Analysis Exploratory Analysis Sociologists often use spatial analogies, such as MDS or correspondence analysis, based on patterns of actor attributes. Social Network Analysis lets you explore the relational space directly, by mapping relations directly. The first step in this exploration is often visualizing the network. Bourdieu “Social Space and Symbolic Space”

3 Complete Network Analysis Exploratory Analysis: Network visualization Network visualization helps build intuition, but you have to keep the drawing algorithm in mind: Tree-Based layouts Most effective for very sparse, regular graphs. Very useful when relations are strongly directed, such as organization charts, internet connections, Spring-embeder layouts Most effective with graphs that have a strong community structure (clustering, etc). Provides a very clear correspondence between social distance and plotted distance Two images of the same network

4 Complete Network Analysis Exploratory Analysis: Network visualization Network visualization helps build intuition, but you have to keep the drawing algorithm in mind: Tree-Based layouts Spring-embeder layouts Two images of the same network

5 Complete Network Analysis Exploratory Analysis: Network visualization Network visualization helps build intuition, but you have to keep the drawing algorithm in mind. Hierarchy & Tree models Use optimization routines to add meaning to the “Y-axis” of the plot. This makes it possible to easily see who is most central because of who is on the top of the figure. Usually includes some routine for minimizing line- crossing. Spring Embedder layouts Work on an analogy to a physical system: ties connecting a pair have ‘springs’ that pull them together. Unconnected nodes have springs that push them apart. The resulting image reflects the balance of these two features. This usually creates a correspondence between physical closeness and network distance.

6 Complete Network Analysis Exploratory Analysis: Network visualization

7 Complete Network Analysis Exploratory Analysis: Network visualization Using colors to code attributes makes it simpler to compare attributes to relations. Here we can assess the effectiveness of two different clustering routines on a school friendship network.

8 Complete Network Analysis Exploratory Analysis: Network visualization As networks increase in size, the effectiveness of a point-and-line display diminishes, because you simply run out of plotting dimensions. I’ve found that you can still get some insight by using the ‘overlap’ that results in from a space-based layout as information. Here you see the clustering evident in movie co-staring for about 8000 actors.

9 Complete Network Analysis Exploratory Analysis: Network visualization As networks increase in size, the effectiveness of a point-and-line display diminishes, because you simply run out of plotting dimensions. I’ve found that you can still get some insight by using the ‘overlap’ that results in from a space-based layout as information. This figure contains over 29,000 social science authors. The two dense regions reflect different topics.

10 Complete Network Analysis Exploratory Analysis: Network visualization Adding time to social networks is also complicated, as you run out of space to put time in most network figures. One solution is to animate the network. Here we see streaming interaction in a classroom, where the teacher (yellow square) has trouble maintaining order. The SONIA software program (McFarland and Bender-deMoll) will produce these figures.

11 Complete Network Analysis Exploratory Analysis: Network visualization Visualization is a tool, but networks are complex and our visualization tools can sometimes confound. The strong advantage is that you get a complete overview of multiple features at once. The difficulty comes with trying to map a complex multi-dimensional object in low- dimensional space.

12 “Goods” flow through networks: Complete Network Analysis Network Connections

13 We often care about networks because of how “goods” travel through the network. In addition to the simple pairwise probability that one actor passes information on to another (p ij ), two factors affect flow through a network: Topology -the shape, or form, of the network - Example: one actor cannot pass information to another unless they are either directly or indirectly connected Time - the timing of contact matters - Example: an actor cannot pass information he has not receive yet Complete Network Analysis Network Connections

14 Two features of the network’s topology are known to be important: connectivity and centrality Connectivity refers to how actors in one part of the network are connected to actors in another part of the network. Reachability: Is it possible for actor i to reach actor j? This can only be true if there is a chain of contact from one actor to another. Distance: Given they can be reached, how many steps are they from each other? Number of paths: How many different paths connect each pair? Complete Network Analysis Network Connections: Topology

15 Without full network data, you can’t distinguish actors with limited flow potential from those more deeply embedded in a setting. a b c Complete Network Analysis Network Connections: Topology

16 d e c Indirect connections are what make networks systems. One actor can reach another if there is a path in the graph connecting them. a b ce d f bf a Paths can be directed, leading to a distinction between strong and weak components Complete Network Analysis Network Connections: Connectivity

17 Basic elements in connectivity A path is a sequence of nodes and edges starting with one node and ending with another, tracing the indirect connection between the two. On a path, you never go backwards or revisit the same node twice. Example: a  b  c  d A walk is any sequence of nodes and edges, and may go backwards. Example: a  b  c  b  c  d A cycle is a path that starts and ends with the same node. Example: a  b  c  a Complete Network Analysis Network Connections: Connectivity

18 Reachability If you can trace a sequence of relations from one actor to another, then the two are reachable. If there is at least one path connecting every pair of actors in the graph, the graph is connected and is called a component. Intuitively, a component is the set of people who are all connected by a chain of relations. Complete Network Analysis Network Connections: Connectivity

19 This example contains many components. Complete Network Analysis Network Connections: Connectivity

20 Because relations can be directed or undirected, components come in two flavors: For a graph with any directed edges, there are two types of components: Strong components consist of the set(s) of all nodes that are mutually reachable Weak components consist of the set(s) of all nodes where at least one node can reach the other. Complete Network Analysis Network Connections: Connectivity

21 There are only 2 strong components with more than 1 person in this network. Components are the minimum requirement for social groups. As we will see later, they are necessary but not sufficient Complete Network Analysis Network Connections: Connectivity All of the major network analysis software identifies strong and weak components

22 We can extend our conception of component to increase the structural cohesion of the definition. Multiple connectivity: Two paths with the same start and end point, but that have no other nodes in common are called node independent. In every component, the paths linking actors i and j must pass through a set of nodes, S, that if removed would disconnect the graph. The number of nodes in the smallest S is equal to the number of independent paths connecting i and j. Complete Network Analysis Network Connections: Connectivity

23 1 2 5 4 3 6 8 7 Simple component Every path from 1 to 8 must go through 4. S(1,8) = 4, and N(1,8)=1. That is, the graph is a component. Complete Network Analysis Network Connections: Connectivity

24 1 2 5 4 3 6 8 7 Multiple connectivity In this graph, there are multiple paths connecting nodes 1 and 8. 1 2 5 8 3 6 7 8 4 6 7 8 4 5 8 But only 2 of them are independent. 1 5 8 1 2 3 6 7 8 N(1,8) = 2. Complete Network Analysis Network Connections: Connectivity

25 A bicomponent is the set of all nodes connected by at least 2 node-independent paths. Complete Network Analysis Network Connections: Connectivity

26 Bicomponents can overlap by at most 1 person. These nodes are cutpoints in the graph. If that node is removed, the graph would be disconnected. 1 2 5 4 3 6 8 7 4 is a cutpoint 1 1 is a cutpoint Complete Network Analysis Network Connections: Connectivity

27 a Geodesic distance is measured by the smallest (weighted) number of relations separating a pair: Actor “a” is: 1 step from 4 2 steps from 5 3 steps from 4 4 steps from 3 5 steps from 1 Complete Network Analysis Network Connections: Distance

28 Probability of transfer by distance and number of paths, assume a constant p ij of 0.6 0 0.2 0.4 0.6 0.8 1 1.2 23456 Path distance probability 10 paths 5 paths 2 paths 1 path Complete Network Analysis Network Connections: Distance

29 Reachability in Colorado Springs (Sexual contact only) (Node size = log of degree) High-risk actors over 4 years 695 people represented Longest path is 17 steps Average distance is about 5 steps Average person is within 3 steps of 75 other people 137 people connected through 2 independent paths, core of 30 people connected through 4 independent paths Complete Network Analysis Network Connections: Distance

30 Complete Network Analysis Network Connections: Distance Calculating distance in global networks: Powers of the adjacency matrix Calculate reachability through matrix multiplication. (see p.162 of W&F) a b c e d f X 0 1 0 0 0 1 1 0 1 0 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 0 1 1 0 0 1 0 1 0 0 0 X 2 2 0 2 0 0 0 0 2 0 1 1 2 2 0 4 1 1 0 0 1 1 2 1 1 0 1 1 1 2 1 0 2 0 1 1 2 X 3 0 4 0 2 2 4 4 0 6 1 1 0 0 6 2 5 5 6 2 1 5 2 3 1 2 1 5 3 2 1 4 0 6 1 1 0 Distance. 1 2 0 0 1 1. 1 2 2 2 2 1. 1 1 1 0 2 1. 1 2 0 2 1 1. 2 1 2 1 2 2. Distance. 1 2 3 3 1 1. 1 2 2 2 2 1. 1 1 1 3 2 1. 1 2 3 2 1 1. 2 1 2 1 2 2.

31 Complete Network Analysis Network Connections: Distance Calculating distance in global networks: Breadth-First Search In large networks, matrix multiplication is just too slow. A breadth-first search algorithm works by walking through the graph, reaching all nodes from a particular start node. Distance is calculated directly in most SNA software packages.

32 Complete Network Analysis Network Connections: Distance As a graph statistic, the distribution of distance can tell you a good deal about how close people are to each other (we’ll see this more fully when we get to closeness centrality). The diameter of a graph is the longest geodesic, giving the maximum distance. We often use the l, or the mean distance between every pair to characterize the entire graph. For example, all else equal, we would expect rumors to travel faster through settings where the average distance is small.

33 Complete Network Analysis Network Connections: Distance

34 Complete Network Analysis Network Connections: Distance

35 Travers and Milgram’s work on the small world is responsible for the standard belief that “everyone is connected by a chain of about 6 steps.” Two questions: Given what we know about networks, what is the longest path (defined by handshakes) that separates any two people? Is 6 steps a long distance or a short distance? Complete Network Analysis Network Connections: Distance

36 a b c d e f g h i j k l m ------------------------------------------ a.. 1 2........ 2 1 b. 3. 1........ 1 2 c.............. d. 4 3 1. 1 2 1. 2.. 2 3 e. 3 2 2 1. 1 2. 1.. 1 2 f. 4 3 3 2 1. 3. 2.. 2 3 g. 5 4 4 3 2 1.. 3.. 3 4 h......... 1.... i.............. j......... 1.... k......... 1.... l. 2 1 2......... 1 m. 1 2 3........ 1. b c d g f e k i j h l m a Complete Network Analysis Network Connections: Distance When the graph is directed, distance is also directed (distance to vs distance from), following the direction of the tie.

37 What if everyone maximized structural holes? Associates do not know each other: Results in an exponential growth curve. Reach entire planet quickly. Complete Network Analysis Network Connections: Distance

38 What if people know each other randomly?: Random graph theory shows that we could reach people quite quickly if ties were random Complete Network Analysis Network Connections: Distance

39 0 20% 40% 60% 80% 100% Percent Contacted 0123456789101112131415 Remove Degree = 4 Degree = 3 Degree = 2 Random Reachability: By number of close friends Complete Network Analysis Network Connections: Distance

40 Distance-Reach Distribution for a large Jr. High School (Add Health data) Random graph Observed Complete Network Analysis Network Connections: Distance

41 Milgram’s test: Send a packet from sets of randomly selected people to a stockbroker in Boston. Experimental Setup: Arbitrarily select people from 3 pools: a) People in Boston b) Random in Nebraska c) Stockholders in Nebraska Complete Network Analysis Network Connections: Distance

42 Milgram’s Findings: Distance to target person, by sending group. Complete Network Analysis Network Connections: Distance

43 Most chains found their way through a small number of intermediaries. Understanding why this is true has been called the “Small-World Problem,” which has since been generalized to a much more formal understanding of tie patterns in large networks (see below) For purposes of flow through graphs, distance is a primary concern so long as pij < 1. Most measures of position in a network account for some aspect of distance. Complete Network Analysis Network Connections: Distance

44 Distance measures “locate” a node by number of steps that separate them from the remainder of the network, but there are many other ways of locating nodes in networks. Centrality refers to (one dimension of) location, identifying where an actor resides in a network. For example, we can compare actors at the edge of the network to actors at the center. In general, this is a way to formalize intuitive notions about the distinction between insiders and outsiders. As a terminology point, some authors distinguish centrality from prestige based on the directionality of the tie. Since the formulas are the same in every other respect, I stick with “centrality” for simplicity. Complete Network Analysis Network Connections: Centrality

45 Conceptually, centrality is fairly straight forward: we want to identify which nodes are in the ‘center’ of the network. In practice, identifying exactly what we mean by ‘center’ is somewhat complicated, but substantively we often have reason to believe that people at the center are very important. The standard centrality measures capture a wide range of “importance” in a network: Degree Closeness Betweenness Eigenvector / Power measures After discussing these, I will describe measures that combine features of each of them. Complete Network Analysis Network Connections: Centrality

46 The most intuitive notion of centrality focuses on degree. Degree is the number of direct contacts a person has. The ideas is that the actor with the most ties is the most important: Complete Network Analysis Network Connections: Centrality

47 Complete Network Analysis Network Connections: Centrality In a simple random graph (G n,p ), degree will have a Poisson distribution, and the nodes with high degree are likely to be at the intuitive center. Deviations from a Poisson distribution suggest non-random processes, which is at the heart of current “scale-free” work on networks (see below).

48 Degree centrality, however, can be deceiving, because it is a purely local measure. Complete Network Analysis Network Connections: Centrality

49 If we want to measure the degree to which the graph as a whole is centralized, we look at the dispersion of centrality: Simple: variance of the individual centrality scores. Or, using Freeman’s general formula for centralization (which ranges from 0 to 1): Complete Network Analysis Network Connections: Centrality UCINET, SPAN, PAJEK and most other network software will calculate these measures.

50 Degree Centralization Scores Freeman:.07 Variance:.20 Freeman: 1.0 Variance: 3.9 Freeman:.02 Variance:.17 Freeman: 0.0 Variance: 0.0 Complete Network Analysis Network Connections: Centrality

51 A second measure of centrality is closeness centrality. An actor is considered important if he/she is relatively close to all other actors. Closeness is based on the inverse of the distance of each actor to every other actor in the network. Closeness Centrality: Normalized Closeness Centrality Complete Network Analysis Network Connections: Centrality

52 Distance Closeness normalized 0 1 1 1 1 1 1 1.143 1.00 1 0 2 2 2 2 2 2.077.538 1 2 0 2 2 2 2 2.077.538 1 2 2 0 2 2 2 2.077.538 1 2 2 2 0 2 2 2.077.538 1 2 2 2 2 0 2 2.077.538 1 2 2 2 2 2 0 2.077.538 1 2 2 2 2 2 2 0.077.538 Closeness Centrality in the examples Distance Closeness normalized 0 1 2 3 4 4 3 2 1.050.400 1 0 1 2 3 4 4 3 2.050.400 2 1 0 1 2 3 4 4 3.050.400 3 2 1 0 1 2 3 4 4.050.400 4 3 2 1 0 1 2 3 4.050.400 4 4 3 2 1 0 1 2 3.050.400 3 4 4 3 2 1 0 1 2.050.400 2 3 4 4 3 2 1 0 1.050.400 1 2 3 4 4 3 2 1 0.050.400 Complete Network Analysis Network Connections: Centrality

53 Distance Closeness normalized 0 1 2 3 4 5 6.048.286 1 0 1 2 3 4 5.063.375 2 1 0 1 2 3 4.077.462 3 2 1 0 1 2 3.083.500 4 3 2 1 0 1 2.077.462 5 4 3 2 1 0 1.063.375 6 5 4 3 2 1 0.048.286 Closeness Centrality in the examples Complete Network Analysis Network Connections: Centrality

54 Distance Closeness normalized 0 1 1 2 3 4 4 5 5 6 5 5 6.021.255 1 0 1 1 2 3 3 4 4 5 4 4 5.027.324 1 1 0 1 2 3 3 4 4 5 4 4 5.027.324 2 1 1 0 1 2 2 3 3 4 3 3 4.034.414 3 2 2 1 0 1 1 2 2 3 2 2 3.042.500 4 3 3 2 1 0 2 3 3 4 1 1 2.034.414 4 3 3 2 1 2 0 1 1 2 3 3 4.034.414 5 4 4 3 2 3 1 0 1 1 4 4 5.027.324 5 4 4 3 2 3 1 1 0 1 4 4 5.027.324 6 5 5 4 3 4 2 1 1 0 5 5 6.021.255 5 4 4 3 2 1 3 4 4 5 0 1 1.027.324 5 4 4 3 2 1 3 4 4 5 1 0 1.027.324 6 5 5 4 3 2 4 5 5 6 1 1 0.021.255 Closeness Centrality in the examples Complete Network Analysis Network Connections: Centrality

55 Betweenness Centrality: Model based on communication flow: A person who lies on communication paths can control communication flow, and is thus important. Betweenness centrality counts the number of shortest paths between i and k that actor j resides on. b a C d e f g h Complete Network Analysis Network Connections: Centrality

56 Betweenness Centrality: Where g jk = the number of geodesics connecting jk, and g jk (n i ) = the number that actor i is on. Usually normalized by: Complete Network Analysis Network Connections: Centrality

57 Centralization: 1.0 Centralization:.31 Centralization:.59 Centralization: 0 Betweenness Centrality: Complete Network Analysis Network Connections: Centrality

58 Centralization:.183 Betweenness Centrality: Complete Network Analysis Network Connections: Centrality

59 Information Centrality: It is quite likely that information can flow through paths other than the geodesic. The Information Centrality score uses all paths in the network, and weights them based on their length. Complete Network Analysis Network Connections: Centrality

60 Graph Theoretic Center (Barry or Jordan Center). Identify the points with the smallest, maximum distance to all other points. Value = longest distance to any other node. The graph theoretic center is ‘3’, but you might also consider a continuous measure as the inverse of the maximum geodesic Complete Network Analysis Network Connections: Centrality

61 Information Centrality: Complete Network Analysis Network Connections: Centrality

62 Comparing across these 3 centrality values Generally, the 3 centrality types will be positively correlated When they are not (low) correlated, it probably tells you something interesting about the network. Low Degree Low Closeness Low Betweenness High Degree Embedded in cluster that is far from the rest of the network Ego's connections are redundant - communication bypasses him/her High Closeness Key player tied to important important/active alters Probably multiple paths in the network, ego is near many people, but so are many others High Betweenness Ego's few ties are crucial for network flow Very rare cell. Would mean that ego monopolizes the ties from a small number of people to many others. Complete Network Analysis Network Connections: Centrality

63 Bonacich Power Centrality: Actor’s centrality (prestige) is equal to a function of the prestige of those they are connected to. Thus, actors who are tied to very central actors should have higher prestige/ centrality than those who are not.  is a scaling vector, which is set to normalize the score.  reflects the extent to which you weight the centrality of people ego is tied to. R is the adjacency matrix (can be valued) I is the identity matrix (1s down the diagonal) 1 is a matrix of all ones. Complete Network Analysis Network Connections: Centrality

64 Bonacich Power Centrality: The magnitude of  reflects the radius of power. Small values of  weight local structure, larger values weight global structure. If  is positive, then ego has higher centrality when tied to people who are central. If  is negative, then ego has higher centrality when tied to people who are not central. As  approaches zero, you get degree centrality. Complete Network Analysis Network Connections: Centrality

65 Bonacich Power Centrality: Complete Network Analysis Network Connections: Centrality  = 0.23

66  =.35  =-.35 Bonacich Power Centrality: Complete Network Analysis Network Connections: Centrality

67 Bonacich Power Centrality:  =.23  =-.23 Complete Network Analysis Network Connections: Centrality

68 Complete Network Analysis Network Connections: Centrality In recent work, Borgatti (2003; 2005) discusses centrality in terms of two key dimensions: Radial Medial Frequency Distance Degree Centrality Bon. Power centrality Closeness Centrality Betweenness (empty: but would be an interruption measure based on distance)

69 Complete Network Analysis Network Connections: Centrality In recent work, Borgatti (2003; 2005) discusses centrality in terms of two key dimensions: Substantively, the key question for centrality is knowing what is flowing through the network. The key features are: Whether the actor retains the good to pass to others (Information, Diseases) or whether they pass the good and then loose it (physical objects) Whether the key factor for spread is distance (disease with low p ij ) or multiple sources (information) The off-the-shelf measures do not always match the social process of interest, so researchers need to be mindful of this.

70 (Node size proportional to betweenness centrality ) Actors that appear very different when seen individually, are comparable in the global network. Complete Network Analysis Network Connections: Centrality Graph is 27% centralized

71 Centrality example: Add Health Node size proportional to betweenness centrality Graph is 45% centralized Complete Network Analysis Network Connections: Centrality

72 Network Topology: Centrality and Centralization Rothenberg, et al. 1995. "Choosing a Centrality Measure: Epidemiologic Correlates in the Colorado Springs Study of Social Networks." Social Networks: Special Edition on Social Networks and Infectious Disease: HIV/AIDS 17:273-97. Found that the HIV positive actors were not central to the overall network Bell, D. C., J. S. Atkinson, and J. W. Carlson. 1999. "Centrality Measures for Disease Transmission Networks." Social Networks 21:1-21. Using a data-based simulation on 22 people, found that simple degree measures were adequate, relative to complexity Poulin, R., M.-C. Boily, and B. R. Masse. 2000. "Dynamical Systems to Define Centrality in Social Networks." Social Networks 22:187-220 Method that allows one to compare across non-connected portions of a network, applied to a network of 40 people w. AIDS Measures research:

73 Two factors that affect network flows: Topology - the shape, or form, of the network - simple example: one actor cannot pass information to another unless they are either directly or indirectly connected Time - the timing of contacts matters - simple example: an actor cannot pass information he has not yet received. Complete Network Analysis Network Connections: Network Evolution

74 Timing in networks A focus on contact structure has often slighted the importance of network dynamics,though a number of recent pieces are addressing this. Time affects networks in two important ways: 1)The structure itself evolves, in ways that will affect the topology an thus flow. Wasserheit and Aral, 1996. “The dynamic topology of Sexually Transmitted Disease Epidemics” The Journal of Infectious Diseases 74:S201-13 Rothenberg, et al. 1997 “Using Social Network and Ethnographic Tools to Evaluate Syphilis Transmission” Sexually Transmitted Diseases 25: 154-160 2) The timing of contact constrains flow Moody 2002, Social Forces Morris and Kretchmar, 1995 Complete Network Analysis Network Connections: Network Evolution

75 A Network-Informed Approach to Investigating a Tuberculosis Outbreak: Implications for Enhancing Contact Investigations Peter D. McElroy, Richard B. Rothenberg, Reuben Varghese, Ruth Woodruff, Gerald Minns, Lauren A. Lambert, Stephen Muth, and Renee Ridzon

76 A Network-Informed Approach to Investigating a Tuberculosis Outbreak: Implications for Enhancing Contact Investigations Peter D. McElroy, Richard B. Rothenberg, Reuben Varghese, Ruth Woodruff, Gerald Minns, Lauren A. Lambert, Stephen Muth, and Renee Ridzon Circled nodes are crack users

77 A Network-Informed Approach to Investigating a Tuberculosis Outbreak: Implications for Enhancing Contact Investigations Peter D. McElroy, Richard B. Rothenberg, Reuben Varghese, Ruth Woodruff, Gerald Minns, Lauren A. Lambert, Stephen Muth, and Renee Ridzon

78 A Network-Informed Approach to Investigating a Tuberculosis Outbreak: Implications for Enhancing Contact Investigations Peter D. McElroy, Richard B. Rothenberg, Reuben Varghese, Ruth Woodruff, Gerald Minns, Lauren A. Lambert, Stephen Muth, and Renee Ridzon

79 Sexual Relations among A syphilis outbreak Jan - June, 1995 Rothenberg et al map the pattern of sexual contact among youth involved in a Syphilis outbreak in Atlanta over a one year period. (Syphilis cases in red) Complete Network Analysis Network Connections: Network Evolution

80 Sexual Relations among A syphilis outbreak July-Dec, 1995

81 Sexual Relations among A syphilis outbreak July-Dec, 1995

82 Data on drug users in Colorado Springs, over 5 years Drug Relations, Colorado Springs, Year 1 Complete Network Analysis Network Connections: Network Evolution

83 Drug Relations, Colorado Springs, Year 2 Current year in red, past relations in gray Complete Network Analysis Network Connections: Network Evolution

84 Drug Relations, Colorado Springs, Year 3 Current year in red, past relations in gray Complete Network Analysis Network Connections: Network Evolution

85 Drug Relations, Colorado Springs, Year 4 Current year in red, past relations in gray Complete Network Analysis Network Connections: Network Evolution

86 Drug Relations, Colorado Springs, Year 5 Current year in red, past relations in gray Complete Network Analysis Network Connections: Network Evolution

87 How do we analyze change in networks over time? a) Descriptive techniques (change in measures over time) b) Visualization c) Network statistical models (Sienna, see below under models) Complete Network Analysis Network Connections: Network Evolution

88 Complete Network Analysis Network Connections: Social Balance One of the best theoretical approaches to understanding change in networks over time is to ask how the current relational patterns are likely to affect future relations. That is, make relational change endogenous. There are many models that do this, but the most famous for affective relations is social balance. Other models include: Preferential attachment: “the rich get richer” (Barabasi) Avoiding asymmetry (Gould) Avoiding close past relations (cycles of 4) (Bearman, Moody & Stovel) Development of Hierarchy (Ivan Chase)

89 Social Balance & Transitivity We determine balance based on the product of the edges: + + + - - + + + - - - - (+)(+)(+) = (+) (-)(+)(-) = (-) (-)(-)(-) = (-) (+)(-)(+) = (-) Balanced Unbalanced “A friend of a friend is a friend” “An enemy of my enemy is a friend” “An enemy of my enemy is an enemy” “A Friend of a Friend is an enemy” Complete Network Analysis Network Connections: Social Balance

90 Heider argued that unbalanced triads would be unstable: They should transform toward balance + + - + + + - + - + - - Become Friends Become Enemies Complete Network Analysis Network Connections: Social Balance

91 IF such a balancing process were active throughout the graph, all intransitive triads would be eliminated from the network. This would result in one of two possible graphs (Balance Theorem): Friends with Enemies with Balanced Opposition Complete Clique Complete Network Analysis Network Connections: Social Balance

92 Empirically, we often find that graphs break up into more than two groups. What does this imply for balance theory? It turns out, that if you allow all negative triads, you can get a graph with many clusters. That is, instead of treating (-)(-)(-) as an forbidden triad, treat it as allowed. This implies that the micro rule is different: negative ties among enemies are not as motivating as positive ties. Complete Network Analysis Network Connections: Social Balance

93 Empirically, we also rarely have symmetric relations (at least on affect) thus we need to identify balance in undirected relations. Directed dyads can be in one of three states: 1) Mutual 2) Asymmetric 3) Null Every triad is composed of 3 dyads, and we can identify triads based on the number of each type, called the MAN label system Complete Network Analysis Network Connections: Social Balance

94 Balance in directed relations Actors seek out transitive relations, and avoid intransitive relations. A triple is transitive A property of triples within triads Assumes directed relations The saliency of a triad may differ for each actor, depending on their position within the triad. ij & jk ik If: then: Complete Network Analysis Network Connections: Social Balance

95 120C a b c Ordered Triples: abc; Transitive ac acb; Vacuous ab b a c;bc bca; Intransitive ba c ab; Intransitive cb cba; Vacuous ca Once we admit directed relations, we need to decompose triads into their constituent triples. Complete Network Analysis Network Connections: Social Balance

96 Network Sub-Structure: Triads 003 (0) 012 (1) 102 021D 021U 021C (2) 111D 111U 030T 030C (3) 201 120D 120U 120C (4) 210 (5) 300 (6) Intransitive Transitive Mixed Complete Network Analysis Network Connections: Social Balance

97 An Example of the triad census Type Number of triads --------------------------------------- 1 - 003 21 --------------------------------------- 2 - 012 26 3 - 102 11 4 - 021D 1 5 - 021U 5 6 - 021C 3 7 - 111D 2 8 - 111U 5 9 - 030T 3 10 - 030C 1 11 - 201 1 12 - 120D 1 13 - 120U 1 14 - 120C 1 15 - 210 1 16 - 300 1 --------------------------------------- Sum (2 - 16): 63 Complete Network Analysis Network Connections: Social Balance Pajek & SPAN will give you the triad census

98 As with undirected graphs, you can use the type of triads allowed to characterize the total graph. But now the potential patterns are much more diverse 1) All triads are 030T: A perfect linear hierarchy. Complete Network Analysis Network Connections: Social Balance

99 Triads allowed: {300, 102} M M N* 1 1 0 0 Complete Network Analysis Network Connections: Social Balance

100 Cluster Structure, allows triads: {003, 300, 102} M M N* M M Eugene Johnsen (1985, 1986) specifies a number of structures that result from various triad configurations 1 1 1 1 Complete Network Analysis Network Connections: Social Balance

101 P RC {300,102, 003, 120D, 120U, 030T, 021D, 021U} Ranked Cluster: M M N* M M M A* 1 1 1 1 1 1 1 1 1 0 1 1 1 10 0 0 0000 00 00 And many more... Complete Network Analysis Network Connections: Social Balance

102 Substantively, specifying a set of triads defines a behavioral mechanism, and we can use the distribution of triads in a network to test whether the hypothesized mechanism is active. We do this by (1) counting the number of each triad type in a given network and (2) comparing it to the expected number, given some random distribution of ties in the network. See Wasserman and Faust, Chapter 14 for computation details, and the SPAN manual for SAS code that will generate these distributions, if you so choose. Complete Network Analysis Network Connections: Social Balance

103 Structural Indices based on the distribution of triads The observed distribution of triads can be fit to the hypothesized structures using weighting vectors for each type of triad. Where: l = 16 element weighting vector for the triad types T = the observed triad census  T = the expected value of T  T = the variance-covariance matrix for T Complete Network Analysis Network Connections: Social Balance

104 For the Add Health data, the observed distribution of the tau statistic for various models was: Indicating that a ranked-cluster model fits the best. Complete Network Analysis Network Connections: Social Balance

105 So far, the structural features of a network focus on the graph ‘at equilibrium.’ That is, we have hypothesized structures once people have made all the choices they are going to make. What we have not done, is really look closely at the implication of changing relations. That is, we might say that triad 030C should not occur, but what would a change in this triad imply from the standpoint of the actor making a relational change? Complete Network Analysis Network Connections: Social Balance

106 003 102 021D 021U 030C 111D 111U 030T 201 120D 120U 120C 210 300 012 021C Transition to a Vacuous Triple Transition to a Transitive Triple Transition to an Intransitive Triple Complete Network Analysis Network Connections: Social Balance

107 003 102 021D 030T 201 120U 120C 210 300 012 021C 021U 111D 111U 030C 120D Observed triad transition patterns, from Hallinan’s data. Complete Network Analysis Network Connections: Social Balance

108 Doreian, Kapuscinski, Krackhardt & Szczypula: A breif history of balance through time. Reanalyzes the Newcomb fraternity data, to look at changes in social balance over time. The basic balance theory hypothesis is that people who find themselves in an unbalanced position should change their relations to generate balance. Hypothetically, this should lead to greater balance over time. After discussing a set of problems imposed because the data are forced ranks, they first look at simple reciprocity. Complete Network Analysis Network Connections: Social Balance

109 Doreian, Kapuscinski, Krackhardt & Szczypula: A brief history of balance through time. Complete Network Analysis Network Connections: Social Balance

110 Doreian, Kapuscinski, Krackhardt & Szczypula: A brief history of balance through time. Complete Network Analysis Network Connections: Social Balance

111 40 30 20 10 0 123456789 11121314 Week % Change in ties Relational Stability Doreian, Kapuscinski, Krackhardt & Szczypula: A brief history of balance through time. Complete Network Analysis Network Connections: Social Balance

112 In addition to the simple degree of transitivity, they want to measure whether the structure as a whole conforms to the prediction of structural balance. They identify groups by partitioning the network to minimize the number of negative ties within group and the number of positive ties between group (this algorithm is implemented in PAJEK). They can then measure structural imbalance as the sum if departures for structural balance (2 and only 2 groups) and generalized balance (greater than 2 groups). Doreian, Kapuscinski, Krackhardt & Szczypula: A brief history of balance through time. Complete Network Analysis Network Connections: Social Balance

113 123456789101112131415 9 11 13 15 17 Extent of Structural Imbalance Structural Imbalance Generalized Imbalance Week Complete Network Analysis Network Connections: Social Balance

114 They point out that the dynamic action of individuals had group implications, which is part of what makes balance so attractive. “…the micro-level processes can be viewed as generating social forces that move the structure toward group balance.” They also point out that negative ties within groups are likely less tolerated than positive ties between groups, as negatives within group may threaten the group in ways that positive ties between groups do not. Doreian, Kapuscinski, Krackhardt & Szczypula: A brief history of balance through time. Complete Network Analysis Network Connections: Social Balance

115 What impact does this kind of timing have on disease flow? The most dramatic effect occurs with the distinction between concurrent and serial relations. Relations are concurrent whenever an actor has more than one sex partner during the same time interval. Concurrency is dangerous for disease spread because: a) compared to serially monogamous couples, and STDis not trapped inside a single dyad b) the std can travel in two directions - through ego - to either of his/her partners at the same time Complete Network Analysis Network Connections: Time Constraint

116 0 400 800 1200 01234567 Concurrency and Epidemic Size Morris & Kretzschmar (1995) Monogamy Disassortative AssortativeRandom Population size is 2000, simulation ran over 3 ‘years’ Complete Network Analysis Network Connections: Time Constraint

117 Concurrency and disease spread Variable Constant Concurrent K 2 Degree Correlation Bias Coefficient 84.18 357.07 440.38 -557.40 982.31 Adjusting for other mixing patterns: Each.1 increase in concurrency results in 45 more positive cases Complete Network Analysis Network Connections: Time Constraint

118 B C E DF A 2 - 5 3 - 7 0 - 1 8 - 9 3 - 5 Numbers above lines indicate contact periods What impact does timing have on flow through the network? Complete Network Analysis Network Connections: Time Constraint

119 B C E DF A The path graph for the hypothetical contact network While clearly important, this is not often handled well by current software. Complete Network Analysis Network Connections: Time Constraint

120 Direct Contact Network of 8 people in a ring Complete Network Analysis Network Connections: Time Constraint

121 Implied Contact Network of 8 people in a ring All relations Concurrent Complete Network Analysis Network Connections: Time Constraint

122 Implied Contact Network of 8 people in a ring Mixed Concurrent 2 2 1 1 2 2 3 3 Complete Network Analysis Network Connections: Time Constraint

123 Implied Contact Network of 8 people in a ring Serial Monogamy (1) 1 2 3 7 6 5 8 4 Complete Network Analysis Network Connections: Time Constraint

124 Implied Contact Network of 8 people in a ring Serial Monogamy (2) 1 2 3 7 6 1 8 4 Complete Network Analysis Network Connections: Time Constraint

125 Implied Contact Network of 8 people in a ring Serial Monogamy (3) 1 2 1 1 2 1 2 2 Complete Network Analysis Network Connections: Time Constraint

126 Identifying the Minimum Path Density of a Graph It turns out that the safest network is one where relations are ‘inter-woven’ in a “early-late-earlier” pattern. To identify the paths empirically, you must search all possible paths in the network. t1t1 t1t1 t2t2 t2t2 t1t1 t1t1 t2t2 t2t2 Complete Network Analysis Network Connections: Time Constraint

127 Any measure calculated on the adjacency structure that rests on reachability or flow may be misleading. There are highly non-linear effects to changing the timing of a relation on total reachability Within connected components, time order may partition the network into reachable sub-groups. Infection risk can be assessed on a continuum from complete concurrency to some minimum level of reachability in the network. Implications of Time-ordered Networks Complete Network Analysis Network Connections: Time Constraint

128 Complete Network Analysis Network Connections: Time Constraint

129 Complete Network Analysis Network Connections: Time Constraint

130 Complete Network Analysis Network Connections: Network Diffusion & Peer Influence Most of our interest in networks is in how things flow through the network, which brings us to questions about the diffusion of goods through networks. We have already seen the limits to diffusion through network connections and timing, but a number of studies focus on how network structure affects diffusion directly. These include questions about the diffusion of goods and ideas through a network as well as the outcomes of diffusion. Examples include: Spatial Diffusion Models Critical Mass Models Dyadic Contact models Peer Influence Models

131 Coleman, Katz and Menzel, “Diffusion of an innovation among physicians” Sociometry (1957) 0 0.2 0.4 0.6 0.8 1 24681012141618 > 3 nominations 0 nominations 1 - 2 noms Week since introduction Cumulative % using “Gammanym” Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

132 Attitudes are a function of two sources: a) Individual characteristics Gender, Age, Race, Education, Etc. Standard sociology b) Interpersonal influences Actors negotiate opinions with others Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

133 Freidkin claims in his Structural Theory of Social Influence that the theory has four benefits: relaxes the simplifying assumption of actors who must either conform or deviate from a fixed consensus of others (public choice model) Does not necessarily result in consensus, but can have a stable pattern of disagreement Is a multi-level theory: micro level: cognitive theory about how people weigh and combine other’s opinions macro level: concerned with how social structural arrangements enter into and constrain the opinion-formation process Allows an analysis of the systemic consequences of social structures Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

134 Formal Model (1) (2) Y (1) = an N x M matrix of initial opinions on M issues for N actors X = an N x K matrix of K exogenous variable that affect Y B = a K x M matrix of coefficients relating X to Y  = a weight of the strength of endogenous interpersonal influences W = an N x N matrix of interpersonal influences Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

135 Formal Model (1) This is the standard sociology model for explaining anything: the General Linear Model. It says that a dependent variable (Y) is some function (B) of a set of independent variables (X). At the individual level, the model says that: Usually, one of the X variables is , the model error term. Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

136 (2) This part of the model taps social influence. It says that each person’s final opinion is a weighted average of their own initial opinions And the opinions of those they communicate with (which can include their own current opinions) Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

137 The key to the peer influence part of the model is W, a matrix of interpersonal weights. W is a function of the communication structure of the network, and is usually a transformation of the adjacency matrix. In general: Various specifications of the model change the value of w ii, the extent to which one weighs their own current opinion and the relative weight of alters. Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

138 1 2 3 4 1 2 3 4 1 1 1 1 0 2 1 1 1 0 3 1 1 1 1 4 0 0 1 1 1 2 3 4 1.33.33.33 0 2.33.33.33 0 3.25.25.25.25 4 0 0.50.50 1 2 3 4 1.50.25.25 0 2.25.50.25 0 3.20.20.40.20 4 0 0.33.67 Even 2*self 1 2 3 4 1.50.25.25 0 2.25.50.25 0 3.17.17.50.17 4 0 0.50.50 degree Self weight: 1 2 3 4 1 2 1 1 0 2 1 2 1 0 3 1 1 2 1 4 0 0 1 2 1 2 3 4 1 2 1 1 0 2 1 2 1 0 3 1 1 3 1 4 0 0 1 1 Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

139 Formal Properties of the model When interpersonal influence is complete, model reduces to: When interpersonal influence is absent, model reduces to: (2) Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

140 Formal Properties of the model The model is directly related to spatial econometric models: If we allow the model to run over t, we can describe the model as: Where the two coefficients (  and  ) are estimated directly (See Doreian, 1982, SMR) Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

141 Simple example 1 2 3 4 1 2 3 4 1.33.33.33 0 2.33.33.33 0 3.25.25.25.25 4 0 0.50.50 Y1357Y1357  =.8 T: 0 1 2 3 4 5 6 7 1.00 2.60 2.81 2.93 2.98 3.00 3.01 3.01 3.00 3.00 3.21 3.33 3.38 3.40 3.41 3.41 5.00 4.20 4.20 4.16 4.14 4.14 4.13 4.13 7.00 6.20 5.56 5.30 5.18 5.13 5.11 5.10 Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

142 Simple example 1 2 3 4 1 2 3 4 1.33.33.33 0 2.33.33.33 0 3.25.25.25.25 4 0 0.50.50 Y1357Y1357  = 1.0 1.00 3.00 3.33 3.56 3.68 3.74 3.78 3.81 3.00 3.00 3.33 3.56 3.68 3.74 3.78 3.81 5.00 4.00 4.00 3.92 3.88 3.86 3.85 3.84 7.00 6.00 5.00 4.50 4.21 4.05 3.95 3.90 T: 0 1 2 3 4 5 6 7 Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

143 Extended example: building intuition Consider a network with three cohesive groups, and an initially random distribution of opinions: (to run this model, use peerinfl1.sas) Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

144 Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha =.8, 7 iterations Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

145 Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha =.8, 7 iterations Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

146 Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha =.8, 7 iterations Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

147 Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha =.8, 7 iterations Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

148 Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha =.8, 7 iterations Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

149 Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha =.8, 7 iterations Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

150 Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha =.8, 7 iterations Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

151 Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha =.8, 7 iterations Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

152 Extended example: building intuition Consider a network with three cohesive groups, and an initially random distribution of opinions: Now weight in-group ties higher than between group ties Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

153 Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha =.8, 7 iterations, in-group tie: 2 Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

154 Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha =.8, 7 iterations, in-group tie: 2 Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

155 Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha =.8, 7 iterations, in-group tie: 2 Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

156 Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha =.8, 7 iterations, in-group tie: 2 Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

157 Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha =.8, 7 iterations, in-group tie: 2 Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

158 Simulated Peer Influence: 75 actors, 2 initially random opinions, Alpha =.8, 7 iterations, in-group tie: 2 Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

159 Consider the implications for populations of different structures. For example, we might have two groups, a large orthodox population and a small heterodox population. We can imagine the groups mixing in various levels: Little Mixing Moderate Mixing Heavy Mixing.95.05.05.02.95.008.008.02.95.001.001.02 Heterodox: 10 people Orthodox: 100 People Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

160 Light Heavy Moderate Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

161 Light mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

162 Light mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

163 Light mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

164 Light mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

165 Light mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

166 Light mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

167 Moderate mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

168 Moderate mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

169 Moderate mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

170 Moderate mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

171 Moderate mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

172 Moderate mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

173 High mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

174 High mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

175 High mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

176 High mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

177 High mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

178 High mixing Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

179 In an unbalanced situation (small group vs large group) the extent of contact can easily overwhelm the small group. Applications of this idea are evident in: Missionary work (Must be certain to send missionaries out into the world with strong in-group contacts) Overcoming deviant culture (I.e. youth gangs vs. adults) Work by Hyojung Kim (U Washington) focuses on the first of these two processes in social movement models Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

180 In extensions (Friedkin, 1998), Friedkin generalizes the model so that alpha varies across people. We can extend the basic model by (1) simply changing  to a vector (A), which then changes each person’s opinion directly, and (2) by linking the self weight (w ii ) to alpha. Were A is a diagonal matrix of endogenous weights, with 0 < a ii < 1. A further restriction on the model sets w ii = 1-a ii This leads to a great deal more flexibility in the theory, and some interesting insights. Consider the case of group opinion leaders with unchanging opinions (I.e. many people have high a ii, while a few have low): Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

181 Group 1 Leaders Group 2 Leaders Group 3 Leaders Peer Opinion Leaders Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

182 Peer Opinion Leaders Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

183 Peer Opinion Leaders Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

184 Peer Opinion Leaders Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

185 Peer Opinion Leaders Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

186 Peer Opinion Leaders Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

187 Further extensions of the model might: Time dependent  : people likely value other’s opinions more early than later in a decision context Interact  with XB: people’s self weights are a function of their behaviors & attributes Make W dependent on structure of the network (weight transitive ties greater than intransitive ties, for example) Time dependent W: The network of contacts does not remain constant, but is dynamic, meaning that influence likely moves unevenly through the network And others likely abound…. Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

188 Testing the fit of the general model Identifying peer influence in real data There are two general ways to test for peer influence in an observed network. The first estimates the parameters (  and  ) of the peer influence model directly, the second transforms the network into a dyadic model, predicting similarity among actors. Peer influence model: For details, see Doriean, 1982, sociological methods and research. Also Roger Gould (AJS, Paris Commune paper for example) Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

189 For details, see Doriean, 1982, sociological methods and research. Also Roger Gould (AJS, Paris Commune paper for example) The basic model says that people’s opinions are a function of the opinions of others and their characteristics. WY  = A simple vector which can be added to your model. That is, multiple Y by a W matrix, and run the regression with WY as a new variable, and the regression coefficient is an estimate of . This is what Doriean calls the QAD estimate of peer influence. Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

190 The problem with the above regression is that cases are, by definition, not independent. In fact, WY is also known as the ‘network autocorrelation’ coefficient, since a ‘peer influence’ effect is an autocorrelation effect -- your value is a function of the people you are connected to. In general, OLS is not the best way to estimate this equation. That is, QAD = Quick and Dirty, and your results will not be exact. In practice, the QAD approach (perhaps combined with a GLS estimator) results in empirical estimates that are “virtually indistinguishable” from MLE (Doreian et al, 1984) The proper way to estimate the peer equation is to use maximum likelihood estimates, and Doreian gives the formulas for this in his paper. The other way is to use non-parametric approaches, such as the Quadratic Assignment Procedure, to estimate the effects. Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

191 An empirical Example: Peer influence in the OSU Graduate Student Network. (to run the model, see osupeerpi1.sas) Each person was asked to rank their satisfaction with the program, which is the dependent variable in this analysis. I constructed two W matrices, one from HELP the other from Best Friend. I treat relations as symmetric and valued, such that: I also include Race (white/Non-white, Gender and Cohort Year as exogenous variables in the model. Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

192 Distribution of Satisfaction with the department. Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

193 Parameter Estimates Parameter Standardized Variable Estimate Pr > |t| Estimate Intercept 2.60252 0.0931 0 FEMALE -1.07540 0.0142 -0.25455 NONWHITE -0.22087 0.5975 -0.05491 y00 0.93176 0.0798 0.21627 y99 -0.19375 0.7052 -0.04586 y98 -0.45912 0.4637 -0.08289 y97 0.60670 0.3060 0.11919 PEER_BF 0.23936 0.0002 0.42084 PEER_H 0.50668 0.0277 0.23321 Model R 2 =.41, compared to.15 without the peer effects Complete Network Analysis Network Connections: Network Diffusion & Peer Influence Alternative is to use a QAP Model (see below)

194 Peer influence through Dyad Models Another way to get at peer influence is not through the level of Y, but through the extent to which actors are similar with respect to Y. Recall the simulated example: peer influence is reflected in how close points are to each other. Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

195 Peer influence through Dyad Models The model is now expressed at the dyad level as: Where Y is a matrix of similarities, A is an adjacency matrix, and X k is a matrix of similarities on attributes Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

196 If we break the original peer influence model into it’s components, the attribute part of the model suggests that any two people with the same attribute should have the same value for Y. The Peer influence model says that (a) if you and I are tied to each other, then we should have similar opinions and (b) that if we are tied to many of the same people, then we should have similar opinions. We can test both sides of these (and many other dyadic properties) directly at the dyad level. Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

197 NODE ADJMAT SAMERCE SAMESEX 1 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 2 1 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 1 3 1 1 0 0 1 0 1 0 0 0 0 0 1 0 1 1 1 0 1 0 0 1 0 0 1 1 0 4 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 0 0 0 1 1 0 5 0 0 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 1 6 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 7 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 1 0 8 0 0 0 0 1 1 0 0 1 0 0 1 1 0 1 1 0 0 1 0 1 1 0 0 1 0 0 9 0 0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

198 Distance (D ij =abs(Y i -Y j ).000.277.228.181.278.298.095.307.481.277.000.049.096.555.575.182.584.758.228.049.000.047.506.526.134.535.710.181.096.047.000.459.479.087.488.663.278.555.506.459.000.020.372.029.204.298.575.526.479.020.000.392.009.184.095.182.134.087.372.392.000.401.576.307.584.535.488.029.009.401.000.175.481.758.710.663.204.184.576.175.000 Y 0.32 0.59 0.54 0.50 0.04 0.02 0.41 0.01 -0.17 Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

199 Obs SENDER RCVER SIM NOM SAMERCE SAMESEX 1 1 2 0.27694 1 1 0 2 1 3 0.22828 1 0 1 3 1 4 0.18136 1 0 1 4 1 5 0.27766 0 1 0 5 1 6 0.29763 0 0 0 6 1 7 0.09473 0 0 1 7 1 8 0.30671 0 0 1 8 1 9 0.48148 0 1 0 9 2 1 0.27694 1 1 0 10 2 3 0.04866 1 0 0 11 2 4 0.09559 0 0 0 12 2 5 0.55460 0 1 1 13 2 6 0.57457 0 0 1 14 2 7 0.18221 1 0 0 15 2 8 0.58365 0 0 0 Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

200 The REG Procedure Model: MODEL1 Dependent Variable: SIM Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 4 0.90657 0.22664 9.29 <.0001 Error 31 0.75591 0.02438 Corrected Total 35 1.66248 Root MSE 0.15615 R-Square 0.5453 Dependent Mean 0.33161 Adj R-Sq 0.4866 Coeff Var 47.08929 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.51931 0.05116 10.15 <.0001 NOM 1 -0.17054 0.05963 -2.86 0.0075 SAMERCE 1 0.05387 0.05916 0.91 0.3696 SAMESEX 1 -0.06535 0.05365 -1.22 0.2324 NCOMFND 1 -0.16134 0.03862 -4.18 0.0002 Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

201 Like the basic Peer influence model, cases in a dyad model are not independent. However, the non-independence now comes from two sources: the fact that the same person is represented in (n-1) dyads and that i and j are linked through relations. One of the best solutions to this problem is QAP: Quadratic Assignment Procedure. A non-parametric procedure for significance testing. QAP runs the model of interest on the real data, then randomly permutes the rows/cols of the data matrix and estimates the model again. In so doing, it generates an empirical distribution of the coefficients. Complete Network Analysis Network Connections: Network Diffusion & Peer Influence

202 Comparing multiple networks: QAP The substantive question is how one set of relations (or dyadic attributes) relates to another. For example: Do marriage ties correlate with business ties in the Medici family network? Are friendship relations correlated with joint membership in a club? Complete Network Analysis Network Connections: QAP

203 Assessing the correlation is straight forward, as we simply correlate each corresponding cell of the two matrices: Marriage 1 ACCIAIUOL 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 ALBIZZI 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 3 BARBADORI 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 4 BISCHERI 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 5 CASTELLAN 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 6 GINORI 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 GUADAGNI 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 8 LAMBERTES 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 9 MEDICI 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 10 PAZZI 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 11 PERUZZI 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 12 PUCCI 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 RIDOLFI 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 14 SALVIATI 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 15 STROZZI 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 16 TORNABUON 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 Business 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 4 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 5 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 6 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 7 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 8 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 0 9 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 1 10 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 11 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 Dyads: 1 2 0 0 1 3 0 0 1 4 0 0 1 5 0 0 1 6 0 0 1 7 0 0 1 8 0 0 1 9 1 0 1 10 0 0 1 11 0 0 1 12 0 0 1 13 0 0 1 14 0 0 1 15 0 0 1 16 0 0 2 1 0 0 2 3 0 0 2 4 0 0 2 5 0 0 2 6 1 0 2 7 1 0 2 8 0 0 2 9 1 0 2 10 0 0 2 11 0 0 2 12 0 0 2 13 0 0 2 14 0 0 2 15 0 0 2 16 0 0 Correlation: 1 0.3718679 0.3718679 1 Complete Network Analysis Network Connections: QAP

204 But is the observed value statistically significant? Can’t use standard inference, since the assumptions are violated. Instead, we use a permutation approach. Essentially, we are asking whether the observed correlation is large (small) compared to that which we would get if the assignment of variables to nodes were random, but the interdependencies within variables were maintained. Do this by randomly sorting the rows and columns of the matrix, then re- estimating the correlation. Complete Network Analysis Network Connections: QAP

205 Comparing multiple networks: QAP When you permute, you have to permute both the rows and the columns simultaneously to maintain the interdependencies in the data: ID ORIG A 0 1 2 3 4 B 0 0 1 2 3 C 0 0 0 1 2 D 0 0 0 0 1 E 0 0 0 0 0 Sorted A 0 3 1 2 4 D 0 0 0 0 1 B 0 2 0 1 3 C 0 1 0 0 2 E 0 0 0 0 0 Complete Network Analysis Network Connections: QAP

206 Procedure: 1.Calculate the observed correlation 2.for K iterations do: a) randomly sort one of the matrices b) recalculate the correlation c) store the outcome 3.compare the observed correlation to the distribution of correlations created by the random permutations. Complete Network Analysis Network Connections: QAP

207 Complete Network Analysis Network Connections: QAP

208 QAP MATRIX CORRELATION -------------------------------------------------------------------------------- Observed matrix: PadgBUS Structure matrix: PadgMAR # of Permutations: 2500 Random seed: 356 Univariate statistics 1 2 PadgBUS PadgMAR ------- ------- 1 Mean 0.125 0.167 2 Std Dev 0.331 0.373 3 Sum 30.000 40.000 4 Variance 0.109 0.139 5 SSQ 30.000 40.000 6 MCSSQ 26.250 33.333 7 Euc Norm 5.477 6.325 8 Minimum 0.000 0.000 9 Maximum 1.000 1.000 10 N of Obs 240.000 240.000 Hubert's gamma: 16.000 Bivariate Statistics 1 2 3 4 5 6 7 Value Signif Avg SD P(Large) P(Small) NPerm --------- --------- --------- --------- --------- --------- --------- 1 Pearson Correlation: 0.372 0.000 0.001 0.092 0.000 1.000 2500.000 2 Simple Matching: 0.842 0.000 0.750 0.027 0.000 1.000 2500.000 3 Jaccard Coefficient: 0.296 0.000 0.079 0.046 0.000 1.000 2500.000 4 Goodman-Kruskal Gamma: 0.797 0.000 -0.064 0.382 0.000 1.000 2500.000 5 Hamming Distance: 38.000 0.000 59.908 5.581 1.000 0.000 2500.000 This can be done simply in UCINET

209 Using the same logic,we can estimate alternative models, such as regression, logits, probits, etc. Only complication is that you need to permute all of the independent matrices in the same way each iteration. Complete Network Analysis Network Connections: QAP

210 # of permutations: 2000 Diagonal valid? NO Random seed: 995 Dependent variable: EX_SIM Expected values: C:\moody\Classes\soc884\examples\UCINET\mrqap-predicted Independent variables: EX_SSEX EX_SRCE EX_ADJ Number of valid observations among the X variables = 72 N = 72 Number of permutations performed: 1999 MODEL FIT R-square Adj R-Sqr Probability # of Obs -------- --------- ----------- ----------- 0.289 0.269 0.059 72 REGRESSION COEFFICIENTS Un-stdized Stdized Proportion Proportion Independent Coefficient Coefficient Significance As Large As Small ----------- ----------- ----------- ------------ ----------- ----------- Intercept 0.460139 0.000000 0.034 0.034 0.966 EX_SSEX -0.073787 -0.170620 0.140 0.860 0.140 EX_SRCE -0.020472 -0.047338 0.272 0.728 0.272 EX_ADJ -0.239896 -0.536211 0.012 0.988 0.012 Complete Network Analysis Network Connections: QAP Peer-influence results on similarity dyad model, using QAP


Download ppt "Complete Network Analysis Exploratory Analysis Social Networks capture the relations between people. These relations form a system that can be thought."

Similar presentations


Ads by Google