Social Network Analysis: What it Is, How it Works, and How You Can Do It Prof. Paul Beckman San Francisco State University
Agenda “Social networks” SNA: Social network analysis The math behind SNA My contribution to SNA research My SNA research SNA software Large dataset research example
Commercially, What Are Social Networks?
Theoretically, What Are Social Networks? Groups of individuals who are often humans but not necessarily: gorillas, dolphins, birds, etc. who “interact” in some setting creating “links” between the individuals With humans, the setting is often professional for example, in the workplace and NOT necessarily social because social interactions are often hard to record without some computerized programmatic process such as used by the firms on the previous slide
Social Network Analysis SNA: research about/using social networks Other criteria are often needed Frequently the “interaction” relates to a task For my research purposes: group of individuals must be precisely defined task must be standardized task must have quantifiable performance measurements task must be completed many times sub-groups must form, break up, and re-form in different configurations to complete the task
The Math Behind SNA In math, SNA is call “graph theory” one branch of mathematics Graph theory is the study of groups of “nodes” points in a network “edges” links between points in a network It is also the study of: measures of node interaction things you can say about an individual node measures of network structure things you can say about the network as a whole
Other Graph Theory Terms Edge weight Did two nodes get linked just once or were they linked more than once? Did you just meet me or are you currently in ISYS 464? Directed vs. non-directed graphs Is an edge from one node to the other or is the edge non-directional? Example: at a party, you may: know about someone: directional shake hands: non-directional Example: “recommendation networks”: each node recommends other nodes and can be recommended
Example: Measuring “Degrees of Separation” Firm #Board Members 1A, D 2B, C 3A, C 4C, D, E 5F,G B A C D E ABCDE A2112 B2122 C1111 D1211 E2211 x F = board member G 1 2 Graph with 2 islands Calculating connectivity 1. for each node: calculate the shortest path to each other node 2. for each node: calculate mean of all shortest paths for that node
Other Connectivity Measures In graph theory, we say “centrality” instead of “connectivity” There are four common measures of centrality Degree centrality simply: sum of other directly connected nodes Betweenness centrality a more complex measure of “degrees of separation” Closeness centrality average of all “degrees of separation” for a node Eigenvector centrality measures “importance” of a node (Google’s PageRank)
My Contribution to SNA Research Move beyond simply measuring network centrality or other graph-theoretic constructs and measures Because: which is important in the real world? Who is most connected? or How does connectivity relate to real-world task performance?
My Contribution Requires... Addition of a standard task because the real world cares about task performance, NOT connectivity values but most SNA research focuses strictly on information flow through the network and who is “important” in network information flow Task has quantifiable performance measures so we can relate (in a mathematical way) network measures to performance measures
My SNA Research MIS researchers Kevin Bacon, Degrees-of-Separation, and MIS Research Kevin Bacon, Degrees-of-Separation, and MIS Research VC board members Do a Firm’s Board Member Linkages Relate to Perceived or Actual Firm Financial Performance? Do a Firm’s Board Member Linkages Relate to Perceived or Actual Firm Financial Performance? Baseball players More Highly-Connected Baseball Players Have Better Offensive Performance
SNA Software Powerful (and complex) tools: UCINET For very complex network calculations Pajek For very large datasets Netdraw For visualizing networks Weaker (and more simple) tools: NodeXL
SNA Software: NodeXL Easy-to-use tool Runs inside Microsoft Excel as a template
NodeXL Example: Using Our Previous Dataset Firm #Board Members 1A, D 2B, C 3A, C 4C, D, E 5F,G B A C D E ABCDE A2112 B2122 C1111 D1211 E2211 x F = board member G 1 2 Graph with 2 islands Calculating connectivity 1. for each node: calculate the shortest path to each other node 2. for each node: calculate mean of all shortest paths for that node
NodeXL Centrality Calculations We need data in “edgelist” format a standard format for entering data into SNA tools this is sometimes a tricky transformation Node1Node2 AC AD BC CD CE DE
Large Dataset Example U.S. professional baseball Players are nodes Links occur when players play together as measured from team rosters
My Five Research Criteria 1. Precisely defined group of nodes? Yes: you are either a MLB player or not 2. Nodes interact in precisely quantifiable sub-groups? Yes: team rosters are defined by specific rules 3. Standard task that sub-groups perform? Yes: a baseball game has a specific set of (exact) rules 4. Task has quantifiable performance measures? Yes: both for players (BA, RBIs, etc.) and teams (wins, runs, etc.) 5. Sub-groups break up, re-form, and re-do the task again? Yes: rosters change from day to day and year to year
Research Methodology 1. Get the dataset available online 2. Calculate centrality for each NON-pitcher over some time period Why non-pitchers? 3. Calculate task performance batting average, home runs, RBIs, slugging pct. 4. Calculate correlation between centrality and individual performance
Correlation Calculation Correlations between centrality measures and individual performance measures “Correlation” varies from to example: list 1000 players by height vs. home runs exactly the same order? correlation = exact inverse order? correlation = -1.00
Results BA = batting average HR = home runs hit RBI = runs batted in SLG = slugging percentage FPCT = fielding percentage
Conclusions Players with higher centrality have higher individual OFFENSIVE performance measures but not defensive performance measures This does NOT mean higher centrality leads to higher performance only that they are correlated
Limitations Simplistic measure of a “link” opening day roster misses subsequent changes in player connection Further simplistic measure of a “link” binary, not weighted, links misses players who play together over a long time Only measured correlation not causality don’t know if one causes the other or perhaps are both caused by some other factor (age, experience, etc.)
So, Today We’ve Talked About: What social networks are The math behind social networks graph theory A free social network analysis tool you can use NodeXL One particular large SNA research project connectivity vs. performance of MLB players
Questions?