Social Network Analysis: What it Is, How it Works, and How You Can Do It Prof. Paul Beckman San Francisco State University.

Slides:



Advertisements
Similar presentations
Network Overview Discovery and Exploration for Excel (NodeXl) Hands On Exercise Presented by: Samer Al-khateeb Class: Social Media Mining and Analytics.
Advertisements

Social network partition Presenter: Xiaofei Cao Partick Berg.
Network biology Wang Jie Shanghai Institutes of Biological Sciences.
GameRank: Ranking and Analyzing Baseball Network Zifei Shan, Shiyingxue Li, Yafei Dai
PREDICTING MLB CAREER SALARIES Stephanie Aube Mike Tarpey Justin Teal.
Introduction to Social Network Analysis Lluís Coromina Departament d’Economia. Universitat de Girona Girona, 18/01/2005.
Social Network Analysis and Its Applications By Paul Rossman Indiana University of Pennsylvania.
13.4 Map Coloring and the Four Color Theorem. We started this chapter by coloring the regions formed by a set of circles in the plane. But when do we.
Relationship Mining Network Analysis Week 5 Video 5.
Bunting and Sacrificing in MLB Lauren McNulty and Drew Jack.
CSE 421 Algorithms Richard Anderson Lecture 23 Network Flow Applications.
 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge.
Social Network Analysis Social Computing Foothill College.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Centrality and Prestige HCC Spring 2005 Wednesday, April 13, 2005 Aliseya Wright.
MLB STATS Group SIX Astrid AmsallemJoel De Martini Naiwen ChangQi He Wenjie HuangWesley Thibault.
Calculating Baseball Statistics Using Algebraic Formulas By E. W. Click the Baseball Bat to Begin.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Minimum Spanning Tree in Graph - Week Problem: Laying Telephone Wire Central office.
Two Sample Project Example 5/6/2013 Ms. Browne made this up Saber metrics: TX Rangers vs. SF Giants.
Microsoft Excel How to make a SPREADSHEET. Microsoft Excel IT is recommended that you have EXCEL running at the same time. You can try what you are reading.
Graph Theory in 50 minutes. This Graph has 6 nodes (also called vertices) and 7 edges (also called links)
Overview Granovetter: Strength of Weak Ties What are ‘weak ties’? why are they ‘strong’? Burt: Structural Holes What are they? What do they do? How do.
Game City In this project you will learn the basics of visual programming to start creating your own games. The tool you will be using to do this is Microsoft.
Example 5.8 Non-logistics Network Models | 5.2 | 5.3 | 5.4 | 5.5 | 5.6 | 5.7 | 5.9 | 5.10 | 5.10a a Background Information.
Case 2: Assessing the Value of Alex Rodriguez Teresa Sonka Gail Bernstein.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 26.
Introduction Of Tree. Introduction A tree is a non-linear data structure in which items are arranged in sequence. It is used to represent hierarchical.
Chinese postman problem
Science: Graph theory and networks Dr Andy Evans.
An Introduction to Social Network Analysis Yi Li
IB Business and Management 5.8 Project Management (Critical Path Analysis)
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 25, 2012.
Special Topics in Educational Data Mining HUDK5199 Spring 2013 March 25, 2012.
1.3 Simulations and Experimental Probability (Textbook Section 4.1)
Dependencies Complex Data in Meta-analysis. Common Dependencies Independent subgroups within a study (nested in lab?) Multiple outcomes on the same people.
By Alexander Reichert, Robert Miller, and Sean Wasserman For the Love of the Game?
Family Game Night! High School (11 th /12 th grade) Statistics Task: Create a new board game which incorporates a respectable amount of measurable probability.
Computing the chromatic number for block intersection graphs of Latin squares Ed Sykes CS 721 project McMaster University, December 2004 Slide 1.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
ITGS Databases.
1 Algorithms and Networks Algorithms and Networks 2015/2016 Hans L. Bodlaender Johan M. M. van Rooij.
Management Information Systems
By:Davis Sheppard.  Chapter 1:Why You Should Play Basketball  Chapter 2:Rules  Chapter 3:Shooting  Chapter 4:Defense  Chapter 5:Fouls  Chapter 6:Points.
EXCURSIONS IN MODERN MATHEMATICS SIXTH EDITION Peter Tannenbaum 1.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
NCAMS Winter Luncheon with Kirk Weiler Tools for Teaching Math in the 21 st Century.
Artificial Intelligence in Game Design Influence Maps and Decision Making.
MIS 480/580 Final Project Presentation Knowledge Management in Cricket – A Research Project By: Luis Barreda Deepika Nim Jagadish Ramamurthy James Sanford.
How to Analyse Social Network? Social networks can be represented by complex networks.
Informatics tools in network science
BASEBALL AVERAGES BY: CRAIG KNIGHT. WHAT IS BASEBALL? According to Websters Dictionary Baseball is defined as: a game played with a bat and ball between.
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
S OCIAL N ETWORK A NALYSIS F OR D UMMIES Y ANNE B ROUX DH S UMMER S CHOOL L EUVEN, S EPTEMBER
Kevin Kaerwer, Dr. Adam Parker Angelo State University Professional athletes get paid to perform well in the sport that they are participating in. Some.
If you have a transaction processing system, John Meisenbacher
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
Mapping Your Digital Audiences Nicole Fernandez, Georgetown Erin Gamble, Charrosé King,
QRB 501 Entire Course For more course tutorials visit  QRB 501 Week 1 Quiz (New)  QRB 501 Week 2 Learning Team Case Studies (5-2.
Excursions in Modern Mathematics Sixth Edition
Classroom network analysis
Slugging Average SAS#7 Notes
KEEPER 5: Final Grade Average Slugging Average SAS 6 (1 – 5)
Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.
Core Methods in Educational Data Mining
Methodology & Current Results
Predict Failures with Developer Networks and Social Network Analysis
Probability in Baseball
KEEPER 5: Final Grade Average Slugging Average SAS 6 (1 – 5)
Analyzing Massive Graphs - ParT I
Presentation transcript:

Social Network Analysis: What it Is, How it Works, and How You Can Do It Prof. Paul Beckman San Francisco State University

Agenda “Social networks” SNA: Social network analysis The math behind SNA My contribution to SNA research My SNA research SNA software Large dataset research example

Commercially, What Are Social Networks?

Theoretically, What Are Social Networks? Groups of individuals who are often humans but not necessarily: gorillas, dolphins, birds, etc. who “interact” in some setting creating “links” between the individuals With humans, the setting is often professional for example, in the workplace and NOT necessarily social because social interactions are often hard to record without some computerized programmatic process such as used by the firms on the previous slide

Social Network Analysis SNA: research about/using social networks Other criteria are often needed Frequently the “interaction” relates to a task For my research purposes: group of individuals must be precisely defined task must be standardized task must have quantifiable performance measurements task must be completed many times sub-groups must form, break up, and re-form in different configurations to complete the task

The Math Behind SNA In math, SNA is call “graph theory” one branch of mathematics Graph theory is the study of groups of “nodes” points in a network “edges” links between points in a network It is also the study of: measures of node interaction things you can say about an individual node measures of network structure things you can say about the network as a whole

Other Graph Theory Terms Edge weight Did two nodes get linked just once or were they linked more than once? Did you just meet me or are you currently in ISYS 464? Directed vs. non-directed graphs Is an edge from one node to the other or is the edge non-directional? Example: at a party, you may: know about someone: directional shake hands: non-directional Example: “recommendation networks”: each node recommends other nodes and can be recommended

Example: Measuring “Degrees of Separation” Firm #Board Members 1A, D 2B, C 3A, C 4C, D, E 5F,G B A C D E ABCDE A2112 B2122 C1111 D1211 E2211 x F = board member G 1 2 Graph with 2 islands Calculating connectivity 1. for each node: calculate the shortest path to each other node 2. for each node: calculate mean of all shortest paths for that node

Other Connectivity Measures In graph theory, we say “centrality” instead of “connectivity” There are four common measures of centrality Degree centrality simply: sum of other directly connected nodes Betweenness centrality a more complex measure of “degrees of separation” Closeness centrality average of all “degrees of separation” for a node Eigenvector centrality measures “importance” of a node (Google’s PageRank)

My Contribution to SNA Research Move beyond simply measuring network centrality or other graph-theoretic constructs and measures Because: which is important in the real world? Who is most connected? or How does connectivity relate to real-world task performance?

My Contribution Requires... Addition of a standard task because the real world cares about task performance, NOT connectivity values but most SNA research focuses strictly on information flow through the network and who is “important” in network information flow Task has quantifiable performance measures so we can relate (in a mathematical way) network measures to performance measures

My SNA Research MIS researchers Kevin Bacon, Degrees-of-Separation, and MIS Research Kevin Bacon, Degrees-of-Separation, and MIS Research VC board members Do a Firm’s Board Member Linkages Relate to Perceived or Actual Firm Financial Performance? Do a Firm’s Board Member Linkages Relate to Perceived or Actual Firm Financial Performance? Baseball players More Highly-Connected Baseball Players Have Better Offensive Performance

SNA Software Powerful (and complex) tools: UCINET For very complex network calculations Pajek For very large datasets Netdraw For visualizing networks Weaker (and more simple) tools: NodeXL

SNA Software: NodeXL Easy-to-use tool Runs inside Microsoft Excel as a template

NodeXL Example: Using Our Previous Dataset Firm #Board Members 1A, D 2B, C 3A, C 4C, D, E 5F,G B A C D E ABCDE A2112 B2122 C1111 D1211 E2211 x F = board member G 1 2 Graph with 2 islands Calculating connectivity 1. for each node: calculate the shortest path to each other node 2. for each node: calculate mean of all shortest paths for that node

NodeXL Centrality Calculations We need data in “edgelist” format a standard format for entering data into SNA tools this is sometimes a tricky transformation Node1Node2 AC AD BC CD CE DE

Large Dataset Example U.S. professional baseball Players are nodes Links occur when players play together as measured from team rosters

My Five Research Criteria 1. Precisely defined group of nodes? Yes: you are either a MLB player or not 2. Nodes interact in precisely quantifiable sub-groups? Yes: team rosters are defined by specific rules 3. Standard task that sub-groups perform? Yes: a baseball game has a specific set of (exact) rules 4. Task has quantifiable performance measures? Yes: both for players (BA, RBIs, etc.) and teams (wins, runs, etc.) 5. Sub-groups break up, re-form, and re-do the task again? Yes: rosters change from day to day and year to year

Research Methodology 1. Get the dataset available online 2. Calculate centrality for each NON-pitcher over some time period Why non-pitchers? 3. Calculate task performance batting average, home runs, RBIs, slugging pct. 4. Calculate correlation between centrality and individual performance

Correlation Calculation Correlations between centrality measures and individual performance measures “Correlation” varies from to example: list 1000 players by height vs. home runs exactly the same order? correlation = exact inverse order? correlation = -1.00

Results BA = batting average HR = home runs hit RBI = runs batted in SLG = slugging percentage FPCT = fielding percentage

Conclusions Players with higher centrality have higher individual OFFENSIVE performance measures but not defensive performance measures This does NOT mean higher centrality leads to higher performance only that they are correlated

Limitations Simplistic measure of a “link” opening day roster misses subsequent changes in player connection Further simplistic measure of a “link” binary, not weighted, links misses players who play together over a long time Only measured correlation not causality don’t know if one causes the other or perhaps are both caused by some other factor (age, experience, etc.)

So, Today We’ve Talked About: What social networks are The math behind social networks graph theory A free social network analysis tool you can use NodeXL One particular large SNA research project connectivity vs. performance of MLB players

Questions?