Presentation is loading. Please wait.

Presentation is loading. Please wait.

Social Network Analysis with Apache Spark and Neo4J

Similar presentations


Presentation on theme: "Social Network Analysis with Apache Spark and Neo4J"— Presentation transcript:

1 Social Network Analysis with Apache Spark and Neo4J
Charles Copley Nathan Begbie Eli Copley

2 Introduction to social network concepts Workshop data & data handling
OVERVIEW Introduction to social network concepts Workshop data & data handling Applied visualisation and network computations By the end of the workshop, participants will have the basic skills needed to learn to use Apache Spark with Neo4j for social network analysis.

3 01 Introduction to Social Networks
Introduction to Concepts & Terminology Used in Social Network Analysis

4

5 Levels of Analysis → → → → Individuals affect other individuals
SOCIAL NETWORK ANALYSIS Levels of Analysis Individuals affect other individuals Individual behaviours and decisions determine network structures and dynamics Network properties and an individual’s network location affect individual behaviour Network structures, dynamics, evolution mechanisms at time 1 affect network dynamics and structures at time 2

6 Isolates Component Edge Node (degree = 4)
SOCIAL NETWORK CONCEPTS & TERMINOLOGY Isolates Node (degree = 4) Component Edge

7 Homophily Birds of a feather flock together
SOCIAL NETWORK CONCEPTS & TERMINOLOGY Homophily Birds of a feather flock together Image from Moody, J. (2004)

8 Sourced by Ambika Samarthya-Howard, Praekelt.Org

9 Influence and Selection
SOCIAL NETWORK CONCEPTS & TERMINOLOGY Influence and Selection 2 1 2 4 3 5 3 1 We influence and are influenced by the people we are connected to; but we also select those who are similar to us. 4 5

10 SOCIAL NETWORK CONCEPTS & TERMINOLOGY
Triadic Closure Triad

11 How connected are your friends?
SOCIAL NETWORK CONCEPTS & TERMINOLOGY How connected are your friends? Clustering Coefficient 1/3 Clustering Coefficient 2/3 Clustering Coefficient 3/3

12 Page Rank Your influence is determined by the influence of people you are connected to. Your influence is passed on to people that you link to Then you iterate…. MANY TIMES PR=1.35 PR =1.35 PR=0.15

13 02 Workshop Data Why and how we use specific tools to handle large network datasets

14 US National Longitudinal Study of Student Health
DATASET US National Longitudinal Study of Student Health Longitudinal study of a nationally representative sample of adolescents in grades 7-12 in the United States during the school year Includes Race, Gender and Grade. See: Reference: A Statnet Tutorial (Goodreau, Handcock, Hunter, Butts and Morris ), Journal of Statistical Software, February 2008, Volume 24.

15 Distributed Computation Graph Database
DATA HANDLING Raw Data Distributed Computation Graph Database Holds your primary data (could also be in a database) First import data into Spark for data handling, formatting and calculation function Then move the data into Neo4j, which allows you to query relationship patterns and conduct SNA.

16 03 Data Practical Visualising network data and computing basic metrics

17 DATA PRACTICAL A recommender system could consist of searching for people connected to your friends, e.g. via LinkedIn Person 1 knows Person 2 → Person 2 knows Person 3 MATCH (p1)-[r1:knows]-(p2), (p1)-[r2:knows]-(p3), (p3)-[r3:knows]-(p2) return p1,p2,p3,r1,r2 limit 10

18 Thank you! Any questions? charles@praekelt.org nathan@praekelt.org

19 More Reading Social Network Analysis with Big Data
Charles Copley, Head of Data Science at Praekelt: Homophily and Influence Sinan Aral (2013) What would Ashton Do? Harvard Business Review On how homophily and social location impact our choices Weak Ties, Social Capital Granovetter, M. S. (1977) The Strength of Weak Ties. American Journal of Sociology, 78(6),


Download ppt "Social Network Analysis with Apache Spark and Neo4J"

Similar presentations


Ads by Google