Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS224w: Social and Information Network Analysis

Similar presentations


Presentation on theme: "CS224w: Social and Information Network Analysis"— Presentation transcript:

1 CS224w: Social and Information Network Analysis
Jure Leskovec, Stanford University

2 About the Class Introduce properties, models and tools for
Large real-world networks Processes taking place in networks through real applications and case studies Goal: find patterns, rules, clusters, outliers, … … in large static and evolving graphs … in processes spreading over the networks 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

3 ****************** Faster and (less complex systems) more Web and Social Networks based motivation Example pictures from NetInf 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

4 Networks & Complex Systems
We are surrounded by hopelessly complex systems: Society is a collection of six billion individuals Communication systems link electronic devices Information and knowledge is organized and linked Thousands of genes in our cells work together in a seamless fashion Our thoughts are hidden in the connections between billions of neurons in our brain These systems, random looking at first, display signatures of order and self-organization 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

5 Networks Each such system can be represented as a network, that defines the interactions between the components 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

6 Networks: Communication
Graph of the Internet (Autonomous Systems) 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

7 Networks: Information
Connections between political blogs 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

8 Seven Bridges of Königsberg (Euler 1735)
Networks: Technology Seven Bridges of Königsberg (Euler 1735) Return to the starting point by traveling each link of the graph once and only once. London Underground 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

9 Networks: Organizations
: departments : consultants : external experts 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

10 Networks: Economy Nodes: Links: Bio-tech companies, 1991 Companies
Investment Pharma Research Labs Public Biotechnology Links: Collaborations Financial R&D Bio-tech companies, 1991 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

11 Human brain has between 10-100 billion neurons
Networks: Brain Human brain has between billion neurons 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

12 Protein-Protein Interaction Networks:
Networks: Cells Protein-Protein Interaction Networks: Nodes: Proteins Edges: ‘physical’ interactoins Metabolic networks: Nodes: Metabolites and enzymes Edges: Chemical reactions 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

13 Why Networks? Behind each such system there is an intricate wiring diagram, a network, that defines the interactions between the components We will never understand a complex system unless we understand the networks behind it 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

14 Reasoning about Networks
How do we reason about networks Empirical: Study network data to find organizational principles Mathematical models: Probabilistic, graph theory Algorithms for analyzing graphs What do we hope to achieve from models of networks? Patterns and statistical properties of network data Design principles and models Understand why networks are organized the way they are (Predict behavior of networked systems) 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

15 Networks: Structure & Process
What do we study in networks? Structure and evolution: What is the structure of a network? Why and how did it became to have such structure? Processes and dynamics: Networks provide “skeleton” for spreading of information, behavior, diseases 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

16 Age and size of networks
Networks: Why Now? Age and size of networks 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

17 Networks: Why Now? 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

18 Networks: Why Now? 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

19 Why Networks? Why Now? Why is the role of networks expanding?
Data availability Rise of the Web 2.0 and Social media Universality Networks from various domains of science, nature, and technology are more similar than one would expect Shared vocabulary between fields Computer Science, Social science, Physics, Economics, Statistics, Biology 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

20 Networks: Impact 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

21 Networks: Impact Intelligence and fighting (cyber) terrorism 9/19/2018
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

22 Networks: Impact Predicting epidemics Real Predicted 9/19/2018
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

23 Networks: Impact Interactions of human disease Drug design 9/19/2018
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

24 Networks Really Matter
Use more WEB/SOCIAL examples, less complex systems If you were to understand the spread of diseases, can you do it without networks? If you were to understand the WWW structure and information, hopeless without invoking the Web’s topology. If you want to understand human diseases, it is hopeless without considering the wiring diagram of the cell. 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

25 Course Logistics

26 Course Syllabus Covers a wide range of network analysis techniques – from basic to state-of-the-art You will learn about things you heard about: Six degrees of separation, small-world, page rank, network effects, P2P networks, network evolution, spectral graph theory, virus propagation, link prediction, power-laws, scale free networks, core-periphery, network communities, hubs and authorities, bipartite cores, information cascades, influence maximization, … Covers algorithms, theory and applications It’s going to be fun  9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

27 Prerequisites Good background in: Programming: 4 recitation sessions:
Algorithms Graph theory Probability and Statistics Linear algebra Programming: You should be able to write non-trivial programs 4 recitation sessions: 2 to review basic mathematical concepts 2 to review programming tools (SNAP, NetworkX) 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

28 Logistics Course website: http://cs224w.stanford.edu
Slides posted at least 30 min before the class Required readings: Mostly chapters from Easley&Kleinberg book Papers Optional readings: Papers and pointers to additional literature This will be very useful for reaction paper and project proposal 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

29 Text Books Recommended textbook: Optional books:
D. Easley, J. Kleinberg: Networks, Crowds, and Markets: Reasoning About a Highly Connected World Freely available at: Optional books: Matthew Jackson: Social and Economic Networks Mark Newman: Networks: An introduction 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

30 Work for the Course Assignment Due on Homework 1 October 13
Reaction paper October 20 Project proposal October 27 Homework 2 November 3 Competition November 10 Project milestone November 17 Project write-up December 11 (no late days!) Project poster presentation December 16 12:15-3;15pm 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

31 Grading Final grade will be composed of: 2 homeworks: 15% each
Reaction paper: 10% Substantial class project: 60% Proposal: 15% Project milestone: 15% Final report: 60% Poster session: 10% 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

32 Homeworks, Write-ups, Reports
Assignments (homeworks, write-ups, reports) take time. Start early! How to submit? Paper: Box outside the class and in Gates basement We will grade on paper! You should also submit electronic copy: 1 PDF/ZIP file (writeups, experimental results, code) Submission website: SCPD: Only submit electronic copy & send us 7 late days for the quarter: Max 4 late days per assignment 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

33 Course Projects Substantial course project:
Where do they get the data? What code/machines they use? Substantial course project: Experimental evaluation of algorithms and models on an interesting network dataset A theoretical project that considers a model, an algorithm and derives a rigorous result about it An in-depth critical survey of one of the course topics and offering a novel perspective on the area Performed in groups of (exactly) 3 students Project is the main work for the class 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

34 Course Assistants Office hours schedule!!! Borja Peleato (head TA)
Chenguang Zhu Evan Rosen Dakan Wang 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

35 Communication Piazza Q&A website:
If you don’t address, send us and we will register you to Piazza For ing course staff, always use: For course announcements subscribe to: 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

36 Sitting-in & Auditing You are welcome to sit-in and audit the class
Please send us saying that you will be auditing the class To receive announcements, subscribe to the mailing list: 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

37 Starter Topic: Structure of Networks

38 Structure of Networks? Network is a collection of objects where some pairs of objects are connected by links What is the structure of the network? 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

39 Components of a Network
Objects: nodes, vertices N Interactions: links, edges E System: network, graph G(N,E) 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

40 Networks or Graphs? Network often refers to real systems
Web, Social network, Metabolic network Language: Network, node, link Graph: mathematical representation of a network Web graph, Social graph (a Facebook term) Language: Graph, vertex, edge We will try to make this distinction whenever it is appropriate, but in most cases we will use the two terms interchangeably 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

41 Networks: Common Language
Movie 1 Movie 3 Movie 2 Actor 3 Actor 1 Actor 2 Actor 4 Peter Mary Albert co-worker friend brothers Protein 1 Protein 2 Protein 5 Protein 9 |N|=4 |E|=4 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

42 Choosing Proper Representation
Choice of the proper network representation determines our ability to use networks successfully: In some cases there is a unique, unambiguous representation In other cases, the representation is by no means unique The way you assign links will determine the nature of the question you can study 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

43 Choosing Proper Representation
If you connect individuals that work with each other, you will explore a professional network If you connect those that have a sexual relationship, you will be exploring sexual networks If you connect scientific papers that cite each other, you will be studying the citation network If you connect all papers with the same word in the title, you will be exploring what? It is a network, nevertheless 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

44 Undirected vs. Directed Networks
Links: undirected (symmetrical) Undirected links: Collaborations Friendship on Facebook Directed Links: directed (arcs) Directed links: Phone calls Following on Twitter A B D C L M F G H I D B C E G A F 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

45 Connectivity of Graphs
Connected (undirected) graph: Any two vertices can be joined by a path. A disconnected graph is made up by two or more connected components D C A B F G D C A B F G Largest Component: Giant Component Isolated node (F) Bridge edge: If we erase it, the graph becomes disconnected. Articulation point: If we erase it, the graph becomes disconnected. 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

46 Connectivity of Directed Graphs
Strongly connected directed graph has a path from each node to every other node and vice versa (e.g., A-B path and B-A path) Weakly connected directed graph is connected if we disregard the edge directions E F B A Graph on the left is connected but not strongly connected. D C G 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

47 Web as a Graph Q: What does Web “look like” at a global level?
Explain what we will do: -- take a real system -- represent it as a graph -- use language of graph theory to reason about the shape of the web -- do a computational experiment -- learn something about the structure of the web Q: What does Web “look like” at a global level? Web as a graph: Nodes = pages Edges = hyperlinks What is a node? Problems: Dynamic pages created on the fly “dark matter” – inaccessible database generated pages 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

48 The Web as a Graph I teach a class on Networks.
CS224W: Classes are in the Computer Science building Computer Science Department at Stanford Stanford University 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

49 The Web as a Graph In early days of the Web links were navigational
I teach a class on Networks. CS224W: Classes are in the Computer Science building Computer Science Department at Stanford Stanford University In early days of the Web links were navigational Today many links are transactional 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

50 The Web as a Directed Graph
9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

51 Other Information Networks
Citations References in an Encyclopedia 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

52 How Does the Web Look Like?
How is the Web linked? What is the “map” of the Web? Web as a directed graph [Broder et al. 2000]: Given node v, what can v reach? What other nodes can reach v? E C A B G F D In(A) = {A,B,C,E,G} Out(A)={A,B,C,D,F} In(v) = {w | w can reach v} Out(v) = {w | v can reach w} 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

53 Directed Graphs Two types of directed graphs:
Strongly connected: Any node can reach any node via a directed path In(A)=Out(A)={A,B,C,D,E} DAG – Directed Acyclic Graph: Has no cycles: if u can reach v, then v can not reach u Any directed graph can be expressed in terms of these two types E C A B D E C A B D 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

54 Strongly Connected Component
Strongly connected component (SCC) is a set of nodes S so that: Every pair of nodes in S can reach each other There is no larger set containing S with this property E F B A Strongly connected components of the graph: {A,B,C,G}, {D}, {E}, {F} D C G 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

55 Strongly Connected Component
Fact: Every directed graph is a DAG on its SCCs (1) SCCs partitions the nodes of G Each node is in exactly one SCC (2) If we build a graph G’ whose nodes are SCCs, and with an edge between nodes of G’ if there is an edge between corresponding SCCs in G, then G’ is a DAG Strongly connected components of graph G: {A,B,C,G}, {D}, {E}, {F} G’ is a DAG: E F B G {E} A {F} D C G {A,B,C,G} G’ {D} 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

56 Proof Why is (2) true? G’ (graph of SCCs) is a DAG
Expand 2 slides: be more explicit about the proof by contradiction. Spell it out. Why is (1) true? SCCs partitions the nodes of G. Suppose node v is a member of 2 SCCs S and S’. Then SS’ is one large SCC: Why is (2) true? G’ (graph of SCCs) is a DAG If G’ is not a DAG, then we have a directed cycle. Now all nodes on the cycle are mutually reachable, and all are part of the same SCC. v S S’ G’ {E} {F} {A,B,C,G} {D} Now {A,B,C,G,E,F} is a SCC 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

57 Graph Structure of the Web
Picture, animation of the BFS Picture why the intersection of In and Out is SCC Goal: Take a large snapshot of the Web and try to understand how it’s SCCs “fit together” as a DAG Computational issue: Want to find a SCC containing node v? Observation: Out(v) … nodes that can be reached from v SCC containing v is: Out(v) ∩ In(v) = Out(v,G) ∩ Out(v,G), where G is G with all edge directions flipped v Out(v) 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

58 Graph Structure of the Web
There is a giant SCC There won’t be 2 giant SCCs: It just takes 1 page from one SCC to link to the other SCC If the components have millions of pages the likelihood of this not happening is very small Giant SCC1 Giant SCC2 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

59 Bow-tie Structure of the Web
Give better explanation of how can we conclude the bowtie structure 250 million pages, 1.5 billion links [Broder et al. 2000] 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

60 What did We Learn/Not Learn ?
Just managed to finish the lecture. Need to kill a few slides in the intro to save time Learn: Some conceptual organization of the Web (i.e., the bowtie) Not learn: Treats all pages as equal Google’s homepage == my homepage What are the most important pages How many pages have k in-links as a function of k? The degree distribution: ~ 1 / k2 Link analysis ranking -- as done by search engines (PageRank) Internal structure inside giant SCC Clusters, implicit communities? How far apart are nodes in the giant SCC: Distance = # of edges in shortest path Avg = 16 [Broder et al.] 9/19/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis


Download ppt "CS224w: Social and Information Network Analysis"

Similar presentations


Ads by Google