Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Slides:



Advertisements
Similar presentations
Analyzing Content-Level Properties of the Web Adversphere Yong Wang* **, Daniel Burgener**, Aleksandar Kuzmanovic**, Gabriel Maciá-Fernández*** * University.
Advertisements

1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
Generative Models for the Web Graph José Rolim. Aim Reproduce emergent properties: –Distribution site size –Connectivity of the Web –Power law distriubutions.
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop L EHIGH U NIVERSITY.
Asking Questions on the Internet
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
CS 345A Data Mining Lecture 1
CS 345A Data Mining Lecture 1 Introduction to Web Mining.
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
Computing Trust in Social Networks
Link Analysis, PageRank and Search Engines on the Web
Understanding the Network and User-Targeting Properties of Web Advertising Networks Yong Wang 1,2 Daniel Burgener 1 Aleksandar Kuzmanovic 1 Gabriel Maciá-Fernández.
CS 345 Data Mining Lecture 1 Introduction to Web Mining.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.
Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics.
Computer Science 1 Web as a graph Anna Karpovsky.
1 Measurement and Analysis of Online Social Networks A. Mislove, M. Marcon, K Gummadi, P. Druschel, B. Bhattacharjee Presentation by Yong Wang (Defense.
COVERTNESS CENTRALITY IN NETWORKS Michael Ovelgönne UMIACS University of Maryland 1 Chanhyun Kang, Anshul Sawant Computer Science Dept.
A Measurement-driven Analysis of Information Propagation in the Flickr Social Network WWW09 报告人: 徐波.
The United States air transportation network analysis Dorothy Cheung.
Network Aware Resource Allocation in Distributed Clouds.
Finding dense components in weighted graphs Paul Horn
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Querying Structured Text in an XML Database By Xuemei Luo.
Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree R 陳建霖 R 宋彥朋 B 楊鈞羽 R 郭慶徵 R
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
An Introduction to Social Network Analysis Yi Li
A Graph-based Friend Recommendation System Using Genetic Algorithm
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Self-Similarity of Complex Networks Maksim Kitsak Advisor: H. Eugene Stanley Collaborators: Shlomo Havlin Gerald Paul Zhenhua Wu Yiping Chen Guanliang.
Gennaro Cordasco - How Much Independent Should Individual Contacts be to Form a Small-World? - 19/12/2006 How Much Independent Should Individual Contacts.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
QoS Supported Clustered Query Processing in Large Collaboration of Heterogeneous Sensor Networks Debraj De and Lifeng Sang Ohio State University Workshop.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
An Effective Method to Improve the Resistance to Frangibility in Scale-free Networks Kaihua Xu HuaZhong Normal University.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Informatics tools in network science
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Models of Web-Like Graphs: Integrated Approach
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Nanyang Technological University
Data Transformation: Normalization
A Viewpoint-based Approach for Interaction Graph Analysis
Social Networks Analysis
1.3 Modeling with exponentially many constr.
GANG: Detecting Fraudulent Users in OSNs
复杂网络可控性 研究进展 汪秉宏 2014 北京 网络科学论坛.
Presentation transcript:

Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web MSN CNN (5.8M) (6.1M) (14.3M) (4.3M) (19M) (2.3M) (1.3M) (4.7M) (2M) A User-Driven Web Network Node: #unique visitors to website. Edge: #Common visitors between endpoints. Fig: Target graph 2

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Motivation Study the Web from the point of view of its users –Evaluate properties of network Analyze user movement among websites Determine properties of the user-driven Web network Compare to Online Social Networks and “classical” Web networks –Mine data to serve – Online advertisers Search engines 3

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Our Contributions Generate the user-driven Web network Study the user-driven Web Apply the user-driven Web 4

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Outline Generate the user-driven Web network Study the user-driven Web Apply the user-driven Web 5

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Information Reconstruction Fact –Plethora of information made publicly available on a daily basis E.g., Google Trends, AdPlanner, Analytics, ALEXA, etc. Problem –The publicly available information snippets are not comprehensive Approach –Combine multiple data sources and develop methods to reconstruct globally meaningful information 6

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web 7

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Parent node Child/edge nodes Generating a User-Driven Web 8

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Crawling Breadth First Search for 15 days 3 seeds – nytimes.com, sina.com.cn, timesofindia.com US centric network : ~297K nodes and 2M edges China centric network : ~290K nodes and 2.7M edges India centric network : ~297K nodes and 2.8M edges Captured information: Unique #users – Google AdPlanner Shared users – Google Trends 9

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Problems without Normalization Network without Normalization (Problems!!!) C F B G D C E A D C E A C F B G Fig: Sub-graph A Fig: Sub-graph B Fig: Merged graphs A&B without normalization Weight to the first child is always set to

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Ideal Normalized Network D C E A C F B G Fig: Normalized graph – Target scenario Weights scaled w.r.t weight(AD)

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Normalization Process Parent nodes Relationship between Website 2 and child nodes of Website 1 12

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Normalization Process Phase 1: Select a starting point (a node with max in-degree – say C) –Select parent (A) of C, and child of A (D). –Normalize all other parent nodes to weight of AD (by querying the parent nodes together with A) Normalized nodes: Nodes whose all edges are normalized 13 A B F G C D Normalized node Child of a normalized node

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Phase 2: Back link from a child of a normalized node to its parent –The weight of the forward link must be equal to the weight of the backward link 14 Normalization Process A C BD E Normalized node Child of a normalized node

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Phase 3: A child of a normalized node (D) shares a child (C) with a normalized node (A) –We can normalize D (by querying it together with node A) –Note: the shared child (green) could itself either be a normalized node or a child of a normalized node 15 Normalization Process A B E C D Either normalized node or a child of a normalized node Normalized node Child of a normalized node

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Phase 4: A node (D) shares a child (C) with a normalized node (A) –We can normalize D (by querying it together with node A) –Note: Node D (black) is initially neither a normalized node nor a child of a normalized node 16 Normalization Process A B E C D Neither normalized node nor a child of a normalized node Either normalized node or a child of a normalized node Normalized node Child of a normalized node

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Normalization Process Validation –Popularity ranking of our normalized network compared to Google AdPlanner –The two tanking results match in 91.66% of cases Adding absolute traffic –Google AdPlanner for #unique users Unifying two scale systems –Top 10 children are sufficient –Relative weight -> Absolute weight 17

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Outline Generate the user-driven Web network Study the user-driven Web Apply the user-driven Web 18

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Weighted Degree Distribution –The sum of link weights for each node –Log-normal distribution OSN and WWW follow a power-law distribution –Small-traffic sites filtered by Google Trends –Seed-free properties with distinctions Extreme values 19 Minimum degree nodes Maximum degree nodes High peak => strong connectedness US networkIndia network China network

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Average Path Length and Diameter –User-Driven Web has properties closer to Online Social Networks than to WWW The human component makes the network more connected –Larger average path length for the Chinese network Because high-degree clusters in the core are loosely connected with low-degree clusters at edges For the other 2 networks, high-degree clusters in the core are well connected to the nodes at the edges 20

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web –High clustering coefficients 4 orders of magnitude higher than the corresponding random graphs –Clustering coefficients uniform for the three networks China: –High-degree and low-degree nodes are separately clustered and loosely connected US: –High-degree nodes are clustered in the core while low degree nodes are not well clustered India: –A smaller difference between high- and low-degree node clusters 21 Clustering Coefficient

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web User Driven Web is closer to Online Social Networks than to WWW in all properties –The human component prevails Seed-free properties –Independent from the starting crawling point Scale-free properties –Independent from the network scale 22 Network Properties

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Outline Generate the user-driven Web network Study the user-driven Web Apply the user-driven Web 23

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Online Advertising MSN CNN (5.8M) (6.1M) (14.3M) (4.3M) (19M) (2.3M) (1.3M) (1700) (2M) 24

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Website Selector Problem: Find the best selection of websites (ad hosts) that provide maximum visibility at minimum cost Target users – –Independent advertisers –Ad commissioners Alternative approaches: –Greedy Choose the websites in descending order of their popularity –Sub-optimal Linear optimization without shared user information 25

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Modeling Inputs – –CPI model – random normal distribution –User-driven web –Budget Output – –List of potential ad hosts providing maximum visibility within budget constraints 26

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Optimization Problem Maximize : Σ i u i x i – Σ j Σ k(j≠k) s jk x j x k subject to linear constraint : Σ i c i x i < = B where – x i – website (node) i u i – unique #users on node x i s jk – #shared users between x j and x k c i – CPI for node x i B – budget constraint 27

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Performance Results Greedy approach used as a baseline Sub-optimal approach lacks shared-user information –And hence doesn’t perform well in improving ads visibility Website Selector improves performance by % 28

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Eliminating High-Volume Websites 5% of top 1,000 websites eliminated (volume >= 1M) Several cases of high volume nodes being ignored due to significant number of shared users MSN CNN (2.9M) (11M) (23M) (1.2M) (0.7M) CPI~$42 CPI~$49 CPI~$53 ✗ 29

Yong Wang, Komal Pal Understanding Crowd’s Migration on the Web Conclusions Generated user-driven web –Used publicly available information –Designed methods to fuse pieces into a global network Studied user-driven web and its properties –Scale- and seed-free network properties –User-driven web different from “classical Web” but similar to Online Social Networks Designed website selector –Incorporates idea of “shared visitors” between websites –Increases visibility of ads by 22-25%, increases revenue –Tailored for ad commissioners 30

Thank You