AN AUTOMATIC ADVERTISEMENT/TOPIC MODELING AND RECOMMENDING SYSTEM Yi Hou, Center for Clinical Investigation (CCI) EECS Department, Case Western Reserve.

Slides:



Advertisements
Similar presentations
Measurement and Analysis of Online Social Networks 1 A. Mislove, M. Marcon, K Gummadi, P. Druschel, B. Bhattacharjee Presentation by Shahan Khatchadourian.
Advertisements

Google News Personalization Scalable Online Collaborative Filtering
Whats new with social media Dean Chew SEO Consultant Ayima Search Marketing.
Fast algorithm for detecting community structure in networks M. E. J. Newman Department of Physics and Center for the Study of Complex Systems, University.
Scale Free Networks.
Complex Networks Luis Miguel Varela COST meeting, Lisbon March 27 th 2013.
Chapter 5: Introduction to Information Retrieval
Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,
The Big Idea for the “Emerging Young Artists” is to do SMART marketing using digital marketing avenues. The idea is to create awareness and increase.
The Importance of Social Media. Some facts and statistics: Nearly 1 out of every 5 minutes online is spent on social media Facebook reached 1.11 billion.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Analysis and Modeling of Social Networks Foudalis Ilias.
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge.
Intro to Social Networking Or, “It sounds like a ton of time to me, why bother?” and other points we can cover in an hour or so! 1© 2010 Angel Fire ArtSpace.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Global topological properties of biological networks.
Advanced Topics in Data Mining Special focus: Social Networks.
+ Beginning Blogging by Six Sisters’ Stuff. + Just start! What do you want to blog about? What are you an expert in? What makes you unique? What are you.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Pinterest Using social pinboards for your newspaper.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
Small World Social Networks With slides from Jon Kleinberg, David Liben-Nowell, and Daniel Bilar.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Automated Social Hierarchy Detection through Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:
COLOR TEST COLOR TEST. Social Networks: Structure and Impact N ICOLE I MMORLICA, N ORTHWESTERN U.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Prims’ spanning tree algorithm Given: connected graph (V, E) (sets of vertices and edges) V1= {an arbitrary node of V}; E1= {}; //inv: (V1, E1) is a tree,
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
Technical Science Scientific Tools and Methods Tables and Graphs.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
Public Relations & Social Media
Corporate Sponsor Corporate Partner Corporate Affiliate Chapter Sponsor.
How Do “Real” Networks Look?
Optimizing today's websites using tomorrow's technologies.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
CS 590 Term Project Epidemic model on Facebook
A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
Grow Your Business with Social Marketing
QR Codes “Scan Your Way to Engagement” Pamala Heller Woodland High School – Henry County
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Alan Mislove Bimal Viswanath Krishna P. Gummadi Peter Druschel.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Don McClain Real Estate Marketing Ideas Guaranteed To Close More Sales EZ House Buyers.
Social Networks Some content from Ding-Zhu Du, Lada Adamic, and Eytan Adar.
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
User Joining Behavior in Online Forums
Gephi Gephi is a tool for exploring and understanding graphs. Like Photoshop (but for graphs), the user interacts with the representation, manipulate the.
How Do “Real” Networks Look?
Network Science: A Short Introduction i3 Workshop
How Do “Real” Networks Look?
Section 7.12: Similarity By: Ralucca Gera, NPS.
How Do “Real” Networks Look?
Apache Spark & Complex Network
Why Social Graphs Are Different Communities Finding Triangles
Peer-to-Peer and Social Networks Fall 2017
How Do “Real” Networks Look?
Department of Computer Science University of York
Graph and Link Mining.
Technical Science Scientific Tools and Methods
Network Models Michael Goodrich Some slides adapted from:
Presentation transcript:

AN AUTOMATIC ADVERTISEMENT/TOPIC MODELING AND RECOMMENDING SYSTEM Yi Hou, Center for Clinical Investigation (CCI) EECS Department, Case Western Reserve University

Review Motivation Lack systematic and automatic ADs/Topic categorizing system --> no place to specify category Social Network Platform Popularity revenue from Facebook advertising shoot up 191 percent year-over-year in the first --> quarter of 2014 Word matters! -->

Tasks 1) Given all the ADs/topics, establish a word network, where two words share an edge iff they co-occurred in at least one AD/topic and the edge weight is the counting of the times they have occurred together in an AD/topic. Small world, power law distribution 2) Given a word network, build a taxonomy T Modularity based clustering Top 20 IF-IDF keywords (due to vocabulary issue) Empirical Network Analysis 3) Given a user's current texting information. e.g. the most recent few Tweets/Posts (we initiate the value of 10 here), we are trying to build a ranking model R, where each AD will be ranked based on R and the top-10 ADs will be returned to the user.

Data Source Data Crawling Twitters stream APIs ruby gem twitterstream acquired application-only authentication tokens set up listening point recording global Tweets only selected 5 categories of ADs/Topics: Car/Dating/Education/Grocery/Hiring, by keyword filtering Manually collected data (experimented on) only selected 5 categories of ADs/Topics: Car/Dating/Education/Grocery/Hiring. CarDatingEducationGroceryHiringTotal CarDatingEducationGroceryHiringTotal

Method Data Preprocessing Build word network Build topic taxonomy ADs/topic ranking

Data Preprocessing Remove Stop Words Such as is, are, when … List from Stanford NLP lab. Stemming Reducing inflected words to their stem, base or root form Used Porter stemmer at e.g. stemming stem Result Original: I like data mining. It is awesome. New: I like data mine It awesom"

Data Visualization In total, 1104 unique words, with word cloud representation.

Build Word Network Co-occurrence Matrix of Words co-occurrence counting served as similarity of measurement of word pairs co-occurrence matrix served as our adjacent matrix co-occurrence counting served as the edge weight Coded in C++. # of nodes: 1104 # of edges: 18972

Build topic Taxonomy Modularity-based community finding The algorithm exhaustively search the graph to maximize the modularity measurement Heavily connected component signify the topic models Each cluster/topic described by top-K highest TF-IDF keywords

Modularity-based finding Modularity one measure of the structure of networks or graphs A measure of goodness of division of a network into sub clusters Q represents the measure of goodness C represents sets of clusters e ij stands for number of edges between cluster i, j m represents total number of edges Reference: 1. Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre, Fast unfolding of communities in large networks, in Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P1000

How the algorithm works Start with all vertices initiated as isolated clusters; Successively join clusters with greatest increase Q for modularity measurement; Stop the procedure when joining any two clusters will result in Q 0;

Clustering We found 13 clusters: Visualized: different clusters with different colors.

Clustering We found 13 clusters: Why not 5? If we zoom in and look at 2 clusters, yellow and blue, respectively. We can see that they actually both belong to grocery. So actually modularity based clustering categorize words in a better granularity. (Divided grocery into food/electronics…)

Clustering Percentage distribution of 13 clusters:

Clustering Top 20 TF-IDF keywords in each cluster: Intuitively: Cluster 1: chevy, ford cars Cluster 2: date, single dating Cluster 3: lunch, friend social (new) Cluster 4: hire, join hiring ……. We observed well-defined clusters. We observed new categories. Keywords meet chevi build event open cover 2015 volt hous alway project american everyon truck ford parti convert built boss cruze date singl see start relationship still peopl pic need matchcom think site area tri women profil ladi facebook reason cant can best friend wait contract problem bet friday lunch latest colleg success stop celebr 33 tgi appet kickstart budget match hire join look team help manag us engin come design work social market make media great offic sale softwar summer onlin appli degre now program today learn new take click tech will earn info offer univers educ applic avail find like 2014 nissan big dont star go allnew dream murano receiv follow remain rogu 1st fool reveal forev get free one just use right fit happi easter http fun zipcar everi galaxi sign 0 fast rate plu low next top car share save toyota end feedback way honda easi two tip php 5 w read video announc invest 10 time spring someon delici drive busi win idea give pretti cake egg recip chanc never day prepar cook dish love want lyft check commun safeti coupl visit pink stach zoosk thing 1 person account fuzzi motorcycl send execut amaz quota s ever memor america everyth onlineonli 3 acceler town name trip hdtv slim rca kid assembl roster road bring mt king room mango pineappl collect store add walmart put winter weekend long theater preorder soldier ripen faster paper captain bellalif bag

Empirical Network Analysis Property definitions: Diameter d: the diameter of a network is the largest geodesic distance in the (connected) network. Shortest path l u,v : the shortest path between two nodes u and v in the network. Average shortest path l network : the average shortest path for every pairs of nodes in the network. Power law distribution: node degree distribution follows a power law, at least asymptotically. Small world property: small world property holds two conditions, that is, 1) high clustering coefficient(as compared to Erdos-Renyi model) and 2) low average shortest path, which means typical distance l between two random nodes grows proportionally to the logarithm of the number of nodes N in the network, as l network ln(N).

Empirical Network Analysis Clustering coefficients definitions: 1) Global clustering coefficient : N t : number of triangles formed in the graph N c : connected triple nodes in the graph 2) Local clustering coefficient : Directed graph: Undirected graph: n i : direct neighbors of node I, n c : direct connections between is direct neighbors Averaged over all nodes: Reference: Social network analysis – by Lada Adamic, University of Michigan

Empirical Network Analysis In our experiment, we use local clustering coefficient definition(for undirected graph), here is the statistics of the experiments. The network satisfies small-world property! Lets recall: Small world property: small world property holds two conditions, that is, 1) high clustering coefficient(as compared to Erdos-Renyi model) and 2) low average shortest path, which means typical distance l between two random nodes grows proportionally to the logarithm of the number of nodes N in the network, as l network ln(N). Dataset Avg Degree Diameter Avg Path Length Modularity Avg Clustering Coefficient ADs/Topic

Network Diameter Betweenness Centrality Closeness Centrality Eccentricity Reference: 1. Ulrik Brandes, A Faster Algorithm for Betweenness Centrality, in Journal of Mathematical Sociology 25(2): , (2001)

Power Law Distribution Degree Distribution In-degree Out-degree

Power Law Distribution Degree Distribution In-degree Out-degree

Power Law Distribution The nodes with high degrees satisfy power law distribution. The nodes with low degrees dont. Because of limit of data, 1104 words in total.

Continue work: ranking FB ranking: assign weights for each features. But Youtube added randomness to increase recall at the cost of precision. Reference: 1, James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, Dasarathi Sampath: The YouTube video recommendation system. RecSys 2010:

Continue work: ranking Our ranking: a combination of FB news ranking and Youtube ranking: We use cosine similarity to measure which topic cluster the user is most interested in. We generate top 8 ADs/Topic by FB ranking algorithm. And we add two more ADs/Topic by random. Increase the prediction broadness (increase recall), at the cost of precision.

Limitation and Future Work Will perform the system in larger scale dataset. Since we dont have real data, e.g. the performance(CTR) for each AD/topic, we need to generate them based on Gaussian model.

Thank you! Questions?