Presentation on theme: "AN AUTOMATIC ADVERTISEMENT/TOPIC MODELING AND RECOMMENDING SYSTEM Yi Hou, Center for Clinical Investigation (CCI) EECS Department, Case Western Reserve."— Presentation transcript:
AN AUTOMATIC ADVERTISEMENT/TOPIC MODELING AND RECOMMENDING SYSTEM Yi Hou, Center for Clinical Investigation (CCI) EECS Department, Case Western Reserve University
Review Motivation Lack systematic and automatic ADs/Topic categorizing system --> no place to specify category Social Network Platform Popularity revenue from Facebook advertising shoot up 191 percent year-over-year in the first --> quarter of 2014 Word matters! -->
Tasks 1) Given all the ADs/topics, establish a word network, where two words share an edge iff they co-occurred in at least one AD/topic and the edge weight is the counting of the times they have occurred together in an AD/topic. Small world, power law distribution 2) Given a word network, build a taxonomy T Modularity based clustering Top 20 IF-IDF keywords (due to vocabulary issue) Empirical Network Analysis 3) Given a user's current texting information. e.g. the most recent few Tweets/Posts (we initiate the value of 10 here), we are trying to build a ranking model R, where each AD will be ranked based on R and the top-10 ADs will be returned to the user.
Data Source Data Crawling Twitters stream APIs ruby gem twitterstream acquired application-only authentication tokens set up listening point recording global Tweets only selected 5 categories of ADs/Topics: Car/Dating/Education/Grocery/Hiring, by keyword filtering Manually collected data (experimented on) only selected 5 categories of ADs/Topics: Car/Dating/Education/Grocery/Hiring. CarDatingEducationGroceryHiringTotal CarDatingEducationGroceryHiringTotal
Method Data Preprocessing Build word network Build topic taxonomy ADs/topic ranking
Data Preprocessing Remove Stop Words Such as is, are, when … List from Stanford NLP lab. Stemming Reducing inflected words to their stem, base or root form Used Porter stemmer at e.g. stemming stem Result Original: I like data mining. It is awesome. New: I like data mine It awesom"
Data Visualization In total, 1104 unique words, with word cloud representation.
Build Word Network Co-occurrence Matrix of Words co-occurrence counting served as similarity of measurement of word pairs co-occurrence matrix served as our adjacent matrix co-occurrence counting served as the edge weight Coded in C++. # of nodes: 1104 # of edges: 18972
Build topic Taxonomy Modularity-based community finding The algorithm exhaustively search the graph to maximize the modularity measurement Heavily connected component signify the topic models Each cluster/topic described by top-K highest TF-IDF keywords
Modularity-based finding Modularity one measure of the structure of networks or graphs A measure of goodness of division of a network into sub clusters Q represents the measure of goodness C represents sets of clusters e ij stands for number of edges between cluster i, j m represents total number of edges Reference: 1. Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre, Fast unfolding of communities in large networks, in Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P1000
How the algorithm works Start with all vertices initiated as isolated clusters; Successively join clusters with greatest increase Q for modularity measurement; Stop the procedure when joining any two clusters will result in Q 0;
Clustering We found 13 clusters: Visualized: different clusters with different colors.
Clustering We found 13 clusters: Why not 5? If we zoom in and look at 2 clusters, yellow and blue, respectively. We can see that they actually both belong to grocery. So actually modularity based clustering categorize words in a better granularity. (Divided grocery into food/electronics…)
Clustering Percentage distribution of 13 clusters:
Clustering Top 20 TF-IDF keywords in each cluster: Intuitively: Cluster 1: chevy, ford cars Cluster 2: date, single dating Cluster 3: lunch, friend social (new) Cluster 4: hire, join hiring ……. We observed well-defined clusters. We observed new categories. Keywords meet chevi build event open cover 2015 volt hous alway project american everyon truck ford parti convert built boss cruze date singl see start relationship still peopl pic need matchcom think site area tri women profil ladi facebook reason cant can best friend wait contract problem bet friday lunch latest colleg success stop celebr 33 tgi appet kickstart budget match hire join look team help manag us engin come design work social market make media great offic sale softwar summer onlin appli degre now program today learn new take click tech will earn info offer univers educ applic avail find like 2014 nissan big dont star go allnew dream murano receiv follow remain rogu 1st fool reveal forev get free one just use right fit happi easter http fun zipcar everi galaxi sign 0 fast rate plu low next top car share save toyota end feedback way honda easi two tip php 5 w read video announc invest 10 time spring someon delici drive busi win idea give pretti cake egg recip chanc never day prepar cook dish love want lyft check commun safeti coupl visit pink stach zoosk thing 1 person account fuzzi motorcycl send execut amaz quota s ever memor america everyth onlineonli 3 acceler town name trip hdtv slim rca kid assembl roster road bring mt king room mango pineappl collect store add walmart put winter weekend long theater preorder soldier ripen faster paper captain bellalif bag
Empirical Network Analysis Property definitions: Diameter d: the diameter of a network is the largest geodesic distance in the (connected) network. Shortest path l u,v : the shortest path between two nodes u and v in the network. Average shortest path l network : the average shortest path for every pairs of nodes in the network. Power law distribution: node degree distribution follows a power law, at least asymptotically. Small world property: small world property holds two conditions, that is, 1) high clustering coefficient(as compared to Erdos-Renyi model) and 2) low average shortest path, which means typical distance l between two random nodes grows proportionally to the logarithm of the number of nodes N in the network, as l network ln(N).
Empirical Network Analysis Clustering coefficients definitions: 1) Global clustering coefficient : N t : number of triangles formed in the graph N c : connected triple nodes in the graph 2) Local clustering coefficient : Directed graph: Undirected graph: n i : direct neighbors of node I, n c : direct connections between is direct neighbors Averaged over all nodes: Reference: Social network analysis – by Lada Adamic, University of Michigan
Empirical Network Analysis In our experiment, we use local clustering coefficient definition(for undirected graph), here is the statistics of the experiments. The network satisfies small-world property! Lets recall: Small world property: small world property holds two conditions, that is, 1) high clustering coefficient(as compared to Erdos-Renyi model) and 2) low average shortest path, which means typical distance l between two random nodes grows proportionally to the logarithm of the number of nodes N in the network, as l network ln(N). Dataset Avg Degree Diameter Avg Path Length Modularity Avg Clustering Coefficient ADs/Topic
Network Diameter Betweenness Centrality Closeness Centrality Eccentricity Reference: 1. Ulrik Brandes, A Faster Algorithm for Betweenness Centrality, in Journal of Mathematical Sociology 25(2): , (2001)
Power Law Distribution Degree Distribution In-degree Out-degree
Power Law Distribution Degree Distribution In-degree Out-degree
Power Law Distribution The nodes with high degrees satisfy power law distribution. The nodes with low degrees dont. Because of limit of data, 1104 words in total.
Continue work: ranking FB ranking: assign weights for each features. But Youtube added randomness to increase recall at the cost of precision. Reference: 1, James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, Dasarathi Sampath: The YouTube video recommendation system. RecSys 2010:
Continue work: ranking Our ranking: a combination of FB news ranking and Youtube ranking: We use cosine similarity to measure which topic cluster the user is most interested in. We generate top 8 ADs/Topic by FB ranking algorithm. And we add two more ADs/Topic by random. Increase the prediction broadness (increase recall), at the cost of precision.
Limitation and Future Work Will perform the system in larger scale dataset. Since we dont have real data, e.g. the performance(CTR) for each AD/topic, we need to generate them based on Gaussian model.