KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu.

Slides:



Advertisements
Similar presentations
Recommender System A Brief Survey.
Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Topic models Source: Topic models, David Blei, MLSS 09.
One Theme in All Views: Modeling Consensus Topics in Multiple Contexts Jian Tang 1, Ming Zhang 1, Qiaozhu Mei 2 1 School of EECS, Peking University 2 School.
THE UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Improving IM Collaboration in the Workplace Kirstin Williams COMP
Title: The Author-Topic Model for Authors and Documents
1 Multi-topic based Query-oriented Summarization Jie Tang *, Limin Yao #, and Dewei Chen * * Dept. of Computer Science and Technology Tsinghua University.
Probabilistic Clustering-Projection Model for Discrete Data
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Yuan Yao Joint work with Hanghang Tong, Xifeng Yan, Feng Xu, and Jian Lu MATRI: A Multi-Aspect and Transitive Trust Inference Model 1 May 13-17, WWW 2013.
Topic Extraction From Turkish News Articles Anıl Armağan Fuat Basık Fatih Çalışır Arif Usta.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Caimei Lu et al. (KDD 2010) Presented by Anson Liang.
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
Personalized Search Result Diversification via Structured Learning
Latent Dirichlet Allocation a generative model for text
Multiscale Topic Tomography Ramesh Nallapati, William Cohen, Susan Ditmore, John Lafferty & Kin Ung (Johnson and Johnson Group)
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
Chapter 5: Information Retrieval and Web Search
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
© 2012 ISACA. All Rights Reserved. Topic Leader Training 2012.
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
Tk20Tk20 CAMPUS TOOLS FOR HIGHER EDUCATION. WHAT IS IT? Tk20 is an electronic program that offers one, central, easy location to manage all courses. Instructors.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
CONCLUSION & FUTURE WORK Normally, users perform triage tasks using multiple applications in concert: a search engine interface presents lists of potentially.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 10a-11:30a Instructor: Christoph F. Eick Classroom:AH123
Gradual Adaption Model for Estimation of User Information Access Behavior J. Chen, R.Y. Shtykh and Q. Jin Graduate School of Human Sciences, Waseda University,
11 Learning to Suggest Questions in Online Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.
1 Linmei HU 1, Juanzi LI 1, Zhihui LI 2, Chao SHAO 1, and Zhixing LI 1 1 Knowledge Engineering Group, Dept. of Computer Science and Technology, Tsinghua.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Understanding Text Corpora with Multiple Facets Lei Shi, Furu Wei, Shixia Liu, Xiaoxiao Lian, Li Tan and Michelle X. Zhou IBM Research.
Chapter 6: Information Retrieval and Web Search
Developing a Quality Framework for Community Languages Schools Dr Tim Wyatt Dr Bob Carbines Erebus International Victorian Annual Conference 7 July 2007.
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
Shobha Kumar, Jeeyeon Seo WBI Multimedia Center Introduction to.
Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.
Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Storylines from Streaming Text The Infinite Topic Cluster Model Amr Ahmed, Jake Eisenstein, Qirong Ho Alex Smola, Choon Hui Teo, Eric Xing Carnegie Mellon.
Topic Modeling using Latent Dirichlet Allocation
Latent Dirichlet Allocation
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009.
Bayesian Machine learning and its application Alan Qi Feb. 23, 2009.
Supporting Knowledge Discovery: Next Generation of Search Engines Qiaozhu Mei 04/21/2005.
‘A Day In The Life Of Urban Canadians’. Objectives Provide insight into a typical day in the life of urban Canadians. Identify the opportunities to target.
Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.
MATH 598: Statistics & Modeling for Teachers May 21, 2014.
Automatic Labeling of Multinomial Topic Models
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Unsupervised Streaming Feature Selection in Social Media
2005/09/13 A Probabilistic Model for Retrospective News Event Detection Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma University of Science and Technology.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Icebreaker What was your first computer? Traditional Literacy & Digital Literacy What is your definition literacy? What would you consider a text?
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 1-2:30p Instructor: Christoph F. Eick Classroom:AH301
Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook INSTRUCTOR: DONGCHUL KIM ANUSHA BOOTHPUR
Dr. Chen, Management Information Systems 1 Chapter 2 Collaboration Information Systems - Case & Exercise Jason C. H. Chen, Ph.D. Professor of MIS School.
Who is the Expert? Combining Intention and Knowledge of Online Discussants in Collaborative RE Tasks Itzel Morales-Ramirez1,2, Matthieu Vergne1,2, Mirko.
Trevor Savage, Bogdan Dit, Malcom Gethers and Denys Poshyvanyk
Recommending Mobile Apps - A Collaborative Filtering Viewpoint
Towards a Personal Briefing Assistant
Michal Rosen-Zvi University of California, Irvine
Topic Models in Text Processing
Hierarchical Relational Models for Document Networks
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

Background Statistics show that s have been ubiquitous at the workplace With the availability of additional online social media, information overload has become problematic Use s daily or several times each week: 97% s being “essential” for their everyday work: 71% --- Institute for the Future (Bowes 2000) US workers average 49 minutes a day managing , and 25% spend more than one hour per day on that task (Gartner 2001). 2

personal-analytics-of-my-life/ 3

4

5

Motivation Help users boost productivity Summarize their work areas automatically Keep track of past and on-going collaborations Prioritize work-related tasks 6

Problem Formulation Given: a user’s s Find: the user’s work profile -- a set of work areas Constraints: Unsupervised (or semi-supervised later on) Effectiveness in providing insights Computation efficiency Teaching class, homework, score Alice, Bob, Charlie Teaching class, homework, score Alice, Bob, Charlie Research , mining, data, paper Hongxia, Yan Research , mining, data, paper Hongxia, Yan Advising meeting, report, draft Dane, Ellen, Flint Advising meeting, report, draft Dane, Ellen, Flint Grants project, proposal, grant, due Sarah, Tim Grants project, proposal, grant, due Sarah, Tim 7

Traditional Community Finding 8

Community (i.e. Work Area) Two aspects people people (whom you collaborate with) task task (what you collaborate on) 9

The Data 10

Data Preprocessing People ( accounts) Disregarded roles, only considered occurrence Content (subject + body) Removed punctuations and stop words; Words are stemmed; Documents converted into bag of words. Unused: Replicate messages; Time-stamps; Attachments; 11

Topic Models: A Bayesian Approach Assume: a topic is a unique distribution of words a document has a mixture of topics documents are generated by sampling from topics and words 12

Latent Dirichlet Allocation (Blei et al., 2003) 13

COllaborator COMmunity Profiling Model (COCOMP) 14

Enron s 15

16

Social Messages 17

18

19

Summary COCOMP: a latent community model Each social media document corresponds to a sharing activity within a community. A community is represented with a list of top participants and associated list of topics. Experiments on and social media datasets demonstrate interesting results. Future work Different sources of data with the same user Evolution over time with incremental learning Scalable inference with user feedback 20

Thank You! 21