LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.

Slides:



Advertisements
Similar presentations
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:
Spread of Influence through a Social Network Adapted from :
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Personalized Search Result Diversification via Structured Learning
1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Latent Dirichlet Allocation a generative model for text
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
Overview of Search Engines
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Presented By: - Chandrika B N
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Yaomin Jin Design of Experiments Morris Method.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
A genetic approach to the automatic clustering problem Author : Lin Yu Tseng Shiueng Bien Yang Graduate : Chien-Ming Hsiao.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
Region-Based Saliency Detection and Its Application in Object Recognition IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM FOR VIDEO TECHNOLOGY, VOL. 24 NO. 5,
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
LOGO Identifying the Influential Bloggers in a Community Nitin Agarwal, Huan Liu, Lei Tang and Philip S. Yu WSDM 2008 Advisor : Dr. Koh Jia-Ling Speaker.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
By: Jesse Ehlert Dustin Wells Li Zhang Iterative Aggregation/Disaggregation(IAD)
2016/2/131 Structural and Temporal Analysis of the Blogosphere Through Community Factorization Y. Chi, S. Zhu, X. Song, J. Tatemura, B.L. Tseng Proceedings.
Unsupervised Streaming Feature Selection in Social Media
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
LOGO Comments-Oriented Blog Summarization by Sentence Extraction Meishan Hu, Aixin Sun, Ee-Peng Lim (ACM CIKM’07) Advisor : Dr. Koh Jia-Ling Speaker :
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
The PageRank Citation Ranking: Bringing Order to the Web
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Hu Li Moments for Low Resolution Thermal Face Recognition
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Discovery of Blog Communities based on Mutual Awareness
Unsupervised learning of visual sense models for Polysemous words
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date :

2 Outline  Introduction  What is Opinion Leader  Motivation  InfluenceRank Algorithm  Experiments  Experimental Setup  Identifying Opinion Leaders  Conclusions

3 Introduction  The blogosphere is a fruitful media to understand people’s response to events, and customers’ opinions on products and services of a company, since they reflect as many topics, events, and opinions as there are people writing about them.  The blogosphere is more conversational in style, it usually starts from ones who introduce new information, ideas, and opinions, then spread them down to their friends, families, and peers.

4 Introduction  Social influence, which describes the phenomenon by which the behavior of an individual can directly or indirectly affect the thoughts, feelings, and actions of others in a population, is present in the conversations in the blogosphere.  Those who play a crucial role in forming and reflecting the opinions of the masses are called opinion leaders.

5 Introduction  Opinion leaders capture the most representative opinions in the social network, and consequently are important for understanding the massive and complex blogosphere.  The important role of opinion leaders has attracted growing attention recently since massive quantities of network data are available through the Internet.

6 Introduction  Example :  Blogs A, B, C, and D discuss the same topic – e.g. how to use Riya to find similar faces and objects on images across the web.  Blog E initiates the discussion of a rumor of Google acquiring Riya, and links to blogs A and C that introduce how to use Riya’s visual search.  Following blog E, blogs F and G start to discuss this acquisition rumor.

7 Introduction  Based on the characteristics of opinion leaders, here proposes an InfluenceRank algorithm to rank blogs according to how important they are in the network and how novel the information they provide.  The top blogs ranked by InfluenceRank tend to be more influential and informative in the network, and thus are more likely to be opinion leaders.

8 InfluenceRank Algorithm  Both information novelty and the importance of its position in the blogosphere are essential characteristics to a blog to determine its leadership.  It can imagine there is an extra source for this blog to contribute novel information to the network, and we model this extra source as a hidden node that is linked by this blog.

9 InfluenceRank Algorithm  To measure the information novelty of one blog, let us first regard each entry in a blog as a document.  We first perform Latent Dirichlet Allocation (LDA) to reduce data dimensionality and to generate a topic space for representing entries, and then project each entry onto this topic space to generate a feature vector to represent the entry.

10 InfluenceRank Algorithm  After representing each document as a feature vector, cosine similarity or Kullback-Leibler (KL) divergence can be used to calculate the dissimilarity.  The information novelty provided by the hidden node of an entry A e in blog A is measured as the information novelty between this entry and those entries it links to.

11 InfluenceRank Algorithm  The information novelty provided by the hidden node of blog A is measured as the average of the novelty scores of the entries it contains.  card(Set(A e )) is the total number of entries of interest in blog A, and information novelty ranges strictly between zero and one.

12 InfluenceRank Algorithm  Given the information novelty of each blog, to calculate the InfluenceRank of a blog, let us first regard a set of n blogs as a directed graph with the adjacency matrix G, where G ij = 1 if blog i links to blog j and G ij = 0 otherwise.  Then we scale the adjacency matrix G by its row sums to obtain a normalized adjacency matrix W.

13 InfluenceRank Algorithm  The InfluenceRank of a blog A, denoted as IR(A), indicates the opinion leadership of A.  Parameter β is used to adjust how important the information novelty is for the leadership, and In( ⋅ ) denotes the set of nodes that node ⋅ is linked to.  Generally speaking :

14 InfluenceRank Algorithm  Due to the presence of dangling nodes in the network, the unique solution of Eq. may not exist.  To remedy this problem, we apply the remedy of “random jumps” as what PageRank does.  W after the adjustment can be written as :  e is the n-vector of all ones and a is the vector with components a i = 1 if ith row of W corresponds to a dangling node, and 0, otherwise.

15 InfluenceRank Algorithm  1)  2)  3)  4) where,  5)

16 InfluenceRank Algorithm

17 Experiments  Experimental Setup  Dataset : The dataset we use for experimental studies is collected from an NEC focused blog crawler. We clean the crawled data by first removing stop words from entries and then removing entries that contain less than ten terms. The cleaned dataset contains 407 English blogs with 67,549 entries.

18 Experiments  Experimental Setup (cont.)  Evaluation metrics : 1. Coverage –One-step coverage –All-path coverage 2. Diversity 3. Distortion

19 Experiments  Experimental Setup (cont.)  Coverage : Given a set of nodes in a network, the coverage is defined as the number of nodes that are either directly or indirectly influenced by this set of nodes. One-step coverage : is defined as the number of nodes that are directly influenced by this set of nodes. All-path coverage : is defined as the number of nodes that are either directly or indirectly influenced by this set of nodes.

20 Experiments  Experimental Setup (cont.)  Diversity : The diversity of a set of items is defined as the average pairwise dissimilarity of the items v i, i = 1,2,..., n.

21 Experiments  Experimental Setup (cont.)  The topic distribution over the opinion leaders should be consistent with that of the original information space.  Distortion : Given the topic distributions of the original information space P o and the sampling space P s, the distortion of the sampling space over the original space is defined as the KL divergence of P s and P o.

22 Experiments  Experimental Setup (cont.)  It compares the InfluenceRank (IR) with : PageRank (PR). Random Sampling (RS). Time-based Ranking (Time). Information Novelty-based Ranking (IN).

23 Experiments

24 Experiments

25 Experiments ,

26 Conclusions  Opinion leaders are the most informative and influential nodes, and capture the most representative opinions in the network.  We propose a novel ranking algorithm, InfluenceRank, to identify those opinion leaders who are novel information contributors and also highly influential in the network.  The opinion leaders detected by InfluenceRank achieve better performance in terms of coverage, diversity, and distortion comparing to four baseline algorithms.