Presentation is loading. Please wait.

Presentation is loading. Please wait.

LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.

Similar presentations


Presentation on theme: "LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu."— Presentation transcript:

1 LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : 2008.08.28

2 2 Outline  Introduction  What is Opinion Leader  Motivation  InfluenceRank Algorithm  Experiments  Experimental Setup  Identifying Opinion Leaders  Conclusions

3 3 Introduction  The blogosphere is a fruitful media to understand people’s response to events, and customers’ opinions on products and services of a company, since they reflect as many topics, events, and opinions as there are people writing about them.  The blogosphere is more conversational in style, it usually starts from ones who introduce new information, ideas, and opinions, then spread them down to their friends, families, and peers.

4 4 Introduction  Social influence, which describes the phenomenon by which the behavior of an individual can directly or indirectly affect the thoughts, feelings, and actions of others in a population, is present in the conversations in the blogosphere.  Those who play a crucial role in forming and reflecting the opinions of the masses are called opinion leaders.

5 5 Introduction  Opinion leaders capture the most representative opinions in the social network, and consequently are important for understanding the massive and complex blogosphere.  The important role of opinion leaders has attracted growing attention recently since massive quantities of network data are available through the Internet.

6 6 Introduction  Example :  Blogs A, B, C, and D discuss the same topic – e.g. how to use Riya to find similar faces and objects on images across the web.  Blog E initiates the discussion of a rumor of Google acquiring Riya, and links to blogs A and C that introduce how to use Riya’s visual search.  Following blog E, blogs F and G start to discuss this acquisition rumor.

7 7 Introduction  Based on the characteristics of opinion leaders, here proposes an InfluenceRank algorithm to rank blogs according to how important they are in the network and how novel the information they provide.  The top blogs ranked by InfluenceRank tend to be more influential and informative in the network, and thus are more likely to be opinion leaders.

8 8 InfluenceRank Algorithm  Both information novelty and the importance of its position in the blogosphere are essential characteristics to a blog to determine its leadership.  It can imagine there is an extra source for this blog to contribute novel information to the network, and we model this extra source as a hidden node that is linked by this blog.

9 9 InfluenceRank Algorithm  To measure the information novelty of one blog, let us first regard each entry in a blog as a document.  We first perform Latent Dirichlet Allocation (LDA) to reduce data dimensionality and to generate a topic space for representing entries, and then project each entry onto this topic space to generate a feature vector to represent the entry.

10 10 InfluenceRank Algorithm  After representing each document as a feature vector, cosine similarity or Kullback-Leibler (KL) divergence can be used to calculate the dissimilarity.  The information novelty provided by the hidden node of an entry A e in blog A is measured as the information novelty between this entry and those entries it links to.

11 11 InfluenceRank Algorithm  The information novelty provided by the hidden node of blog A is measured as the average of the novelty scores of the entries it contains.  card(Set(A e )) is the total number of entries of interest in blog A, and information novelty ranges strictly between zero and one.

12 12 InfluenceRank Algorithm  Given the information novelty of each blog, to calculate the InfluenceRank of a blog, let us first regard a set of n blogs as a directed graph with the adjacency matrix G, where G ij = 1 if blog i links to blog j and G ij = 0 otherwise.  Then we scale the adjacency matrix G by its row sums to obtain a normalized adjacency matrix W.

13 13 InfluenceRank Algorithm  The InfluenceRank of a blog A, denoted as IR(A), indicates the opinion leadership of A.  Parameter β is used to adjust how important the information novelty is for the leadership, and In( ⋅ ) denotes the set of nodes that node ⋅ is linked to.  Generally speaking :

14 14 InfluenceRank Algorithm  Due to the presence of dangling nodes in the network, the unique solution of Eq. may not exist.  To remedy this problem, we apply the remedy of “random jumps” as what PageRank does.  W after the adjustment can be written as :  e is the n-vector of all ones and a is the vector with components a i = 1 if ith row of W corresponds to a dangling node, and 0, otherwise.

15 15 InfluenceRank Algorithm  1)  2)  3)  4) where,  5)

16 16 InfluenceRank Algorithm

17 17 Experiments  Experimental Setup  Dataset : The dataset we use for experimental studies is collected from an NEC focused blog crawler. We clean the crawled data by first removing stop words from entries and then removing entries that contain less than ten terms. The cleaned dataset contains 407 English blogs with 67,549 entries.

18 18 Experiments  Experimental Setup (cont.)  Evaluation metrics : 1. Coverage –One-step coverage –All-path coverage 2. Diversity 3. Distortion

19 19 Experiments  Experimental Setup (cont.)  Coverage : Given a set of nodes in a network, the coverage is defined as the number of nodes that are either directly or indirectly influenced by this set of nodes. One-step coverage : is defined as the number of nodes that are directly influenced by this set of nodes. All-path coverage : is defined as the number of nodes that are either directly or indirectly influenced by this set of nodes.

20 20 Experiments  Experimental Setup (cont.)  Diversity : The diversity of a set of items is defined as the average pairwise dissimilarity of the items v i, i = 1,2,..., n.

21 21 Experiments  Experimental Setup (cont.)  The topic distribution over the opinion leaders should be consistent with that of the original information space.  Distortion : Given the topic distributions of the original information space P o and the sampling space P s, the distortion of the sampling space over the original space is defined as the KL divergence of P s and P o.

22 22 Experiments  Experimental Setup (cont.)  It compares the InfluenceRank (IR) with : PageRank (PR). Random Sampling (RS). Time-based Ranking (Time). Information Novelty-based Ranking (IN).

23 23 Experiments

24 24 Experiments

25 25 Experiments ,

26 26 Conclusions  Opinion leaders are the most informative and influential nodes, and capture the most representative opinions in the network.  We propose a novel ranking algorithm, InfluenceRank, to identify those opinion leaders who are novel information contributors and also highly influential in the network.  The opinion leaders detected by InfluenceRank achieve better performance in terms of coverage, diversity, and distortion comparing to four baseline algorithms.


Download ppt "LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu."

Similar presentations


Ads by Google