Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith.

Similar presentations


Presentation on theme: "Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith."— Presentation transcript:

1 Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

2 Introduction People increasingly publish their reactions to public events using a blog A tool that enables this info to be published quickly A journal that is available on the web Need for effective data-mining techniques specific to blogs and similar tools (e.g. the Semantic Web) Our goal is to develop a method of capturing hot conversations by automating readers processes for characterizing and monitoring blogs.

3 Overview Data-mining techniques Creation of blog link structure Analysing link structure Types of important bloggers Agitators Summarisers Applications, analysis and conclusions Real-world applications and extensions Pros and cons of the paper

4 Crawling blogs Extracting hyperlinks Extracting blog threads

5 Crawling blogs System crawls through RSS list registering for each entry: Title Permalink List entry date Aggregator: gathers RSS feeds from multiple sources and organises them OPML: file format used to share RSS feed lists RSS: A format for distributing content on the web Aggregators RSS list RSS feeds OPML

6 Extracting hyperlinks Problem: Different tag structures per server RSS feed from list DescriptionBlog entriesHyperlink list

7 Extracting blog threads Hyperlink If sourceLink If replyLink Check links exist in thread data Add Check departure URL exists in thread data Check destination URL points to entry on list && Add dest entry to thread 11 Add destination entry to entry list and add to thread 10 Add departure entry to thread 01 Create new thread 00

8 Example Results

9 Agitators Summarisers Joe Bloggs

10 Agitators Discussion stimulator Threads often grow after an agitators entry Three discriminants for an agitator Link (Agi 1 ) Popularity (Agi 2 ) Topic (Agi 3 ) The three discriminants can be weighted using the following formula:

11 Link-based Discriminant e x is an agitator if (k x ) > θ 1 e x = a blog entry k x = no of entries in thread i with a replyLink to e x

12 Popularity-based discriminant e x is an agitator if (l x /m x ) > θ 2 e x = a blog entry l x = no of entries in thread i published t days after e x m x = no of entries in thread i published t days before e x

13 Topic-based discriminant e x is an agitator if e x = a blog entry n = number of entries

14 Summarizers Publish entries that collate and compact previous posts Provide a convenient way of digesting an entire thread The discriminant for summarizers is link-based: e x is a summarizer if (p x ) > θ 4 e x = a blog entry p x = number of entries in thread i that have a replyLink from e x

15 Applications Pros and Cons Conclusions

16 Applications Supplementary info e.g. TV, news site etc Home and Away – who shot Josh West Agitator Sports, etc. – used by studios and media to highlight points of interest in a match Summariser

17 Analysis – Pros Basis for future research – a brief intro to the subject. Multiple thread analysis Identification of areas of bloggers expertise Highly effective in certain specific areas News and reviews Implementation of theory (feature vector)

18 Analysis – Cons Only 25 sites used in sample (but 1000s of blogs) Does not take context into consideration E.g., an agitator may be posting offensive entries No measurement of summary success Comments are not analysed Inappropriate for certain areas MySpace, Bebo, et al. (due to target audience)

19 Conclusions Created a data-mining framework for future research May instigate research into further work Nice idea and potentially useful but needs to be extended

20 Thank you for your time


Download ppt "Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith."

Similar presentations


Ads by Google