Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topic cluster of Streaming Tweets based on GPU-Accelerated Self Organizing Map Group 15 Chen Zhutian Huang Hengguang.

Similar presentations


Presentation on theme: "Topic cluster of Streaming Tweets based on GPU-Accelerated Self Organizing Map Group 15 Chen Zhutian Huang Hengguang."— Presentation transcript:

1 Topic cluster of Streaming Tweets based on GPU-Accelerated Self Organizing Map Group 15 Chen Zhutian Huang Hengguang

2

3

4 Unsupervised, Clustering algorithm. Organize large document collections according to textual similarities. Create visible result for searching and exploring large document collections.

5 WEBSOM system Based on Self Organizing Map. Generate topic map for documents. Explore large documents just like explore Google map.

6 What WEBSOM looks like?

7 Gap WEBSOM – Long document, static, long training time. Twitter – Short text, dynamic, streaming data How to adapt SOM to streaming Twitter data?

8 What our system looks like

9

10

11 Pipeline Detect Event Build Dictionary Vectorize Tweets Reduce Dimension SOM Cluster Show the SOM map Detect Event

12 Only focus on unusual events. How to identify abnormal events on Twitter?

13 1. Similar to TCP’s congestion control mechanism. 2. Count the number of tweets in a moving window. 3. Weighted moving average and variance. 4. Threshold to determine whether it’s an event. Detect Event

14 Test Data

15 Time of PeakWhat’s happen? 4:11First Goal! 4:25Goal! X 3 in 3 minute 4:30Goal! 5:07Second Half Begin 5:25Goal! 5:35Goal! 5:46Goal! 5:50End! Detect Event

16 Build Dictionary Vectorize Tweets Reduce Dimension SOM Cluster Show the SOM map Detect Event Build Dictionary

17 Detect Event Build Dictionary Vectorize Tweets Reduce Dimension SOM Cluster Show the SOM map Build Dictionary

18 1. Remove stop words 2. Stemming – Snow Balls 3. Remove words whose occurrence less that 10% 4. Remove words whose occurrence greater that 50% Build Dictionary

19 1. Vector Space model 2. TF-IDF 3. Normalization Vectorize Tweets

20

21 Reduce Dimension Show the SOM map SOM Cluster Reduce Dimension Vectorize Tweets Build Dictionary Detect Event

22 Reduce Dimension Random Projection 1. No Training. 2. Matrix Operation. Based on Johnson-Lindenstrauss lemma

23 Show the SOM map SOM Cluster Reduce Dimension Vectorize Tweets Build Dictionary Detect Event SOM Cluster

24 What is SOM? Self-organization Map. SOM Cluster

25

26

27 Test Data http://web.ist.utl.pt/acardoso/datasets/http://web.ist.utl.pt/acardoso/datasets/.

28 MethodRandom Projection Macro Accuracy(%) Micro Accuracy(%) Renato’s SOMNO6867 Our MethodYES6061 Conclusion: Random projection will result in losing precision. Hence the performance will decrease after dimension reduction. 20 Newsgroup Test

29 MethodRandom Projection Macro Accuracy(%) Micro Accuracy(%) Renato’s SOMNO6867 Our MethodYES6061 Matlab repeat Renato’s SOM NO6362 Matlab repeat Renato’s SOM YES6160 20 Newsgroup Test

30 FIFA Data

31

32

33

34 Conclusion

35 Thanks for Watching Q & A


Download ppt "Topic cluster of Streaming Tweets based on GPU-Accelerated Self Organizing Map Group 15 Chen Zhutian Huang Hengguang."

Similar presentations


Ads by Google