Presentation is loading. Please wait.

Presentation is loading. Please wait.

April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

Similar presentations


Presentation on theme: "April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,"— Presentation transcript:

1 April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University, Swiss  Minh-Son Dao, University of Information Technology, Vietnam  Riccardo Mattivi, Trento University, Italy  Francesco G.B. De Natale, Trento University, Italy SEWM – ICMR – 2014 Glasgow, UK

2 Outline  Social Event and Web-media  User-centric Parallel Split-n-merge for Events Clustering  Composite Kernel for Event Classification  Ongoing work  Conclusion April 2014 SEWM 20142

3 April 2014 SEWM 20143 -Tsunami -Miyagi, Japan -Mar 11, 2011 -Tsunami -Miyagi, Japan -Mar 11, 2011

4 Observations  Time-Location: Users cannot attend two events at the same time at different places whose locations are far away each other  Theme: Users in the same community tend to TAG the same event with similar words Users tend to take series of images in a short interval time for what they pay attention Images related to an event of a given type share some common visual features that are characteristic for that event type  Spatio-Temporal-Theme April 2014 SEWM 20144

5 User-centric Parallel Split-n-merge April 2014 SEWM 20145 Web media collection A crawled from Social Networks Convert A to UT-image Split each row of UT-image into clusters {b i } Merge {b i } using {location, time, theme} Merge {b i } using {location, time, theme} and Common-sense Merge {b i } using visual information

6 UT-Image April 2014 SEWM 20146 photo_url username dateTaken title description tags locations users time Sort by time for each row. Those pixels (in the same row) do not have time will be grouped and put at the beginning of the row

7 Split by TIME April 2014 Truc-Vien T. Nguyen7 If no time information, each pixel is treated as one cluster If there is time information

8 Merge by spatio-time-theme April 2014 Truc-Vien T. Nguyen8 for selected cluster b k, create -time-taken-boundary T k -Location-union L k -Document (tag, title, description) D k for any pair of clusters (b k, b l ), merge if 2/3 following conditions are hold -Tdistance(T k, T l ) ≤α -Ldistance(L k,L l ) ≤ β -JaccardIndex(D k, D l ) ≥ γ

9 Merge by common-sense April 2014 Truc-Vien T. Nguyen9 Process tf-idf on D k and select the most COMMON key-words to create ND k With any pair of cluster (b k,b l ), merge if JaccardIndex(ND k, ND l ) ≥ γ

10 Merge with Visual features April 2014 Truc-Vien T. Nguyen10 with any pair of cluster (b k, b l ), merge if JaccardIndex(BoW k, BoW l ) ≥ θ

11 Results – Events clustering April 2014 SEWM 201411 MediaEval 2013 dataset and participants

12 Result - Events Clustering April 2014 SEWM 201412 -The first run (Split, Merge by spatio-location-them) α=24 hours, β=5km, γ=0.2 -The second run (as the first) α=8 hours, β=2km, γ=0.2 -The third run (as the first plus common-sense merging) -The last run, as the third plus visual feature θ= 0.3

13 April 2014 SEWM 201413 Classification Problems  Supervised Learning: learn a function  : → from examples  Binary Classification: = {-1, +1}  Multi-class Classification: = {1,2,…,k}  Event Classification: Each member of has a set of features

14 April 2014 SEWM 201414 SVM- Multiclass Classification  Support Vector Machines (SVMs) Binary classification Computing a function (Kernel) between each pair of samples One Vs. Rest  Multi-class Classification

15 April 2014 SEWM 201415 Event Categories ClassEvent Type 0Conference 1Fashion 2Concert 3Non_event 4Sports 5Protest 6Other 7Exhibition 8Theater_dance

16 April 2014 SEWM 201416 Composite Kernel text features Coefficient visual features  212121,1),(),(EEKEEKEECK VT  

17 April 2014 SEWM 201417 Text Features  NLP basic features: the word, its lower-case, four prefixes, four suffixes, orthographic feature, word form feature.  Ontological features: obtained by matching w i with a knowledge base, for ex. “Washington”->City  Encyclopedic features: obtained by associating w i with Wikipedia, for ex. “Washington”-> http://en.wikipedia.org/wiki/Washington,_D.C.

18 An excerpt from the ontology April 2014 SEWM 201418

19 Visual Features April 2014 SEWM 201419 - Dense RGB-SIFT - SVM with histogram intersection kernel - the SVMs have been trained with the images given in the SED training set - codebook for the bag of words with 4096 visual words

20 Results – Events Classification April 2014 SEWM 201420 Run with test-set cross-validation on the training set

21 Ongoing work April 2014 SEWM 201421 Events clustering Web media Events classification Training data -Set of instances of events -Have ability of automatically annotating events -Extend to “automatically annotation images” Topic modeling (apply on set of document D k ) name clusters classifiers events Improve events clustering qualification

22 Conclusion April 2014 SEWM 201422 1.Event clustering -Simple and easy to develop -Can develop to run on parallel mode -Need to find the way to automatically adjust parameters 2.Event classification -Composite kernel combined both text and visual features -The combination has proved its robustness with a significant improvement in performance (from 45.83% to 53.58% with basic features, and from 47.61% to 54.86% with our new features) -Encyclopedic knowledge such as Wikipedia, could provide a great additional resource

23 Thanks for your attention April 2014 SEWM 201423 Q & A

24 April 2014 Truc-Vien T. Nguyen24 Features  w i is text of the title, description, or the tag in each event  l i is the word w i in lower-case  p1 i, p2 i, p3 i, p4 i are the four prefixes of w i  s1 i, s2 i, s3 i, s4 i are the four suffixes of w i  f i is the part-of-speech of w i  g i is the orthographic feature that test whether a word contains all upper-cased, initial letter upper-cased, all lower-cased.  k i is the word form feature that test whether a token is a word, a number, a symbol, a punctuation mark.  o i is the ontological features. We used an ontology and knowledge base that contains 355 classes, 99 properties, and more than 100,000 entities. Given a full ontology, w i is be matched to the deepest subsumed child class.


Download ppt "April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,"

Similar presentations


Ads by Google