EARLY DETECTION OF TWITTER TRENDS MILAN STANOJEVIC UNIVERSITY OF BELGRADE SCHOOL OF ELECTRICAL ENGINEERING.

Slides:



Advertisements
Similar presentations
Pattern Finding and Pattern Discovery in Time Series
Advertisements

Brief introduction on Logistic Regression
Forecasting OPS 370.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Bayesian Piggyback Control for Improving Real-Time Communication Quality Wei-Cheng Xiao 1 and Kuan-Ta Chen Institute of Information Science, Academia Sinica.
SEM II : Marketing Research
Spectrum Awareness in Cognitive Radio Systems based on Spectrum Sensing Miguel López-Benítez Department of Electrical Engineering and Electronics University.
Information | Analytics | Expertise SOCIAL MEDIA INTELLIGENCE Practical Strategies for Using Social Media to Enhance Security AUGUST 2014 © 2014 IHS IHS.
20 10 School of Electrical Engineering &Telecommunications UNSW UNSW 2. Laguerre Parameterised Hawkes Process To model spike train data,
Business Forecasting Chapter 10 The Box–Jenkins Method of Forecasting.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Data Handling & Analysis ZO4030 Linear Regression Andrew Jackson
1 In-Network PCA and Anomaly Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis § Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel.
Topic 2: Statistical Concepts and Market Returns
Lecture 24 Multiple Regression (Sections )
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
Twitter Mood Predicts the Stock Market Authors: Johan Bollen, Huina Mao, Xiao-Jun Zeng Presented By: Krishna Aswani Computing ID: ka5am.
Twitter Volume Spikes: Analysis and Application in Stock Trading Yuexin Mao, Wei Wei and Bing Wang COMP4332/RMBI4310 CHAN Chun Ting ( )
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
Anomaly detection Problem motivation Machine Learning.
28 August 2015T Kari Laitinen1 T Seminar on Wireless Future 3 ECTS cr Dr. Kari Laitinen Principal Lecturer Oulu University of Applied Sciences.
Presented by: Kamakhaya Argulewar Guided by: Prof. Shweta V. Jain
ENN: Extended Nearest Neighbor Method for Pattern Recognition
1 Speaker : 童耀民 MA1G Authors: Ze Li Dept. of Electr. & Comput. Eng., Clemson Univ., Clemson, SC, USA Haiying Shen ; Hailang Wang ; Guoxin.
Technion - Israel Institute of Technology Department of Electrical Engineering Advanced Topics in Computer Vision Course Presentation By Stav Shapiro.
Anomaly detection with Bayesian networks Website: John Sandiford.
Nynke Hofstra and Mark New Oxford University Centre for the Environment Trends in extremes in the ENSEMBLES daily gridded observational datasets for Europe.
Link Reconstruction from Partial Information Gong Xiaofeng, Li Kun & C. H. Lai
CS380y Junior Thesis1 The Performance of TCP/IP over Bluetooth Chris Snow Supervisors: Serguei Primak, Electrical Engineering Hanan Lutfiyya, Computer.
Kristina Lerman Aram Galstyan USC Information Sciences Institute Analysis of Social Voting Patterns on Digg.
Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Event Detection using Customer Care Calls 04/17/2013 IEEE INFOCOM 2013 Yi-Chao Chen 1, Gene Moo Lee 1, Nick Duffield 2, Lili Qiu 1, Jia Wang 2 The University.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
© 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
1 A Randomized Space-Time Transmission Scheme for Secret-Key Agreement Xiaohua (Edward) Li 1, Mo Chen 1 and E. Paul Ratazzi 2 1 Department of Electrical.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Week 11 Introduction A time series is an ordered sequence of observations. The ordering of the observations is usually through time, but may also be taken.
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Jan 10, 2001CSCI {4,6}900: Ubiquitous Computing1 Administrative Chores Add yourself to the mailing
Prediction of Influencers from Word Use Chan Shing Hei.
CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.
Anant Pradhan PET: A Statistical Model for Popular Events Tracking in Social Communities Cindy Xide Lin, Bo Zhao, Qiaozhu Mei, Jiawei Han (UIUC)
Wong, Gardner, Krieger, Litt (2006) Zack Dvey-Aharon, March 2008.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
Speculating about causes essay. What is a speculating about causes essay ? O It is the type of essay that requires students to speculate about the causes.
CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.
1 GPS-Free-Free Positioning System for Wireless Sensor Networks Farid Benbadis, Timur Friedman, Marcelo Dias de Amorim, and Serge Fdida IEEE WCCN 2005.
1 Monitoring and Early Warning for Internet Worms Authors: Cliff C. Zou, Lixin Gao, Weibo Gong, Don Towsley Univ. Massachusetts, Amherst Publish: 10th.
Combined Human, Antenna Orientation in Elevation Direction and Ground Effect on RSSI in Wireless Sensor Networks Syed Hassan Ahmed, Safdar H. Bouk, Nadeem.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Financial Data mining and Tools CSCI 4333 Presentation Group 6 Date10th November 2003.
A Nonparametric Method for Early Detection of Trending Topics Zhang Advisor: Prof. Aravind Srinivasan.
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.
Two Approaches to Estimation of Classification Accuracy Rate Under Item Response Theory Quinn N. Lathrop and Ying Cheng Assistant Professor Ph.D., University.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
Lecture 9 Forecasting. Introduction to Forecasting * * * * * * * * o o o o o o o o Model 1Model 2 Which model performs better? There are many forecasting.
Uncovering Social Spammers: Social Honeypots + Machine Learning
Machine Learning for Computer Security
Online Conditional Outlier Detection in Nonstationary Time Series
Supervised Time Series Pattern Discovery through Local Importance
Predicting Prevalence of Influenza-Like Illness From Geo-Tagged Tweets
Forecasting.
Supervised machine learning: creating a model
TIME SERIES MODELS – MOVING AVERAGES.
Presentation transcript:

EARLY DETECTION OF TWITTER TRENDS MILAN STANOJEVIC UNIVERSITY OF BELGRADE SCHOOL OF ELECTRICAL ENGINEERING

CONTENTS  Introduction  Trending topics  Parametric model  Data-Driven approach  Experiment results  Conclusion 2/22 Milan Stanojevic

INTRODUCTION  Events occur in large datasets  We need:  detection  classification  prediction  Parametric models are popular but overly simplistic  Nonparametric approach is proposed for time series inference  Observed signal is compared to two sets of reference signals – positive and negative examples Is there enough information for earlier prediction? (spoiler alert: YES) 3/22 Milan Stanojevic

TRENDING TOPICS  Twitter: a global communication network  Tweet: a short, public message  Topic: a phrase in a tweet  Trending topic (trend): a topic that becomes popular 4/22 Milan Stanojevic

PARAMETRIC MODEL  Expect certain type of pattern  usually constant + jumps  Fit parameter in data  e.g. size of a jump 5/22 Milan Stanojevic

DATA-DRIVEN APPROACH  All the information needed is in the data  Assumptions:  tweets are written by people  people are simple:  in how they spread information  in how they connect to each other  there is a small number of distinct ways in which a topic becomes trending 6/22 Milan Stanojevic

DATA-DRIVEN APPROACH 7/22 Milan Stanojevic

DATA DRIVEN APPROACH 8/22 Milan Stanojevic

CLASSIFICATION BY EXPERTS 9/22 Milan Stanojevic

CLASSIFICATION BY EXPERTS 10/22 Milan Stanojevic

CLASSIFICATION BY EXPERTS 11/22 Milan Stanojevic

CLASSIFICATION BY EXPERTS 12/22 Milan Stanojevic

CLASSIFICATION BY EXPERTS 13/22 Milan Stanojevic

CLASSIFICATION BY EXPERTS 14/22 Milan Stanojevic

CLASSIFICATION BY EXPERTS 15/22 Milan Stanojevic

CLASSIFICATION BY EXPERTS 16/22 Milan Stanojevic

CLASSIFICATION BY EXPERTS 17/22 Milan Stanojevic

CLASSIFICATION BY EXPERTS Properties  simple: computation of distances  scalable: computation is easily parallelized  nonparametric: model “parameters” scale along with the data 18/22 Milan Stanojevic

EXPERIMENT SETUP  Dataset:  500 trends  500 non-trends  Do trend detection of 50% holdout set of topics  Online signal classification RESULTS  Early detection  79% rate of early detection, 1.43hrs average  Low rate of error  95% true positive rate, 4% false positive rate 19/22 Milan Stanojevic

EXPERIMENT FPR / TPR TradeoffEarly / Late Tradeoff 20/22 Milan Stanojevic

CONCLUSION  New approach to detecting Twitter trends  Generalized time series analysis method:  Classification  Prediction  Anomaly detection  Possible applications:  Movie ticket sales  Stock prices  etc. 21/22 Milan Stanojevic

BIBLIOGRAPHY Trend or No Trend: A Novel Nonparametric Method for Classifying Time Series Stanislav Nikolov Master thesis Massachusetts Institute of Technology (2011) 22/22 Milan Stanojevic