Online Discovery of Group Level Events in Time Series Xi C. Chen Computer Science & Engineering University of Minnesota Vijay K Narayanan.

Slides:



Advertisements
Similar presentations
A probabilistic model for retrospective news event detection
Advertisements

Pattern Finding and Pattern Discovery in Time Series
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
HW 4 Answers.
PARTITIONAL CLUSTERING
Word Spotting DTW.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.
Evaluating the Use of Outbreak Detection Algorithms to Detect Tuberculosis Outbreaks in Scotland Ben Tait Dr Janet Stevenson.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
Online Pattern Discovery Applications in Data Streams Sensor-less: Pairs-trading in stock trading (find highly correlated pairs in n log n time) Sensor-full:
Jessica Lin, Eamonn Keogh, Stefano Loardi
Region of Interests (ROI) Extraction and Analysis in Indexing and Retrieval of Dynamic Brain Images Researcher: Xiaosong Yuan, Advisors: Paul B. Kantor.
Cluster Validation.
Financial Ratios Lecture 6
Image Segmentation by Clustering using Moments by, Dhiraj Sakumalla.
Copyright © Cengage Learning. All rights reserved. 11 Applications of Chi-Square.
1 CS 178H Introduction to Computer Science Research What is CS Research?
Navigating and Browsing 3D Models in 3DLIB Hesham Anan, Kurt Maly, Mohammad Zubair Computer Science Dept. Old Dominion University, Norfolk, VA, (anan,
Efficient Capital Markets Objectives: What is meant by the concept that capital markets are efficient? Why should capital markets be efficient? What are.
Patterns of significant seismic quiescence in the Pacific Mexican coast A. Muñoz-Diosdado, A. H. Rudolf-Navarro, A. Barrera-Ferrer, F. Angulo-Brown National.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Programming Collective Intelligence by Toby.
The Power of Moving Averages in Financial Markets By: Michael Viscuso.
OPTIMIZATION OF FUNCTIONAL BRAIN ROIS VIA MAXIMIZATION OF CONSISTENCY OF STRUCTURAL CONNECTIVITY PROFILES Dajiang Zhu Computer Science Department The University.
Lecture 20: Cluster Validation
Chapter 16 Jones, Investments: Analysis and Management
CHAPTER EIGHTEEN Technical Analysis CHAPTER EIGHTEEN Technical Analysis Cleary / Jones Investments: Analysis and Management.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
FAT TAILS REFERENCES CONCLUSIONS SHANNON ENTROPY AND ADJUSTMENT OF PARAMETERS AN ADAPTIVE STOCHASTIC MODEL FOR RETURNS An adaptive stochastic model is.
Feng Zhang and Aris Georgakakos School of Civil and Environmental Engineering, Georgia Institute of Technology Sample of Chart Subheading Goes Here Comparing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Synthesizing Natural Textures Michael Ashikhmin University of Utah.
1 Value of information – SITEX Data analysis Shubha Kadambe (310) Information Sciences Laboratory HRL Labs 3011 Malibu Canyon.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
University of Macau Discovering Longest-lasting Correlation in Sequence Databases Yuhong Li Department of Computer and Information Science.
1 THE FUTURE: RISK AND RETURN. 2 RISK AND RETURN If the future is known with certainty, all investors will hold assets offering the highest rate of return.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
Hybrid Load Forecasting Method With Analysis of Temperature Sensitivities Authors: Kyung-Bin Song, Seong-Kwan Ha, Jung-Wook Park, Dong-Jin Kweon, Kyu-Ho.
D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.
Advanced Science and Technology Letters Vol.28 (EEC 2013), pp Histogram Equalization- Based Color Image.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Crowd Fraud Detection in Internet Advertising Tian Tian 1 Jun Zhu 1 Fen Xia 2 Xin Zhuang 2 Tong Zhang 2 Tsinghua University 1 Baidu Inc. 2 1.
Validation of Satellite-derived Clear-sky Atmospheric Temperature Inversions in the Arctic Yinghui Liu 1, Jeffrey R. Key 2, Axel Schweiger 3, Jennifer.
© Vipin Kumar IIT Mumbai Case Study 2: Dipoles Teleconnections are recurring long distance patterns of climate anomalies. Typically, teleconnections.
Market analysis for the S&P500 Giulio Genovese Tuesday, December
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
CHAPTER 11 The Stock Market. Section 3: The Stock Market  Objectives:  Evaluate the benefits and risks of buying stock by comparing them to those of.
Omni-channel Maturity Analysis Lester Allan Lasrado Copenhagen Business School 28 th Jan 2016.
Enabling Beyond-Surface Interactions for Interactive Surface with An Invisible Projection Li-Wei Chan, Hsiang-Tao Wu, Hui-Shan Kao, Ju-Chun Ko, Home-Ru.
V k equals the vector difference between the object and the block across the first and last frames in the image sequence or more formally: Toward Learning.
G10 Anuj Karpatne Vijay Borra
Volume 94, Issue 1, Pages (January 2008)
IGRASS2011 An Interferometric Coherence Optimization Method Based on Genetic Algorithm in PolInSAR Peifeng Ma, Hong Zhang, Chao Wang, Jiehong Chen Center.
Figure 1. Spatial distribution of pinyon-juniper and ponderosa pine forests is shown for the southwestern United States. Red dots indicate location of.
CSE 4705 Artificial Intelligence
Faculty of Engineering and Physical Sciences
Mean Reverting Asset Trading
Grid Cells Encode Local Positional Information
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
Economic Activity in a Changing World Chapter 3 pp
Figure 3 Anatomical distribution of all implanted electrodes
Dennis P. Lettenmaier Andrew W. Wood, and Kostas Andreadis
MATH 6380J Mini-Project 1: Realization of Recent Trends in Machine Learning Community in Recent Years by Pattern Mining of NIPS Words Chan Lok Chun
Grid Cells Encode Local Positional Information
Measuring Actin Flow in 3D Cell Protrusions
Albert K. Lee, Matthew A. Wilson  Neuron 
Volume 34, Issue 4, Pages (May 2002)
Presentation transcript:

Online Discovery of Group Level Events in Time Series Xi C. Chen Computer Science & Engineering University of Minnesota Vijay K Narayanan Cloud and Information Services Lab Microsoft Corporation Changes in time series Traditional changes in time series  Assume that auto-correlation of a time series should be preserved when no changes happen.  Most algorithms detect breaking points where predefined statistics (e.g., mean and variance) change.  CUSUM considers the mean of an unchanged time series to be stable and hence it searches for breaking points when the mean shifts.  BIFAST assume that values in a time series are periodically generated from a certain model. Therefore, it detects the change point when the coefficients of the models change. i.Singleton contextual event [1] Unchanged Changed a.Group Disbanding (Top panel): Detecting the time when one group of time series disbands into two or more subgroups. b.Group Formation (Bottom panel): Detecting the time when two or more groups of time series merges into a single larger group Contextual changes in time series  Events that change cross-correlation.  There are two types of contextual changes. i.Singleton contextual changes ii.Group level contextual changes. ii.Group level contextual event Proposed framework As new observations are collected, the method performs three steps: i.AutoDBScan to group similar time series. ii.Similarity aware entropy to score group formation/disbanding events iii.Threshold based method to report Group formation/disbanding detection results Experimental results Real world events in time series Changes in time series (Continued): group level contextual events Abdullah Mueen Computer Science & Engineering University of Minnesota Vipin Kumar Computer Science & Engineering University of Minnesota Nikos Karampatziakis Cloud and Information Services Lab Microsoft Corporation Gagan Bansal Cloud and Information Services Lab Microsoft Corporation Real world events can be observed in time series. Forest Fire Unchanged Changed Acknowledgement: Part of this work was done when the first author was an intern in the Cloud & Information Services Lab in Microsoft. It was supported in part by the National Science Foundation under Grants IIS and IIS , as well as the Planetary Skin Institute. Access to computing facilities was provided by the University of Minnesota Supercomputing Institute. 25-May May May May May May Figure: A set of EVI time series which disbands in August 2009 because of forest fire. Such disbanding pattern is useful to detect events from time series datasets. Red and green points show the time series of points inside (marked as the red locations) and outside (marked as the green locations) the fire a affected region, respectively. Vegetation index dataset  A region in California that is bounded by 36.5⁰N, 35.9 ⁰ N, ⁰ W and 122 ⁰ W and contains 3345 grid cells.  We have run our algorithm for a window of two years (46 time steps) on this dataset to find historical contextual changes.  Twenty-five group disbanding events were detected for the period Aug May Fig. (a) shows all the locations that belong to one of the disbanding events around the area shown. Fig. (b) - (e) show four of these group disbanding events. In each of these events, EVI time series that show similar patterns over two years are grouped together and then they split into roughly two groups around Aug 2009, when the patch bounded within the red line was burned. The algorithm reports that all the events occurred during the time window within the two arrows. Stock price data  5825 time series of the daily closing price of different stock tickers starting from April, 1996 till July, 2013 in the NYSE.  We have run our algorithm for a window of 60 business days on this dataset to find historical contextual changes of stock tickers. A disbanding event detected from the stock price data. All tickers in this group are REITs ((Real Estate Investment Trust). Two of the tickers significantly rise after June 2012 while, others remain within the context and show stability for more than six months after the event. The two rising time series are stock tickers from two self-service storage companies (EXR and SSS). The others are real estate companies in different parts of US and none of them does self-service storage business as per google finance. The event started at June 2012 that is the usual time of the year for publishing the quarterly/annual financial reports. AutoDBSCAN Same results as Yes No Stop More efficient Similarity-aware entropy (a)(b) Both panels show 10 time series that belong to the same cluster before the current observation. Assuming that a clustering method discovers 2 clusters in both, event scores based on entropy are identical in the two scenario. While, similarity-aware entropy scores (b) higher than (a). In many scenarios, we believe the event in panel (b) is more significant compared with the event shown in panel (a). Scalable DBSCAN x x 10 7 Number of time series running time in (Millisecond) Naive Indexing Indexing+Iterative distance The running time of the original DBSCAN implementation and the two optimized implementations. We see a speedup of up to 57× over the original implementation, which enabled us to search over more land area as well as larger correlation window in the experiments using EVI data.  Indexing-technique i.Order the time series dataset based on a reference time series. ii.Reduce search space based on the properties of the ordered dataset.  Iterative-technique i.Reduce time of distance calculation by using the results from previously time step. [1]. Xi C. Chen, Karsten Steinhaeuser, Shyam Boriah, Snigdhansu Chatterjee, and Vipin Kumar: Contextual Time Series Change Detection. SDM 2013: EVI (the shown time series) is an indicator of “greenness" of the earth's surface. When fire occurred, EVI would drop significantly due to the drastic changes in the greenness of the area. The Dow fell 22.61% on Black Monday (1987) from about the 2,500 level to around 1,750. Two days later, it rose 10.15% above the 2,000 level for a mild recovery attempt. Significant drop during the Great Depression The date of the forest fire (a) (b)(c) (d) (e) Jun-2012 AVB BRE CLP CPT EQY ESS EXR MAC SSS TCO HIW KRC Days New data obtained? AutoDBScan Calculate similarity aware entropy Report events Yes No