Feature Selection on Time-Series Cab Data

Slides:



Advertisements
Similar presentations
Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.
Advertisements

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Machine Learning Lecture 8 Data Processing and Representation
Yue Han and Lei Yu Binghamton University.
Minimum Redundancy and Maximum Relevance Feature Selection
MESA LAB Two papers in IFAC14 Guimei Zhang MESA LAB MESA (Mechatronics, Embedded Systems and Automation) LAB School of Engineering, University of California,
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Principal Component Analysis
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Detection of Nuclear Threats: Defending Multiple Ports Jeffrey Victor Truman 17 July 2009.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
PCA Channel Student: Fangming JI u Supervisor: Professor Tom Geoden.
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Table of Contents Solving Linear Systems of Equations - Substitution Method Recall that to solve the linear system of equations in two variables... we.
Image Segmentation by Clustering using Moments by, Dhiraj Sakumalla.
Chapter 2 Dimensionality Reduction. Linear Methods
Multimedia Databases (MMDB)
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
Machine Learning CSE 681 CH2 - Supervised Learning.
Learning a Fast Emulator of a Binary Decision Process Center for Machine Perception Czech Technical University, Prague ACCV 2007, Tokyo, Japan Jan Šochman.
Complex Variables & Transforms 232 Presentation No.1 Fourier Series & Transforms Group A Uzair Akbar Hamza Saeed Khan Muhammad Hammad Saad Mahmood Asim.
1 Effective Feature Selection Framework for Cluster Analysis of Microarray Data Gouchol Pok Computer Science Dept. Yanbian University China Keun Ho Ryu.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
SINGULAR VALUE DECOMPOSITION (SVD)
Dimensionality Reduction Motivation I: Data Compression Machine Learning.
CSE 185 Introduction to Computer Vision Face Recognition.
Indoor Location Detection By Arezou Pourmir ECE 539 project Instructor: Professor Yu Hen Hu.
Lecture 4 Linear machine
Design of PCA and SVM based face recognition system for intelligent robots Department of Electrical Engineering, Southern Taiwan University, Tainan County,
Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.
Secure Unlocking of Mobile Touch Screen Devices by Simple Gestures – You can see it but you can not do it Muhammad Shahzad, Alex X. Liu Michigan State.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Anomaly Detection in GPS Data Based on Visual Analytics Kyung Min Su - Zicheng Liao, Yizhou Yu, and Baoquan Chen, Anomaly Detection in GPS Data Based on.
Feature Selection and Dimensionality Reduction. “Curse of dimensionality” – The higher the dimensionality of the data, the more data is needed to learn.
2D-LDA: A statistical linear discriminant analysis for image matrix
Scalable Learning of Collective Behavior Based on Sparse Social Dimensions Lei Tang, Huan Liu CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/02/01.
Dr. Gheith Abandah 1.  Feature selection is typically a search problem for finding an optimal or suboptimal subset of m features out of original M features.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Solving a System of 3 Equations with 3 Unknowns. Breakdown Step 1 Labeling Step 2 Reduce to a 2 by 2 Step 3 Substitute Back In Step 4 Check Solution.
Chapter 15: Classification of Time- Embedded EEG Using Short-Time Principal Component Analysis by Nguyen Duc Thang 5/2009.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Unsupervised Learning II Feature Extraction
Chapter 9 Estimation using a single sample. What is statistics? -is the science which deals with 1.Collection of data 2.Presentation of data 3.Analysis.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Principal Component Analysis (PCA)
In Search of the Optimal Set of Indicators when Classifying Histopathological Images Catalin Stoean University of Craiova, Romania
Presented by Jingting Zeng 11/26/2007
University of Ioannina
Recognition: Face Recognition
Dimension Reduction via PCA (Principal Component Analysis)
Principal Component Analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Machine Learning Math Essentials Part 2
Parallelization of Sparse Coding & Dictionary Learning
Announcements Project 2 artifacts Project 3 due Thursday night
Feature Selection Methods
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Xiao-Yu Zhang, Shupeng Wang, Xiaochun Yun
Lecture 16. Classification (II): Practical Considerations
The “Margaret Thatcher Illusion”, by Peter Thompson
Presentation transcript:

Feature Selection on Time-Series Cab Data Yingkit (Keith) Chow

Contents Introduction Features Considered FCBF (Filter-type feature selection) FCBF-PCA (my variation) Conclusion

All Features Considered Each time sample consists of the following features Day of Week, Time of Day (1st two features) taxis[t, 6:9], taxis[t-1, 6:9],…, taxis[t-5, 6:9] [6:9] represents the index to the matrix taxis, which is the cab entering with meter off, cab enter on, cab exit off, cab exit on Not all features here will be relevant to classifying whether a game is present.

Fast Correlation-Based Filter Algorithm: Finds features that are relevant ( SU(I, C) > threshold), where SU is symmetric uncertainty and will be described in the next slide Remove redundant features by comparing remaining features (after the first step) Remove feature j if SU(i, j) >= SU(j, C) Check out [1] to get the pseudo code. Basically ‘I’ is a feature in the ideal subset (starting with the individual most informative feature). Then check to see if j is redundant and if it is informative of class C.

Equations[1] Information Gain (IG) Symmetric Uncertainty (SU) IG(X|Y) = H(X) – H(X|Y) Symmetric Uncertainty (SU) SU(X,Y) = 2 * IG(X|Y) / [H(X)+H(Y)] SU is used instead of IG because it compensates for features having more values and normalizes data[1] H(X) is entropy of feature X, and H(X|Y) is the conditional entropy of X when Y is known.

FCBF Classifier (MATLAB Classify- Linear) Number Bins = 96 Threshold = 0.01 Accuracy = 91.9% I was expecting features immediately before the start of game to be selected. However, this is likely due to the fact that trying to classify when the games are active, so samples in the middle and end of the game will rely on data 5 samples ago in helping in the decision process.

Choice of Number Bins Num Bins = 96 results shown in previous slide (red is ground truth of game and blue is my classification) Num Bins = 20 Accuracy = 58.6% Here the algorithm breaks down and only chooses feature 2, the “time of day”. The blue is periodic here, where a certain time segment a day, everyday will be classed as a game. Throughout this presentation I use 11:10000 samples of taxis data for training. Then Test with samples 25000:32000 Both the blue and red plots are boolean, the scaling is done to help visualize how the classification is going.

FCBF - PCA FCBF compares individual features with each other We can use PCA to try and capture a group of features. (for example, maybe one eigenvector can capture the shape of the number of cabs incoming with meters on initially before a game or the increase in the number of cabs entering with meters off prior to the end of game) Example shown in the next slide

Cab Traffic Behavior Before Start of Game Towards End of Game Cab On Enter and Cab Off Exit are high Towards End of Game Cab Off Enter and Cab On Exit are high

FCBF-PCA Classifier (MATLAB Classify- Linear) Number Bins = 20 Threshold = 0.01 Accuracy = 92.9% Note: the features here are projections onto the eigenvectors and not the original feature dimension

Conclusions The choice of number of bins have an enormous impact on the performance. (possibly due to 96 discrete values of time of day variable) FCBF-PCA was less susceptible to the choice of numBins (10, 20, 100 numBins all resulted in approximately 91% accuracy)

Future Work Currently using labels of game or not game. I’ll try to make it work for detecting the first sample of a game and another classifier to detect the last sample of a game since the mid-game generally has an entirely different characteristic from the beginning and end of game. However, I might be limited by the number of samples.

Questions I’m not currently in NYC so please send questions or comments to: yingkit.chow@gmail.com

Citations “Feature Selection for High Dimensional Data: A Fast Correlation-Based Filter Solution”, by Lei Yu and Huan Liu, ICML (2003) “Efficient Feature Selection via Analysis of Relevance and Redundancy”, by Lei Yu and Huan Liu, Journal of Machine Learning Research 5 (2004)