Local Discriminative Distance Metrics and Their Real World Applications Local Discriminative Distance Metrics and Their Real World Applications Yang Mu,

Slides:

Advertisements

Similar presentations

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

DIMENSIONALITY REDUCTION: FEATURE EXTRACTION & FEATURE SELECTION Principle Component Analysis.

Olivier Duchenne ， Armand Joulin ， Jean Ponce Willow Lab ， ICCV2011.

Data Mining Classification: Alternative Techniques

Data Mining Classification: Alternative Techniques

A Geometric Perspective on Machine Learning 何晓飞浙江大学计算机学院 1.

An Overview of Machine Learning

LPP-HOG: A New Local Image Descriptor for Fast Human Detection Andy Qing Jun Wang and Ru Bo Zhang IEEE International Symposium.

Crime Forecasting Using Boosted Ensemble Classifiers Chung-Hsien Yu Crime Forecasting Using Boosted Ensemble Classifiers Department of Computer Science.

A novel supervised feature extraction and classification framework for land cover recognition of the off-land scenario Yan Cui

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.

Discriminative and generative methods for bags of features

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Support Vector Machines (SVMs) Chapter 5 (Duda et al.)

Graph Based Semi- Supervised Learning Fei Wang Department of Statistical Science Cornell University.

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Principle of Locality for Statistical Shape Analysis Paul Yushkevich.

Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Online Learning Algorithms

Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Outline Classification Linear classifiers Perceptron Multi-class classification Generative approach Naïve Bayes classifier 2.

General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Local Non-Negative Matrix Factorization as a Visual Representation Tao Feng, Stan Z. Li, Heung-Yeung Shum, HongJiang Zhang 2002 IEEE Presenter : 張庭豪.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.

1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.

Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.

Linear Models for Classification

An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)

Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.

Mete Ozay, Fatos T. Yarman Vural —Presented by Tianxiao Jiang

Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.

A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:

Crime Forecasting Using Spatio-temporal Pattern with Ensemble Learning PAKDD 2014 Crime Forecasting Using Spatio-temporal Pattern with Ensemble Learning.

Crime Forecasting Using Data Mining Techniques: Chung-Hsien Yu, Max W. Ward, Melissa Morabito, and Wei Ding Crime Forecasting Using Data Mining Techniques.

Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.

Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Wavelet domain image denoising via support vector regression

Neural networks and support vector machines

CS 9633 Machine Learning Support Vector Machines

Presented by: Chung-Hsien Yu

Correlative Multi-Label Multi-Instance Image Annotation

Supervised Time Series Pattern Discovery through Local Importance

Basic machine learning background with Python scikit-learn

Machine Learning Basics

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Image Segmentation Techniques

CS 2750: Machine Learning Support Vector Machines

CSCI B609: “Foundations of Data Science”

COSC 4335: Other Classification Techniques

The following slides are taken from:

Support Vector Machines and Kernels

Nonlinear Dimension Reduction:

Using Manifold Structure for Partially Labeled Classification

CAMCOS Report Day December 9th, 2015 San Jose State University

Presentation transcript:

Local Discriminative Distance Metrics and Their Real World Applications Local Discriminative Distance Metrics and Their Real World Applications Yang Mu, Wei Ding University of Massachusetts Boston 2013 IEEE International Conference on Data Mining, Dallas, Texas, Dec. 7 PhD Forum

Classification Distance learning Feature selection Feature extraction Large-scale Data Analysis framework Representation Discrimination Linear time Online algorithm Structure Pairwise constraints Separability Performance IEEE TKDE in submitting ICAMPAM (1), 2013 ICAMPAM (2), 2013 IJCNN, 2011 KSEM, 2011 ACM TIST, 2011 IEEE TSMC-B, 2011 Neurocomputing, 2010 Cognitive Computation, 2009 KDD 2013 ICDM 2013 IEEE TKDE in submitting PR 2013 ICDM PhD forum, 2013 IJCNN, 2011 IEEE TSMC-B, 2011 Neurocomputing, 2010 Cognitive Computation, 2009

Feature selection Distance learning Classification Feature extraction Representation Discrimination

Mars impact crater data Input crater image Two S1 maps in one band C1 map pool over scales within band C1 map pool over local neighborhood Linear summation Max operation within S1 band Max operation within C1 map Y. Mu, W. Ding, D. Tao, T. Stepinski: Biologically inspired model for crater detection. IJCNN (2011) W. Ding, T. Stepinski:, Y. Mu: Sub-Kilometer Crater Discovery with Boosting and Transfer Learning. ACM TIST 2(4): 39 (2011):

Crime data Spatial influence Temporal influence The influence of other criminal events Other criminal events may influence the residential burglaries: construction permits, foreclosure, mayor hotline inputs, motor vehicle larceny, social events, and offender data 5 Crimes will be never spatially isolated (broken window theory) … Time series patterns obey the social Disorganization theories

[1, 0, 1, 1, 1, 0, 1, 0, 0] Geometry structure is destroyed Original structure Vector feature Feature representation An example of residential burglary in a fourth-order tensor 6 [Residential Burglary, Social Events,…, Offender data] … … … … Tensor feature Y. Mu, W. Ding, M. Morabito, D. Tao: Empirical Discriminative Tensor Analysis for Crime Forecasting. KSEM 2011

Y. Mu, H. Lo, K. Amaral, W. Ding, S. Crouter: Discriminative Accelerometer Patterns in Children Physical Activities, ICAMPAM, 2013 K. Amaral, Y. Mu, H. Lo, W. Ding, S. Crouter: Two-Tiered Machine Learning Model for Estimating Energy Expenditure in Children, ICAMPAM, 2013 Y. Mu, H. Lo, W. Ding, K. Amaral, S. Crouter: Bipart: Learning Block Structure for Activity Detection, IEEE TKDE submitted Accelerometer data Feature vectors One activity has multiple feature vectors, we proposed the block feature representation for each activity.

Other feature extraction works Y. Mu, D. Tao: Biologically inspired feature manifold for gait recognition. Neurocomputing 73(4-6): (2010) B. Xie, Y. Mu, M. Song, D. Tao: Random Projection Tree and Multiview Embedding for Large-Scale Image Retrieval. ICONIP (2) 2010: Y. Mu, D. Tao, X. Li, F. Murtagh: Biologically Inspired Tensor Features. Cognitive Computation 1(4): (2009)

Feature selection Distance learning Classification Feature extraction Linear time Online algorithm

Y. Mu, W. Ding, T. Zhou, D. Tao: Constrained stochastic gradient descent for large-scale least squares problem. KDD 2013 K. Yu, X. Wu, Z. Zhang, Y. Mu, H. Wang, W. Ding: Markov blanket feature selection with non-faithful data distributions. ICDM 2013 Online feature selection methods Lasso Group lasso Elastic net and etc. Common issue Least squares loss optimization We proposed a fast least square loss optimization approach, which benefits all least square based algorithms

Feature selection Distance learning Classification Feature extraction Structure Pairwise constraints

Why am I close to that guy? Why not use Euclidean space?

Representative state-of-the-art methods

Our approach (i) A generalized form Y. Mu, W. Ding, D. Tao: Local discriminative distance metrics ensemble learning. Pattern Recognition 46(8): 2013 Y. Mu, W. Ding: Local Discriminative Distance Metrics and Their Real World Applications. ICDM PhD forum, 2013

Can the Goals be Satisfied? local region 1 with left shadowed craters local region 2 with right shadowed craters Optimization issue (constraints will be compromised) Projection directions conflict Non-Crater Projection direction

Comments: 1.The summation is not taken over i. n distance metrics in total for n training samples. 2.The distance between different class samples are maximized. Our approach (ii) Y. Mu, W. Ding, D. Tao: Local discriminative distance metrics ensemble learning. Pattern Recognition 46(8): 2013 Y. Mu, W. Ding: Local Discriminative Distance Metrics and Their Real World Applications. ICDM PhD forum, 2013

Feature selection Distance learning Classification Feature extraction Separability Performance

VC Dimension Issues In classification problem, distance metric serves for classifiers Most classifiers have limited VC dimension. For example: linear classifier in 2-dimensional space has VC dimension 3. Fail Therefore, a good distance metric does not mean a good classification result

Our approach (iii) We have n distance metrics for n training samples. By training classifiers on each distance metric, we will have n classifiers. This is similar to K-Nearest Neighbor classifier which has infinite VC-dimensions

Complexity analysis

Theoretical analysis 1.The convergence rate to the generalized error for each distance metric (with VC dimension) 2.The error bound for each local classifier (with VC dimension) 3.The error bound for classifiers ensemble (without VC dimension) Detail proof please refer to: Y. Mu, W. Ding, D. Tao: Local discriminative distance metrics ensemble learning. Pattern Recognition 46(8): 2013 Y. Mu, W. Ding: Local Discriminative Distance Metrics and Their Real World Applications. ICDM, PhD forum 2013

Accelerometer based activity recognition Crater detection Crime prediction New crater feature under proposed distance metric Proposed method