Presentation is loading. Please wait.

Presentation is loading. Please wait.

Random feature for sparse signal classification

Similar presentations


Presentation on theme: "Random feature for sparse signal classification"— Presentation transcript:

1 Random feature for sparse signal classification
Jen-Hao Rick Chang Aswin C. Sankaranarayanan B. V. K. Vijaya Kumar Hi, I am Jen-Hao Rick Chang from Carnegie Mellon University. Our work is called random feature for sparse signal classification. We provide a tighter bound for random feature method on sparse signals and propose compressive random feature that exploits the spareness of input signals to make kernel method scalable.

2 kernel matrix computation
Kernel method Kernel method does not scale well N training samples kernel SVM Kernel method does not scale well in term of the size of the dataset. It uses a so called kernel trick to avoid constructing infinitely high dimensional lifted data. However, kernel trick induces high storage and computation costs during both training and test phases. For example, if you have one million training sample, you need at least 1 terabyte of memory to store the kernel matrix and you need to go through almost all the one million samples to evaluate a single test point. <old> It handles linearly inseparable datasets by lifting them into possibly- infinitely high dimensional spaces where the data may be separable. Even though we can utilize kernel functions to avoid constructing such high dimensional lifted data, it requires the construction of a large kernel matrix whose size quadratically depends on the number of training samples. Besides, during test phase, in order to evaluate a single test sample using kernel trick, we need to access a large portion of the training samples. Therefore, while kernel method provides benefits like convexity and understanding of the problems, it does not scale well with the size of the data set. For example, if you have one million training sample, you need at least 1 terabyte of memory to store the kernel matrix and you need to go through almost all the one million training samples to evaluate a single test point. data storage kernel matrix computation kernel matrix storage testing storage

3 kernel matrix computation
Kernel method Kernel method does not scale well N training samples kernel SVM Our work exploits a method called random feature and the sparsity of the input signals to greatly reduce the storage, computation, and acquisition costs. data storage kernel matrix computation kernel matrix storage testing storage sparse signals: This work

4 random feature computation random feature storage
Make kernel method scale gracefully N training samples linear SVM Random feature is first developed by Rahimi and Recht to make kernel method scale gracefully with the size of the dataset. By constructing a M-dimensional features from the data, whose inner products approximate the original kernel function, it alleviates the quadratic dependency on the size of the dataset during training phase and makes the test phase independent to the training dataset. Their result shows that M needs to be proportional to the original dimensionality of the signals to achieve good approximation. data storage random feature computation random feature storage testing storage Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. NIPS

5 Our contributions provide an enhanced bound for sparse signals like images and videos: for k-sparse, d-dimensional signals propose compressive random feature —> exploits signal sparsity —> improves data storage, computation, and acquisition costs Our work has two contributions. First, we analyze random feature’s performance on sparse signals. For sparse signals like images and videos, we provides a tighter bound of the dimension of random feature. Specifically, for k-sparse, d-dimensional signals, M only needs to be proportional to k log(d/k) to achieve good kernel approximation. When the sparsity of the signal is high, our result greatly tightens the bound. Second, we propose a new scheme for random feature, compressive random feature. Compressive random feature exploits the sparsity of the signals to further reduce storage, computation, and data acquisition costs.

6 Compressive random feature
N training samples k-sparse compress random feature proposed method data storage feature computation feature storage testing storage random feature kernel method The proposed compressive random feature is an effective combination of compressive sensing and random feature. Specifically, for signals that are k-sparse canonically or after transformation, we first perform random projection to reduce the dimensionality of the signals. Then we perform the typical random feature on the compressed signals. We prove that the compressive random feature has similar kernel approximation ability even with the additional dimensionality reduction. With the dimensionality reduction, we effectively reduce data storage, and computation. And with compressive sensing technique, we also reduce data acquisition cost. for sparse signals (ex: images) :

7 Classification result on MNIST
Since the proposed compressive random feature has lower computational costs and at the same time retains the ability to approximate kernel function, on MNIST dataset, compared to the original random feature, it achieves similar classification accuracy in shorter time. Similar results can also be seen across many datasets, including CIFAR-10 and street view house number dataset. We welcome you to more details at our poster spot.


Download ppt "Random feature for sparse signal classification"

Similar presentations


Ads by Google