Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.

Slides:

Advertisements

Similar presentations

Clustering. How are we doing on the pass sequence? Pretty good! We can now automatically learn the features needed to track both people But, it sucks.

Advertisements

Efficient Private Approximation Protocols Piotr Indyk David Woodruff Work in progress.

Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.

Partitional Algorithms to Detect Complex Clusters

Aggregating local image descriptors into compact codes

Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.

Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

Big Data Lecture 6: Locality Sensitive Hashing (LSH)

Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.

Searching on Multi-Dimensional Data

Clustering by Passing Messages Between Data Points Brendan J. Frey and Delbert Dueck Science, 2007.

Similarity Search in High Dimensions via Hashing

Data Structures and Functional Programming Algorithms for Big Data Ramin Zabih Cornell University Fall 2012.

VLSH: Voronoi-based Locality Sensitive Hashing Sung-eui Yoon Authors: Lin Loi, Jae-Pil Heo, Junghwan Lee, and Sung-Eui Yoon KAIST

Coherency Sensitive Hashing (CSH) Simon Korman and Shai Avidan Dept. of Electrical Engineering Tel Aviv University ICCV2011 | 13th International Conference.

Non-metric affinity propagation for unsupervised image categorization Delbert Dueck and Brendan J. Frey ICCV 2007.

Support Vector Machines and Kernel Methods

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Given by: Erez Eyal Uri Klein Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions.

RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.

Radial Basis Functions

Sparse Solutions for Large Scale Kernel Machines Taher Dameh CMPT820-Multimedia Systems Dec 2 nd, 2010.

1 Lecture 18 Syntactic Web Clustering CS

Similarity Search in High Dimensions via Hashing Aristides Gionis, Protr Indyk and Rajeev Motwani Department of Computer Science Stanford University presented.

Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.

Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.

Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.

1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.

INSTANCE-BASE LEARNING

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

CS Instance Based Learning1 Instance Based Learning.

FLANN Fast Library for Approximate Nearest Neighbors

© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Content Based Image Retrieval Natalia.

Machine Learning Problems Unsupervised Learning – Clustering – Density estimation – Dimensionality Reduction Supervised Learning – Classification – Regression.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Classifier Evaluation Vasileios Hatzivassiloglou University of Texas at Dallas.

1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX 75265

Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.

Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.

Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.

1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.

Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.

Sketching, Sampling and other Sublinear Algorithms: Euclidean space: dimension reduction and NNS Alex Andoni (MSR SVC)

1 Efficient Algorithms for Substring Near Neighbor Problem Alexandr Andoni Piotr Indyk MIT.

Randomized Algorithms Part 3 William Cohen 1. Outline Randomized methods - so far – SGD with the hash trick – Bloom filters – count-min sketches Today:

Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.

Optimal Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Joint work with: Ilya Razenshteyn.

Summer School on Hashing’14 Dimension Reduction Alex Andoni (Microsoft Research)

Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Nonlinear Mapping for Data Structure Analysis John W.

January, 7th 2011Simon Giraudot & Ugo Martin - MoSIG - Machine learning1/1/ Large-Scale Image Retrieval with Compressed Fisher Vectors Presentation of.

Fast nearest neighbor searches in high dimensions Sami Sieranoja

Sublinear Algorithmic Tools 3

Lecture 11: Nearest Neighbor Search

Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE

Semi-supervised Affinity Propagation

K Nearest Neighbor Classification

Jianping Fan Dept of Computer Science UNC-Charlotte

Neuro-Computing Lecture 4 Radial Basis Function Network

Locality Sensitive Hashing

CS5112: Algorithms and Data Structures for Applications

“Clustering by Passing Messages Between Data Points”

Minwise Hashing and Efficient Search

President’s Day Lecture: Advanced Nearest Neighbor Search

Topological Signatures For Fast Mobility Analysis

LSH-based Motion Estimation

Presentation transcript:

Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010 Joint work with F. Gao, M. Hefeeda and W. Abdel-Majeed 1

Outline Introduction Motivation Local Sensitive Hashing Z and H Curves Affinity propagation Results 2

Introduction Machine learning Kernel-based methods require O(N 2 ) time and space complexities to compute and store non- sparse Gram matrices. We are developing methods to approximate the Gram matrix with a band matrix N points N*N Gram Matrix 3

Motivation Exact vs. Approximate Answer  Approximate might be good-enough and much- faster  Time-quality and memory trade-off  As machine learning point of view; we can live with bounded (controlled) error as long as we can run on large scale data where in normal ways we cann’t at all due to the memory usage. 4

Ideas of approximation To construct the approximated band matrix we evaluate the kernel function only between a fixed neighborhood around each point. This low rank method depends on the observation that the eigen-spectrum of the kernel function is a Radial Basis Function (real-valued function whose value depends only on the Euclidean distance ) (The most information is stored in the first of eigen vectors) 5

How to choose this neighborhood window? Since kernel function is monotonically decreasing with the Euclidian distance between the input points, so we can compute the kernel function only between close points. We should find a fast and reliable technique to order the points.  Space filling Curves: Z-Curve and H-Curve  Locality Sensitive Hashing 6

LSH: Motivation Similarity Search over large scale High-Dimensional Data Exact vs. Approximate Answer  Approximate might be good-enough and much-faster  Time-quality trade-off 7

LSH: Key idea Hash the data-point using several LSH functions so that probability of collision is higher for closer objects Algorithm: Input − Set of N points { p 1, …….. p n } − L ( number of hash tables ) Output − Hash tables T i, i = 1, 2, …. L Foreach i = 1, 2, …. L − Initialize T i with a random hash function g i (.) Foreach i = 1, 2, …. L Foreach j = 1, 2, …. N Store point p j on bucket g i (p j ) of hash table T i 8

LSH: Algorithm g 1 (p i )g 2 (p i )g L (p i ) TLTL T2T2 T1T1 pipi P 9

LSH: Analysis Family H of (r 1, r 2, p 1, p 2 )-sensitive functions, {h i (.)} − dist(p,q) < r 1  Prob H [h(q) = h(p)]  p 1 − dist(p,q)  r 2  Prob H [h(q) = h(p)]  p 2 − p 1 > p 2 and r 1 < r 2 LSH functions: g i (.) = { h 1 (.) …h k (.) } 10

Our approach N points  Hash the points using LSH functions family Compute the kernel function only between the points in same bucket (0 between points on different buckets) Using “m” size hash table we can achieve as best case O (N 2 /m) memory and computation 11

Validation Methods Low Level (matrix Level)  Frobenius Norm  Eigen spectrum High Level (Application Level)  Affinity Propagation  Support Vector Machines 12

Example i,k i,k i,k S (i,k) S 0 (i,k)S 1 (i,k) P0 P1 P2 P3P4P5 0P0 P1 P2 1P3 P4 P5 LSH FrobNorm (S) = FrobNorm ( [S0 S1] ) =

Results Memory Usage All dataZ512Z1024Lsh3000Lsh K4 G32 M64 M19 M18 M 128 K16 G64 M128 M77 M76 M 256 K64 G128 M256 M309 M304 M 512 K256 G256 M512 M1244 M1231 M 14

15

References [1] M. Hussein and W. Abd-Almageed, “Efficient band approximation of gram matrices for large scale kernel methods on gpus,” in Proc. of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SuperComputing’09), Portland, OR, November [2] A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions,” Communications of the ACM, vol. 51, no. 1, pp. 117–122, January [3] B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science, vol. 315, no. 5814, pp. 972–976,

AP : Motivation No priori on the number of clusters Independence on initialization Processing time to achieve good performance 17

Affinity Propagation Take each data point as a node in the network Consider all data points as potential cluster centers Start the clustering with a similarity between pairs of data points Exchange messages between data points until the good cluster centers are found 18

Terminology and Notation Similarity s(i,k): single evidence of data k to be the exemplar for data I (Kernel function for the Gram matrix) 19

Responsibility r (i,k ): accumulated evidence of data k to be the exemplar for data i 20

Availability a ( i,k) : accumulated evidence of data i pick data k as the exemplar 21

22

flow chart 23