Click to edit Master subtitle style 2/23/10 Time and Space Optimization of Document Content Classifiers Dawei Yin, Henry S. Baird, and Chang An Computer.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Data Mining Classification: Alternative Techniques
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Indian Statistical Institute Kolkata
Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},
Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.
Classification and Decision Boundaries
VLSH: Voronoi-based Locality Sensitive Hashing Sung-eui Yoon Authors: Lin Loi, Jae-Pil Heo, Junghwan Lee, and Sung-Eui Yoon KAIST
Parallelized variational EM for Latent Dirichlet Allocation: An experimental evaluation of speed and scalability Ramesh Nallapati, William Cohen and John.
Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.
Instance Based Learning
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Fitting a Model to Data Reading: 15.1,
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Reduce Instrumentation Predictors Using Random Forests Presented By Bin Zhao Department of Computer Science University of Maryland May
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of a scientific research When you know the system: Estimation.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
FLANN Fast Library for Approximate Nearest Neighbors
Online Learning Algorithms
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Application of reliability prediction model adapted for the analysis of the ERP system Frane Urem, Krešimir Fertalj, Željko Mikulić College of Šibenik,
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Computer Vision James Hays, Brown
This week: overview on pattern recognition (related to machine learning)
by B. Zadrozny and C. Elkan
Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Machine Learning on Images Janez Brank. Introduction  Collections of images (a.k.a. pictorial databases)  Image retrieval  Image classification –How.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
HPDC 2013 Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices Yu Su*, Gagan Agrawal*, Jonathan Woodring # Kary Myers #, Joanne Wendelberger.
CS654: Digital Image Analysis
Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,
Project by: Cirill Aizenberg, Dima Altshuler Supervisor: Erez Berkovich.
A Fast and Scalable Nearest Neighbor Based Classification Taufik Abidin and William Perrizo Department of Computer Science North Dakota State University.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Vector Quantization CAP5015 Fall 2005.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
KNN & Naïve Bayes Hongning Wang
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Experience Report: System Log Analysis for Anomaly Detection
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Data Science Algorithms: The Basic Methods
Scalable Load-Distance Balancing
Efficient Image Classification on Vertically Decomposed Data
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Efficient Image Classification on Vertically Decomposed Data
K Nearest Neighbor Classification
A Fast and Scalable Nearest Neighbor Based Classification
Minwise Hashing and Efficient Search
Reseeding-based Test Set Embedding with Reduced Test Sequences
Data Mining CSCI 307, Spring 2019 Lecture 23
Presentation transcript:

Click to edit Master subtitle style 2/23/10 Time and Space Optimization of Document Content Classifiers Dawei Yin, Henry S. Baird, and Chang An Computer Science and Engineering Department Lehigh University

2/23/10 Background Document image analysis Pixel-accurate document image content extraction k-Nearest Neighbor (kNN) classifier is suitable for this problem

2/23/10 K Nearest Neighbors Classifier For each test sample, find the k(eg. 5) nearest training samples, and choose the most frequent class among k. Problems (Both space and time) Training sets (space) --too large to fit in main memory Brute force (time) -- must calculate all the distances of training samples How to speed up?

2/23/10 Related work K-d trees (Bentley et. al., 1975) Voronoi methods (e.g. Preparata & Shamos, 1985) ANN (Arya, Mount, 2001) Locality Sensitive Hashing scheme (Indyk et. Al, 2005) Hashed K-d trees (Baird, Casey, Moll, 2006)

2/23/10 Hashed k-d trees Split feature space into a large number of bins Training and test samples fall into bins Hashing into bins makes it fast Calculate distances of samples only within each bin May not find the exact k nearest neighbors loss of accuracy is often small

2/23/10 Pre-decimation Throw away, at random, most of the training samples, before loading into bins It saves both space and time: 9 times speedup Loss of accuracy, again, is small: less than 1%.

2/23/10 Pre-decimation Throw away, at random, most of the training samples, before loading into bins It saves both space and time: 9 times speedup Loss of accuracy, again, is small: less than 1%. But….

2/23/10 Pre-decimation problems

2/23/10 Pre-decimation problems

2/23/10 Pre-decimation problems

2/23/10 Bin-decimation The key idea of bin-decimation is to enforce an upper bound M approximately on the number of training samples stored in each bin We propose an adaptive statistical technique to do this online (while reading the training data exactly once), and in linear time.

2/23/10 Bin-decimation

2/23/10 Bin-decimation If we read the training data twice, we can easily enforce the bound M exactly on every bin---but this is slow. We can read the training data only once (“on-line”), and still enforce the bound approximately, if this assumption holds: For every bin, the samples falling in that bin tend to be distributed uniformly within the sequence of training samples, in the order in which they are read.

2/23/10 Online Bin-decimation The total number of training sample N At time t Nt the total current number of samples which have been read Nt(b) the current number of samples which have fallen into the bin b Se(b) the estimated number of samples which belong in bin b At time t, read training samples, the probability of keeping this sample is With this probability, we pseudorandomly keep this sample.

2/23/10 Experiments the training set contains 1,658,060 samples. the test set contains 340,054 samples. test and training images have been collected from books, magazines, newspapers, technical articles, and notes of students, et. Each pixel is a sample

2/23/10 Pre-decimation results Runtime and accuracy (on separate scales), as functions of the pre-decimation factor. Up to a factor of 1/100, accuracy falls only 6%, while runtime falls by a factor of 100.

2/23/10 Pre-decimation results The number of unclassifiable samples, and accuracy, as functions of the pre-decimation factor. For factors beyond 1/100, the number of unclassifiable samples increases dramatically.

2/23/10 Bin-decimation results Accuracy and runtime of bin-decimation as functions of M, the runtime parameter controlling maximum bin size. Accuracy remains nearly unaffected until M falls below 5, whereas runtime drops very significantly even for M greater than 100.

2/23/10 Comparison Comparison of bin-decimation vs. pre-decimation using parameter chosen so that they consume roughly the same runtime (18 CPU seconds). Note that bin-decimation achieves a higher accuracy (roughly 6% better). Comparison of bin-decimation vs. pre-decimation using parameters chosen so that they achieve roughly the same accuracy (roughly 77% correct). Note that bin-decimation consumes less time (less than 1/10th).

2/23/10 Comparison Comparison of bin-decimation vs. pre-decimation using parameter chosen so that they consume roughly the same runtime (18 CPU seconds). Note that bin-decimation achieves a higher accuracy (roughly 6% better). Comparison of bin-decimation vs. pre-decimation using parameters chosen so that they achieve roughly the same accuracy (roughly 77% correct). Note that bin-decimation consumes less time (less than 1/10th).

2/23/10 Comparison Comparison of bin-decimation vs. pre-decimation using parameter chosen so that they consume roughly the same runtime (18 CPU seconds). Note that bin-decimation achieves a higher accuracy (roughly 6% better). Comparison of bin-decimation vs. pre-decimation using parameters chosen so that they achieve roughly the same accuracy (roughly 77% correct). Note that bin-decimation consumes less time (less than 1/10th).

2/23/10 Comparison Comparison of bin-decimation vs. pre-decimation using parameter chosen so that they consume roughly the same runtime (18 CPU seconds). Note that bin-decimation achieves a higher accuracy (roughly 6% better). Comparison of bin-decimation vs. pre-decimation using parameters chosen so that they achieve roughly the same accuracy (roughly 77% correct). Note that bin-decimation consumes less time (less than 1/10th).

2/23/10 A larger-scale experiment The training set contains 33 images, a total of 86.7M samples. The test set contains 83 images, a total of 221.6M samples. Experiment environment: high performance computing (Beowulf cluster) at Lehigh University. The HPC Cluster contains 40 nodes and each node is equipped 8 core Intel Xeon 1.8GHz and 16GB memory.

2/23/10 Result A 23-times speedup with less than 0.1% loss of accuracy (for M=100) A 60-times speedup with less than 5% loss of accuracy (M=30) It actually improves accuracy (by very little: +0.06%) and still speeds up by a factor of 2.3 (M=500).

2/23/10 Future work More systematic trials: variance resulting from randomization Protect against: imbalanced training sets concentration: too many samples in too few bins

2/23/10 Thank you!

2/23/10