Presentation is loading. Please wait.

Presentation is loading. Please wait.

Image Classifier Digital Image Processing A.A

Similar presentations


Presentation on theme: "Image Classifier Digital Image Processing A.A"— Presentation transcript:

1 Image Classifier Digital Image Processing A.A. 2014-2015
Sansoni Davide Trivella Emanuele

2 Introduction The aim of this project is to develop a software able to classify specific categories of object. We used Python as programming language and OpenCV, SciKit-Learn as support libraries. The classification task is reached through Machine Learning techniques. [1] Gli oggetti sono dati in input come immagini

3 Introduction (2) The program rely on a command line interface, which has two main modes of operation: Batch Mode Interactive Mode In both cases, the execution outputs the classification result and the probability that the input image belongs to each category. [1] Users have to run our program from command line interface. Once started there are 2 main modes of operation: Batch Mode: classify all the images in the test dataset, which have been previously downloaded. outputs accuracy, confusion matrix, some index (recall, precision) Interactive Mode: useris required to input an image path, and the software classify that image.

4 Feature detection and description
The most common algorithm to detect and compute features from images is SIFT ( patented) A Speed-up version of the first one is SURF ( patented) Similar algorithm ORB ( free) [2] 2004, D.Lowe, University of British Columbia, came up with a new algorithm, Scale Invariant Feature Transform (SIFT) in his paper, Distinctive Image Features from Scale-Invariant Keypoints,

5 Dataset creation The program uses an algorithm to detect keypoints (kp) and compute their descriptors (des). The information of each kp makes a record of our dataset. In this case an image is described by a lot of records (one for each kp extracted from the image itself). [2]

6 Dataset creation (2) A record is composed by 134 attributes:
6 are properties of the kp (coordinate, size, angle, response, octave) 128 are descriptor values of the kp The whole dataset is stored in a csv file that can be read easily by any machine learning application [2] 6 are properties of the kp (coordinate, size, angle, response, octave): coordinate: coordinates of the keypoint size: diameter of the meaningful keypoint neighborhood angle: computed orientation of the keypoint (-1 if not applicable). Its possible values are in a range [0,360) degrees. It is measured relative to image coordinate system (y-axis is directed downward), ie in clockwise. response: the response by which the most strong keypoints have been selected. Can be used for further sorting or subsampling octave: octave (pyramid layer) from which the keypoint has been extracted class_id: can be used to clustered keypoints by an object they belong to

7 Classification algorithms
We tested a lot of classification algorithms, but the best is DecisionTreeClassifier (similar to C4.5) Other algorithms used are: GaussianNB (Naive Bayes) BernoulliNB (Bernoulli Naive Bayes) SVC (SVM) RandomForestClassifier (Random Forest) ExtraTreesClassifier (Extra Tree Classifier) [1] Other algorithms didn’t give us so good performances Using Gaussian NB and Bernoulli NB we obtain bad performances because they work well with text classification SVM took a long time of computation, often it didn’t end the execution of the program, so we decided to discart it Then with RF and ETC we can obtain accuracy comparable with DT, but often lower than the first one

8 Classification algorithms (2)
At first we tested the progam with only three classification categories. In this case Naive Bayes showed a comparable accuracy with Decision Tree. But when we decided to add new categories, the performance of NB dropped drastically. With Random Forest we can obtain discrete results, but lower than DT. [1] Only three categories to keep simple the entire system NB performances dropped drastically demostrating that the this algorithm is completely inadeguate for the purpose

9 Parameters SIFT and ORB algorithms take in input some parameters:
nfeature, the number of best features to retain nOctaveLayers, the number of layers in each octave. contrastThreshold, the contrast threshold used to filter out weak features edgeThreshold, the threshold used to filter out edge-like features sigma, the sigma of the Gaussian applied to the input image at the octave n. 0 We specified only nfeature because other parameters default values are already good. [2] nfeature: the number of best features to retain nOctaveLayers: searching keypoints at multiple scales is obtained by constructing a so-called “Gaussian scale space”. The scale space is just a collection of images obtained by progressively smoothing the input image, which is analogous to gradually reducing the image resolution. Increasing the scale by an octave means doubling the size of the smoothing kernel, whose effect is roughly equivalent to halving the image resolution. the number of layers in each octave contrastThreshold: the contrast threshold used to filter out weak features edgeThreshold: The threshold used to filter out edge-like features. Note that the its meaning is different from the contrastThreshold, i.e. the larger the edgeThreshold, the less features are filtered out (more features are retained). sigma: the sigma of the Gaussian applied to the input image at the octave n. 0

10 Parameters (2) Otherwise, SURF algorithm takes in input:
hessianThreshold, threshold for hessian kp detector nOctaves, number of pyramid octaves the kp detector will use. nOctaveLayers, number of octave layers within each octave. extended, extended descriptor flag (true: use 128 descriptors; false: use 64 descriptors). upright – Up-right or rotated features flag (true: do not compute orientation; false: compute orientation). As before, we changed only one parameter (hessianThreshold) leaving the others to default. [2] hessianThreshold: Threshold for the keypoint detector. Only features, whose hessian is larger than hessianThreshold are retained by the detector. Therefore, the larger the value, the less keypoints you will get. extended: 0 means that the basic descriptors (64 elements each) shall be computed. 1 means that the extended descriptors (128 elements each) shall be computed upright:0 means that detector computes orientation of each feature. 1 means that the orientation is not computed (which is much, much faster).

11 Parameters (3) Speaking about Decision Tree Classifier the most important parameters are: criterion, measure the quality of a split splitter, strategy used to choose the split at each node max_depth, the maximum depth of the tree min_samples_split, the minimum number of samples required to split an internal node min_samples_leaf, the minimum number of samples required to be at a leaf node Our progam allows the user to enter a value for all of these parameters. [2] criterion: The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. splitter: The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split. max_depth: The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. Ignored if max_leaf_nodes is not None. min_samples_split: The minimum number of samples required to split an internal node. (default: 2) min_samples_leaf: The minimum number of samples required to be at a leaf node (default: 0)

12 Parameters (4) If the user wants to use Random Forest or Extra Trees Classifier, the program asks him to insert: n_estimators, the number of trees in the forest criterion, measure the quality of a split max_depth, the maximum depth of the tree min_samples_split, the minimum number of samples required to split an internal node min_samples_leaf, the minimum number of samples required to be at a leaf node bootstrap, whether bootstrap samples are used when building trees [2] RANDOM FOREST In random forests (see RandomForestClassifier and RandomForestRegressor classes), each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set. In addition, when splitting a node during the construction of the tree, the split that is chosen is no longer the best split among all features. Instead, the split that is picked is the best split among a random subset of the features. n_estimators: numbers of tree in the forest. bootstrap: Whether bootstrap samples are used when building trees. EXTREMELY RANDOMIZED TREE In extremely randomized trees (see ExtraTreesClassifier and ExtraTreesRegressor classes), randomness goes one step further in the way splits are computed. As in random forests, a random subset of candidate features is used, but instead of looking for the most discriminative thresholds, thresholds are drawn at random for each candidate feature and the best of these randomly-generated thresholds is picked as the splitting rule.

13 Deafult Parameters Feature Detection Algorithm
nfeature: 0, meaning that all possible features are detected (SIFT, ORB) hessianThreshold: 100 (SURF) Classification Algorithm criterion: gini max_depth: None min_samples_split: 2 min_samples_leaf: 1 splitter = best (Decision Tree) n_estimators = 10 (Random Forest, ETC) bootstrap = True (Random Forest, ETC) [1] After the explanation of all modifiable parameters, we talk about default parameters N_feature describes the maximum number of features that the classifier can extract from image of dataset. hessianThreshold is a threshold for the keypoint detector. Only features, whose hessian is larger than hessianThreshold are retained by the detector. Therefore, the larger the value, the less keypoints you will get. A good default value could be from 300 to 500, depending from the image contrast. Criterion is the function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. max_depth is the maximum depth of the tree. If None, the program let nodes expand until all leaves are pure or until all leaves contain less than min_samples_split samples. Ignored if max_leaf_nodes is not None. min_samples_split is the minimum number of samples required to split an internal node. min_samples_leaf is the minimum number of samples required to be at a leaf node Splitter is the strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split. n_estimators is the number of trees in the forest. Bootstrap whether bootstrap samples are used when building trees.

14 Performance At first we used only three categories: cars, cats and chairs. For each category we downloaded 50 training images and 10 testing images from Google, using the software provided by Damioli’s group. Classifier/Detector SIFT SURF ORB Decision Tree Classifier (default) 96.67% 100.00% Naive Bayes 93.33% 70.00% Bernoulli Naive Bayes 66.67% 60.00% Random Forest (default) Extra Tree Classifier (default) 73.33% 76.67% [1] Dataset size: SIFT -> 50 MB SURF -> 180 MB ORB -> 50 MB We made this choice to test the behavior of classifier with very different kind of categories It is easy to se that BNB has quickly proved to be inadequate It should remember that results may vary a bit during different runs of the software due to the classification algorithms implementation

15 Performance (2) Then we added other categories to test similar object recognition: dogs, doors and tables. Classifier/Detector SIFT SURF ORB Decision Tree Classifier (default) 75.00% 86.67% 78.33% Naive Bayes 55.00% 33.33% Bernoulli Naive Bayes 51.67% 35.00% Random Forest (default) 65.00% 73.33% 61.67% Extra Tree Classifier (default) [2] Dataset size: SIFT -> 93 MB SURF -> 335 MB ORB -> 93 MB [TABELLA AGGIORNATA]

16 Performance (3) Because the best results we achieved have been got by DT and RF, we decided to modify some parameters of these algorithms. Decision Tree [%] (default accuracy = 75%, 88.3%, 76.7%) criterion splitter max_depth min_split min_leaf SIFT SURF ORB entropy best None 2 1 80.0 90.0 78.3 gini 5 71.7 86.7 70.0 73.3 75.0 4 76.7 [2] Random Forest [%] (default accuracy = 61.7%, 71.7%, 66.7%) n_est crit. max_dep. min_split min_leaf b.strap SIFT SURF ORB 10 entropy None 2 1 True 61.7 70.0 63.3 20 68.3 66.7 gini 75.0 50 65.0

17 Performance (7) Added other 4 classes (bottles, shoes, flowers, pineapples). Decision Tree [%] (default accuracy = 61%, 87%, 60%) criterion splitter max_depth min_split min_leaf SIFT SURF ORB entropy best None 2 1 62.0 82.0 58.0 gini 5 57.0 81.0 59.0 79.0 60.0 4 63.0 86.0 61.0 Random Forest [%] (default accuracy = 57%, 59%, 48%) [1] Dataset size: SIFT -> 160 MB SURF -> 586 MB ORB -> 197 MB At this point we have 6 categories, so we decide to add other 4 values of class because we think that 10 categories is a resonable number to understand how classifier can performs. We performed the same changes described before from my colleague Modify main parameters of two classification algorithms which work better than others Adding 4 classes performances decreased n_est crit. max_dep. min_split min_leaf b.strap SIFT SURF ORB 10 entropy None 2 1 True 51.0 60.0 50.0 20 52.0 64.0 49.0 gini 50 46.0 69.0 47.0

18 Performance (11) To simplify we used default parameters because we didn’t observe a significant performance improvement. Just to remember… Decision Tree SIFT: 58% SURF: 87% ORB: 60% Random Forest SIFT: 57% SURF: 59% ORB: 48% [1] Simplify usage of the program by the user [DECISION TREE] PAR DEFAULT: criterion = Gini / Split=best / max_depth = None / min_split = 2 / min_samples_leaf = SIFT = 0, ORB = 0, SURF -> HessianT=100 [RANDOM FOREST] PAR DEFAULT: n_estimator = 10 / criterion=Gini / Split=best / max_depth = None / min_split = 2 / min_samples_leaf = 1 / bootstrap = true (esempi usati boosting puoi usarli nel training) SIFT = 0, ORB = 0, SURF -> HessianT=100 Dataset size: SIFT -> x, 45, 57, 68 MB SURF -> 700, 463, 388, 337 MB ORB -> x MB (similar to SIFT size) To reduce spatial and computational complexity, we decided to try to modify also parameters of feature detection algorithms Analysis of accuracy from default to modified parameters SIFT (nfeature) SURF (hessianT.) ORB (nfeature) 100 150 200 250 50 300 400 DT[%] 63 69 68 86 77 74 70 72 71 RF[%] 67 62 59 61 58

19 Preprocessing Given these results we tried to improve them further more, so we decided to make some preprocessing on our dataset. Image resizing Low pass filtering Image background removal Feature standardization Feature normalization [2]

20 Image resizing The software of Damioli’s group resizes downloaded images in order to reduce spatial and computational complexity, so we decided to implement it as well. In this way the whole dataset contains solely images with 500px of width, minding to keep the same aspect ratio. [2]

21 Low pass filtering We opted for these two low pass filters in order to remove some noise from dataset images: Gaussian filter [2] Gaussian Filter: Blurs an image using a Gaussian filter. ksize – Gaussian kernel size. ksize.width and ksize.height can differ but they both must be positive and odd. Or, they can be zero’s and then they are computed from sigma* . sigmaX – Gaussian kernel standard deviation in X direction. sigmaY – Gaussian kernel standard deviation in Y direction; if sigmaY is zero, it is set to be equal to sigmaX, if both sigmas are zeros, they are computed from ksize.width and ksize.height , respectively (see getGaussianKernel() for details); to fully control the result regardless of possible future modifications of all this semantics, it is recommended to specify all of ksize, sigmaX, and sigmaY. Bilateral Filter: BilateralFilter can reduce unwanted noise very well while keeping edges fairly sharp. However, it is very slow compared to most filters. Sigma values: For simplicity, you can set the 2 sigma values to be the same. If they are small (< 10), the filter will not have much effect, whereas if they are large (> 150), they will have a very strong effect, making the image look “cartoonish”. Filter size: Large filters (d > 5) are very slow, so it is recommended to use d=5 for real- time applications, and perhaps d=9 for offline applications that need heavy noise filtering. Gaussian sift 150 -> 63 Gaussian sift 200 -> 68 Bilateral sift 200 -> 63 Bilateral surf 200 -> 76 Bilateral filter

22 Low pass filtering (2) In both cases the performances we obtained got worse. Gaussian SIFT [nfeature = 150]: 70%  63% SIFT [nfeature = 200]: 72%  68% Bilateral SIFT [nfeature = 200]: 72%  63% SURF [hessianT.= 200]: 83%  76% As consequence, we decided to avoid low pass filtering during preprocessing. [2]

23 Image Background Removal
A further step we thought to introduce in preprocessing phase is background removal. This step might be useful if the background is well separated from the subject of the image (category). We observed that performances decrease because the algorithm that provides this functionality depends on a threshold value which is strongly related to the content of the image itself. [1] The first reason why we did not use this step is that it might be useful if the background is well separated from the subject of the image. And this is not always true for dataset images. The second reason is that the algorithm that In some cases the algorithm of background removal worked well and recognized the backgound, in some other cases it obscured or damaged the content of images (For example it filled with black blobs the face of a cat)

24 Feature Standardization
In general, machine learning estimators might behave badly if the individual feature do not more or less looks like standard normally distributed data (Gaussian with zero mean and unit variance). However, in our case, we achieved worse performances. [1] In general, to improve performances, machine learning techniques make available feature standardization and normalization The first one technique makes all attributes of the dataset to have zero mean and unit variance In many case the usage of standardized attributes can improve accuracy, however… Sift 150 -> 61 invece di 70 Sift 200 -> 50 invece di 72

25 Feature Normalization
Normalization is the process of scaling individual samples to have unit norm. This process can be useful if you plan to use a quadratic form such as the dot-product or any other kernel to quantify the similarity of any pair of samples. As for standardization, this technique is commonly used to increase performance of classification algorithms. Despite that, we observed a performance decrease also in this case [1]

26 Feature Selection Univariate feature selection works by selecting the best features based on univariate statistical tests. It can be seen as a preprocessing step to an estimator. We used SelectKBest(f_classif,k) to select k highest scoring features from our dataset records. We tried different k values: 50,100,130 scoring respectively 23, 44, 65 [%] of accuracy against the default 70% with SIFT (nfeature=150). [1] The last machine learning preprocessing technique we tried is feature selection In particular we tested univariate feature selection. It works by… We obtain a disaster Rising value of k we increase also the accuracy but to achieve the best performances we have to use the whole set of attributes (134) 50 attributi -> 23% 100 attributi -> 44% 130 attributi -> 65% SIFT NFEATURE 150 During the creation of the model tried to use feature selection

27 Conclusions and Results
At the end of the day, the best results we achieved are: SIFT: nfeature = 200 Decision Tree Accuracy: ~72% SURF: hessianThreshold= 100 Accuracy: ~87% ORB: nfeature = 200 Decision Tree Accuracy: ~72% [2]

28 Results of SIFT [2] recall = TP / (TP + FN) precision = TP / (TP + FP) F1 score (also F-score or F-measure) is a measure of a test's accuracy. It considers both the precision p and the recall r of the test to compute the score: p is the number of correct positive results divided by the number of all positive results, and r is the number of correct positive results divided by the number of positive results that should have been returned. The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst at 0. F1 = 2* (precision*recall)/(precision+recall) Ho tante tante categorie. Se ne avessi 3 per esempio devo il 100% su 3 classi e quindi quella classificata correttamente avrebbe una percentuale significativa -> precision e recall elevate. Il fatto di introdurre tante classi, significa distribuire il 100% su molte classi -> a parità di predizione corretta la percentuale di appartenza alla classe corretta si abbassa -> si abbassano precision e recall, nonostante le performance siano buone

29 Results of SIFT (2) [2]

30 Results of SIFT (3) How SIFT performs in bottles classification:
How SIFT performs in cars classification: [2]

31 Results of SURF [1] Precision and recall values are higher than SIFT

32 Results of SURF (2) [1]

33 Results of SURF (3) How SURF performs in bottles classification:
How SURF performs in cars classification: [1]

34 Results of ORB [1] Precision and recall values are comparable with SIFT

35 Results of ORB (2) [1]

36 Results of ORB (3) How ORB performs in bottles classification:
How ORB performs in cars classification: [1]

37 Results of INTERACTIVE MODE
Input files: dogs_test.jpg, cars_test.jpg SIFT (nfeature = 200): SURF (hessianThreshold = 100): [2] The result of the previous slides were of the batch mode, which takes in input all the images of the testset directory and tries to classify them. In this slide I'm gonna report an example of the results of the interactive mode, which ask the user for an image path, and then it tries to classify it as one of the category we described before. The two test image we used are about a dog and a car. As we can see SIFT can classify both of them correctly. Using SURF, instead, we can see that the dog is missclassified as a cat. Also ORB can classify both images correctly as SIFT. ORB (nfeature = 200):

38 THANKS FOR YOUR ATTENTION


Download ppt "Image Classifier Digital Image Processing A.A"

Similar presentations


Ads by Google