Download presentation
Presentation is loading. Please wait.
Published byLouisa Davis Modified over 9 years ago
1
Recognition Template Matching Statistical Pattern Recognition (SPR)
Chamfer Matching Statistical Pattern Recognition (SPR) Robust object recognition using a cascade of Haar classifiers (Haar) Performance Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
2
Template Matching - Topics
Applications Matching criteria Use of Fourier space Use of chamfering Control strategies Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
3
Template – Applications for template matching
Searching Locating objects We can address a range of problems: - Searching (Locating a specific subimage) - Recognition (where we have images of what we want to recognise, and can pick the ones which match best) - Visual Inpsection (where we may need to inspect the visual appearance of a product. However the product presentation may vary slightly so we need to align a golden template with each product and evaluate any differences) - Matching (Stereo vision – left to right frame using very small sub-images from the left frame, Tracking where we assume that the appearance of an object being tracked will change slowly from frame to frame) Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
4
Template – Applications for template matching
Recognition We can address a range of problems: - Searching (Locating a specific subimage) - Recognition (where we have images of what we want to recognise, and can pick the ones which match best) - Visual Inpsection (where we may need to inspect the visual appearance of a product. However the product presentation may vary slightly so we need to align a golden template with each product and evaluate any differences) - Matching (Stereo vision – left to right frame using very small sub-images from the left frame, Tracking where we assume that the appearance of an object being tracked will change slowly from frame to frame) Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
5
Template – Applications for template matching
Visual inspection Golden template matching Matching Stereo Vision Tracking We can address a range of problems: - Searching (Locating a specific subimage) - Recognition (where we have images of what we want to recognise, and can pick the ones which match best) - Visual Inpsection (where we may need to inspect the visual appearance of a product. However the product presentation may vary slightly so we need to align a golden template with each product and evaluate any differences) - Matching (Stereo vision – left to right frame using very small sub-images from the left frame, Tracking where we assume that the appearance of an object being tracked will change slowly from frame to frame) Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
6
Template – Matching Algorithm
Basic Algorithm Inputs – Image & Object For every possible position of the object in the image Evaluate a match criterion Search for local maxima of the match criterion above some threshold Problems ‘Every possible position’? ‘Match criterion’ ‘Local maxima above some threshold’ Our inputs are an image (in which we want to search) and an object (generally a smaller ‘template’ image containing a picture of whatever we are searching for). Every possible position may mean every location AND every possible rotation AND every possible scale. This explosion in degree of freedom can make the computational complexity get out of hand. Generally we try to restrict these degrees of freedom (e.g. normalising the size and orientation of characters for license plate recognition). The ‘matching criterion’ needs to be defined. This criterion may have to deal with problems of noise, partial occlusion, geometric distortion, etc. so an exact match cannot be expected. Carrying on from the previous point, the local maxima must be above some threshold. However pick the threshold too high and you miss some objects, but make it too low and you get false positives. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
7
Template – Matching criteria
In terms of gernal matching criteria it is possible to extend the idea of Euclidean distance to the notion of distance between two images – where the number of dimensions is the number of image points being compared. A simpler way to do the comparison is to just sum the absolute differences. Both of these methods result in number with maximum values that depend on the images (size and contents). It is possible to derive formulas for cross correlation and normalised (0-1) cross correlation based on Euclidean distance. Matching criteria for template matching. The sum of absolute differences (top), the Euclidean distance (middle) and the normalized cross correlation (bottom). Note that the computations are done over the entire image, but the template is only defined in a small section of the image the location of which is determined by (m,n). Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
8
Template – Matching criteria example
Simple example Matching criteria Boundaries… Real example Uses normalised cross correlation. The simple example shows a binary image and a binary template (with the center of the template marked with an X). Putting the template in every position in the image (where it is fully in the image- What should we really do at the boundaries?) we can computer a degree of fit. In this case we have just computed the difference (so lower is better). The demonstration is intended to be similar to figure 5.55 in the text. TIPS uses normalised cross correlation for it matching criteria. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
9
Template Matching in OpenCV
Mat matching_space; matching_space.create( search_image.cols–template_image.cols+1, search_image.rows–template_image.rows+1, CV_32FC1 ); matchTemplate( search_image, template_image, matching_space, CV_TM_CCORR_NORMED ); // Other measures: CV_TM_CCORR, CV_TM_SQDIFF, // CV_TM_SQDIFF_NORMED In terms of gernal matching criteria it is possible to extend the idea of Euclidean distance to the notion of distance between two images – where the number of dimensions is the number of image points being compared. A simpler way to do the comparison is to just sum the absolute differences. Both of these methods result in number with maximum values that depend on the images (size and contents). It is possible to derive formulas for cross correlation and normalised (0-1) cross correlation based on Euclidean distance. Matching criteria for template matching. The sum of absolute differences (top), the Euclidean distance (middle) and the normalized cross correlation (bottom). Note that the computations are done over the entire image, but the template is only defined in a small section of the image the location of which is determined by (m,n). Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
10
Template – Finding Local Maxima in OpenCV
Local maxima – Dilate + Look for unchanged values + Threshold Local minima – Erode + Look for unchanged values + Threshold Mat dilated, thresholded_matching_space, local_maxima, thresholded_8bit; dilate( matching_space, dilated, Mat()); compare( matching_space, dilated, local_maxima, CMP_EQ ); threshold( matching_space, thresholded_matching_space, threshold, 255, THRESH_BINARY ); thresholded_matching_space.convertTo( thresholded_8bit, CV_8U ); bitwise_and( local_maxima,thresholded_8bit,local_maxima ); In terms of gernal matching criteria it is possible to extend the idea of Euclidean distance to the notion of distance between two images – where the number of dimensions is the number of image points being compared. A simpler way to do the comparison is to just sum the absolute differences. Both of these methods result in number with maximum values that depend on the images (size and contents). It is possible to derive formulas for cross correlation and normalised (0-1) cross correlation based on Euclidean distance. Matching criteria for template matching. The sum of absolute differences (top), the Euclidean distance (middle) and the normalized cross correlation (bottom). Note that the computations are done over the entire image, but the template is only defined in a small section of the image the location of which is determined by (m,n). Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
11
Template – Control Strategies for Matching
Goal: Localise close copies Size, orientation Geometric distortion Use an image hierarchy Low resolution first Limit higher resolution search Search higher probability locations first Known / learnt likelihood From lower resolution Localise close copies Size, orientation Must match Geometric distortion Must be small Multiple templates vs. Geometric transforms To match rotated, expanded, shrinked No difference Global vs. local template matching Object consists of parts with rubber links between them Rubber links minimum force to match Possible matching strategy Find partial matches as in local template Heuristic graph construction to find the best combination Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
12
Template – Chamfer matching
Template matching requires very close matches Objects often appear very slightly different Orientation Noise Sampling We want a more flexible approach We may not have covered edge detection yet – so this may have to wait or alternatively give a bit of an early intro. The chamfered image gives the distance of any point from any edge. Simple algorithm in test allows this to be computed in two passes (algorithm 2.1 – p.28). Template matching can then be used to find best fit of some boundary image (template) just by summing the values in the chamfered image. Low values good, high values bad. Demo… Shape -> binary edges -> chamfer Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
13
Template – Chamfering Compute chamfered image for a binary edge image.
For every point If (edge point) Set c(i, j) = 0 Else Set c(i, j) = ∞ For j = min to max For i = min to max c(i, j) = minqAL [distance( (i, j), q ) + f( q )] For j = max to min For i = max to min c(i, j) = minqBR [distance( (i, j), q ) + f( q )] Chamfering is achieved in two passes. One which looks at points AL Above and to the Left of the current point and the other (which looks at the image bottom to top, right to left considers points BR (Botteom and to the Right)… Canny( gray_image, edge_image, 100, 200, 3); threshold(edge_image,edge_image,127,255,THRESH_BINARY_INV); distanceTransform(edge_image,chamfer_image,CV_DIST_L2,3); Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
14
Template – Chamfer matching
Chamfering is achieved in two passes. One which looks at points AL Above and to the Left of the current point and the other (which looks at the image bottom to top, right to left considers points BR (Botteom and to the Right)… Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
15
Template – Chamfering for matching
Compare with boundary template Sum of chamfer values for boundary points Low values best We may not have covered edge detection yet – so this may have to wait or alternatively give a bit of an early intro. The chamfered image gives the distance of any point from any edge. Simple algorithm in test allows this to be computed in two passes (algorithm 2.1 – p.28). Template matching can then be used to find best fit of some boundary image (template) just by summing the values in the chamfered image. Low values good, high values bad. Demo… Shape -> binary edges -> chamfer Chamfer matching is a simple routine to write (provided in the text); Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
16
SPR – Statistical Pattern Recognition
Probability Review Features Classification The probability of an event A can be defined as P(A) = limn N(A) / n where N(A) is the number of times that event A occurs in n trials. If there are two events A and B then these events can either be independent in which case the probability of both A and B occurring is: P(AB) = P(A)P(B) (e.g. A = “today is Thursday” and B = “it is snowing”) or can be dependent in which case the likelihood of A is dependent on whether or not B has occurred: P(AB) = P(A|B)P(B) (e.g. A = “there are clouds in the sky” and B = “it is raining”) where P(A|B) is referred to as the conditional probability of A occurring if B has occurred. From a point of view of statistical pattern recognition we are interested in events such as an unknown object being of a class Wi in the presence of some evidence (features) x. Through training we can determine the p(x | Wi) and we can determine the relative probabilities of each object class occurring p(Wi). The probability p(x | Wi) is the a-priori probability that feature value x will occur given an object of class Wi is viewed. We are often more interested in the a-posteriori probability – what is the probability that we have located an object of class Wi if feature value x has been observed. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
17
SPR – Probabitity Review – Basics
P(A) = limn N(A) / n Probability of two events A and B: Independent: P(AB) = P(A)P(B) Dependent: P(AB) = P(A|B)P(B) Conditional Probability P(A|B) Typical problem: Given some evidence x from an unknown object. What class Wi is the object? Training A-priori probability – p(x | Wi) Relative probability – p(Wi) A-posteriori probability – p(Wi | x) The probability of an event A can be defined as P(A) = limn N(A) / n where N(A) is the number of times that event A occurs in n trials. If there are two events A and B then these events can either be independent in which case the probability of both A and B occurring is: P(AB) = P(A)P(B) (e.g. A = “today is Thursday” and B = “it is snowing”) or can be dependent in which case the likelihood of A is dependent on whether or not B has occurred: P(AB) = P(A|B)P(B) (e.g. A = “there are clouds in the sky” and B = “it is raining”) where P(A|B) is referred to as the conditional probability of A occurring if B has occurred. From a point of view of statistical pattern recognition we are interested in events such as an unknown object being of a class Wi in the presence of some evidence (features) x. Through training we can determine the p(x | Wi) and we can determine the relative probabilities of each object class occurring p(Wi). The probability p(x | Wi) is the a-priori probability that feature value x will occur given an object of class Wi is viewed. We are often more interested in the a-posteriori probability – what is the probability that we have located an object of class Wi if feature value x has been observed. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
18
SPR – Probabitity Review – Bayes Theorem
For two classes A and B the a-posteriori probability is: P(B|A) = P(A|B)P(B) / P(A) Where Wi forms a partitioning of the event space: p(Wi | x) = _p(x | Wi)P(Wi)__ j p(x | Wj)P(Wj) For two classes A and B the a-posteriori probability is: P(B|A) = P(A|B)P(B) / P(A) Where Wi forms a partitioning of the event space: p(Wi | x) = __p(x | Wi)P(Wi)__ j p(x | Wj)P(Wj) Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
19
SPR – Probability Density Functions
Given a class (Wi) and an event (x) Determine the probability of any particular value occurring… Given any event which can have multiple outcomes we can develop a probability density function for the likelihood of a particular value occurring for the event. For example if considering a feature x when viewing an objects of class W1 and W2 then we will be able to compute separate probability density functions for each class. See Figure 10‑2. Figure 10‑2 The probability density functions P(W|x) for two object classes (W1 and W2) for a single feature x. Note that the two classes are not separable (as their PDFs overlap for the feature) and hence (assuming that only this feature is being used) there will be some misclassification. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
20
SPR – Features: Area Count the points
Location dependent? Determine from n polygon vertices vector<vector<Point>> contours; vector<Vec4i> hierarchy; findContours( binary_image, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_NONE); for (int contour=0; (contour>=0); contour=hierarchy[contour][0]) { double area = contourArea(contours[contour]) + contours[contour].size()/2 + 1; for (int hole=hierarchy[contour][2]; (hole>=0); hole=hierarchy[hole][0]) area -= ( contourArea(contours[hole]) – contours[hole].size()/2 + 1 ); } Looking at an image counting the pixels may be proportional to determining the area of a region, but in the case of many images this is not the case. For example in satellite images the area depend upon where in the image the point is (earth’s curvature, etc.). We can calculate area in a quadtree representation quickly by using the level to indicate the area. The determination from polygon vertices is interesting as it is so simple. Best way to explain it is to try it out on a square with coordinates (1,1), (1,5), (10,5), (10,1). This will give 0.5 * | (5-1)+(5-45)+(5-50)+(10-1) | = 05. * | | =0.5*72=36. What is happening is that each pair is being considered as two square with regard to the origin (one of which is added the other subtracted). Nice algorithm. The determination for a BCC is kept simple by only considering 4 connectedness. If moving horizontally area is added or subracted. If moving vertically only the vertical_position is updated. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
21
SPR – Features: Elongatedness
CANNOT BE the ratio of the length and width of the minimum bounding rectangle Ratio of region area divided by the square of it’s thickness Erosion Sonka provides an efficient method for determining the minimum bounding rectangle from the boundary points given its orientation. Before that though he says that this won’t work for curved regions! – such as that in (b) Elongatedness can (should) instead be defined as the ratio of the region area divided by the square of it’s thickness. It’s thickness can be computed by erosion. In the formula shown d is the number of iterations of erosion before the region disappears completely. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
22
SPR – Features: Minimum Bounding Rectangle
Turn rectangle through discrete steps Only one quadrant Metrics: Length to Width ratio Length / Width Rectangularity Area / (Length * Width) Convex Hull to Minimum Bounding Rectangle area ratio Area inside convex hull / (Length * Width) Fk is the ratio of the region area and the area of the bounding rectangle. The rectangle is considered in all possible orientations – which actually means turning it through just one quadrant – This effectively means we are searching for the minimum bounding rectangle by considering all possible orientations. The range of values is (0,1] with 1 representing a perfect rectangle. As an alternative which is faster to calculate just use the bounding rectangle in the “Direction” orientation. Direction is dealt with on the next slide. Note that this will not always be the minimum bounding rectangle but will often be. It should give consistent results in any case. RotatedRect min_bounding_rectangle = minAreaRect(contours[contour_number]); Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
23
SPR – Features: Convex hull
Smallest convex region Algorithm Start with Any point on convex hull Previous vector direction Search all other boundary points Find point with least angle to previous vector Switch to new point and vector Go to 2 unless new point = start point. The convex hull is a the smallest region H which is convex and within which R fits. This representation can be used to derive other shape representations such as number of concavities, etc. One such tree is shown in the text. The algorithm presented here is to aid understanding of that which is presented in the book. It is not complete (as is the case with most algorithms that I show on PP slides – the one in the text is the important one). The starting point (P1) in the text is the uppermost, leftmost point and the previous vector (this is not the real vector but is fine for starting the algorithm) is effectively horizontal to the left. We then search for Pq in the diagram above – the boundary point which gives the minimum angle in an anti-clockwise direction. (Note that the diagram is not showing the starting point – that would be at the very top). Move to Pq Keep going until all the way around. Note that this is not an efficient algorithm but it does give an intuitive sense of what we are trying to do. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
24
SPR – Features: Convex hull
vector<vector<Point>> hulls(contours.size()); for (int contour=0; (contour<contours.size()); contour++) { convexHull(contours[contour], hulls[contour]); } The convex hull is a the smallest region H which is convex and within which R fits. This representation can be used to derive other shape representations such as number of concavities, etc. One such tree is shown in the text. The algorithm presented here is to aid understanding of that which is presented in the book. It is not complete (as is the case with most algorithms that I show on PP slides – the one in the text is the important one). The starting point (P1) in the text is the uppermost, leftmost point and the previous vector (this is not the real vector but is fine for starting the algorithm) is effectively horizontal to the left. We then search for Pq in the diagram above – the boundary point which gives the minimum angle in an anti-clockwise direction. (Note that the diagram is not showing the starting point – that would be at the very top). Move to Pq Keep going until all the way around. Note that this is not an efficient algorithm but it does give an intuitive sense of what we are trying to do. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
25
SPR – Features: Concavities and Holes
Identify concavities from the convex hull Identify holes in the binary shape The convex hull is a the smallest region H which is convex and within which R fits. This representation can be used to derive other shape representations such as number of concavities, etc. One such tree is shown in the text. The algorithm presented here is to aid understanding of that which is presented in the book. It is not complete (as is the case with most algorithms that I show on PP slides – the one in the text is the important one). The starting point (P1) in the text is the uppermost, leftmost point and the previous vector (this is not the real vector but is fine for starting the algorithm) is effectively horizontal to the left. We then search for Pq in the diagram above – the boundary point which gives the minimum angle in an anti-clockwise direction. (Note that the diagram is not showing the starting point – that would be at the very top). Move to Pq Keep going until all the way around. Note that this is not an efficient algorithm but it does give an intuitive sense of what we are trying to do. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
26
SPR – Features: Concavities and Holes
vector<vector<int>> hull_indices(contours.size()); vector<vector<Vec4i>> convexity_defects(contours.size()); for (int contour=0; (contour<contours.size()); contour++) { convexHull( contours[contour], hull_indices[contour] ); convexityDefects( contours[contour], hull_indices[contour], convexity_defects[contour]); } The convex hull is a the smallest region H which is convex and within which R fits. This representation can be used to derive other shape representations such as number of concavities, etc. One such tree is shown in the text. The algorithm presented here is to aid understanding of that which is presented in the book. It is not complete (as is the case with most algorithms that I show on PP slides – the one in the text is the important one). The starting point (P1) in the text is the uppermost, leftmost point and the previous vector (this is not the real vector but is fine for starting the algorithm) is effectively horizontal to the left. We then search for Pq in the diagram above – the boundary point which gives the minimum angle in an anti-clockwise direction. (Note that the diagram is not showing the starting point – that would be at the very top). Move to Pq Keep going until all the way around. Note that this is not an efficient algorithm but it does give an intuitive sense of what we are trying to do. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
27
SPR – Features: Perimeter Length and Circularity
Approximately the number of boundary elements Should really take into account direction Circularity = Perimeter length / (4*π*Area) contours[contour].size() The convex hull is a the smallest region H which is convex and within which R fits. This representation can be used to derive other shape representations such as number of concavities, etc. One such tree is shown in the text. The algorithm presented here is to aid understanding of that which is presented in the book. It is not complete (as is the case with most algorithms that I show on PP slides – the one in the text is the important one). The starting point (P1) in the text is the uppermost, leftmost point and the previous vector (this is not the real vector but is fine for starting the algorithm) is effectively horizontal to the left. We then search for Pq in the diagram above – the boundary point which gives the minimum angle in an anti-clockwise direction. (Note that the diagram is not showing the starting point – that would be at the very top). Move to Pq Keep going until all the way around. Note that this is not an efficient algorithm but it does give an intuitive sense of what we are trying to do. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
28
SPR – Features: Moments & Moment invariants
Moments measure the distribution of shape Central moments: Scale invariant moments: The convex hull is a the smallest region H which is convex and within which R fits. This representation can be used to derive other shape representations such as number of concavities, etc. One such tree is shown in the text. The algorithm presented here is to aid understanding of that which is presented in the book. It is not complete (as is the case with most algorithms that I show on PP slides – the one in the text is the important one). The starting point (P1) in the text is the uppermost, leftmost point and the previous vector (this is not the real vector but is fine for starting the algorithm) is effectively horizontal to the left. We then search for Pq in the diagram above – the boundary point which gives the minimum angle in an anti-clockwise direction. (Note that the diagram is not showing the starting point – that would be at the very top). Move to Pq Keep going until all the way around. Note that this is not an efficient algorithm but it does give an intuitive sense of what we are trying to do. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
29
SPR – Features: Moment invariants
Moment invariants / Hu moments: The convex hull is a the smallest region H which is convex and within which R fits. This representation can be used to derive other shape representations such as number of concavities, etc. One such tree is shown in the text. The algorithm presented here is to aid understanding of that which is presented in the book. It is not complete (as is the case with most algorithms that I show on PP slides – the one in the text is the important one). The starting point (P1) in the text is the uppermost, leftmost point and the previous vector (this is not the real vector but is fine for starting the algorithm) is effectively horizontal to the left. We then search for Pq in the diagram above – the boundary point which gives the minimum angle in an anti-clockwise direction. (Note that the diagram is not showing the starting point – that would be at the very top). Move to Pq Keep going until all the way around. Note that this is not an efficient algorithm but it does give an intuitive sense of what we are trying to do. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
30
SPR – Features: Moments and Moment invariants
Moments contour_moments; double hu_moments[7]; contour_moments = moments( contours[contour] ); HuMoments( contour_moments, hu_moments ); The convex hull is a the smallest region H which is convex and within which R fits. This representation can be used to derive other shape representations such as number of concavities, etc. One such tree is shown in the text. The algorithm presented here is to aid understanding of that which is presented in the book. It is not complete (as is the case with most algorithms that I show on PP slides – the one in the text is the important one). The starting point (P1) in the text is the uppermost, leftmost point and the previous vector (this is not the real vector but is fine for starting the algorithm) is effectively horizontal to the left. We then search for Pq in the diagram above – the boundary point which gives the minimum angle in an anti-clockwise direction. (Note that the diagram is not showing the starting point – that would be at the very top). Move to Pq Keep going until all the way around. Note that this is not an efficient algorithm but it does give an intuitive sense of what we are trying to do. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
31
SPR – Classification – Introduction
Object recognition Classes (w1, w2, … wR) Classifier Input Pattern / features (x1, x2, … xn) Feature space Choosing the features (Example) Clusters in feature space Separability Hyper-surfaces Linear separability Inseperable classes Classifiers Minimum Distance Classifier Linear Classifier Probabilistic Classifier We are addressing a problem of object recognition amongst a number of possible (known) objects. There are R possibilities. We should be considering the possibility that the object is not one we have seen before but this is often ignored. To recognise the object we consider information derived from the unknown instances. This information takes the form of features which are considered a “pattern”. So object recognition and pattern recognition are synonymous. We map the features into a feature space and if the features have been “appropriately chosen” the classes should for clusters in feature space. Ideally these classes should be seperable and we should be able to locate hyper-surfaces between them (see diagram). If the hyper surfaces are planar then the classes are linearly separable (see diagram). Unfortunately for most vision tasks the classes cannot be completely reliably seperated (see diagram of very inseperable classes) and hence some misclassification will result. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
32
SPR – Minimum distance classifier
Each class represented by an exemplar For an unknown object Determine the distance to the exemplars Pick the class with the smallest distance Unknown class? Distance must be less than some threshold Advantages Training Computational This approach has significant advantages in computational terms over more complex classifiers. Exemplars are sample patterns. Each sample pattern is associated with some class (they need not be distinct). In other words a class may have many exemplars and these may map to disjoint subsets of feature space. To classify an object just pick the examplar which is closet to the unknown feature vector x. In the example the nearest class exemplar is BLUE although arguably GREEN is closer. Also note that perhaps this should really be getting an UNKNOWN classification (something usually not considered). Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
33
SPR – Linear classifier
Linear discrimination functions gr(x) = qro + qr1x1 + … + qrnxn Decision rule wr = d(x) gr(x) gs(x) Unknown class gr(x) > threshold Decision rule (decide on what class given a feature vector) - wr = d(x) Decision rule divides the feature space into R disjoint subsets Kr , r= 1, … , R It could easily be more or less In the example shown the space is divided in two and there are two classes. The hyper-surfaces are the borders between the subsets of feature space. For each class we can define a discrimination function gr(x) for which gr(x) gs(x) for all values of s for any point with Kr Basically the discrimination function which has the highest value defines which class gets selected. The hyper-surface between any two subsets of feature space is defined as gr(x) - gs(x) = 0 A linear discrimination function is the simplest form (think about each q being 1) If all the discrimination functions are linear (i.e. for every class) then it is a linear classifier. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
34
SPR – Probabilistic classifier
Optimal classifier Based on Probability Density Functions Remove normalisation from Bayes rule… Resultant Discrimination function: Mean Loss Function: Advantages / Disadvantages Accuracy Extensive training A classifier based on dicrimination functions is a deterministic machine. Any particular pattern will always be classified into the same class. However errors of misclassification can occur and to minimise these the setting of the optimal classifier should be probablistic – according to some criteria for loss. The optimal function is the posterior probability and (removing the normalisation) this give the optimal discrimination functions for each class. (Remember d(x) returns wr where g(xr) > g(xs) for all s!=r J(q) is the mean loss w = d(x,q) is a decision rule for selecting a class based on a pattern x and an ordering of decision rules as defined by a vector q. The vector of optimal parameters gives the minimum mean loss. [wr |ws] gives the loss incurred if a pattern which should be classified as ws is instead classified as wr p(x|ws) is the probability of the pattern x given the class ws P(ws) is the probability of occurrence of the class ws To develop a classifier based on minimum loss we Choose a fixed loss of 1 for any misclassification and a loss of 0 for a correct classification. Set the discrimination function to a measure of the posteriori probability Note that the posteriori probability is defined using Bayes theorem. In the text the demoninator is p(x) which is describe as the mixture density. In the slide this is replaced with the real thing (as it is pretty simple). The two pictures should how the hyperplane (green line) moves as the relative probability of each class is changed. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
35
SPR – Classifier learning
Training set Must be representative Must be inductive Training set size Training set provides the unknown statistical information Size will typically have to be increased several times Sample learning strategies Supervised: Probability density estimation – estimating p(x|wr) & P(wr) Training set includes class specification for every instance Unsupervised: Cluster Analysis Look for similarities in feature space Not covering section or Just considering a general overview of this problem. The quality of the classifier will depend upon the quality and size of the training set. This set must be representative of all possible presentations of all possible objects (i.e. the set must be inductive) – but effecitvely allowing the classifier to correctly recognise presentations of objects that it has never seen before (the book says “recognise even those objects that it has never ‘seen’ before” but this is impossible!). The training is a substitute for the statistical information (probabilities) about the patterns. Unfortunately this statistical information really defines the size of the training set so it’s a bit of a chicken and egg problem. So the size of the training set has to be increased until the discriminant functions are estimated accurately enough. We are not going to go into the details of the learning strategies or cluster analysis (which is really a learning strategy). Instead just comment on Probability density estimation and Cluster analysis. Cluster analysis is a type of unsupervised learning in which the classes are not presented. The system infers the existence of classes based on the similarity of instances (and the dissimilarities with other instances). For example we could look for peaks in (continuous) feature space to indicate the presence of classes. This is the first time that learning has been introduced so make a big deal about it!!! Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
36
SPR: Real example The convex hull is a the smallest region H which is convex and within which R fits. This representation can be used to derive other shape representations such as number of concavities, etc. One such tree is shown in the text. The algorithm presented here is to aid understanding of that which is presented in the book. It is not complete (as is the case with most algorithms that I show on PP slides – the one in the text is the important one). The starting point (P1) in the text is the uppermost, leftmost point and the previous vector (this is not the real vector but is fine for starting the algorithm) is effectively horizontal to the left. We then search for Pq in the diagram above – the boundary point which gives the minimum angle in an anti-clockwise direction. (Note that the diagram is not showing the starting point – that would be at the very top). Move to Pq Keep going until all the way around. Note that this is not an efficient algorithm but it does give an intuitive sense of what we are trying to do. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
37
Haar – Robust Object Detection using a cascade of classifiers.
Features Efficient calculation Training Weak & Strong Classifiers Adaboost Recognition This technique for object detection learns to identify objects (e.g. faces) based on a number of positive and negative samples. It uses only simple features (which are reminiscent of Haar like basis functions – in 1-D a positive square wave followed by an equal but opposite negative one) to decide if a sub-image contains the object in question. It selects a large number of these features during training and creates classifiers with them which provide an accept/reject response. The classifiers are organized into a cascade (i.e. sequentially) where if the sub-image is rejected by any classifier in the cascade it is not processed any further. This has significant computational advantages as most sub-images are not processed past one or two classifiers. The system is trained using objects at a particular scale in a standard size sub-image, but the classifiers are designed so they are easily resized allowing this object detection technique to be applied at different scales. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
38
Haar – In OpenCV. CascadeClassifier cascade; if( !cascade.load(
"haarcascades/haarcascade_frontalface_alt.xml" ) { vector<Rect> faces; equalizeHist( gray_image, gray_image ); cascade.detectMultiScale( gray_image, faces, 1.1, 2, CV_HAAR_SCALE_IMAGE, Size(30, 30) ); } This technique for object detection learns to identify objects (e.g. faces) based on a number of positive and negative samples. It uses only simple features (which are reminiscent of Haar like basis functions – in 1-D a positive square wave followed by an equal but opposite negative one) to decide if a sub-image contains the object in question. It selects a large number of these features during training and creates classifiers with them which provide an accept/reject response. The classifiers are organized into a cascade (i.e. sequentially) where if the sub-image is rejected by any classifier in the cascade it is not processed any further. This has significant computational advantages as most sub-images are not processed past one or two classifiers. The system is trained using objects at a particular scale in a standard size sub-image, but the classifiers are designed so they are easily resized allowing this object detection technique to be applied at different scales. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
39
Haar – Overview Training using a number of positive and negative samples Uses simple (Haar like) features Efficient calculation… Selects a large number of these features during training to create strong classifiers Links a number of strong classifiers into a cascade for recognition Efficiency… Can work at different scales This technique for object detection learns to identify objects (e.g. faces) based on a number of positive and negative samples. It uses only simple features (which are reminiscent of Haar like basis functions – in 1-D a positive square wave followed by an equal but opposite negative one) to decide if a sub-image contains the object in question. It selects a large number of these features during training and creates classifiers with them which provide an accept/reject response. The classifiers are organized into a cascade (i.e. sequentially) where if the sub-image is rejected by any classifier in the cascade it is not processed any further. This has significant computational advantages as most sub-images are not processed past one or two classifiers. The system is trained using objects at a particular scale in a standard size sub-image, but the classifiers are designed so they are easily resized allowing this object detection technique to be applied at different scales. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
40
Haar – Features Features determined as the difference of the sums of a number of rectangular regions Place the mask in a specific location and at a specific scale Then subtract the sum of the ‘white pixels’ from the sum of the ‘black pixels’ Why does this work? The features used are determined by taking the difference of the sums of a number (usually 2 or 3) of rectangular regions. Given a sub-image the mask for a feature is placed at some specific location within the sub-image, and at some particular scale, and the feature value is simply the sum of the pixels in the black area is subtracted from the sum of the pixels in the white area(s). Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
41
Haar – Efficient Calculation – Integral Image
Every point ii(i,j) = i’=0..i j’=0..j image(i’,j’) Sum of points in rectangle D: sum(D) = ii(p) + ii(s) – ii(q) – ii(r) Features can be computed at any scale for the same cost To make the calculation features more efficient Viola & Jones put forward the idea of the integral image from which the sum of all pixels in an arbitrary rectangle can be calculated by an addition and two subtractions. See Figure 0‑4 (a) The integral image ii(i,j) of some image image(i’,j’) has every point in the image (i,j) as the sum of all pixels value in image(i’,j’) where i’ ≤ i and j’ ≤ j. (b)To compute the total sum of all pixels in an arbitrary rectangle D: sum(D) = ii(p)+ii(s)-ii(q)-ii(r) where p.q,r,s are coordinates in the integral image and ii(p) = sum(A), ii(q) = sum(A+B), ii(r) = sum(A+C), ii(s) = sum(A+B+C+D). These diagrams are based on some in [Viola01].Figure 0‑4 for details. This means that features can be computed at any scale with the same cost. This is very important as in order to detect objects at different scales we must apply the operator at all possible scales. An integral function is provided which computes the integral image both normally and at 45o Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
42
Haar – Training Number of possible features.
the variety of feature types allowed variations in size and position Training must Identify the best features to use at each stage To do this positive and negative samples are needed… For a typical sub-image, due to the variety of feature types together with the allowed variations in size and position, there are hundreds of thousands of possible features. The training phase for object detection then must identify which features are best to use at each stage in the classifier cascade. Note that to do so the system must be provided with a number of positive and negative samples. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
43
Haar – Weak & Strong Classifiers
Weak Classifier pjfeaturej(x) < pjθj Tune threshold (θj) Strong Classifiers Combine a number of weak classifiers… E.g. 100% true positives with only 40% false positives using only two face features… Weak classifiers A weak classifier can be created by combining a specific feature with a threshold and through comparison make an accept or reject classification. The feature is accepted if pjfeaturej(x) < pjθj where featurej(x) is the value of feature j at some location x, pj is a parity value and θj is a threshold for feature j. The threshold must be tuned to minimize the amount of misclassification (on a set of training data). Strong classifiers – AdaBoost To create a strong classifier, a number of weak classifiers are combined using a boosting algorithm called AdaBoost. For example if the two features shown in Figure 0‑3 are combined into a two-feature first stage classifier for frontal face detection, Viola and Jones [Viola01] reports that it was possible to detect 100% of the faces from a validation training set with a false positive rate of only 40%. Viola and Jones also reported that a strong classifier constructed using 200 features achieved a detection rate of 95% with a false positive rate of 0.01%. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
44
Haar – Strong Classifiers – Adaboost
Given n example images x1..xn together with classifications y1..yn where yi = 0, 1 for negative and positive examples respectively. Initialise weights w1,i = 1 / (2*(m*(1- yi)+l*yi)) where m and l are the number of negative and positive examples respectively. For t=1,…,T 1. Normalize the weights (i.e. for all i): wt,i = wt,i / (j=1..n wt,j) 2. For each feature, j, train a weak classifier hj(x) and evaluate the error taking into account the weights: j = i wt,i | hj(xi) – yi | 3. Select the classifier, hj(x), with the lowest j , save as ct(x) with error Et 4. Update the weights (i.e. for all i): wt+1,i = wt,i t(1- ei ) where ei = | ct(xi) – yi | and t = Et / (1–Et) The final strong classifier is: h(x)= 1 if t=1..T αtct(x) ½ t=1..T αt , 0 otherwise where αt = log 1/t AdaBoost algorithm (based on that described in [Viola01]): Given n example images x1..xn together with classifications y1..yn where yi = 0, 1 for negative and positive examples respectively. Initialise weights w1,i = 1 / (2*(m*(1- yi)+l*yi))where m and l are the number of negative and positive examples respectively.The weights are changed at each “round of boosting”. Considering wa,b a is the number of the current round and b is the example image number. For t=1,…,T T is the number of rounds of boosting. Normalize the weights (i.e. for all i):wt,i = wt,i / (j=1..n wt,j) As a result of this stage the sum of all weights will be 1.0 which means that wt,i can be considered as a probability distribution. For each feature, j, train a weak classifier hj(x) and evaluate the error taking into account the weights: j = i wt,i | hj(xi) – yi |Training a classifier means taking the single feature and determining the threshold which minimizes the misclassifications. Select the classifier, ht(x), with the lowest t Update the weights (i.e. for all i):wt+1,i = wt,i t(1- ei )where ei = | ht(xi) – yi | and t = t / (1–t)Update the weights on the images leaving the weights on misclassified images the same and reducing the weights on correctly classified images by t The final strong classifier is:h(x)= 1 if t=1..T αtht(x) ½ t=1..T αt otherwisewhere αt = log 1/tThe final strong classifier is a weighted combination of the weak classifiers where the weights are related to the training errors from each of the weak classifiers. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
45
Haar – Classifier cascade
Object recognition is possible with a single strong classifier To improve detection rates AND to reduce computation time: A cascade of strong classifiers can be used Each stage either accepts or rejects Only those accepted pass to the next stage Efficient computation… Strong classifiers trained using AdaBoost On the remaining set of data While object recognition is possible with a single classifier, a cascade of classifiers can be used to improve detection rates AND to reduce computation time. At each stage within the cascade there is a single strong classifier which either accepts or rejects the sub-image under consideration. If it is rejected by a classifier in the cascade no further processing is performed on the sub window. If it is accepted by a classifier it is passed to the next classifier in the cascade. In this way the sub-images which do not contain the object are gradually removed from consideration leaving just the objects which were sought. Most negative windows are rejected by the couple of stages in the cascade and hence the computation for these windows is low (relative to those which have to go through more stages in the cascades). It is this which reduces the computation time. Note that the classifiers in the cascade are trained using AdaBoost on the remaining set of example images (i.e. if the first stage classifier rejects a number of images then these images are not included when training the second stage classifer). Also note that the threshold determined by AdaBoost for the classifier at each stage is tuned so that the false negative rate is close to zero. Note 38 stages used in Voila and Jones’s final face recognition system. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
46
Haar – Recognition Face recognition 38 stages 6000+ features
4916 positive samples 9544 negative samples Scale independence The complete frontal face detection cascade hads 38 stages with over 6000 features [Viola01]. This was trained on a set of 4916 positive example images and 9544 negative example images. See Figure 0‑6 and Figure 0‑7 for some sample output (of a slightly more advanced version of the system; i.e. which incorporates the ideas from [Lienhart02]). It can be seen from the varying rectangle sizes that the images are being processed at a variety of scales. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
47
Performance Two aspects Computation time
Success and failure rates What is the correct answer? We need ground truth How do we assess success? We need metrics. double before_tick_count=static_cast<double>(getTickCount()); // Put methods to be timed here… double after_tick_count=static_cast<double>(getTickCount()); double duration_in_ms=1000.0*(after_tick_count- before_tick_count) / getTickFrequency(); The complete frontal face detection cascade hads 38 stages with over 6000 features [Viola01]. This was trained on a set of 4916 positive example images and 9544 negative example images. See Figure 0‑6 and Figure 0‑7 for some sample output (of a slightly more advanced version of the system; i.e. which incorporates the ideas from [Lienhart02]). It can be seen from the varying rectangle sizes that the images are being processed at a variety of scales. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
48
Performance – Ground Truth
Has to be manually computed Very difficult to get agreement… The complete frontal face detection cascade hads 38 stages with over 6000 features [Viola01]. This was trained on a set of 4916 positive example images and 9544 negative example images. See Figure 0‑6 and Figure 0‑7 for some sample output (of a slightly more advanced version of the system; i.e. which incorporates the ideas from [Lienhart02]). It can be seen from the varying rectangle sizes that the images are being processed at a variety of scales. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
49
Performance – Metrics Compute TP TN FP FN and then compute…
The complete frontal face detection cascade hads 38 stages with over 6000 features [Viola01]. This was trained on a set of 4916 positive example images and 9544 negative example images. See Figure 0‑6 and Figure 0‑7 for some sample output (of a slightly more advanced version of the system; i.e. which incorporates the ideas from [Lienhart02]). It can be seen from the varying rectangle sizes that the images are being processed at a variety of scales. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
50
Performance – Tuning We can tune performance by altering parameters
For example altering the threshold used for accepting a match in our template matching example The complete frontal face detection cascade hads 38 stages with over 6000 features [Viola01]. This was trained on a set of 4916 positive example images and 9544 negative example images. See Figure 0‑6 and Figure 0‑7 for some sample output (of a slightly more advanced version of the system; i.e. which incorporates the ideas from [Lienhart02]). It can be seen from the varying rectangle sizes that the images are being processed at a variety of scales. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
51
Performance – Precision-Recall Curves
Alternative (common) visualisation Select the parameters which result in the point on the PR curve which is closest to P=1.0, R=1.0 The complete frontal face detection cascade hads 38 stages with over 6000 features [Viola01]. This was trained on a set of 4916 positive example images and 9544 negative example images. See Figure 0‑6 and Figure 0‑7 for some sample output (of a slightly more advanced version of the system; i.e. which incorporates the ideas from [Lienhart02]). It can be seen from the varying rectangle sizes that the images are being processed at a variety of scales. Recognition Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.