Presentation is loading. Please wait.

Presentation is loading. Please wait.

Perception Vision, Sections 24.1 - 24.3 Speech, Section 24.7.

Similar presentations


Presentation on theme: "Perception Vision, Sections 24.1 - 24.3 Speech, Section 24.7."— Presentation transcript:

1 Perception Vision, Sections 24.1 - 24.3 Speech, Section 24.7

2 Computer Vision §“the process by which descriptions of physical scenes are inferred from images of them.” -- S. Zucker §“produces from images of the external 3D world a description that is useful to the viewer and not cluttered by irrelevant information”

3 Typical Applications §Medical Image Analysis §Aerial Photo Interpretation §Material Handling §Inspection §Navigation

4 Multimedia Applications §Image compression §Video teleconferencing §Virtual classrooms

5 Image pixelation

6 Pixel values

7 How to recognize faces?

8 Problem Background §M training images §Each image is N x N pixels §Each image is l normalized for face position, orientation, scale, and brightness §There are several pictures of each face l different “moods”

9 Your Task §Determine if the test image contains a face §If it contains a face, is it a face of a person in our database? §If it is a person in our database, which one? §Also, what is the probability that it is Jim?

10 Image Space §An N x N image can be thought of as a point in an N 2 dimensional image space §Each pixel is a feature with a gray scale value. §Example: l 512 x 512 image l each pixel can be 0 (black) to 255 (white)

11 Nearest Neighbor §The most likely match is the nearest neighbor §But that would take too much processing §Since all images are faces, they will have very high similarity

12 Face Space §Lower dimensionality to both simplify the storage and generalize the answer §Use eigenvectors to distill the 20 most distinctive metrics §Make a 20-item array for each face that contains the values of 20 features that most distinguish faces. §Now each face can be stored in 20 words

13 The average face §Training images are I 1, I 2,... I m §Average image is A

14 Weight of an image in each feature §For k=1,..., 20 features, compute the similarity between the Input image, I, and the kth eigenvector, E k

15 Image in Face Space §“Only” 20 dimensional space §W = [w 1, w 2,..., w 20 ], a column vector of weights that indicate the contribution of each of the 20 eigenfaces in I §Each image is projected from a point in high dimensional space into face space §20 features * 32 bits = 320 bits per image

16 Reconstructing image I §If M’ < M, we can only approximate I §Good enough for recognizing faces

17 Picking the 20 Eigenfaces §Principal Component Analysis l (also called Karhunen-Loeve transform) §Create 20 images that maximize the information content in eigenspace §Normalize by subtracting the average face §Compute the covariance matrix, C §Find the eigenvectors of C that have the 20 largest eigenvalues

18 Build a database of faces §Given a training set of face images, compute the 20 largest eigenvectors, E 1, E 2,..., E 20 l Offline because it is slow §For each face in the training set, compute the point in eigenspace, W = [w 1,w 2,...,w 20 ] l Offline, because it is big

19 Categorizing a test face §Given a test image, I test, project it into the 20-space by computing W test §Find the closest face in the database to the test face: l where Wk is the point in facespace associated with the kth person l || * || denotes the euclidean distance in facespace

20 Distance from facespace §Find the distance of the test image from eigenspace

21 Is this a face? §If dffs < threshold1 l then if d < threshold2 the test image is a face that is very close to the nearest neighbor, classify it as that person l else the image is a face, but not one we recognize § else the image probably does not contain a face

22 Face Recognition Accuracy §Using 20-dimensional facespace resulted in about 95% correct classification on a database of 7500 images of 3000 people §If there are several images per person, the average W for that person helps improve accuracy

23 Edge Detection §Finding simple descriptions of objects in complex images l find edges l interrelate edges

24 Causes of edges §Depth discontinuity l One surface occludes another §Surface orientation discontinuity l the edge of a block §reflectance discontinuity l texture or color changes §illumination discontinuity l shadows

25 Examples of edges

26 Finding Edges Image Intensity along a line First derivative of intensity Smoothed via convolving with gaussian

27 Pixels on edges

28 Edges found

29 Human-Computer Interfaces §Handwriting recognition §Optical Character Recognition §Gesture recognition §Gaze tracking §Face recognition

30 Vision Conclusion §Machine Vision is so much fun, we have a full semester course in it §Current research in vision modeling is very active l More breakthroughs are needed

31 Speech Recognition Section 24.7

32 Speech recognition goal §Find a sequence of words that maximizes P(words | signal)

33 Signal Processing §“Toll quality” was the Bell labs definition of digitized speech good enough for long distance calls (“toll” calls) l Sampling rate: 8000 samples per second l Quantization factor: 8 bits per sample §Too much data to analyze to find utterances directly

34 Computational Linguistics §Human speech is limited to a repertoire of about 40 to 50 sounds, called phones §Our problem: l What speech sounds did the speaker utter? l What words did the speaker intend? l What meaning did the speaker intend?

35 Finding features

36 Vector Quantization §The 255 most common clusters of feature values are labeled C1, …, C255 §Send only the 8 bit label §One byte per frame (a 100-fold improvement over the 500 KB/minute)

37 How to Wreck a Nice Beach §where P(signal) is a constant (it is the signal we received) §So we want

38 Unigram Frequency §Word frequency §Even though his handwriting was sloppy, Woody Allen’s bank hold-up note probably should not have been interpreted as “I have a gub” l The word “gun” is common l The word “gub” is unlikely

39 Language model §Use the language model to compare l P(“wreck a nice beach”) l P(“recognize speech”) §Use naïve Bayes to asses the likelihood for each word that it will appear in this context

40 Bigram model §want P(w i | w 1, w 2, …, w n ) l approximate it by P(w i | w I-1 ) §Easy to train l Simply count the number of times each word pair occurs l “I has” is unlikely, “I have” is likely l “an gun” is unlikely, “a gun” is likely

41 Trigram §Some trigrams are very common l only track the most common trigrams §Use a weighted sum of l unigram l bigram l trigram

42 Near the end of the semester l Time flies like an arrow l Fruit flies like a banana §It is currently hard to incorporate parts of speech and sentence grammar into the probability calculation l lots of ambiguity l but humans seem to do it

43 Conclusion §Speech recognition technology is changing very quickly §Highly parallel §Amenable to hardware implementations


Download ppt "Perception Vision, Sections 24.1 - 24.3 Speech, Section 24.7."

Similar presentations


Ads by Google