Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Vision Systems for the Blind and Visually Disabled. STATS 19 SEM 2. 263057202. Talk 3. Alan Yuille. UCLA. Dept. Statistics and Psychology. www.stat.ucla/~yuille.

Similar presentations


Presentation on theme: "Computer Vision Systems for the Blind and Visually Disabled. STATS 19 SEM 2. 263057202. Talk 3. Alan Yuille. UCLA. Dept. Statistics and Psychology. www.stat.ucla/~yuille."— Presentation transcript:

1 Computer Vision Systems for the Blind and Visually Disabled. STATS 19 SEM 2. 263057202. Talk 3. Alan Yuille. UCLA. Dept. Statistics and Psychology. www.stat.ucla/~yuille

2 Computer Vision Systems Digital Camera + Portable Computer + Digital Camera + Portable Computer + Speech Synthesizer. (I) Input image from camera. (I) Input image from camera. (II) Algorithm on PC searches the image to detect and read text. (II) Algorithm on PC searches the image to detect and read text. (III) Speech Synthesizer speaks the text. (III) Speech Synthesizer speaks the text.

3 LED Reader LED/LCD displays are very common. But LED/LCD displays are very common. But impossible for the Blind to use. Controlled domain. Design system to detect and read the displays. Controlled domain. Design system to detect and read the displays.

4 LED Reader. Prototype System. (1999). Prototype System. (1999). Subjects using the LED Reader. Subjects using the LED Reader. Implementation using special purpose hardware being built. Implementation using special purpose hardware being built.

5 Blind Volunteer with Camera Blind volunteers take photographs. Still digital camera, or video camera. Blind volunteers take photographs. Still digital camera, or video camera. Automatic camera settings. Gain control. Automatic camera settings. Gain control. Dynamic range of the eye is far larger than the range of a camera. Dynamic range of the eye is far larger than the range of a camera.

6 Gain Control: Digital Cameras Limitation due to the quality of the input images. Limitation due to the quality of the input images. Blind users cannot point camera, focus, adjust camera gain, or keep the camera steady. Blind users cannot point camera, focus, adjust camera gain, or keep the camera steady. Enormous variation in the intensity in natural images: – range 10,000,000, Enormous variation in the intensity in natural images: – range 10,000,000, camera range is 100. camera range is 100.

7 Biologically Inspired Cameras. Ideal: cameras with the ability of the Ideal: cameras with the ability of the human retina: (I) Large gain control (from 100 to 100,000,000). (II) More than 30 frames/second (to decrease motion blur). Companies are designing cameras with these abilities. (Carver Mead). Companies are designing cameras with these abilities. (Carver Mead).

8 Images taken by the Blind Top two rows are Images taken by blind volunteers. Bottom two rows are images by Scientists. Scientists better at orienting the camera and Centering text.

9 Experiments with Blind Volunteers Experiments with Blind Volunteers. In San Francisco. Experiments with Blind Volunteers. In San Francisco. Experiments showed: Experiments showed: 1. Blind volunteers could keep the camera approximately horizontal. 2. They could hold it steady so there is little motion blur. 3. Automatic gain control was usually sufficient to give good quality images.

10 Visual Search to Detect Text. The human visual system has mechanisms for directing “interesting parts” of images. The human visual system has mechanisms for directing “interesting parts” of images. Known as “Visual Attention”. Known as “Visual Attention”. Visual attention causes eye movements and directs gaze. Visual attention causes eye movements and directs gaze. We need a form of visual attention to detect text. We need a form of visual attention to detect text. This must be fast. We want to quickly reject non-text areas of the image. This must be fast. We want to quickly reject non-text areas of the image.

11 Strategy I: Twenty Questions. Divide the image up into many small windows. Divide the image up into many small windows. Apply “filter tests” to each window. Apply “filter tests” to each window. If the window fails the test, then eliminate it. If the window fails the test, then eliminate it. If it passes, then proceed to the next test. If it passes, then proceed to the next test. Apply tests until there are only a few (1-5) windows in the image which pass all tests. Apply tests until there are only a few (1-5) windows in the image which pass all tests.

12 Strategy II: Test Selection. Choose a vocabulary of tests. E.g. average image brightness, local image variability. Choose a vocabulary of tests. E.g. average image brightness, local image variability. Use a Machine Learning algorithm “AdaBoost” to select and combine tests. Use a Machine Learning algorithm “AdaBoost” to select and combine tests. Requires a training dataset of text and non-text. (Learning with a teacher). Requires a training dataset of text and non-text. (Learning with a teacher). AdaBoost combines “weak tests” into a “strong test”. AdaBoost combines “weak tests” into a “strong test”.

13 AdaBoost Example: Face Detection. AdaBoost was AdaBoost was used in Computer Vision to detect faces. Best test: Best test: Forehead brighter than eyes.

14 Example Sequence I: Series of tests, selected by AdaBoost. Series of tests, selected by AdaBoost.

15 Example II.

16 Results of AdaBoost. Strong Performance: Very High Detection Rate.

17 Failures of AdaBoost. AdaBoost fails to detect some text. AdaBoost fails to detect some text.

18 Next Stage: Binarization. AdaBoost detects regions of text in windows of the image. AdaBoost detects regions of text in windows of the image. Apply a binarization algorithm. Label the points within the window as letters/digits or as background. Apply a binarization algorithm. Label the points within the window as letters/digits or as background. Extend the binarization to areas outside the window – to include letters/digits that are just outside the window. Extend the binarization to areas outside the window – to include letters/digits that are just outside the window.

19 Results of Binarization.

20 Optical Character Recognition (OCR) OCR has been developed for reading text on documents. OCR has been developed for reading text on documents. Black and white images. High resolution. Black and white images. High resolution. We apply it to the binarized output of AdaBoost. We apply it to the binarized output of AdaBoost. OCR will read the text and reject regions which are not-text. OCR will read the text and reject regions which are not-text.

21 Text detected by AdaBoost, Binarized, and read by OCR.

22 Text detected, but not read. Non-text detected, rejected by OCR. Non-text detected, read by OCR.

23 Performance Can detect text within our dataset (San Francisco) with false negative rate of 2.8%. Can detect text within our dataset (San Francisco) with false negative rate of 2.8%. We can read the detected text correctly at 93.0%. We can read the detected text correctly at 93.0%. Read detected non-text as text at 1.0%. Read detected non-text as text at 1.0%. Prototype System: room for improvement. Prototype System: room for improvement.

24 Summary It will soon be practical to build Computer It will soon be practical to build Computer Vision systems for text detection and reading that work in unconstrained Vision systems for text detection and reading that work in unconstrained domains. domains.


Download ppt "Computer Vision Systems for the Blind and Visually Disabled. STATS 19 SEM 2. 263057202. Talk 3. Alan Yuille. UCLA. Dept. Statistics and Psychology. www.stat.ucla/~yuille."

Similar presentations


Ads by Google