Download presentation
1
UW Computer Vision (CSE 576)
Staff Web Page Steve Seitz Rick Szeliski TA: Jiun-Hung Chen
2
Today Readings Introduction Computer vision overview Course overview
Image processing Readings Book: Richard Szeliski, Computer Vision: Algorithms and Applications (please check Web site weekly for updated drafts) Introduction: Chapter 1.0
3
1.1 What Is Computer Vision?
Optical character recognition (OCR): reading handwritten postal codes on letters (Figure 1.4a) and automatic number plate recognition (ANPR); Machine inspection: rapid parts inspection for quality assurance using stereo vision with specialized illumination to measure tolerances on aircraft wings or auto body parts (Figure 1.4b) or looking for defects in steel castings using X-ray vision; Retail: object recognition for automated checkout lanes (Figure 1.4c); 3D model building (photogrammetry): fully automated construction of 3D models from aerial photographs used in systems such as Bing Maps; Clips: terminator 2, enemy of the state (from UCSD “Fact or Fiction” DVD)
4
1.1 What Is Computer Vision?
Medical imaging: registering pre-operative and intra-operative imagery (Figure 1.4d) or performing long-term studies of people’s brain morphology as they age; Automotive safety: detecting unexpected obstacles such as pedestrians on the street, under conditions where active vision techniques such as radar or lidar do not work well (Figure 1.4e; see also Miller, Campbell, Huttenlocher et al. (2008); Montemerlo, Becker, Bhat et al. (2008); Urmson, Anhalt, Bagnell et al. (2008) for examples of fully automated driving); Clips: terminator 2, enemy of the state (from UCSD “Fact or Fiction” DVD)
5
1.1 What Is Computer Vision?
Match move: merging computer-generated imagery (CGI) with live action footage by tracking feature points in the source video to estimate the 3D camera motion and shape of the environment. Such techniques are widely used in Hollywood (e.g., in movies such as Jurassic Park) (Roble 1999; Roble and Zafar 2009); They also require the use of precise matting to insert new elements between foreground and background elements (Chuang, Agarwala, Curless et al. 2002). Clips: terminator 2, enemy of the state (from UCSD “Fact or Fiction” DVD)
6
1.1 What Is Computer Vision?
Motion capture (mocap): using retro-reflective markers viewed from multiple cameras or other vision-based techniques to capture actors for computer animation; Surveillance: monitoring for intruders, analyzing highway traffic (Figure 1.4f), and monitoring pools for drowning victims; Fingerprint recognition and biometrics: for automatic access authentication as well as forensic applications. Clips: terminator 2, enemy of the state (from UCSD “Fact or Fiction” DVD)
12
What Is Computer Vision?
Does human vision (left) or computer vision (right) perform better? Clips: terminator 2, enemy of the state (from UCSD “Fact or Fiction” DVD) Terminator 2 How can we write a computer program to tell the left side is a real human and right side a robot?
13
Every Picture Tells a Story
Black and white photo by H Roger-Viollet of a train accident at La Gare Montparnasse station in Paris on 22 October, 1895 when engine failed to stop at the platform, went through a first-floor window and crashed down onto the street. Goal of computer vision is to write computer programs that can interpret images.
14
Can Computers Match (or Beat) Human Vision?
If you can write a formula for it, computers can excel Computer vision can’t solve the whole problem (yet), so breaks it down into pieces. Many of the pieces have important applications. Yes and no (but mostly no!) Humans are much better at “hard” and versatile things. Computers can be better at “easy,” specific, and repetitive things.
15
Human Perception Has Its Shortcomings…
Although this image appears to be a fairly run-of-the-mill picture of Bill Clinton and Al Gore, a closer inspection reveals that both men have been digitally given identical inner face features and their mutual configuration. Only the external features are different. It appears, therefore, that the human visual system makes strong use of the overall head shape in order to determine facial identity. Example where humans make mistakes that computers can avoid Sinha and Poggio, Nature, 1996
16
Copyright A.Kitaoka 2003
17
Rotating Snakes Circular snakes appear to rotate spontaneously.
18
Current State of the Art
The next slides show some examples of what current vision systems can do.
19
Earth Viewers (3D Modeling)
Image from Microsoft’s Virtual Earth (see also: Google Earth)
20
CSE: Computer Science and Engineering
Photosynth Based on Photo Tourism technology developed here in CSE! by Noah Snavely, Steve Seitz, and Rick Szeliski CSE: Computer Science and Engineering
21
Optical Character Recognition (OCR)
Technology to convert scanned documents to text If you have a scanner, it probably came with OCR software Digit Recognition, AT&T Laboratories AT&T: American Telephone and Telegraph License plate readers
22
Face Detection Many new digital cameras now detect faces
Why would this be useful? Main reason is focus. Also enables “smart” cropping. Many new digital cameras now detect faces Canon, Sony, Fuji, … Smile shutter Subject metering for auto focus, auto exposure, and auto white balance
23
Smile Detection? Sony Cyber-shot® T70 Digital Still Camera
24
Object Recognition (in Supermarkets)
LaneHawk by EvolutionRobotics “A smart camera is flush-mounted in the checkout lane, continuously watching for items. When an item is detected and recognized, the cashier verifies the quantity of items that were found under the basket, and continues to close the transaction. The item can remain under the basket, and with LaneHawk, you are assured to get paid for it… “
25
Face Recognition Who is she?
26
Vision-Based Biometrics
“How the Afghan Girl was Identified by Her Iris Patterns” Read the story
27
Login without a Password…
Face recognition systems now beginning to appear more widely Fingerprint scanners on many new laptops, other devices
28
Object Recognition (in Mobile Phones)
This is becoming real: Microsoft Research Point & Find, Nokia
29
Special Effects: Shape Capture
The Matrix movies, ESC Entertainment, XYZ RGB, NRC NRC: National Research Council
30
Special Effects: Motion Capture
Pirates of the Carribean, Industrial Light and Magic Click here for interactive demo
31
Sports Sportvision first down line Nice explanation on The system has to know the orientation of the field with respect to the camera so that it can paint the first-down line with the correct perspective from that camera's point of view. The system has to know, in that same perspective framework, exactly where every yard line is.
32
Slide content courtesy of Amnon Shashua
Smart Cars Slide content courtesy of Amnon Shashua Mobileye Vision systems currently in high-end BMW, GM, Volvo models By 2010: 70% of car manufacturers. Video demo BMW: Bavarian Motor Works GM: General Motors
33
Vision-Based Interaction (and Games)
Digimask: put your face on a 3D avatar. Nintendo Wii has camera-based IR tracking built in. See Lee’s work at CMU on clever tricks on using it to create a multi-touch display! IR: Infra-Red CMU: Carnegie Mellon University “Game turns moviegoers into Human Joysticks”, CNET Camera tracking a crowd, based on this work.
34
Vision in Space Vision systems (JPL) used for several tasks
NASA'S Mars Exploration Rover Spirit captured this westward view from atop a low plateau where Spirit spent the closing months of 2007. Vision systems (JPL) used for several tasks Panorama stitching 3D terrain modeling Obstacle detection, position tracking For more, read “Computer Vision on Mars” by Matthies et al. JPL: Jet Propulsion Laboratory
35
Robotics NASA’s Mars Spirit Rover http://www.robocup.org/
NASA: National Aeronautics and Space Administration
36
Medical Imaging Image guided surgery 3D imaging Grimson et al., MIT
MIT: Massachusetts Institute of Technology 3D imaging MRI, CT MRI: Magnetic Resonance Imaging CT: Computer Tomography
37
Current State of the Art
You just saw examples of current systems. Many of these are less than 5 years old. This is a very active research area, and rapidly changing. Many new applications in the next 5 years To learn more about vision applications and companies. David Lowe maintains an excellent overview of vision companies
38
Consumer-Level Applications
Stitching: turning overlapping photos into a single seamlessly stitched panorama (Figure 1.5a), as described in Chapter 9; Exposure bracketing: merging multiple exposures taken under challenging lighting conditions (strong sunlight and shadows) into a single perfectly exposed image (Figure 1.5b), as described in Section 10.2; Morphing: turning a picture of one of your friends into another, using a seamless morph transition (Figure 1.5c);
39
3D modeling: converting one or more snapshots into a 3D model of the object or person you are photographing (Figure 1.5d), as described in Section 12.6 Video match move and stabilization: inserting 2D pictures or 3D models into your videos by automatically tracking nearby reference points (see Section 7.4.2) or using motion estimates to remove shake from your videos (see Section 8.2.1);
40
Photo-based walkthroughs: navigating a large collection of photographs, such as the interior of your house, by flying between different photos in 3D (see Sections and ) Face detection: for improved camera focusing as well as more relevant image searching (see Section ); Visual authentication: automatically logging family members onto your home computer as they sit down in front of the webcam (see Section 14.2).
41
1.2 A Brief History Computational theory: What is the goal of the computation (task) and what are the constraints that are known or can be brought to bear on the problem? Representations and algorithms: How are the input, output, and intermediate information represented and which algorithms are used to calculate the desired result?
42
1.2 A Brief History Hardware implementation: How are the representations and algorithms mapped onto actual hardware, e.g., a biological vision system or a specialized piece of silicon? Conversely, how can hardware constraints be used to guide the choice of representation and algorithm? With the increasing use of graphics chips (GPUs) and many-core architectures for computer vision (see Section C.2), this question is again becoming quite relevant.
48
1.3 Book Overview
51
This Course http://www.csie.ntu.edu.tw/~fuh/vcourse/szeliski
52
Optional Project 1: Features
53
Optional Project 2: Panorama Stitching
Indri Atmosukarto, sp
54
Optional Project 3: Face Recognition
55
Final Project Open-ended project of your choosing
56
General Comments Prerequisites—these are essential!
Data structures A good working knowledge of C and C++ programming (or willingness/time to pick it up quickly!) Linear algebra Vector calculus Course does not assume prior imaging experience computer vision, image processing, graphics, etc.
57
Project due Mar. 15 use correlation to do image matching
find to minimize DC & CV Lab. CSIE NTU
58
DC & CV Lab. CSIE NTU
59
DC & CV Lab. CSIE NTU
60
DC & CV Lab. CSIE NTU
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.