Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Vision CS 223b Jana Kosecka CA: Sanaz Mohatari

Similar presentations


Presentation on theme: "Computer Vision CS 223b Jana Kosecka CA: Sanaz Mohatari"— Presentation transcript:

1 Computer Vision CS 223b Jana Kosecka http://cs223b.stanford.edu/ kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu Han-I Su hanisu@stanfor.eduhanisu@stanfor.edu

2 Logistics Grading: Homeworks (about every 2 weeks) 30% Midterm: 30% Final project: 40% Prerequisites: basic statistical concepts, geometry, linear algebra, calculus Recommended text: Introductory Techniques for 3D Computer Vision (E. Trucco, A. Verri, Prentice Hall, 1998) From Images to Geometric Models: Y. Ma, S. Soatto, J.Kosecka and S. Sastry, Springer Verlag 2003 Computer Vision: Algorithms and Applications R. Szeliski (preprint) Computer Vision a Modern Approach (D. Forsyth, J. Ponce, Prentice Hall 2002) Required Software MATLAB (with Image Processing toolbox) Open CV library

3 The Texts

4 Project Deadlines Check Web site for proposals, or develop your own Team up! Dates –Feb 13: Project proposals due –Mar 12: Final report due –Mar 15: Mini-workshop on projects

5 To define your own project… Find a mentor Generate project description for the Class Web site Gather data, process data Write suitable project proposal Examples: Learn to find sports videos on youtube.com Match images of same location at flickr.com Fly autonomous helicopter with camera

6 Today’s Goals Learn about CS223b Get Excited about Computer Vision Learn about image formation (Part 1)

7 What is vision? From the 3-D world to 2-D images: image formation (physics). –Domain of artistic reproduction (synthesis): painting, graphics. From 2-D images to the 3-D world: image analysis (mathematical modeling, inference). –Domain of vision: biological (eye+brain), computational

8 IMAGE SYNTHESIS: simulation of the image-formation process Pinhole (perspective) imaging in most ancient civilizations. Euclid, perspective projection, 4th century B.C., Alexandria (Egypt) Pompeii frescos, 1st century A.D. (ubiquitous). Geometry understood very early on, then forgotten. Image courtesy of C. Taylor

9 PERSPECTIVE IMAGING (geometry) Image courtesy of C. Taylor Re-discovered and formalized in the Renaissance: Fillippo Brunelleschi, first Renaissance artist to paint with correct perspective,1413 “Della Pictura”, Leon Battista Alberti, 1435, first treatise Leonardo Da Vinci, stereopsis, shading, color, 1500s Raphael, 1518

10 Biological motivations Understanding visual sensing modality and its role - We have no difficulties to navigate, manipulate objects recognize familiar places and faces - How can we successfully carry out all these everyday tasks ? - How does the visual perception mediates these activities ? - Overall system Sensation and perception – integrate and interpret sensory readings Behavior – control muscles and glands to mediate some behavior Memory – long term memory (declarative – faces, places, events) short term memory (procedural – associations, skills) Higher level functions – internal models and representations of the environments

11 Goals of Computer Vision Build machines and develop algorithms which can automatically replicate some functionalities of biological visual system -Systems which navigate in cluttered environments -Systems which can recognize objects, activities -Systems which can interact with humans/world Synergies with other disciplines and various applications Artificial Intelligence - robotics, natural language understanding Vision as a sensor – medical imaging, Geospatial Imaging, robotics, visual surveillance, inspection

12 Computer Vision Visual Sensing Images I(x,y) – brightness patterns - image appearance depends on structure of the scene - material and reflectance properties of the objects - position and strength of light sources

13 [Felleman & Van Essen, 1991] This is the part of your brain that processes visual information VISUAL INFORMATION PROCESSING

14 This is how a computer represents it

15 And so is this …

16 And so are these! We need to extract some “invariant”, i.e. what is common to all these images (they are all images of my office)

17 BUMMER! THIS IS IMPOSSIBLE! –THM: [Weiss, 1991]: There exists NO generic viewpoint invariant! –THM: [Chen et al., 2003]: There exists NO photometric invariant!! So, how do we (primates) solve the problem?

18 EXAMPLE OF A (VERY COMMON) SENSOR NETWORK Retina performs distributed computation: –Contrast adaptation (lateral inhibition) –Enhancement/edgels (e.g. Mach bands) –Motion detection (leap frog) Does the human visual system really solve the problem? Not really!

19 Optical Illusion

20 http://web.mit.edu/persci/adelson/checkershadow_illusion.html Checker A and B are of the same gray-level value

21 SIDE EFFECTS OF LATERAL INHIBITION THESE ARE NOT SPIRALSTHESE ARE NOT SPIRALS AND THEY ARE NOT MOVING!AND THEY ARE NOT MOVING! Cause Peripheral Drift http://www.psy.rittsumei.ac.jp/~akitaoka/rotsnakee.html

22 Challenges/Issues About 40% of our brain is devoted to vision We see immediately and can form and understand images instantly Several strategies for forming a representation of the scenes Task dependent attentional mechanisms Detailed representations are often not necessary Different approaches in the past Animate Vision (biologically inspired), Purposive Vision (depending on the task/purpose) - have many shortcomings

23 Recovery of the properties of the environment Recovery of the properties of the environment from single or multiple views using various visual cues from single or multiple views using various visual cues Vision problems Segmentation Segmentation Recognition Recognition Reconstruction Reconstruction Vision Based Control - Action Vision Based Control - Action stereo stereo motion motion shading shading texture texture brightness, color brightness, color Visual Cues

24 Example 1:Stereo See http://schwehr.org/photoRealVR/example.html

25 Example 2: Structure From Motion http://www.cs.unc.edu/Research/urbanscape

26 Example 2: Structure From Motion http://www.cs.unc.edu/Research/urbanscape

27 Example 2: Structure From Motion http://www.cs.unc.edu/Research/urbanscape

28 Example 3: 3D Modeling http://www.photogrammetry.ethz.ch/research/cause/3dreconstruction3.html

29 Example 5: Segmentation http://elib.cs.berkeley.edu/photos/classify/

30 Example 6: Classification

31

32 Example 9: Human Vision

33 Segmentation – partition image into separate objects Clustering and search algorithms in the space of visual cues Clustering and search algorithms in the space of visual cues

34

35 Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. ICCV, copyright 1998, IEEE Recognition by finding patterns Figure from A Statistical Method for 3D Object Detection Applied to Faces and Cars, H. Schneiderman and T. Kanade, CVPR, 2000. Face detection Human detection

36 Samples from motorbikes appearance model (Fergus’s et. al results) Recognition by finding patterns

37 Focus of this course The focus of the course : 1. Geometry of Single and Multiple Views Shape and Motion Recovery, Matching, Alignment Problems Reconstruction (from 2D to 3D) Computation of Pictorial cues – shading, texture, blur, contour, stereo, motion cues 2. Object Detection and Recognition Object representation, detection in cluttered scenes Recognition of object categories

38 How to reliably recover and represent the geometric model from single image or video and camera motion/pose Representation issues depends on the task/applications Image-based rendering, Computer Graphics Virtual and Augmented Reality Vision based control, surveillance, target tracking Human computer interaction Medical imaging (alignment, monitoring of change) Video Analysis 1. Geometry of Single and Multiple Views, Video

39 Vision and Computer Graphics - image based rendering techniques - 3D reconstruction from multiple views or video - single view modeling - view morphing (static and dynamic case)

40 Modeling with Multiple Images University High School, Urbana, Illinois Three of Twelve Images, courtesy Paul Debevec

41 Final Model

42 Visual surveillance wide area surveillance, traffic monitoring Interpretation of different activities

43 Virtual and Augmented Reality, Human computer Interaction Virtual object insertion various gesture based interfaces Interpretation of human activities Enabling technologies of intelligent homes, smart spaces

44 Topics Image Formation, Representation of Camera Motion, Camera Calibration Image Features – filtering, edge detection, point feature detection Image alignment, 3D structure and motion recovery, stereo Analysis of dynamic scenes, detection, tracking Object detection and object recognition

45 Surveillance

46

47 Image Morphing, Mosaicing, Alignment Images of CSL, UIUC

48 3D RECONSTRUCTION FROM MULTIPLE VIEWS: GEOMETRY AND PHOTOMETRY jin-soatto-yezzi; image courtesy: j-y bouguet - intel

49 estimated shape laser-scanned, manually polished jin-soatto-yezzi

50 with h. jin

51 APPLICATIONS – Unmanned Aerial Vehicles (UAVs) Berkeley Aerial Robot (BEAR) Project Rate: 10Hz Accuracy: 5cm, 4 o


Download ppt "Computer Vision CS 223b Jana Kosecka CA: Sanaz Mohatari"

Similar presentations


Ads by Google