Intelligent Vision Processor “Iolanthe II” rounds Channel Island - Auckland-Tauranga Race, 2007 John Morris Computer Science/ Electrical & Computer Engineering,

Intelligent Vision Processor “Iolanthe II” rounds Channel Island - Auckland-Tauranga Race, 2007 John Morris Computer Science/ Electrical & Computer Engineering, The University of Auckland

Intelligent Vision Processor Applications  Robot Navigation  Collision avoidance – autonomous vehicles  Manoeuvring in dynamic environments  Biometrics  Face recognition  Tracking individuals  Films  Markerless motion tracking  Security  Intelligent threat detection  Civil Engineering  Materials Science  Archaeology

Background

Intelligent Vision  Our vision system is extraordinary  Capabilities currently exceed those of any single processor  Our brains  Operates on a very slow ‘clock’: kHz region  Massively parallel >10 10 neurons can compute in parallel  Vision system (eyes) can exploit this parallelism ~3 x 10 6 sensor elements (rods and cones) in human retina

Intelligent Vision  Matching and recognition  Artificial intelligence systems are currently not in the race! For example  Face recognition We can recognize faces From varying angles Under extreme lighting conditions With or without glasses, beards, bandages, makeup, etc With skin tone changes, eg sunburn  Games We can strike balls travelling at > 100 km/h ‪ and Direct that ball with high precision

Human vision  Uses a relatively slow, but massively parallel processor (our brains)  Able to perform tasks  At speeds ‪and  With accuracy beyond capabilities of state-of-the-art artificial systems

Intelligent Artificial Vision  High performance processor  Too slow for high resolution (Mpixel+) image in real time (~30 frames per second)  Useful vision systems  Must be able to Produce 3D scene models Update scene models quickly Immediate goal: 20-30Hz to mimic human capabilities Long term goal: >30 Hz to provide enhanced capabilities Produce accurate scene models

Intelligent Artificial Vision Use human brain as the fundamental model We know it works better than a conventional processor! We need Artificial systemBrain Large numbers of (small) processing elements Neurons Many parallel connectionsNerves

Human Vision Systems  Higher order animals all use binocular vision systems  Permits estimation of distance to an object  Vital for many survival tasks Hunting Avoiding danger Fighting predators  Distance (or depth) computed by triangulation P P’ P’’ P’-P’’ is the disparity It increases as P comes closer Eyeball Lens Retina

Human Vision Systems  Higher order animals all use binocular vision systems  Permits estimation of distance to an object  Vital for many survival tasks Hunting Avoiding danger Fighting predators  Distance (or depth) computed by triangulation P P’ P’’ P’-P’’ is the disparity Increases as P comes closer

Artificial Vision  Evolution took millions of years to optimize vision  Don’t ignore those lessons!  Binocular vision works  Verging optics  Human eyes are known to swivel to ‘fixate’ on an object of interest P P’ P’’ Optical axis P P’ P’’ Fixation point F

Real vs Ideal Systems  Real lenses distort images  Distortion must be removed for high precision work!  Easy but  Conventional technique uses iterative solution  Slow!  Faster approach needed for real time work Image of a rectangular grid with a real lens

Why Stereo?  Range finders give depth information directly  SONAR Simple Not very accurate (long ) Beam spread  Low spatial resolution  Lasers Precise Low divergence  High spatial resolution Requires fairly sophisticated electronics Nothing too challenging in 2008 Why use an indirect measurement when direct ones are available?

Why Stereo?  Passive  Suitable for dense environments  Sensors do not interfere with each other  Wide area coverage Multiple overlapping views obtainable without interference  Wide area 3D data can be acquired at high rates  3D data aids unambiguous recognition  3 rd dimension provides additional discrimination  Textureless regions cause problems but  Active illumination can resolve these  Active patterns can use IR (invisible, eye-safe) light

Artificial Vision Challenges

Artificial Vision - Challenges  High processor power  Match parallel capabilities of human brain  Distortion removal  Real lenses always show some distortion  Depth accuracy  Evolution learnt about verging optics millions of years ago!  Efficient matching  Good corresondence algorithms

Artificial Vision  Simple stereo systems are being produced  Point Grey, etc  All use canonical configuration Parallel axes, coplanar image planes  Computationally simpler  High performance processor doesn’t have time to deal with the extra computational complexity of verging optics Point Grey Research Trinocular vision system

Artificial System Requirements  Highly Parallel Computation  Calculations are not complex but  There are a lot of them in megapixel+ ( >10 6 ) images!  High Resolution Images  Depth is calculated from the disparity If it’s only a few pixels, then depth accuracy is low Basic equation (canonical configuration only!) Depth, z = b f d p Baseline Focal Length Disparity Pixel size

Artificial System Requirements  Depth resolution is critical!  A cricket* player can catch a 100mm ball travelling at 100km/h  High Resolution Images Needed  Disparities are large numbers of pixels  Small depth variations can be measured but  High resolution images increase the demand for processing power! *Strange game played in former British colonies in which a batsmen defends 3 small sticks in the centre of a large field against a bowler who tries to knock them down!

Artificial System Requirements  Conventional processors do not have sufficient processing power  but Moore’s Law says  Wait 18 months and the power will have doubled but  The changes that give you twice the power also give your twice as many pixels in a row and four times as many in an image! Specialized highly parallel hardware is the only solution!

Processing Power Solution

FPGA Hardware  FPGA = Field Programmable Gate Array  ‘Soft’ hardware  Connections and logic functions are ‘programmed’ in much the same way as a conventional von Neuman processor  Creating a new circuit is about as difficult as writing a programme!  High order parallelism is easy Replicate the circuit n times As easy as writing a for loop!

FPGA Hardware  FPGA = Field Programmable Gate Array  ‘Circuit’ is stored in static RAM cells  Changed as easily as reloading a new program

FPGA Hardware  Why is programmability important? or  Why not design a custom ASIC?  Optical systems don’t have the flexibility of a human eye Lenses fabricated from rigid materials  Not possible to make a ‘one system fits all’ system  Optical configurations must be designed for each application Field of view Resolution required Physical constraints …  Processing hardware has to be adapted to the optical configuration  If we design an ASIC, it will only work for one application!!

Correspondence or Matching

Stereo Correspondence Can you find all the matching points in these two images? “Of course! It’s easy!” The best computer matching algorithms get 5% or more of the points completely wrong! …and take a long time to do it! They’re not candidates for real time systems!!

Stereo Correspondence  High performance matching algorithms are global in nature  Optimize over large image regions using energy minimization schemes  Global algorithms are inherently slow Iterate many times over small regions to find optimal solutions

Correspondence Algorithms  Good matching performance, global, low speed  Graph-cut, belief-propagation, …  High speed, simple, local, high parallelism, lowest performance  Correlation  High speed, moderate complexity, parallel, medium performance Dynamic programming algorithms

Depth Accuracy

Stereo Configuration  Canonical configuration – Two cameras with parallel optical axes  Rays are drawn through each pixel in the image  Ray intersections represent points imaged onto the centre of each pixel Points along these lines have the same disparity  but To obtain depth information, a point must be seen by both cameras, ie it must be in the Common Field of View Depth resolution

a Stereo Camera Configuration  Now, consider an object of extent, a  To be completely measured, it must lie in the Common Field of View but  place it as close to the camera as you can so that you can obtain the best accuracy, say at D ?Now increase b to increase the accuracy at D !But you must increase D so that the object stays within the CFoV!  Detailed analysis leads to an optimum value of b  a b D a

Increasing the baseline % good matches Baseline, b Images: ‘corridor’ set (ray-traced) Matching algorithms: P2P, SAD Increasing the baseline decreases performance!!

Increasing the baseline Standard Deviation Examine the distribution of errors Images: ‘corridor’ set (ray-traced) Matching algorithms: P2P, SAD Increasing the baseline decreases performance!! Baseline, b

Increased Baseline  Decreased Performance  Statistical Higher disparity range  increased probability of matching incorrectly - you’ve simply got more choices!  Perspective Scene objects are not fronto-planar Angled to camera axes  subtend different numbers of pixels in L and R images  Scattering Perfect scattering (Lambertian) surface assumption OK at small angular differences  increasing failure at higher angles  Occlusions Number of hidden regions increases as angular difference increases  increasing number of ‘monocular’ points for which there is no 3D information!

Evolution  Human eyes ‘verge’ on an object to estimate its distance, ie the eyes fix on the object in the field of view Configuration commonly used in stereo systems Configuration discovered by evolution millions of years ago Note immediately that the CFoV is much larger!

Look at the optical configuration!  If we increase f, then D min returns to the critical value! Original f Increase f

Depth Accuracy - Verging axes, increased f Now the depth accuracy has increased dramatically! Note that at large f, the CFoV does not extend very far!

Summary

Summary: Real time stereo  General data acquisition is:  Non contact Adaptable to many environments  Passive Not susceptible to interference from other sensors  Rapid Acquires complete scenes in each shot  Imaging technology is well established Cost effective, robust, reliable  3D data enhances recognition  Full capabilities of 2D imaging system +Depth data  With hardware acceleration  3D scene views available for Control Monitoring in real time  Rapid response  rapid throughput Host computer is free to process complex control algorithms  Intelligent Vision Processing Systems which can mimic human vision system capabilities!

Our Solution

System Architecture Serial Interface Firewire/ GigE/ CameraLink Corrected Images Depth Map Line Buffers Distortion Removal Image Alignment Host Higher order Interpretation L Camera R Camera Control Signals FPGA PC Stereo Matching Disparity  Depth

Distortion removal  Image of a rectangular grid from camera with simple zoom lens  Lines should be straight!  Store displacements of actual image from ideal points in LUT  Removal algorithm  For each ideal pixel position Get displacement to real image Calculate intensity of ideal pixel (bilinear interpolation)

Distortion Removal  Fundamental Idea  Calculation of undistorted pixel position Simple but slow  Not suitable for real time but It’s the same for every image! So, calculate once!  Create a look up table containing ideal  actual displacements for each pixel u d = u ud  (1+  2 +  4 +..)r 2 r 2 = (u ud +v ud ) 2

Distortion Removal  Creating the LUT  One entry (dx,dy) per pixel  For a 1 Mpixel image needs 8 Mpixels! Each entry is a float – (dx,dy) requires 8 bytes  However, distortion is a smooth curve  Store one entry per n pixels Trials show that n=64 is OK for severely distorted image LUT row contains 2 10 / 2 6 = 2 4 = 16 entries Total LUT is 256 entries  Displacement for pixel j,k du jk = (j mod 64) *  u j/64,k/64   u j/64,k/64 is stored in LUT  Simple, fast circuit Since the algorithm runs along scan lines, this multiplication is done by repeated addition

Alignment correction  In general, cameras will not be perfectly aligned in canonical configuration  Also, may be using verging axes to improve depth resolution  Calculate locations of epipolar lines once!  Add displacements to LUT for distortion!

Real time 3D data acquisition  Real time stereo vision  Implemented Gimel’farb’s Symmetric Dynamic Programming Stereo in FPGA hardware  Real time precise stereo vision  Faster, smaller hardware circuit  Real time 3D maps 1% depth accuracy with 2 scan line latency at 25 frames/se System block diagram: lens distortion removal, misalignment correction and depth calculator Output is stream of depth values: a 3D movie!

Real time 3D data acquisition  Possible Applications  Collision avoidance for robots  Recognition via 3D models Fast model acquisition Imaging technology not scanning! Recognition of humans without markers Tracking objects Recognizing orientation, alignment  Process monitoring eg Resin flow in flexible (‘bag’) moulds  Motion capture – robot training System block diagram: lens distortion removal, misalignment correction and depth calculator Output is stream of depth values: a 3D movie!

FPGA Stereo System Firewire Cables Firewire Physical Layer ASIC Firewire Link Layer ASIC FPGA Altera Stratix Parallel Host Interface FPGA Prog Cable

Summary

 Challenges of Artificial Vision Systems  Real-time Image processing requires compute power!  Correspondence (Matching)  Depth accuracy  Evolution Lessons  Emulate parallel processing capability of human brain  Use verging optics

Summary  Our system  FPGA ‘front end’ processor Remove distortion Correct camera misalignment Stereo matching Using dynamic programming  Latency Several scan lines (  1 millisecond) Depends on lens distortion and camera alignment Host does not have to wait for a whole image!  Depth (distance) maps in real-time 3D vision!  Frees host processor for image interpretation Use both technologies (FPGA, conventional CPU) where they perform best!

Ongoing Photogrammetry Projects

Ongoing Projects  Face Recognition  Development of Face Models  Animation  Automated Driving  With Daimler-Benz  Stereo Algorithms  Improved correspondence algorithms  High Quality Rendering  Movie special effects – eg “The Lord of the Rings”  Using reconfigurable hardware (FPGA)

Spare slides

Stereo matching  Automated stereo systems find matching regions in the two images  The separation of the matching regions is the disparity from which depth is calculated  Matching algorithms generally search over a range of possible disparities  Looking for the best ‘match’ in the two images Stereo Correspondence is a classical challenge for AI systems Our brains match regions in images without effort.. but computers struggle to match as well!

Stereo Photogrammetry Pairs of images giving different views of the scene can be used to compute a depth (disparity) map  Key task – Correspondence Locate matching regions in both images Epipolar constraint Align images so that matches must appear in the same scan line in L & R images

Detail System Architecture Pixel Buffers Pixel Address Generator Removes distortion and misalignment n Disparity Calculators One for each possible disparity value Predecessor matrix (dynamic programming) Stream of disparity values

Intelligent Vision Processor “Iolanthe II” rounds Channel Island - Auckland-Tauranga Race, 2007 John Morris Computer Science/ Electrical & Computer Engineering,

Similar presentations

Presentation on theme: "Intelligent Vision Processor “Iolanthe II” rounds Channel Island - Auckland-Tauranga Race, 2007 John Morris Computer Science/ Electrical & Computer Engineering,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Intelligent Vision Processor “Iolanthe II” rounds Channel Island - Auckland-Tauranga Race, 2007 John Morris Computer Science/ Electrical & Computer Engineering,

Similar presentations

Presentation on theme: "Intelligent Vision Processor “Iolanthe II” rounds Channel Island - Auckland-Tauranga Race, 2007 John Morris Computer Science/ Electrical & Computer Engineering,"— Presentation transcript:

Similar presentations

About project

Feedback