3D Object Recognition Using Computer Vision VanGogh Imaging, Inc.

Slides:

Advertisements

Similar presentations

Real-Time Template Tracking

Advertisements

The fundamental matrix F

SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.

T1.1- Analysis of acceleration opportunities and virtualization requirements in industrial applications Bologna, April 2012 UNIBO.

Caroline Rougier, Jean Meunier, Alain St-Arnaud, and Jacqueline Rousseau IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 5,

Joydeep Biswas, Manuela Veloso

MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.

Vision Based Control Motion Matt Baker Kevin VanDyke.

December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.

Virtual Dart: An Augmented Reality Game on Mobile Device Supervisor: Professor Michael R. Lyu Prepared by: Lai Chung Sum Siu Ho Tung.

Registration of two scanned range images using k-d tree accelerated ICP algorithm By Xiaodong Yan Dec

Multimodal Templates for Real-Time Detection of Texture-less Objects in Heavily Cluttered Scenes Stefan Hinterstoisser, Stefan Holzer, Cedric Cagniart,

Accelerating a random forest classifier: multi-core, GP-GPU, or FPGA?

Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.

Real-time Embedded Face Recognition for Smart Home Fei Zuo, Student Member, IEEE, Peter H. N. de With, Senior Member, IEEE.

A Study of Approaches for Object Recognition

Multi-view stereo Many slides adapted from S. Seitz.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.

May 2004Stereo1 Introduction to Computer Vision CS / ECE 181B Tuesday, May 11, 2004  Multiple view geometry and stereo  Handout #6 available (check with.

California Car License Plate Recognition System ZhengHui Hu Advisor: Dr. Kang.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

Lecture 6: Feature matching and alignment CS4670: Computer Vision Noah Snavely.

Real-Time Face Detection and Tracking Using Multiple Cameras RIT Computer Engineering Senior Design Project John RuppertJustin HnatowJared Holsopple This.

Mahesh Sukumar Subramanian Srinivasan. Introduction Face detection - determines the locations of human faces in digital images. Binary pattern-classification.

Vision Guided Robotics

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.

Department of Electrical Engineering National Cheng Kung University

CSE 185 Introduction to Computer Vision

Final Exam Review CS485/685 Computer Vision Prof. Bebis.

Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.

Graphics on Key by Eyal Sarfati and Eran Gilat Supervised by Prof. Shmuel Wimer, Amnon Stanislavsky and Mike Sumszyk 1.

Reading Notes: Special Issue on Distributed Smart Cameras, Proceedings of the IEEE Mahmut Karakaya Graduate Student Electrical Engineering and Computer.

Automatic Registration of Color Images to 3D Geometry Computer Graphics International 2009 Yunzhen Li and Kok-Lim Low School of Computing National University.

Speaker : Meng-Shun Su Adviser : Chih-Hung Lin Ten-Chuan Hsiao Ten-Chuan Hsiao Date : 2010/01/26 ©2010 STUT. CSIE. Multimedia and Information Security.

CSCE 5013 Computer Vision Fall 2011 Prof. John Gauch

Lecture 4: Feature matching CS4670 / 5670: Computer Vision Noah Snavely.

Implementing Codesign in Xilinx Virtex II Pro Betim Çiço, Hergys Rexha Department of Informatics Engineering Faculty of Information Technologies Polytechnic.

A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.

December 4, 2014Computer Vision Lecture 22: Depth 1 Stereo Vision Comparing the similar triangles PMC l and p l LC l, we get: Similarly, for PNC r and.

CS654: Digital Image Analysis Lecture 25: Hough Transform Slide credits: Guillermo Sapiro, Mubarak Shah, Derek Hoiem.

Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.

Vehicle Segmentation and Tracking From a Low-Angle Off-Axis Camera Neeraj K. Kanhere Committee members Dr. Stanley Birchfield Dr. Robert Schalkoff Dr.

21 June 2009Robust Feature Matching in 2.3μs1 Simon Taylor Edward Rosten Tom Drummond University of Cambridge.

Adaptive Rigid Multi-region Selection for 3D face recognition K. Chang, K. Bowyer, P. Flynn Paper presentation Kin-chung (Ryan) Wong 2006/7/27.

1 by: Ilya Melamed Supervised by: Eyal Sarfati High Speed Digital Systems Lab.

1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.

Jack Pinches INFO410 & INFO350 S INFORMATION SCIENCE Computer Vision I.

Looking at people and Image-based Localisation Roberto Cipolla Department of Engineering Research team

GPU Accelerated MRI Reconstruction Professor Kevin Skadron Computer Science, School of Engineering and Applied Science University of Virginia, Charlottesville,

Copyright © 2004, Dillon Engineering Inc. All Rights Reserved. An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs  Architecture optimized.

CSE 185 Introduction to Computer Vision Feature Matching.

Skuller: A volumetric shape registration algorithm for modeling skull deformities Yusuf Sahillioğlu 1 and Ladislav Kavan 2 Medical Image Analysis 2015.

Visual Odometry David Nister, CVPR 2004

776 Computer Vision Jan-Michael Frahm Spring 2012.

Person Following with a Mobile Robot Using Binocular Feature-Based Tracking Zhichao Chen and Stanley T. Birchfield Dept. of Electrical and Computer Engineering.

CSCI 631 – Foundations of Computer Vision March 15, 2016 Ashwini Imran Image Stitching.

1 Munther Abualkibash University of Bridgeport, CT.

Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging.

CENG 789 – Digital Geometry Processing 08- Rigid-Body Alignment

Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Answering ‘Where am I?’ by Nonlinear Least Squares

CENG 789 – Digital Geometry Processing 10- Least-Squares Solutions

Recognizing Deformable Shapes

New horizons in the artificial vision

Iterative Optimization

Aim of the project Take your image Submit it to the search engine

CENG 789 – Digital Geometry Processing 11- Least-Squares Solutions

Recognizing Deformable Shapes

Presentation transcript:

3D Object Recognition Using Computer Vision VanGogh Imaging, Inc.

Kenneth Lee CEO/Founder

Corporate Overview Founded in 2007, located in McLean VA Mission: “Provide easy to use, real-time 3D computer vision (CV) technology for embedded and mobile applications” – 2D to 3D for better visualization, higher reliability, and accuracy – Solve problems that require spatial measurements (e.g. parts inspection) Target customer: Application and System Developers – Enhance existing product or develop new products Product: ‘Starry Night’ 3D-CV Middleware (Unity Plugin) – Operating Systems: Android and Linux – 3D Sensor: Occipital Structure and Intel RealSense – Processors: ARM and Xilinx Zynq Our focus – Object recognition – Feature detection – Analysis (e.g., measurements)

Potential Applications 3D Printing Parts Inspection Robotics Entertainment Automotive Safety Security Medical Imaging

Challenges for Implementing Real-Time 3D Computer Vision – Busy uncontrolled real-world environment – Limited processing power and memory – Noisy and uncalibrated low-cost scanners – Difficult to use libraries – Hard to find proficient computer vision engineers – Lack of standards – Large development investment

Starry Night Unity Plugin (patent pending) Starry Night Video:

The ‘Starry Night’ Template-Based 3D Model Reconstruction Reliable - The output is always a fully-formed 3D model with known feature points despite noisy or partial scans Easy to use – Fully automated process Powerful – Known data structure for easy analysis and measurement Fast – Real-time modeling Input Scan (Partial) + Reference Model = Full 3D Model

3D Object Recognition Algorithm for mobile and embedded Devices

Challenges - Scene Busy scene, object orientation, and occlusion

Challenges - Platform Mobile and Embedded Devices – ARM – A9 or A15, <2G RAM – Existing libraries were built for laptop/desktop platform – GPU processing is not always available

Previous Approaches (2D) Texture-Based Methods – Color-based → depends heavily on lighting or color of the object – Machine learning → robust, but requires training for each object – Neither method provides transform (i.e., orientation) (3D) Methods – Hough transform and geometric hashing → slow – Geometric hashing → even slower – Tensor matching → not good for noisy and sparse scene – Correspondence-based methods using rigid geometric descriptors – The models must have distinctive feature points which is not true for most models (i.e., cylinder) Tried

General Concept for CV-Based Object Recognition Reference Object Descriptor Scene Compare Distance & Normal Distance & Normal of Random Sample Points Match Criteria Fine-Tune Orientation Location Transpose

Block Diagram

Model Descriptor (Pre-Processed) Sample all point pairs in the model that are separated by the same distance D Use the surface normal of the pair to group them into the hash tablet key (α1,β1,Ω1)P1, P2P3, P4 (α2,β2,Ω2)P5, P6P7, P8P9, P10P11, P12 (α3,β3,Ω3)P13, P14 Note: In the bear example, D = 5 cm which resulted in 1000 pairs Note: The keys are angles derived from the normal of the points. alpha(α) = first normal to second point beta(β) = second normal to first point omega(Ω) = angle of the plane between two points

Object Recognition Workflow Grab Scene Sample point pair w/ distance D using RANSAC Generate key using same hash function Use key to retrieve similarly oriented points in the model & rough transform Match criteria to find the best match Use ICP to refine transform Note: The example scene has around 16K points Note: We iterated this sampling process 100 times Note: Entire process can be easily parallelized Very Important: Multiple models can be found using a single hash table, for example, sampled point pair in the scene

Implementation Result Object Recognition Video:

Object Recognition Examples * CONFIDENTIAL *18

Adaptive 3D Object Recognition Algorithm Resize and Reshape

Object Recognition for Different Sizes & Shape Objects in the real world are not always identical Similarity Factor, S%, can be used to denote % of shape difference – This allows recognition of object that’s similar but does not have the exact shape as the reference model Size Factor, Z%, can be used to note the % size the object can recognize – This allows recognition of object that’s of different sizes from the reference model

General Approach Dynamically resizes the reference model Dynamically reshapes the reference model – Uses our ‘Shape-based Registration’ technique Hence, the reference model is ‘deformed’ to match the object in the scene Results in very robust object recognition The end reference model best represents the object in the scene both in size and shape

Block Diagram – Adaptive Object Recognition with feedback Reference model is iteratively modified with every new frame until it converges into the same object in the scene Note: Currently in the process of being implemented and will be available in Version 1.2 later this year

Object Recognition Performance Numbers

Reliability (w/ bear model) Reliability – % false positives – depends on the scene – Clean scene: <1% – Noisy scene: 5% (1 out of 20 frames) – % negative results (cannot find the object) – Clean scene: <1% – Noisy scene: 10% (also takes longer) Effect of orientation on success ratio – Model facing front: >99% – Model facing backwards: >99% – Model facing sideways (narrower): 85%

Performance - Mobile Performance on Cortex A-15 2GHz ARM (on Android mobile) – Amount of time it takes to find one object – Single thread: 2 seconds – Multi-thread & NEON: 0.3 second – Amount of time it takes to find two objects – Single thread: 2.5 seconds – Multi-thread & NEON: 0.5 second Note: Effective use of NEON led to significant performance gains of X2.5 for certain functions

Hardware Acceleration Using FPGA Xilinx Zynq SoC provides 20 to 1,000 parallel voxel processors depending on the size of the FPGA Zynq ARM FPGA Processor 1 Processor 20+ voxel scan

Hardware Acceleration: FPGA (Xilinx Zynq) Select Functions to Be Implemented in Zynq – FPGA: Matrix operations – Dual-core ARM: Data management + Floating point – Entire implementation done in C++ (Xilinx Vivado-HLS)

Performance: Embedded Using FPGA Note: Currently, only 30% of the computationally intensive functions are implemented on the FPGA with the rest still running on ARM A9. Speed will be much improved once the remaining high-intensity functions are transferred to the FPGA. Performance on Xilinx Zynq (Cortex A MHZ + FPGA) – Amount of time it takes to find one object – Zynq 7020: 0.7 second – Zynq 7045 (est.): 0.1 second – No test results for two objects, but should scale the same way as for the ARM

Future The chosen algorithm works well in most real-world conditions The chosen algorithm is tolerant to size and shape differences respect to the reference model The chosen algorithm can find multiple objects at the same time with minimal additional processing power Additional improvements in performance are needed – Algorithm – Application-specific parameters (e.g., size of the model descriptor) – ARM - NEON – Optimize the use of FPGA core

Summary Key implementation issues – Model descriptor – Data structure – Sampling technique – Platform IMPORTANT – Both ARM & FPGA provide the scalability Therefore – Real-time 3D object recognition was very difficult but successfully implemented on both mobile and embedded platforms! LIVE DEMO AT THE Xilinx BOOTH!

Resources Android 3D printing: “Challenges and Techniques in Using CPUs and GPUs for Embedded Vision” by Ken Lee, VanGogh Imaging— vision.com/platinum-members/vangogh-imaging/embedded-vision- training/videos/pages/september-2012-embedded-vision-summithttp:// vision.com/platinum-members/vangogh-imaging/embedded-vision- training/videos/pages/september-2012-embedded-vision-summit “Using FPGAs to Accelerate Embedded Vision Applications”, Kamalina Srikant, National Instruments— vision.com/platinum-members/national-instruments/embedded-vision- training/videos/pages/september-2012-embedded-vision-summithttp:// vision.com/platinum-members/national-instruments/embedded-vision- training/videos/pages/september-2012-embedded-vision-summit “Demonstration of Optical Flow algorithm on an FPGA”— vision-training/videos/pages/demonstration-optical-flow-algorithm-fpg vision-training/videos/pages/demonstration-optical-flow-algorithm-fpg * Reference: “An Efficient RANSAC for 3D Object Recognition in Noisy and Occluded Scenes” by Chavdar Papazov and Darius Burschka. Technische Universitat Munchen (TUM), Germany.