Download presentation
Presentation is loading. Please wait.
1
Visual Grouping and Recognition David Martin UC Berkeley David Martin UC Berkeley
2
UCB Collaborators Prof. Jitendra Malik Prof. Dave Patterson Charless Fowlkes Doron Tal See http://www.cs.berkeley.edu/~dmartin/papers/iccv01.pdf Prof. Jitendra Malik Prof. Dave Patterson Charless Fowlkes Doron Tal See http://www.cs.berkeley.edu/~dmartin/papers/iccv01.pdf
3
From images to objects Labeled sets: tiger, grass etc
4
Recognition Object classes are hierarchical (e.g. leopard vs. clouded leopard vs. my pet clouded leopard Leopold) Must be tolerant to changes in pose and illumination
5
Framework for Recognition: Three stages Segmentation: Images Regions Association: Regions Super-regions Matching: Super-regions Prototype views Segmentation: Images Regions Association: Regions Super-regions Matching: Super-regions Prototype views
6
Segmentation: Images Regions Segmentation: Images Regions Slight over-segmentation of objects is OK Under-segmentation is BAD Slight over-segmentation of objects is OK Under-segmentation is BAD
7
Association: Regions Super-Regions Simple enumeration of connected components Number of super-regions of size k in image with n regions is approximately (4**k)*n/k For typical images, number is 1K-10K Plausibility ordering could reduce effective number substantially Simple enumeration of connected components Number of super-regions of size k in image with n regions is approximately (4**k)*n/k For typical images, number is 1K-10K Plausibility ordering could reduce effective number substantially
8
Matching: Super-regions Prototype Views Objects are represented by a set of prototypical views (~10 per object) –For each super-region S, calculate probability that it is an instance of view V –Determine most probable labeling of image into objects Tolerant to: –pose and illumination changes –intra-category variation –error in segmentation and association steps Objects are represented by a set of prototypical views (~10 per object) –For each super-region S, calculate probability that it is an instance of view V –Determine most probable labeling of image into objects Tolerant to: –pose and illumination changes –intra-category variation –error in segmentation and association steps
9
Focus on Segmentation Segmentation = Recognition (?!) Need quantitative measures: A benchmark! MNIST for handwritten digits SPEC for CPUs WinStone for PCs TPC[-C] for transaction processing DBs Segmentation / Recognition on ISTORE Segmentation = Recognition (?!) Need quantitative measures: A benchmark! MNIST for handwritten digits SPEC for CPUs WinStone for PCs TPC[-C] for transaction processing DBs Segmentation / Recognition on ISTORE
10
Step 1: Establish Ground-truth We want high-level “Gold Standard” segmentations. Range of granularities Segmentation tool in Java –Explicit partition of image into pixel sets –Ease of deployment 10 UCB grad students –60 images, 180 total segmentations –Up to 5 segmentations/image by different people –Goal: 1K images, 5K segmentations We want high-level “Gold Standard” segmentations. Range of granularities Segmentation tool in Java –Explicit partition of image into pixel sets –Ease of deployment 10 UCB grad students –60 images, 180 total segmentations –Up to 5 segmentations/image by different people –Goal: 1K images, 5K segmentations
11
Human Segmentations (1)
12
Human Segmentations (2)
13
Step 2: Similarity / Error Measures…
14
Segmentation Refinement A B CD
15
Local Refinement Error How much is segmentation S 1 a refinement of segmentation S 2 at pixel p i ? S1S1 S2S2 refinement of E(S 1,S 2,p i ) = |(R(S 1,p i )\R(S 2,p i )| |R(S 1,p i )|
16
Segmentation Error Measures Global Consistency Error (GCE): Refinement in same direction at all pixels: GCE = 1/n min { i E(S 1,S 2,p i ), i E(S 2,S 1,p i ) } Local Consistency Error (LCE): Refinement in either direction at each pixel: LCE = 1/n i min { E(S 1,S 2,p i ), E(S 2,S 1,p i ) } Global Consistency Error (GCE): Refinement in same direction at all pixels: GCE = 1/n min { i E(S 1,S 2,p i ), i E(S 2,S 1,p i ) } Local Consistency Error (LCE): Refinement in either direction at each pixel: LCE = 1/n i min { E(S 1,S 2,p i ), E(S 2,S 1,p i ) }
17
Measure Results GCE (human vs. human) SAME DIFFERENT LCE (human vs. human) GCE (NCuts vs. human) 0.11 0.39 0.07 0.30 0.28 0.38
18
NCuts Per-Image Error Blue: Human vs. Human Red: NCuts vs. Human
19
Future Work: Dataset Obtain 1000-5000 segmentations –Hire undergrads –Vision groups at other schools –More widespread deployment on web? Release dataset to community Informal public forum for segmentation algorithm comparison –e.g. MNIST Obtain 1000-5000 segmentations –Hire undergrads –Vision groups at other schools –More widespread deployment on web? Release dataset to community Informal public forum for segmentation algorithm comparison –e.g. MNIST
20
Future Work: Segmentation Algorithms Cue combination is the key –luminence, color, texture, motion, stereoscopic depth, familiar configuration Feedback between segmentation and matching Distributed computational framework for exploring these issues –Millennium or ISTORE Cue combination is the key –luminence, color, texture, motion, stereoscopic depth, familiar configuration Feedback between segmentation and matching Distributed computational framework for exploring these issues –Millennium or ISTORE
21
Visual Recognition on ISTORE… Sequential code: –Segmentation: 5 minutes / image –Association: Negligible –Matching : 0.5 sec / match Memory requirements are very low: –10K object categories * 10 views/category * 100 * 100 pixels/view * 1 byte/pixel = 1 GB. Computation on 10 4 node ISTORE –Segmentation 50% embarrassingly parallel (many convolutions) 50% sparse eigenvalue problem Frame rate throughput, but not latency Sequential code: –Segmentation: 5 minutes / image –Association: Negligible –Matching : 0.5 sec / match Memory requirements are very low: –10K object categories * 10 views/category * 100 * 100 pixels/view * 1 byte/pixel = 1 GB. Computation on 10 4 node ISTORE –Segmentation 50% embarrassingly parallel (many convolutions) 50% sparse eigenvalue problem Frame rate throughput, but not latency
22
Matching: Embarrassingly parallel –1K candidate super-regions –20K matches/sec at full resolution –Consider only 1% of matches at full resolution (10% pass color/texture filter, 10% of those pass low resolution shape filter) –If half time spent in pruning and half in full resolution matching, we get 10K matches/sec Worst case: 100 object categories Best case depends on how well one can exploit context, hierarchy and hashing. Humans can recognize 10K-100K objects Matching: Embarrassingly parallel –1K candidate super-regions –20K matches/sec at full resolution –Consider only 1% of matches at full resolution (10% pass color/texture filter, 10% of those pass low resolution shape filter) –If half time spent in pruning and half in full resolution matching, we get 10K matches/sec Worst case: 100 object categories Best case depends on how well one can exploit context, hierarchy and hashing. Humans can recognize 10K-100K objects
23
ISTORE Applications Summary Segmentation algorithm development –Much compute, little storage Real-time recognition of image/video stream content –Plus storage for subsequent retrieval Content-based indexing of all the images/video on the Internet! Segmentation algorithm development –Much compute, little storage Real-time recognition of image/video stream content –Plus storage for subsequent retrieval Content-based indexing of all the images/video on the Internet!
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.