Presentation is loading. Please wait.

Presentation is loading. Please wait.

Visual Grouping and Recognition David Martin UC Berkeley David Martin UC Berkeley.

Similar presentations


Presentation on theme: "Visual Grouping and Recognition David Martin UC Berkeley David Martin UC Berkeley."— Presentation transcript:

1 Visual Grouping and Recognition David Martin UC Berkeley David Martin UC Berkeley

2 UCB Collaborators Prof. Jitendra Malik Prof. Dave Patterson Charless Fowlkes Doron Tal See http://www.cs.berkeley.edu/~dmartin/papers/iccv01.pdf Prof. Jitendra Malik Prof. Dave Patterson Charless Fowlkes Doron Tal See http://www.cs.berkeley.edu/~dmartin/papers/iccv01.pdf

3 From images to objects Labeled sets: tiger, grass etc

4 Recognition Object classes are hierarchical (e.g. leopard vs. clouded leopard vs. my pet clouded leopard Leopold) Must be tolerant to changes in pose and illumination

5 Framework for Recognition: Three stages Segmentation: Images  Regions Association: Regions  Super-regions Matching: Super-regions  Prototype views Segmentation: Images  Regions Association: Regions  Super-regions Matching: Super-regions  Prototype views

6 Segmentation: Images  Regions Segmentation: Images  Regions Slight over-segmentation of objects is OK Under-segmentation is BAD Slight over-segmentation of objects is OK Under-segmentation is BAD

7 Association: Regions  Super-Regions Simple enumeration of connected components Number of super-regions of size k in image with n regions is approximately (4**k)*n/k For typical images, number is 1K-10K Plausibility ordering could reduce effective number substantially Simple enumeration of connected components Number of super-regions of size k in image with n regions is approximately (4**k)*n/k For typical images, number is 1K-10K Plausibility ordering could reduce effective number substantially

8 Matching: Super-regions  Prototype Views Objects are represented by a set of prototypical views (~10 per object) –For each super-region S, calculate probability that it is an instance of view V –Determine most probable labeling of image into objects Tolerant to: –pose and illumination changes –intra-category variation –error in segmentation and association steps Objects are represented by a set of prototypical views (~10 per object) –For each super-region S, calculate probability that it is an instance of view V –Determine most probable labeling of image into objects Tolerant to: –pose and illumination changes –intra-category variation –error in segmentation and association steps

9 Focus on Segmentation Segmentation = Recognition (?!) Need quantitative measures: A benchmark! MNIST for handwritten digits SPEC for CPUs WinStone for PCs TPC[-C] for transaction processing DBs Segmentation / Recognition on ISTORE Segmentation = Recognition (?!) Need quantitative measures: A benchmark! MNIST for handwritten digits SPEC for CPUs WinStone for PCs TPC[-C] for transaction processing DBs Segmentation / Recognition on ISTORE

10 Step 1: Establish Ground-truth We want high-level “Gold Standard” segmentations. Range of granularities Segmentation tool in Java –Explicit partition of image into pixel sets –Ease of deployment 10 UCB grad students –60 images, 180 total segmentations –Up to 5 segmentations/image by different people –Goal: 1K images, 5K segmentations We want high-level “Gold Standard” segmentations. Range of granularities Segmentation tool in Java –Explicit partition of image into pixel sets –Ease of deployment 10 UCB grad students –60 images, 180 total segmentations –Up to 5 segmentations/image by different people –Goal: 1K images, 5K segmentations

11 Human Segmentations (1)

12 Human Segmentations (2)

13 Step 2: Similarity / Error Measures…

14 Segmentation Refinement A B CD

15 Local Refinement Error How much is segmentation S 1 a refinement of segmentation S 2 at pixel p i ? S1S1 S2S2 refinement of E(S 1,S 2,p i ) = |(R(S 1,p i )\R(S 2,p i )| |R(S 1,p i )|

16 Segmentation Error Measures Global Consistency Error (GCE): Refinement in same direction at all pixels: GCE = 1/n min {  i E(S 1,S 2,p i ),  i E(S 2,S 1,p i ) } Local Consistency Error (LCE): Refinement in either direction at each pixel: LCE = 1/n  i min { E(S 1,S 2,p i ), E(S 2,S 1,p i ) } Global Consistency Error (GCE): Refinement in same direction at all pixels: GCE = 1/n min {  i E(S 1,S 2,p i ),  i E(S 2,S 1,p i ) } Local Consistency Error (LCE): Refinement in either direction at each pixel: LCE = 1/n  i min { E(S 1,S 2,p i ), E(S 2,S 1,p i ) }

17 Measure Results GCE (human vs. human) SAME DIFFERENT LCE (human vs. human) GCE (NCuts vs. human) 0.11 0.39 0.07 0.30 0.28 0.38

18 NCuts Per-Image Error Blue: Human vs. Human Red: NCuts vs. Human

19 Future Work: Dataset Obtain 1000-5000 segmentations –Hire undergrads –Vision groups at other schools –More widespread deployment on web? Release dataset to community Informal public forum for segmentation algorithm comparison –e.g. MNIST Obtain 1000-5000 segmentations –Hire undergrads –Vision groups at other schools –More widespread deployment on web? Release dataset to community Informal public forum for segmentation algorithm comparison –e.g. MNIST

20 Future Work: Segmentation Algorithms Cue combination is the key –luminence, color, texture, motion, stereoscopic depth, familiar configuration Feedback between segmentation and matching Distributed computational framework for exploring these issues –Millennium or ISTORE Cue combination is the key –luminence, color, texture, motion, stereoscopic depth, familiar configuration Feedback between segmentation and matching Distributed computational framework for exploring these issues –Millennium or ISTORE

21 Visual Recognition on ISTORE… Sequential code: –Segmentation: 5 minutes / image –Association: Negligible –Matching : 0.5 sec / match Memory requirements are very low: –10K object categories * 10 views/category * 100 * 100 pixels/view * 1 byte/pixel = 1 GB. Computation on 10 4 node ISTORE –Segmentation 50% embarrassingly parallel (many convolutions) 50% sparse eigenvalue problem  Frame rate throughput, but not latency Sequential code: –Segmentation: 5 minutes / image –Association: Negligible –Matching : 0.5 sec / match Memory requirements are very low: –10K object categories * 10 views/category * 100 * 100 pixels/view * 1 byte/pixel = 1 GB. Computation on 10 4 node ISTORE –Segmentation 50% embarrassingly parallel (many convolutions) 50% sparse eigenvalue problem  Frame rate throughput, but not latency

22 Matching: Embarrassingly parallel –1K candidate super-regions –20K matches/sec at full resolution –Consider only 1% of matches at full resolution (10% pass color/texture filter, 10% of those pass low resolution shape filter) –If half time spent in pruning and half in full resolution matching, we get 10K matches/sec Worst case: 100 object categories Best case depends on how well one can exploit context, hierarchy and hashing. Humans can recognize 10K-100K objects Matching: Embarrassingly parallel –1K candidate super-regions –20K matches/sec at full resolution –Consider only 1% of matches at full resolution (10% pass color/texture filter, 10% of those pass low resolution shape filter) –If half time spent in pruning and half in full resolution matching, we get 10K matches/sec Worst case: 100 object categories Best case depends on how well one can exploit context, hierarchy and hashing. Humans can recognize 10K-100K objects

23 ISTORE Applications Summary Segmentation algorithm development –Much compute, little storage Real-time recognition of image/video stream content –Plus storage for subsequent retrieval Content-based indexing of all the images/video on the Internet! Segmentation algorithm development –Much compute, little storage Real-time recognition of image/video stream content –Plus storage for subsequent retrieval Content-based indexing of all the images/video on the Internet!


Download ppt "Visual Grouping and Recognition David Martin UC Berkeley David Martin UC Berkeley."

Similar presentations


Ads by Google