Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wonwoo Lee, Youngmin Park, Vincent Lepetit, Woontack Woo IEEE TRANSACTIONS ON CURCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 10, OCTOBER 2011.

Similar presentations


Presentation on theme: "Wonwoo Lee, Youngmin Park, Vincent Lepetit, Woontack Woo IEEE TRANSACTIONS ON CURCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 10, OCTOBER 2011."— Presentation transcript:

1 Wonwoo Lee, Youngmin Park, Vincent Lepetit, Woontack Woo IEEE TRANSACTIONS ON CURCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 10, OCTOBER 2011

2 Introduction Online Target Learning Detection and Tracking Experimental Results Conclusion

3 Objective : Augment a real-world scene with minimal user intervention on a mobile phone. Anywhere Augmentation Considerations: Avoid reconstruction of 3D scene Perspective patch recognition Mobile phone processing power Mobile phone accelerometers Mobile phone Bluetooth connectivity http://www.youtube.com/watch?v=Hg20kmM8R1A

4 The proposed method follows a standard procedure of target learning and detection. Input Image Online Learning Real-time Detection

5 The proposed method follows a standard procedure of target learning and detection Input Image Online Learning Real-time Detection

6 Input: Image of the target plane Output: Patch data and camera poses Assumptions Known camera parameters Horizontal or vertical surface

7 Input ImageFrontal View GenerationBlurred Patch GenerationPost-processingInput ImageFrontal View GenerationBlurred Patch GenerationPost-processing

8 We need a frontal view to create the patch data and their associated poses. Targets whose frontal views are available.

9 However, frontal views are not always available in the real world. Targets whose frontal views are NOT available.

10 Objective : Fronto-parallel view image from the input image. Approach : Exploit the phones built-in accelerometer. Assumption : Patch is on horizontal or vertical surface.

11 The orientation of a target (H / V) is recommended based on the current pose of the phone. Vertical π/4 -π/4 Parallel to Ground G (detected by acceleromaeter) Horizontal

12 Under the 1 degree of freedom assumption Frontal view camera: [I|0] Captured view camera: [R|c] T = -Rc Function to warp image to virtual frontal view. [12] [12] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. Cambridge, U.K.: Cambridge Univ. Press, 2000.

13

14 Input ImageFrontal View GenerationBlurred Patch GenerationPost-processing

15 Objective: Learn the appearances of a target surface fast. Approach : Adopt the approach of patch learning in Gepard [6] Real-time learning of a patch on the desktop computer. [6] S. Hinterstoisser, V. Lepetit, S. Benhimane, P. Fua, and N. Navab, Learning real-time perspective patch rectification, Int. J. Comput. Vis., vol. 91, pp. 107–130, Jan. 2011.

16 Fast patch learning by linearizing image warping with principal component analysis. Mean patch as a patch descriptor. Difficult to directly apply to mobile phone platform. Low performance of mobile phone CPU Large amount of pre-computed data is required (about 90MB)

17 Remove need for fronto-parallel view Using phones accelerometers and limiting to 2 planes Skip the Feature Point Detection step Instead use larger patches for robustness Replace how templates are constructed By blurring instead Added Bluetooth sharing of AR configuration

18 Approach: Use blurred patch instead of mean patch

19 Generate blurred patches through multi-pass rendering in a GPU. Faster image processing through a GPUs parallelism.

20 1st Pass: Warping Render the input patch from a certain viewpoint Much faster than on CPU

21 2nd Pass: Radial blurring to the warped patch Allow the blurred patch covers a range of poses close to the exact pose

22 3rd Pass: Gaussian blurring to the radial- blurred patch Make the blurred patch robust to image noise

23 Fig. 7. Effectiveness of radial blur. Combining the radial blur and the Gaussian blur outperforms simple Gaussian blurring.

24 4th Pass: Accumulation of blurred patches in a texture unit. Reduce the number of readback from GPU memory to CPU memory

25 Input ImageFrontal View GenerationBlurred Patch GenerationPost-processing

26 Downsampling blurred patches (128x128) to (32x32) Normalization Zero mean and Standard Deviation of 1

27 User points the target through the camera. Square patch at the center of the image is used for detection.

28 Initial pose is retrieved by comparing the input patch with the learned mean patches. ESM-Blur [20] is applied for further pose refinement. NEON instructions are used for faster pose refinement. [20] Y. Park, V. Lepetit, and W. Woo, ESM-blur: Handling and rendering blur in 3D tracking and augmentation, in Proc. Int. Symp. Mixed Augment. Reality, 2009, pp. 163–166. in Proc. Int. Symp. Mixed Augment. Reality, 2009, pp. 163–166.

29 Patch size: 128 x 128 Number of views used for learning: 225 Maximum radial blur range: 10 degrees Gaussian blur kernel: 11x11 Memory requirement: 900 KB for a target

30

31

32

33

34 iPhone3GS / 4PC CPU600MHz / 1GHzIntel QuadCore 2.4 GHz GPUPowerVR SGX 535GeForce 8800 GTX RendererOpenGL ES 2.0OpenGL 2.0 Video480x360640x480

35 More views, more rendering. Slow radial blur due on the mobile phone. Possible speed improvement through shader optimization. PC iPhone 3GSiPhone 4

36 Comparison with Gepard[6] [6] S. Hinterstoisser, V. Lepetit, S. Benhimane, P. Fua, and N. Navab,Learning real-time perspective patch rectification, Int. J. Comput. Vis.,vol. 91, pp. 107–130, Jan. 2011. Fig. 11. Planar targets used for evaluation. (a) Sign-1. (b) Sign-2. (c) Car. (d) Wall. (e) City. (f) Cafe. (g) Book. (h) Grass. (i) MacMini. (j) Board. The patches delimited by the yellow squares are used as a reference patch.

37 Our approach performs slightly worse in terms of recognition rates, but it is better adapted to mobile phones.

38 The mean patches comparison takes about 3ms with 225 views. The speed of pose estimation and tracking with ESM-Blur depend on the accuracy of the initial pose provided by patch detection.

39 Weak to repetitive textures and reflective surfaces. Currently single target only.

40 Potential applications AR tagging on the real world AR apps anywhere anytime Future work More optimization on mobile phones Detection of multiple targets at the same time

41 http://www.youtube.com/watch?v=DLegclJVa0E

42


Download ppt "Wonwoo Lee, Youngmin Park, Vincent Lepetit, Woontack Woo IEEE TRANSACTIONS ON CURCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 10, OCTOBER 2011."

Similar presentations


Ads by Google