Presentation on theme: "2D & 3D VIDEO PROCESSING FOR IMMERSIVE APPLICATIONS"— Presentation transcript:
12D & 3D VIDEO PROCESSING FOR IMMERSIVE APPLICATIONS Emerging Convergence of Video, Vision & GraphicsHarpreet S. SawhneyRakesh Kumar
2ACKNOWLEDGEMENTS Collaborative Work with: Hai Tao Yanlin Guo Steve Hsu Supun SamarasekeraKeith HannaAydin ArpaRick Wildes
3TECHNICAL SUCCESS OF CONVERGENCE TECHNOLOGIES PC based near real-time mosaicingImage based modeling for EntertainmentAutomated Video Enhancement: VHS-to-DVDReal-time Video InsertionIris recognition, active vision
4Immersive and Interactive Telepresence Modes of Operation Observation ModeConversation ModeInteraction ModeUser observes a remote sitefrom any perspective.User “walks” through site to view activities of interest“up close”.Example: security, facility guards, sports & entertainmentUsers talk and observe oneanother as if in the same room.Users walk around yet maintaineye contact.Example: immersive tele-conferencingRemote users share a commonwork space.Users observe each other’s handsas they manipulate shared objects,such as war room wall displays.Example: mission planning,remote surgery
5Quality of Service for Tele-presence Critical Issues High quality for immersive experienceArtifact free recovery of 3D shape from video streamsEfficient 3D video representation and compressionHigh quality rendering of new views using 3D shape and video streamsBandwidth available in the Next Generation InternetLow latency for interactive applicationsReal time 3D geometry recovery at the content server endReal time new view rendering at the browser client endAdaptive Stream management to handle user requests and network loadsError resilience and concealment to fill in missing packets
6Convergence Technologies … for immersive & interactive visual applications ... Vision algorithms: High-quality 3D shape recoveryand dynamic scene analysisASICs, high performance hardware: Real-time video processingCompact, low-cost cameras: CMOS camerasLow latency and high quality compression: Error resilienceReal time view synthesis : Standard platforms, e.g. PCsImmersive Displays
7Vision algorithm performance over time ImmersiveTelepresenceHigh Quality 3d shape extraction2000Geo-registration visual databasesVideo registration to 3D site models1998Algorithm ComplexityFace Finding for Iris RecognitionCoarse 3D Depth Recovery1995Real-time insertion inLive TV2D Video Insertion1993Mosaicing for entertainment & surveillance2D Stabilization1990Time
8HW Performance/Size/Cost over time ACADIA ASIC2000VFE-1001992VFE-2001997Sarnoff ACADIA ASIC performance100 MHz system clock, processes 100 million pixels/sec in each processing element10 billion operations / sec total IC performance800 MB/sec SDRAM interface using 64-bit busEnables building smart 3D cameras for immersive applications.
9Application Performance Parametric Motion : Stabilization & Mosaicing720x Hz OR 720x HzPyramid based Fusion : Dynamic Range, Focus EnhancementStereo Depth Extraction720x240 field 32 disparity levels in 4 ms (250 Hz)720x240 field 60 disparity levels in 10 ms (100 Hz)60 disparities on 1k x 1k images at 55 ms (18 Hz)
10Low Latency MPEG2 multiplexing service Sarnoff Compression Technology … Required algorithm components for tele-presence are emerging ...MPEG4, Progressive EncodingE-vue1999Low Latency MPEG2 multiplexing serviceICTVJust Noticeable Difference (JND):MPEG2 Encoding and QualityMeasurementTektronixAlgorithm ComplexityVideoPhone: H.263LG ElectronicsMPEG2: Encoding and TransmissionDIREC-TV & HDTVPyramid & Wavelet based EncodingStill Image CompressionTime
11A FRAMEWORK FOR VIDEO PROCESSING ALIGN2D & 3D MODELS OF MOTION & STRUCTUREMODEL-BASED IMAGE SEQUENCE ALIGNMENTTESTWARP/RENDER WITH 2D/3D MODELSTEST ALIGNMENT QUALITYSYNTHESIZECREATE OUTPUT REPRESENTATIONS
12Highlights of Sarnoff’s Video Analysis Technologies … framework applied to a create immersive representations ...2D Immersive& Layered RepresentationsModel-centricVideo VisualizationDynamic model & videovisualizationGeo-registration with referenceimage databaseSpherical MosaicsDynamic & Synopsis MosaicsCore Vision Algorithmsfor (Real-time)Motion & 3D Video AnalysisStereo & Video SequenceEnhancementMulti-camera ImmersiveDynamic RenderingHi-Q IBR based mixed resolution synthesisVideo Quality Enhancement for efficient compressionHi-Q Depth extractionImage-based rendering with dynamicdepth
13TOPOLOGY INFERENCE & LOCAL-TO-GLOBAL ALIGNMENT SPHERICAL MOSAICS[Sawhney,Hsu,Kumar ECCV98, Szeliski,Shum SIGGRAPH98]Sarnoff Library VideoCaptures almost the complete spherewith 380 frames
19ALIGNMENT & SYNTHESIS FOR HI-RES STEREO SYNTHESIS A HIGH END APPLICATION OF IBMR[Sawhney,Guo,Hanna,Kumar,Zhou,Adkins SIGGRAPH2001]Low-Res LeftSynthesized High-Res LeftOriginal High-Res Right
20THE PROBLEM SCENARIO INPUT OUTPUT Left Eye Right Eye (Typically 1.5K)
213D & Motion Alignment Based Stereo Sequence Processing wt-2owlwoot-1t-1flfwls t e r e ooflttfs t e r e offlt+1t+1olwffloot+2t+2lwwoLeftt+3RightwLeftRightHighlights :Scintillation effect is reduced.Occlusion regions are better handled.
23IMPLICATIONS FOR IMMERSIVE IBMR CAMERA CONFIGURATIONS Lo-res cameraHi-res cameraMulti-resolution camera configuration allows 3D capture at the highest resolutionas well as user-controlled large range of zooms without the need forzoom control on the cameras.
24Model-Centric Video Visualization OR Video-Centric Model Visualization [Hsu,Supun,Kumar,Sawhney CVPR00]Original VideoSite modelGeo-registration of video to site modelRe-projection of video after merging with model.
25Video to Site Model Alignment Model to frame alignmentREFINECorrespondence-lessexterior orientationfrom 3D-2D line pairs
26Oriented Energy Pyramid Goal: representation which indicates edge strength in the image at various orientations and scalesOrientation selectivity: reduce false matchesCoarse-to-fine: increase capture range0°45°90°135°
27Pose Refinement Algorithm …iterative coarse to fine adjustment of pose ... This will be an animation ofthe gradual improvement of alignmentduring the coarse to fineiterationsregsite_animation.avi
28Geo-Registration Video to Reference Database Alignment [Wildes et al Geo-Registration Video to Reference Database Alignment [Wildes et al. ICCV01]Current Video3D Reference Imagery
30Dynamic 3D Capture & Rendering …global modeling is not feasible... Recovering depth from local viewsDepth refinement across multiple local viewsNew view synthesis using multiple local viewsCross view depth checking
313D Shape/Depth Estimation from Multiple Views of a Scene Stereo PairEstimation of high quality, artifact free depth maps co-registered with video imagery for rendering new views.Must work both outdoors and indoors