Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Long March ( 長征 ) to 3D Video Leonardo Chiariglione Speech at 3D Systems and Applications Seoul – 2014/05/28.

Similar presentations

Presentation on theme: "The Long March ( 長征 ) to 3D Video Leonardo Chiariglione Speech at 3D Systems and Applications Seoul – 2014/05/28."— Presentation transcript:

1 The Long March ( 長征 ) to 3D Video Leonardo Chiariglione Speech at 3D Systems and Applications Seoul – 2014/05/28

2 It has already been not a short march Analogue Printing Photography Telegraphy Telephony Audio recording Radio Television Video recording Digital Video conference Video telephony Video interactive Television (3D TV) 2

3 The dimensions of future media Time/space resolution Screen content Colour Brightness Scalability 3D Video 3D Audio Metadata File format Sensors/actuators Human interaction Fusion of real & virtual Detection/analysis Linking Energy saving User profile 3

4 There has been progress in resolution QSIF SIF Standard Definition (interlace) High Definition (Interlaced/progressive) 4k (Progressive) 8k (Progressive) 4

5 The cost of being digital “VHS”SDHD4k8k #lines288576108021604320 #pixels360720192038407680 Frame freq.25 50 Mbit/s41166829663626542 5 SpeechCDStereo5.122.2 Sampling freq.844.148 bits/sample816 #channels1225.3322.66 Mbit/s0.0641.4111.5364.09317.403 Video Audio

6 Compression is making progress affordable 6 BaseScalableStereoDepthSelectable viewpoint yr MP1~VHS---92 MP22Mbit/s-10%-15% - -94 MP4-V-25%-10%-15% - -98 AVC-30%-25% -20%5/10%03 HEVC-60%-25% -20%5/10%13 ???????

7 Are there limits to compression? Input bandwidth to humans Eyes: 2 channels of 430–790 THz Ears: 2 channels of 20 Hz – 20 kHz A nerve fiber connecting senses to the brain can transmit a new impulse every ~6ms = 167 spikes/s (1 bit ~16 spikes) Eye 1.2 M fibers transmit 10 bit/s each An eye sends ~12 Mbit/s to brain Ear 30 k fibers in the cochlear nerve An ears sends ~300 kbit/s to brain 7

8 Sensors-to-brain bitrates 8 0.020- 20kHz 30k nerve fibers 1.2M nerve fibers ~0.3Mbit/s 430 – 790 THz ~12Mbit/s

9 High Dynamic Range and Wider Color Gamut Higher Dynamic Range and Wider Color Gamut can give users a better sense of “being there”, with a viewing experience closer to real life experience Light bulb > 10,000 nits Surface lit in the sunlight > 100,000 nits Night sky < 0.005 nits Question: if dynamic ranges and volumes of the color gamut increase significantly, are existing MPEG video coding standards able to efficiently support future needs? 9

10 Wider Color Gamut 10 ITU-R BT.709 ITU-R BT.2020

11 Dynamic Range- Examples Bright areas can have > 10,000 Cd/m 2 luminance Dark areas can have < 0.01 Cd/m 2 luminance

12 Screen Content applications Wireless display Companion screen Control rooms with high resolution display wall Digital operating room (DiOR) Virtual desktop infrastructure (VDI) Screen/desktop sharing and collaboration Cloud computing and gaming Factory automation display Supervisory control and data acquisition (SCADA) display Automotive/navigation display PC over IP (PCoIP) Ultra-thin client Remote sensing 12

13 Use case #1: Hi-res display wall

14 Use case #2: collaboration

15 Use case 3: DiOR

16 Where we are Janury 2014: Joint Call for Proposals for Coding of Screen Content April 2014: Proposals evaluation Conclusion: evidence that significantly improved coding efficiency can be obtained by exploiting screen content characteristics with novel dedicated coding tools April 2014: Standardization plan and tentative time line First Test Model: Apr. 2014 PDAM: Oct. 2014 DAM: Feb. 2015 FDAM: Oct. 2015

17 Test sequence #1 (text and graphics with motion)

18 Test sequence #2 (text and graphics with motion)

19 Test sequence #3 (mixed content)

20 Test sequence #4 (animation)

21 MPEG standards for coding multiple cameras A long history, starting from MPEG-2 (mid 1990s) MPEG standards (existing and under development) Multiview coding – can only display views captured at the source Depth-based coding – can also display limited number of additional views Camera arrangement: cameras are assumed to be linearly arranged 21

22 Free viewpoint television (FTV)/1 Free viewpoint television (FTV): a hypothetical 3D transmission system that enables a viewer to select arbitrarys viewpoints, inside and outside a scene FTV requires many technologies, not just from MPEG A 3D video format supporting the generation of views not already included in the bitstream generated by the encoder would be a major enabler for FTV. Purpose of MPEG FTV exploration: to develop the know-how to enable MPEG to develop the said 3D video format 22

23 Free viewpoint television (FTV)/2 Areas considered in the MPEG FTV exploration Compare and evaluate the depth quality attainable for general camera arrangements Evaluate view synthesis algorithms and improve their performance To investigate the coding efficiency of the most promising coding technologies currently available To investigate the influence of mis-registration on the View Synthesis performance To investigate the representation capability of BIFS to clarify the elements that need to be standardized 23

24 FTV Seminar A Viewing Revolution in the Making Date: 2014 July 8 T14:00-18:00 Venue: Main Hall B, Sapporo Convention Center Sapporo, Japan Exhibition of FTV demos Room 101, 10:00-17:00, July 1 to 4. 24

25 3D Audio – NHK Loudspeaker Array Frame 25

26 Parallel worlds For centuries humans have been building two different types of worlds 26 Physical Informational Books Music Films Knowledge

27 Immersion A definition of immersion: a state in which connections of a human with Physical world are severed Informational world are activated 27

28 How far is immersion progressing? Fairly……or too far? 28

29 Can we reconnect the two worlds? Smartphones Enable universal access to the Informational world while sensing also the Physical world Enhance history and meaning of the real world with powerful digital elements Let’s create two-way bridges Extend reality to virtual Add reality to virtual 29 Physical & Informational

30 Functions of an Augmented Reality browser Retrieve scenario from the internet Start video acquisition and track objects Recognise objects and recover camera pose Get streamed 3D graphics and compose new scenes Get input from various sensors Access interaction possibilities and objects from a remote server Adapt to offer optimal AR experience 30

31 The AR technology chain 31 ARAF Browser Media Servers Service Servers User Local Sensors & Actuators Remote Sensors & Actuators MPEG ARAF Local Real World Environment Local Real World Environment Remote Real World Environment Remote Real World Environment Authoring Tools ARAF Augmented Reality Application Format

32 Augmented Reality Application Format A set of MPEG-4 scene graph nodes Audio, image, video, graphics, programming, communication, user interactivity, animation Map, MapMarker, Overlay, ReferenceSignal, ReferenceSignalLocation, CameraCalibration, AugmentedRegion Connection to sensors defined in MPEG-V Orientation, Position, Angular Velocity, Acceleration, GPS, Geomagnetic, Altitude, Local camera(s) Compressed media Image, (3D) sound, (3D) video, 2D/3D graphics 32 ARAF

33 The whole used to be the message 33 Classic Books: the value is in the content as a whole

34 Today the link adds value to the message 34 On-line knowledge: the value is in the link

35 The video used to be the message 35 Classic video content: the value is in the content as a whole

36 Next the link will add value to the video message 36 New video content: the value is in the link From EU FP7 BRIDGET project

37 An unequal fight Many new services – all more demanding in bandwidth Compression improves, but cannot cope with all the demands just by itself UHD is 4 times the uncompressed bitrate of HD, but HEVC “only” compresses two times AVC) And we have HDR, WCG, SCC, FTV… At prime time 30% of USA internet is taken by Netflix traffic We need more tools to solve the problem 37

38 The mobile industry perspective 38 10 x more spectrum 10 x better spectrum utilisation 10 x more base stations 1000 x more capacity XX=

39 Making the network smarter Video has lion’s share of internet traffic – more so as we add more dimensions to the user experience We need to cope with (human-vehicle) mobility More and more of human life happens on the move We need new smarter approaches instead of just throwing more network capacity, beyond Digital video recording (on premises or networked) Peer-to-Peer (P2P) Overlays Content Distribution Networks (CDNs) 39

40 Video and Information Centric Networking 40 Information Centric Network IP Network Same content available at different network locations Migration path from today’s IP infrastructure to pub/sub support for ICN Client - content - network mobility under energy consumption constraints From FP7/NICT EU-JAPAN GreenICN project

41 Media Pre-processor Media Encoder Media Decoder Presentation Subsystem Green Meta- data Generator Green Meta- data Generator Power optimization module Green Meta- data Extractor Power control Power control Power control Power control Green Metadata Green Metadata Media Encoded Media Encoded Media Green Feedback Green Feedback Green MPEG

42 42

Download ppt "The Long March ( 長征 ) to 3D Video Leonardo Chiariglione Speech at 3D Systems and Applications Seoul – 2014/05/28."

Similar presentations

Ads by Google