Emerging Technologies in Multimedia Communications 電資學院院長 杭學鳴 Dean of EECS College : Hsueh-Ming Hang 台北科技大學 Taipei Univ. of Technology.

1 Emerging Technologies in Multimedia Communications 電資學院院長 杭學鳴 Dean of EECS College : Hsueh-Ming Hang 台北科技大學 Taipei Univ. of Technology

2 Contents Audio and video standards Video standards evolution Emerging techniques in video coding Audio standards evolution Feb 20092hmhang/EECS, NTUT

3 Video Coding Standards Standards Typical rates Applications ITU-T (CCITT) H.261 128 384k bits/s Videophone over ISDN ISO MPEG-1 (11172-2) 1.2 Mbits/s Video CD ISO MPEG-2 (13818-2) 4–10 Mbits Digital TV/HDTV (ITU-T H.262) 20 Mbits/s Over air/networks ITU-T H.263 < 64k bits/s Videophone ISO MPEG-4 (14496-2) Low/high-rates Object-oriented ISO MPEG-7 (15938) Database Content description ITU-T H.263 v2< 64k bits/s PSTN/wireless Videophone ITU-T H.264 (JVT,AVC) < 40k bits/s Net/wireless Videophone ITU-T H.264 ext (SVC) Multi-layer Net/wireless streaming ISDN: Integrated Services Digital Network Feb 20093hmhang/EECS, NTUT

4 MPEG Audio Standards MPEG-1 Layer 1: 1992(good: 256k /2ch)1-2 chs MPEG-1 Layer 2: 1992(good: 192k /2ch)1-2 chs MPEG-1 Layer 3: 1993 (MP3)(good: 128k /2ch)1-2 chs MPEG-2 Layers 1,2,3: 19941-5.1 chs MPEG-2 AAC: 1997 (Advanced Video Coding) (good: 96k /2ch)1-96 chs MPEG-4 (v1) subpart 3 General Audio Coding, AAC: 1999 (new tools: PNS, LTP, TwinVQ) 1-96 chs MPEG-4 Amd 1: (2003) Bandwidth extension (SBR -- Spectral Band Replication) HE-AAC, AAC+ (good: 48k) MPEG-4 Amd 2: (2004) Parametric Audio extension  MPEG surround (MPEG-D 2006) (good: 24k?) Feb 20094hmhang/EECS, NTUT

5 Image/Video Standards ISO/IEC JTC1 SC29 – ISO and IEC Joint Technical Committee (on Information Technology) Subcommittee 29 (Coding of audio, picture, multimedia and hypermedia) – Working Group (WG) 1: JBIG (Joint Bi-level Image Group) – 1-bit to 4/5-bit still pictures JPEG (Joint Photographic Experts Group) – 8-bit or more still pictures ISO/IEC JTC1 SC29 – WG 11: MPEG (Moving Picture Experts Group) – Motion pictures – WG 12: MHEG (Multimedia-Hypermedia Experts Group) – Multi/Hyper-media exchange format Feb 20095hmhang/EECS, NTUT

6 Standards Organizations CCITT – Comité Consultaitif International Télégraphique et Téléphonique (International Telegraph and Telephone Consultative Committee) ITU – International Telecommunication Union ISO – International Standardization Organization IEC – International Electrotechnical Commission Feb 20096hmhang/EECS, NTUT

7 MPEG Committee Convener: Leonardo Chiariglione Standards: -- MPEG-1: done -- MPEG-2: done -- MPEG-4: done?! -- MPEG-7: done?! -- MPEG-21: done? -- MPEG A,B,C,D,E: on-going Feb 20097hmhang/EECS, NTUT

8 MPEG Chair Dr. Chiariglione at NCTU (2003.12) Feb 20098hmhang/EECS, NTUT

9 MPEG-A,B,C MPEG-A (ISO/IEC 23000) Multimedia Application Formats Part 1 Purpose for Multimedia Application formats Part 2 Music Player Application Format Part 3 Photo Player Application Format … Part 12 MPEG-B (ISO/IEC 23001) MPEG Systems Technologies Part 1 Binary MPEG format for XML … Part 5 MPEG-C (ISO/IEC 23002) MPEG Video Technologies Part 1 Accuracy specification for implementation of integer-output IDCT Part 2 Fixed point implementation of DCT/IDCT Part 3 Auxiliary Video Data Representation Part 4 Video Tool Library Feb 20099hmhang/EECS, NTUT

10 MPEG-D,E MPEG-D (ISO/IEC 23003) MPEG Audio Technologies Part 1 MPEG Surround Part 2 Spatial Audio Object Coding Part 3 Unified Speech and Audio Coding MPEG-E (ISO/IEC 23004) MPEG Multimedia Middleware Part 1 Architecture Part 2 Multimedia API Part 3 Component Model Part 4 Resource and Quality Management Part 5 Component Download Part 6 Fault Management Part 7 System Integrity Management Part 8 Reference Software and Conformance Feb 200910hmhang/EECS, NTUT

11 Feb 2009hmhang/EECS, NTUT11 How I Got Involved? 1984: Joined AT&T Bell Labs – Visual Comm. Dept.  H.261 video standard started 1988.1: MPEG started 1991.12: I joined NCTU  discontinued standard activities 1999.9: NCTU formed a small group to participate in the MPEG activities

12 NCTU MPEG Activity Tihao Chiang ( 蔣迪豪 ), C.J. Tsai ( 蔡淳仁 ), Wen Peng ( 彭文孝 ) and H.-M. Hang ( 杭學鳴 ) attend MPEG meetings constantly Tihao Chiang : Co-editor, MPEG-4 Part 7 Optimised Reference Software (Done) C.J. Tsai : Co-editor, MPEG-21 Part 12 Multimedia Test Bed for Resource Delivery (Done) 107 contributions (input and output documents) in the past 5 years (2002 -- 2007). [Dr. Y.-S. Tung, NTU] Example: Call for Proposal on Scalable Video Coding (Feb. 2004) – 2 out of 14 proposals Feb 200912hmhang/EECS, NTUT

13 Image & Video Compression: JPEG  AVC (H.264) Feb 200913hmhang/EECS, NTUT

14 Progress of Image/Video Coding H.261 (CCITT/ITU;1984, 88, 90) – video (videoconf.)  JPEG (1986, 89, 92) – image (Digital Camera)  MPEG-1 (1988 – 92) – video (VCD)  MPEG-2 (1990 – 94) – video (DVD, DTV)  MPEG-4 part 2 (1992 – 99) – video (Internet, WL)  H.263 (1993 – 95; ver.3: 2000) – video (WL)  JPEG2000 (1996 – 2001) – image  H.264 (MPEG-4 part 10) AVC (1998 – 03) – video (WL, HD-DVD)  AVC Amd.1 (2003 – 2008) – scalable video coding Feb 200914hmhang/EECS, NTUT

15 Scalable Bitstream Progressive approximation GOP HeaderMotion Info.Image Data 300kbps PSNR=32.2 dB 500kbps PSNR=34.6 dB 1000kbps PSNR=38.2 dB Feb 200915hmhang/EECS, NTUT

16 Spatial/SNR Scalability 176x144, 256Kbs352x288, 750Kbs 704x576, 1.5Mbs704x576, 6Mbs Feb 200916hmhang/EECS, NTUT

17 Scalable Video Coding Why scalable video coding? Why scalable video coding?  Reliably deliver video to diverse clients over heterogeneous networks using available system resources Types of Scalability Types of Scalability  SNR scalability (quality)  Spatial scalability (frame resolution)  Temporal scalability (frame rate)  Combined scalability 17 Feb 200917hmhang/EECS, NTUT

18 Feb 2009hmhang/EECS, NTUT18 MPEG SVC Activity 2003.10 Call-for-Proposal (CfP) 2004.2 Proposals received (14 submitted) (M10737) (NCTU submitted two proposals) 2004.3 Evaluations: two categories (M10480)  Category 1: MCTF+Wavelet (10)  Category 2: AVC based (incl. AVC/MCTF) (4) 2004.3/7/10 Proposals and Refinements evaluated 2005.1 AVC became Amd 1 of MPEG-4 Part 10 Standard in 2008

19 Feb 2009hmhang/EECS, NTUT19 H.264/SVC Encoder

20 Feb 2009hmhang/EECS, NTUT20 Hierarchical B Pictures Lower temporal layers are generated first  Use reconstructed frames for prediction I 0 /P 0 B 1 B 2 B 3 I 0 0 I 0 0 B 3 B 3 B 3 B 3 B 3 B 3 B 3 B 2 B 2 B 2 B 1 012218167915351113610144 display order group of pictures (GOP) I 0 /P 0 B 1 B 2 B 3 I 0 0 I 0 0 B 3 B 3 B 3 B 3 B 3 B 3 B 3 B 2 B 2 B 2 B 1 012218167915351113610144 display order group of pictures (GOP)

21 Feb 2009hmhang/EECS, NTUT21 Spatial Scalability Same concepts in MPEG-2/4 and H.263 -- Each spatial layer is coded with texture/ motion refinement scaling Coding Scalable stream prediction

22 Fast Algorithm: Intra Prediction Base layer is coded with good quality (small Qp)  Enh. Layer: IntraBL dominates Base layer is coded with poor quality (large Qp)  Enh. Layer: Intra4x4/IntraBL  Intra4x4 H.-C. Lin, W.-H. Peng, and H.-M. Hang, IEEE ICIP07; IEEE ICME08. 22Feb 2009hmhang/EECS, NTUT

23 Examples of Research Topics: Interframe Wavelet, Contourlet Coding Feb 200923hmhang/EECS, NTUT

24 Interframe Wavelet  Algorithm proposed and improved by Profs. Jens Ohm (Achen U.) and John Woods (RPI)  Motion compensated temporal filtering (MCTF) + wavelet zero-tree coding  Key advantage: “full” scalability – temporal + spatial + SNR  Disadvantage: long delay (storage)  High bit rates: good (~ advanced video coding) Low rates: needs improvement  Many variations now Feb 200924hmhang/EECS, NTUT

25 MCTF + Wavelet MCTF: Motion Compensated Temporal Filtering Spatial Analysis MCTF (analysis) Entropy Coding Packetizer Motion Estimation Motion Info. Encoding Spatial Synthesis MCTF (synthesis) Entropy Decoding Depacketizer Motion Info. Decoding Encoder Decoder Input Video Output Video Feb 200925hmhang/EECS, NTUT

26 MCTF MCTF = Motion Compensated Temporal Filtering 12345 Feb 200926hmhang/EECS, NTUT

27 Temporal Subband Decomposition GOP (Group of Pictures) Corresponding to temporal level=4 decomposition Temporal Low-pass frame Temporal High-pass frame Frames that remain after temporal decomposition MCTF Video Sequence Feb 200927hmhang/EECS, NTUT

28 Hierarchical Variable Size Block Matching Down-sample by 2 Reference Predicted Search Refine 64 x 64 4 x 4 Feb 200928hmhang/EECS, NTUT

29 Spatial Scalability Wavelet decomposition provides spatial scalability ~ JPEG 2000 Rate-control! Bit-plane Coder Feb 200929hmhang/EECS, NTUT

30 R-D Optimization in Interframe Wavelet Video Wavelet coding structure Wavelet coding structure 30 Predictor Entropy coding bits Open-loop Motion Coder Quantizer R-D No Feedback Path!!! Multiple R-D operation points!! Block-based subband-based Inter-scaled hybrid coding!! C.-Y. Tsai and H.-M. Hang, “rho-GGD source modeling for wavelet coefficients in image/video coding,” in IEEE ICME, 2008 Feb 200930hmhang/EECS, NTUT

31 Contourlet Transform Inefficiency of separable transform Feb 200931hmhang/EECS, NTUT

32 Contourlet Representation Image decomposition using Directional Filter Bank (DFB) Feb 200932hmhang/EECS, NTUT

33 An Example of DFB Decomposed by a DFB with 4 levels that leads to 16 subbands Original image Image after DFB with 4 levels Feb 200933hmhang/EECS, NTUT

34 DFB-Based Coding One example of mixed 2D wavelet decomposition C.-H. Hung and H.-M. Hang, “Image Coding Using Short Wavelet- based Contourlet Transform,” IEEE ICIP, 2008 Feb 200934hmhang/EECS, NTUT

35 Audio Compression: MP3  MPEG Surround Feb 200935hmhang/EECS, NTUT

36 MPEG Audio Standards MPEG-1 Layer 1: 1992(good: 256k /2ch) 1-2 chs32k – 448k bits MPEG-1 Layer 2: 1992(good: 192k /2ch) 1-2 chs32k – 384k bits MPEG-1 Layer 3: 1993 (MP3) (good: 128k /2ch) 1-2 chs32k – 320k bits MPEG-2 Layers 1,2,3: 19941-5.1 chs MPEG-2 AAC: 1997 (Advanced Video Coding) (good: 96k /2ch)1-96 chs 8-64k/ch MPEG-4 (v1) subpart 3 General Audio Coding, AAC: 1999 (new tools: PNS, LTP, TwinVQ) 1-96 chs 8-64k/ch Feb 200936hmhang/EECS, NTUT

37 MPEG Audio Standards (2) MPEG-4 (v1) subpart 2: Code-Excited Linear Prediction (CELP) Speech3.8k – 23.8 k bits MPEG-4 (v1) subpart 2: Harmonic Vector eXcitation Coding (HVXC) Speech2k – 4k bits MPEG-4 (v1) subpart 4: Structured Audio Synthesized audio 0k – 3k bits MPEG-4 (v2) Parametric Audio Coding: HILN (Harmonic, Individual Line plus Noise) 6k – 16k bits/ch MPEG-4 (v2) Fine Granule Audio: BSAC (Bit Sliced Arithmetic Coding) Fine scale granularity: 1k Feb 200937hmhang/EECS, NTUT

38 MPEG Audio Standards (3) MPEG-4 Amd 1: (2003) Bandwidth extension (SBR -- Spectral Band Replication) HE-AAC, AAC+ (48k) MPEG-4 Amd 2: (2004) Parametric Audio extension Audio MPEG-4 Amd 4: Audio lossless coding (ALS)Lossless MPEG-4 Amd 5: Scalable to lossless audio coding (SLS) Scalable coding MPEG-4 Amd 6: Lossless coding of 1 bit oversampled audio signals ISO/IEC 2300-3-1 (MPEG-D): MPEG Surround (FDIS 2006.7) Spatial audio coding Feb 200938hmhang/EECS, NTUT

39 Spectral Band Replication: SBR Typical audio signal spectrum Feb 200939hmhang/EECS, NTUT

40 SBR (2) The high frequencies are reconstructed and adjusted Feb 200940hmhang/EECS, NTUT

41 Spatial Hearing Three parameters describing how human locate sound source in the horizontal place  Interaural Level Difference (ILD)  Interaural Time Difference (ITD)  Interaural Coherence (IC) Feb 200941hmhang/EECS, NTUT

42 MPEG Surround Low-bitrate parametric coding technology for multi-channel audio signal  64 kb/s or less Backward compatibility to stereo equipment Standardization  CfP on Spatial Audio Coding (SAC) in March 2004  Reference Model 0 (RM0) defined in 2005  Rename to ”MPEG Surround” in 2005  Finalize in July, 2006 (ISO/IEC 23003-1) Feb 200942hmhang/EECS, NTUT

43 MPEG Surround Encoder Capture the spatial image of a multi-channel audio signal Generate a mono/stereo downmixed signal Feb 200943hmhang/EECS, NTUT

44 MPEG Surround Decoder Synthesis multi-channel output signal Backward compatibility Feb 200944hmhang/EECS, NTUT

45 Multimedia Communication System Feb 200945hmhang/EECS, NTUT

46 Feb 2009hmhang/EECS, NTUT46 Server-Client Structure ServerNetworkClient

47 Feb 2009hmhang/EECS, NTUT47 NCTU Multimedia Test Bed Designed for performance evaluation of scalable media streaming system Also included: MPEG-21 digital item adaptation, MPEG-4 IPMP, … 2002.12: NCTU donated the source codes to MPEG (M9182). 2005.4: ISO/IEC JTC 1/SC 29, Information Technology – Multimedia Framework (MPEG- 21) – Part 12: Test Bed for MPEG-21 Resource Delivery

48 Feb 2009hmhang/EECS, NTUT48 MPEG-21 Testbed Data flow Control path IPMP DIA

49 Feb 2009hmhang/EECS, NTUT49 MPEG Integrated Project Pre-processor video source MPEG-4/21 Scalable codec MPEG-21 DIA Engine Streaming Module Network Simulator MPEG IPMP Streaming Module Post- processor Client GUI N-way Client/Server Module N-Way Conference Client SP#5: MPEG IPMP and Robust Video Decoder Design and Simulation SP#2: Video Streaming Server and Video Database Integration SP#3: MPEG Multimedia Transport, Protocols, & Network Simulator SP#1: Pre- and Post-processing Techniques of Scalable Video Streaming SP#4: Advanced Fine Granularity Scalability SP#6: Research in Multipoint Videoconferencing Technologies Multimedia/Conference Client Multimedia Database Multimedia Server MPEG IPMP Video Database MPEG-4/21 Scalable codec Main Project: MPEG Integrated Multimedia Platform and Applications

50 Feb 2009hmhang/EECS, NTUT50 Test Bed Demo Setup

51 Feb 2009hmhang/EECS, NTUT51 MPEG-21 Test Bed Demo

52 Reconfigurable Video Coding -- Many codecs share common/similar tools -- A collection of tools (functional units): Each tool has a single, clear functionality Free viewpoint TV (FTV) -- Watch (synthesize) video from “any” viewpoint -- Scenes are captured by multiple cameras -- Key: Reduce information; Simple/fast synthesis Scalable Speech and Audio Coding What Next in MPEG? Feb 200952hmhang/EECS, NTUT

