Download presentation
Presentation is loading. Please wait.
1
CMPT 771 Internet Architecture and Protocols
Digital Media Basics
2
Media Basics Contents: Brief introduction to digital media Audio/Video
Digitization Representation Compression
3
Audio Digitization (PCM)
4
A few words about digital audio
Sampling theory – Nyquist theorem the discrete time sequence of a sampled continuous function { V(tn) } contains enough information to reproduce the function V=V(t) exactly provided that the sampling rate is at least twice that of the highest frequency contained in the original signal V(t) Analog signal sampled at constant rate telephone: 8,000 samples/sec CD music: 44,100 samples/sec
5
Audio Digitization (Pulse Code Modulation)
Sound in analogue formats must be digitized at every time interval the sound is converted to a digital equivalent using 2 bits the following sound can be digitized
6
Digitize audio Each sample quantized, i.e., rounded
e.g., 28=256 possible quantized values Each quantized value represented by bits 8 bits for 256 values Example: 8,000 samples/sec, 256 quantized values --> 64,000 bps Receiver converts it back to analog signal: some quality reduction Example rates CD: Mbps MP3: 96, 128, 160 kbps Internet telephony: kbps
7
Approximate file sizes for 1 second of audio
Channels Resolution Fs File Size Mono 8bit 8Khz 64Kb Stereo 128Kb 16bit 16Khz 256Kb 44.1Khz 1441Kb* 24bit 2116Kb 1CD 700M mins
8
Psychoacoustic: Perceptual Coding
Hide errors where humans will not see or hear it Study hearing and vision system to understand how we see/hear Masking refers to one signal overwhelming/hiding another (e.g., loud siren or bright flash) Natural Bandlimitng Audio perception is kHz but most sounds in low frequencies (e.g., 2 kHz to 4 kHz) Low frequencies may be encoded as single channel Human ear can tolerate 200ps second delay
9
Psychoacoustic -Human aural response]
10
Psychoacoustic Model Basically: If you can’t hear the sound, don’t encode it Human frequency response: Frequency masking: If within a critical band a stronger sound and weaker sound compete, you can’t hear the weaker sound. Don’t encode it. Temporal masking: After a loud sound, there’s a while before we can hear a soft sound. Stereo redundancy: At low frequencies, we can’t detect where the sound is coming from. Encode it mono. Critical band Masking threshold dictate how much quantization noise can be injected Inaudiable
11
Audio Compression Makes use of psychoacoustic knowledge to reduce the amount of information required to achieve the same perceived quality (lossy compression) MP3 = MPEG 1/2 layer 3 audio; achieves CD quality in about 192 kbps (a 3.7:1 compression ratio): higher compression possible Sony MiniDisc uses Adaptive TRAnsform Coding (ATRAC) to achieve a 5:1 compression ratio (about 141 kbps)
12
Transform Coding Frequency analysis ? Time domain ? Not easy!
Time domain -> Transform domain Sequence to be coded is converted into new sequence using transformation rule. New sequence - transform coefficients. Process is reversible - get back to original sequence using inverse transformation. Example - the Fourier transform. Coefficients represent proportion of energy contributed by different frequencies.
13
Transform Coding (Cont…)
In transform coding - choose transformation such that only subset of coefficients have significant values. Energy confined to subset of ‘important’ coefficients. Known as ‘energy compaction’. Example - FT of bandlimited signal:
14
Artefacts of compression
Mp3 encoded recordings rarely sound identical to original uncompressed audio files Whole areas of the spectrum are lost in the encoding process On small domestic ‘hi-fi’ or PC speakers, however, mp3 compressed audio can be acceptable
15
WAV File (34Mb)
16
Mp3 file (3Mb)
17
Video Digitization and Compression
Video is sequence of images (frames) displayed at constant frame rate e.g. 24 images/sec Digital image is a 2-D array of pixels Sampling theory Each pixel represented by bits R:G:B Y:U:V Y = 0.299R G B (Luminance or Brightness) U = B - Y (Chrominance 1, color difference) V = R - Y (Chrominance 2, color difference) Redundancy spatial Temporal
18
JPEG (Joint Photographic Experts Group)
Transform Quantize Encode JPEG Lossy Sequential Mode JPEG Compression Ratios: 30:1 to 50:1 compression is possible with small to moderate defects. 100:1 compression is quite feasible for very-low-quality purposes .
19
JPEG Steps Block Preparation: From RGB to YUV (YIQ) planes
Transform: Two-dimensional Discrete Cosine Transform (DCT) on 8x8 blocks. Quantization: Compute Quantized DCT Coefficients (lossy). Encoding of Quantized Coefficients : Zigzag Scan Differential Pulse Code Modulation (DPCM) on DC component Run Length Encoding (RLE) on AC Components Entropy Coding: Huffman or Arithmetic
20
JPEG Overview Compression: Transform Quantize Encode Block Preparation
Decompression: Reverse the order Encode
21
JPEG: Block Preparation
RGB Input Data After Block Preparation Input image: 640 x 480 RGB (24 bits/pixel) transformed to three planes: Y: (640 x 480, 8-bit/pixel) Luminance (brightness) plane. U, V: (320 X bits/pixel) Chrominance (color) planes.
22
Discrete Cosine Transform (DCT)
A transformation from spatial domain to frequency domain (similar to FFT) Definition of 8-point DCT: F[0,0] is the DC component and other F[u,v] define AC components of DCT
23
The 64 (8 x 8) DCT Basis Functions
DC Component v
24
8x8 DCT Example or u Original values of an 8x8 block
or v or u DC Component Original values of an 8x8 block (in spatial domain) Corresponding DCT coefficients (in frequency domain)
25
JPEG: Quantized DCT Coefficients
q(u,v) Uniform quantization: Divide by constant N and round result. In JPEG, each DCT F[u,v] is divided by a constant q(u,v). The table of q(u,v) is called quantization table. F[u,v] Rounded F[u,v]/ q(u,v)
26
JPEG: Zigzag Scan Maps an 8x8 block into a 1 x 64 vector
Zigzag pattern group low frequency coefficients in top of vector.
27
JPEG: Encoding of Quantized DCT Coefficients
DC Components: DC component of a block is large and varied, but often close to the DC value of the previous block. Encode the difference of DC component from previous 8x8 blocks using Differential Pulse Code Modulation (DPCM). AC components: The 1x64 vector has lots of zeros in it. Using RLE, encode as (skip, value) pairs, where skip is the number of zeros and value is the next non-zero component. Send (0,0) as end-of-block value.
28
Intra-Frame Coding (JPEG)
Block-based 2-D DCT (Discrete Cosine Transform) Karhunen-Loeve (KL) transform ? 8x8 blocks Frequency domain compression -> Run-length coding -> Entropy (Huffman) coding A typical 8x8 block of quantized DCT coefficients. Most of the higher order coefficients have been quantized to 0. 12 34 0 54 0 0 0 0 87 0 0 12 0 0 0 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Zig-zag scan: the sequence of DCT coefficients to be transmitted: DC coefficient (12) is sent via a separate Huffman table. Runlength coding remaining coefficients: 34 | 87 | 16 | | | |
29
JPEG: Runlength Coding
A typical 8x8 block of quantized DCT coefficients. Most of the higher order coefficients have been quantized to 0. 12 34 0 54 0 0 0 0 87 0 0 12 0 0 0 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Zig-zag scan: the sequence of DCT coefficients to be transmitted: DC coefficient (12) is sent via a separate Huffman table. Runlength coding remaining coefficients: 34 | 87 | 16 | | | | Further compress: statistical (entropy) coding
30
A few words about Entropy
A measure of information content Entropy of the English Language How much information does each character in “typical” English text contain? From a probability view If the probability of a binary event is 0.5 (like a coin), then, on average, you need one bit to represent the result of this event. As the probability of a binary event increases or decreases, the number of bits you need, on average, to represent the result decreases The figure is expressing that unless an event is totally random, you can convey the information of the event in fewer bits, on average, than it might first appear
31
Entropy (Shannon 1948) For a set of messages S with probability p(s), s S, the self information of s is: Measured in bits if the log is base 2. The lower the probability, the higher the information Entropy is the weighted average of self information.
32
Entropy Example
33
Entropy Coding Recall Huffman coding…
Entropy Coding (Variable-length coding, statistical coding) Lossless coding Takes advantage of the probabilistic nature of information Example: Huffman coding, arithmetic coding Theorem (Shannon) (lower bound): For any probability distribution p(S) with associated uniquely decodable code C, Recall Huffman coding…
34
Example JPEG Original Image Compressed Image Quantization Table Used
Compression Ratio: 7.7 JPEG Example Compression Ratio: 12.3 Original Image Compression Ratio: 33.9 Compression Ratio: 60.1 Produced using the interactive JPEG Java applet at:
35
Inter-Frame Predecition
Predicted P-frame Intra-coded I-frame
36
Motion Estimation and Compesentation
37
Video compression: A big picture
38
Bi-Directional Prediction
Intra-Coded I-Frame Bi-directional Predicted B-Frame I B P Group of frames (GOF)
39
VBR vs CBR: Rate Control
Variable-Bit-Rate Fixed quantizer Qp “Constant” quality E.g. RMVB Constant-Bit-Rate Adaptive quanitzer “Constant” rate – easier control Difference (compared to target rate can be 0.5% or less) E.g. RM, MPEG-1 Rate-distortion optimization Recall that transport layer also has rate control … CBR Video Encoder Smoothing Buffer Rate Controller Raw VBR Qp
40
Standardization Organizations
ITU-T VCEG (Video Coding Experts Group) standards for advanced moving image coding methods appropriate for conversational and non-conversational audio/visual applications. ISO/IEC MPEG (Moving Picture Experts Group) standards for compression and coding, decompression, processing, and coded representation of moving pictures, audio, and their combination Relation ITU-T H.262~ISO/IEC (mpeg2) Generic Coding of Moving Pictures and Associated Audio. ITU-T H.263~ISO/IEC (mpeg4) WG - work group SG – sub group ISO/IEC JTC 1/SC 29/WG 1 Coding of Still Pictures ISO/IEC JTC 1/SC 29/WG 11
41
Coding Rate and Standards
Mobile videophone Videophone over PSTN ISDN videophone Video CD Digital TV HDTV 8 16 64 384 1.5 5 20 kbit/s Mbit/s Very low bitrate Low bitrate Medium bitrate High bitrate MPEG-4 H.263 H.261 MPEG-1 MPEG-2
42
Coding Rate and Standards
ITU-T VCEG (Video Coding Experts Group) ISO/IEC MPEG (Moving Picture Experts Group) ITU-T H.262~ISO/IEC (mpeg2) Generic Coding of Moving Pictures and Associated Audio. ITU-T H.263~ISO/IEC (mpeg4)
43
ISO MPEG-1 (Moving Pictures Experts Group).
Progressively scanned video for multimedia applications, at a bit rate 1.5Mb/s access time for CD-ROM players. Video format: near VHS quality
44
ISO MPEG-2 MPEG-2 Standard for Digital Television, DVD
4 to 8 Mb/s / 10 to 15 Mb/s >> MPEG -1 Supports various modes of scalability (Spatial, temporal, SNR) There are differences in quantization and better Variable length codes tables for progressive video sequences.
45
MPEG-3? Originally envisioned for high bit rate applications such as HDTV Cancelled since the target rate can be handled by MPEG-2. J. Liang SFU ENSC861 2019/4/7 45 45
46
ISO MPEG-4 A much broader standard.
MPEG-4 was aimed primarily at low bit rate video communication, but not limited to Applications: Digital television Interactive graphics applications Interactive multimedia (World Wide Web) Two version: Divx 3 and Divx 4 (Internet world) Important concept Video object
47
MPEG-4 Structure Compositor MUX Bitstream Audio/Video scene A/V object
Decoder A/V object Decoder Bitstream Audio/Video scene MUX Compositor A/V object Decoder
48
MPEG-4 Object Video Instead of ”frames”: Video Object Planes
Shape Adaptive DCT Alpha map A video frame Background VOP VOP SA DCT
49
Example Object 3 Object 1 Object 4 Object 2 Problems, comments?
50
Another Example
51
Status MPEG-4 part 2 Microsoft, RealVideo, QuickTime, ...
But only recentagular frame based MPEG-4 part 2 DivX/Xvid QuitTime6 H.264 = MPEG-4 part 10 (2003) iTune video store YouTube HD video Bluray (later version)
52
Next Step: H.26L->H.264 - Done
ITU-T Recommendations: Real time video communication applications. MPEG Standards : Video storage, broadcast video, video streaming applications H.26 L = ITU-T + MPEG = JVT coding Current project of Joint Video Team formed by ITU-T SG16 Q6 ( VCEG) and the ISO/IEC JTC 1/SC 29 WG 11 ( MPEG ) Basic configuration similar to H.263 and MPEG-4 Part 2
53
Coding Evolution H.264/AVC
54
H.264 History Objectives: 50% bit rate savings compared to MPEG-2
1998: Call for proposal for H.26L issued by ITU-T VCEG (Video Coding Expert Group) Oct. 1999: First draft design Dec. 2001: ITU and ISO formed the Joint Video Team (JVT) Mar. 2003: approved ITU-T H.264 and ISO/IEC MPEG-4 Part 10 Advanced Video Coding (AVC) Jul 2004: Fidelity Range Extensions (FRExt) Current: Scalability Extensions Default YouTube format Objectives: 50% bit rate savings compared to MPEG-2 High quality video at both low and high bit rates: 64kbps to 240Mbps Network-friendly: more error resilient tools Support both conversational and non-conversational applications: Conversational: video conference Non-conversational: storage, broadcast, streaming 2019/4/7 54
55
H.264 Design Draft design adopted in Aug 1999 and has evolved into a test model long term (TML ) ref design Goals Enhanced Compression performance Provision of network friendly packet based video representation addressing the conversational and non-conversational applications Conceptual Separation between Video Coding Layer ( VCL) and Network Adaptation Layer ( NAL)
56
H.264 Design ( Contd. ) Video Coding Layer Control Data Macro-block
Data Partitioning Slice/Partition Network Adaptation Layer
57
New Developments for/beyond H.264
57
58
New Developments for/beyond H.264
58
59
New Developments for/beyond H.264
Scalable video coding 4K UHD Multiview video/3D video Virtual Reality/Augmented Reality GPU acceleration Learning/model based coding? 59
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.