Presentation is loading. Please wait.

Presentation is loading. Please wait.

Signal Prediction, Transformation & Decomposition Trần Duy Trác ECE Department The Johns Hopkins University Baltimore MD 21218.

Similar presentations


Presentation on theme: "Signal Prediction, Transformation & Decomposition Trần Duy Trác ECE Department The Johns Hopkins University Baltimore MD 21218."— Presentation transcript:

1 Signal Prediction, Transformation & Decomposition Trần Duy Trác ECE Department The Johns Hopkins University Baltimore MD 21218

2 Outline  Prediction  Open-loop differential pulse-code modulation (DPCM)  Closed-loop DPCM  Optimal linear prediction  Transformation & Signal Decomposition  Transform fundamentals  Basis functions, transform coefficients  Invertibility, unitary  1D, 2D  Karhunen-Loeve Transform (KLT)  Optimal linear transform  Discrete Cosine Transform (DCT)  Filter Banks and the Discrete Wavelet Transform

3 Reminder reconstructed signal original signal compressed bit-stream Information theory VLC Huffman Arithmetic Run-length Quantization Prediction Transform

4 Predictive Coding  We have only dealt with memory-less model so far  Each symbol/sample is quantized and/or coded without much knowledge on previous ones  There is usually a strong correlation between neighboring symbols/samples  Simplest prediction scheme: take the difference!  The range of the differences should be a lot smaller – a reduction in the signal entropy!

5 Open-Loop DPCM input signal communication channel encoder decoder reconstructed signal

6 Open-Loop DPCM: Analysis Q Q + Quantization Error Accumulation Encoder Decoder

7 Open-Loop DPCM: Analysis  There seems to be a model mismatch  Encoder:  Decoder:  How about using the reconstructed sample for the difference?

8 Closed-Loop DPCM: Analysis Modified prediction scheme Decoder remains the same No error accumulation!

9 Closed-Loop DPCM input signal communication channel encoder decoder reconstructed signal

10 Closed-Loop DPCM: Observations  Quantization error does not accumulate  Minor modification in prediction scheme leads to major encoder modification  Encoder now has decoder embedded inside  Closed-loop & open-loop DPCM has the same decoder  DPCM predicts current sample from last reconstructed one  Generalization?  Replace the simple delay operator by more complicated & more sophisticated predictor P(z)

11 Linear Prediction communication channel encoder decoder

12 Optimal Linear Prediction  Problem  Assumptions  Signal is WSS  High bit rates, i.e., fine quantization  Approach

13 Optimal Linear Prediction

14

15 Correlation matrix of WSS RP x[n] Cross correlation vector between x[n] and past observed samples approximated from time averaging

16 Linear Signal Representation input signal transform coefficient basis function Synthesis Analysis Approximation using as few coefficients as possible

17 Transform Fundamentals Q AnalysisSynthesis  1D Analysis Transform  1D Synthesis Transform

18 Invertibility & Unitary  Invertibility  perfect reconstruction, bi-orthogonal, reversible  Unitary  orthogonal, orthonormal same analysis & synthesis basis functions

19 Norm Preservation  Norm preservation property of orthonormal transform  From a coding perspective  Q error in the transform domain equals Q error in the spatial domain!  Concentrate on the quantization of the transform coefficients

20 2D Separable Transformation  2D Analysis  2D Synthesis transforming columns transforming rows N x N  2D Orthogonal Synthesis

21 Example: Fourier Transform  Discrete Fourier Transform (DFT) k n Jean Baptiste Joseph Fourier

22 KLT: Optimal Linear Transform  Karhunen-Loeve Transform (KLT)  Hotelling transform, principle component analysis (PCA)  Question:  Amongst the linear transforms, which one is the best decorrelator, offering the best “compression”?  Problem:  Assumptions: x is zero-mean WSS RP.

23 KLT  Reminder:  Orthonormal constraint:  Autocorrelation matrix

24 KLT basis = Eigenvectors  Reminder: eigenvector eigenvalue  Lagrange Multiplier  Optimal coding scheme: send the larger eigenvalues first!

25 KLT Problems  KLT problems  Signal dependent  Computationally expensive  statistics need to be computed  no structure, no symmetry, no guarantee of stability  Real signals are really stationary  Encoder/Decoder communication  Practical solutions  Assume a reasonable signal model  Blocking the signals to ensure stationary assumption holds  Making the transform matrix sparse & symmetric  Good KLT approximation for smooth signals: DCT!  Assumptions: x is zero-mean WSS RP.

26 KLT: Summary eigenvectors  Signal dependent  Require stationary signals  How do we communicate bases to the decoder?  How do we design “good” signal-independent transform?

27 Best Basis Revisited  Fundamental question: what is the best basis?  energy compaction: minimize a pre-defined error measure, say MSE, given L coefficients  maximize perceptual reconstruction quality  low complexity: fast-computable decomposition and reconstruction  intuitive interpretation  How to construct such a basis? Different viewpoints!  Applications  compression, coding  signal analysis  de-noising, enhancement  communications

28 Discrete Cosine Transforms  Type I  Type II  Type III  Type IV

29 DCT Type-II DC 8 x 8 block low frequency high frequency middle frequency horizontal edges vertical edges DCT basis orthogonal orthogonal real coefficients real coefficients symmetry symmetry near-optimal near-optimal fast algorithms fast algorithms

30 DCT Symmetry DCT basis functions are either symmetric or anti-symmetric Prof. K. R. Rao

31 DCT: Recursive Property  An M-point DCT–II can be implemented via an M/2-point DCT–II and an M/2-point DCT–IV Butterfly input samples DCT coefficients

32 Fast DCT Implementation 13 multiplications and 29 additions per 8 input samples

33 Block DCT input blocks, each of size M output blocks of DCT coefficients, each of size M

34 Filter Bank  First FB designed for speech coding, [Croisier-Esteban-Galand 1976]  Orthogonal FIR filter bank, [Smith-Barnwell 1984], [Mintzer 1985] Q 2 2 2 2 Analysis Bank Synthesis Bank low-pass filter high-pass filter

35 FB Analysis Q 2 2 2 2 Alternating-Sign Construction

36 History: Wavelets  Early wavelets: for geophysics, seismic, oil-prospecting applications, [Morlet-Grossman-Meyer 1980-1984]  Compact-support wavelets with smoothness and regularity, [Daubechies 1988]  Linkage to filter banks and multi-resolution representation, fast discrete wavelet transform (DWT), [Mallat 1989]  Even faster and more efficient implementations: lattice structure for filter banks, [Vaidyanathan-Hoang 1988]; lifting scheme, [Sweldens 1995] Prof. Y. MeyerProf. I. DaubechiesProf. S. Mallat

37 From Filter Bank to Wavelet  [Daubechies 1988], [Mallat 1989]  Constructed as iterated filter bank  Discrete Wavelet Transform (DWT): iterate FB on the lowpass subband 0 frequency spectrum

38 1-Level 2D DWT input image LP HP LP HP decompose columns decompose rows LP HP LL LH HLHH LL: smooth approximation LH: horizontal edges HL: vertical edges HH: diagonal edges LH

39 LP HP LP HP LP HP decompose rowsdecompose columns 2-Level 2D DWT LP HP LP HP decompose rowsdecompose columns LP HP LH HLHH

40 2D DWT

41 Time-Frequency Localization best time localization best frequency localization STFT uniform tiling wavelet dyadic tiling  Heisenberg’s Uncertainty Principle: bound on T-F product  Wavelets provide flexibility and good time-frequency trade-off

42 Wavelet Packet  Iterate adaptively according to the signal  Arbitrary tiling of the time-frequency plain 0 frequency spectrum Question: can we iterate any FB to construct wavelets and wavelet packets?

43 Scaling and Wavelet Function Discrete Basis Continuous-time Basis product filters scaling function wavelet function

44 Convergence & Smoothness  Not all FB yields nice product filters  Two fundamental questions  Will the infinite product converge?  Will the infinite product converge to a smooth function?  Necessary condition for convergence: at least a zero at  Sufficient condition for smoothness: many zeros at with 2 zeros at with 4 zeros at

45 Sparsity in Video Signals: Motion Estimation & Compensation Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD 21218

46 Outline  Video coding algorithms and standards: an overview  Main observations. Several simple video codecs  Motion-compensated DCT coding (MC-DCT)  Motion-compensated wavelet coding (MC-DWT)  3D wavelet and subband coding  Block motion estimation (BME) and motion compensation  Principle. Error measures. Optimality. Search strategies.  Examples of fast sub-optimal algorithms  Fast optimal algorithms: spiral search, successive elimination, projection matching...  Combination searches: hierarchical, spatial correlation, rate- constrained...  Advanced topics in motion estimation / motion compensation  Global motion estimation  Advanced motion models: sub-pixel motion, affine motion, meshes, variable-block-size MEMC...

47 Main Observations  Video signal is a sequence of still images or frames  Correlation in video sequence  Temporal correlation: similar background with a few moving objects in the foreground  Spatial correlation: similar pixels seem to group together – just like spatial correlation in images  There is usually much more temporal correlation then spatial  Motion model in video sequence  Natural motion  moving people & moving objects  translational, rotational, scaling  Camera motion  camera panning, camera zooming, fading

48 Examples of Video Sequences Frame 1 51 71 91 111  Observations of Visual Data  There is a lot of redundancy, correlation, strong structure within natural image/video  Images  Spatial correlation: a lot of smooth areas with occasional edges  Video  Temporal correlation: neighboring frames seem to be very similar

49 Simple Video Coders  JPEG/JPEG2000 encodes every frame independently  Quite popular: Motion-JPEG, Motion-JPEG2000  Does not take into account any temporal correlation  Advantage: very simple and fast, no latency (frame delay), easy frame access for video editing  Disadvantage: low coding performance  JPEG/JPEG2000 encodes frame difference  Requires very stationary background to be effective  Advantage: improves compression, low latency  Disadvantage: still low coding performance, open-loop design, quantization error accumulation from coder/decoder mismatch  JPEG/JPEG2000 encodes frame difference – closed-loop  Advantage: no error accumulation, low latency, simple  Disadvantage: too simple motion model, still low compression

50 Motion-Compensated Framework  Transform-based coding on motion-compensated prediction error (residue)  Popular transformations: block DCT or wavelet  Closed-loop DPCM to prevent error propagation (drifting)  Usually employed block-translation motion model  All international video coding standards are based on this coding framework  Video teleconferencing: H.261, H.263, H.263++, H.26L/H.264  Video archive & play-back: MPEG-1, MPEG-2 (in DVDs) MPEG-4

51 Typical MC Codec Entropy Decoding, Inverse Q, Inverse Transform Motion Comp. Predictor Transform, Quantization, Entropy Coding Motion Estimation Frame Buffer (Delay) Motion Compensated Prediction Input Frame Encoded Residual (To Channel) Approximated Input Frame (To Display) Motion Vector and Block Mode Data (Side-Info, To Channel) Decoder

52 Motion Estimation  Goal: extract correlation between adjacent video frames to improve compression efficiency  General problem statement:  Practical motion model: small objects or regions moving in translational fashion  Block-based motion estimation (BME) and compensation (BMC) time axis

53 Intra- and Inter-Coding  Inter-coding:  blocks of predicted motion error labeled P-block is encoded  Intra-coding  any block that motion estimation fails to find a good match is labeled I-block and encoded as is (without any motion compensation)  Conditional replenishment  when prediction error is low, no coding or decoding is necessary  we simply record the motion vector and replenish the block from the reference frame.

54 Motion Models  Translation  Affine  Bilinear  Perspective

55 Principle of BME  Partition current frame into small non-overlapped blocks called macro-blocks (MB)  For each block, within a search window, find the motion vector (displacement) that minimizes a pre-defined mismatch error  For each block, motion vector and prediction error (residue) are encoded Search Window Reference Frame Current Frame MV

56 BME: Error Measure  Sum of absolute differences  Sum of squared errors (mean-squared error)  Discussions:  Other norms, correlation measure have been tested  Approximately same coding performance  SAD is less complex for some hardware architectures search range current blockreference block

57 Common Setting  Macro-block  Luminance: 16x16, four 8x8 blocks  Chrominance: two 8x8 blocks  Motion estimation only performed for luminance  Motion vector range  [ -15, 15] YY YY CrCb 8 8 16 15 Search Area

58 Search Strategies I  All possible MV candidates within the search range are investigated  Very computationally expensive  Optimal, highly regular, parallel computable Exhaustive Search dx dy

59 Search Strategies II  Sampling the search space  At every stage, investigate a few candidates  Move in the direction of the best match  Can adaptively reduce the displacement size for each stage Gradient Search  Sampling the search space  Divide search space into regions  Pick a center point for each region  Perform more elaborate search on the region with best center Divide and Conquer

60 Search Strategies III Multi-resolution or Hierarchical Search current frame reference frame DD BME MV Field Full Search

61 Fast BME Algorithm: Example I  Use + pattern at each stage (5 candidates)  Move center to best match  Reduce step size at each stage  Fast, reasonable quality  Similar algorithms: three-step-search, four- step-search, cross search... 2D Log Search dx dy

62 Fast BME Algorithm: Example II  Divide-and-conquer  Move search to region with best match  Finer search at later stages Binary Search

63 Optimal BME: Spiral Search  Early termination  Start with a predicted search center, default=(0,0)  Spiral search around center in diamond or square pattern  Keep track of best match so far; update whenever a better candidate is found  MPEG-4, H.263+ Spiral Search dx dy

64 Optimal BME: Partial Matching  Triangle inequality  Strategy  Compute partial sums for macro-blocks in both current and reference frame  Eliminate candidates based on partial sums  Nice partial sums: row projections, column projections!  Avoid full 2D matching, translate the problem into 1D matching  Can be combined with spiral search and early termination partial sums

65 Spatial Correlation Based BME  Exploit intra-frame spatial correlation to narrow down search space for BME  Blocks covering the same object should move together  MVs from causal neighboring blocks can be used to predict MV for the current block  Reduce bit-rate for MV encoding: regularize the MV field  Reduce BME complexity C1 234

66 BME/BMC: Example I Previous FrameCurrent FrameMotion Vector Field Frame DifferenceMotion-Compensated Difference

67 BME/BMC: Example II Previous FrameCurrent FrameMotion Vector Field Frame DifferenceMotion-Compensated Difference

68 Sub-Pixel Motion Estimation  Sub-pixel motion vector resolution  Use linear/bilinear interpolation to fill in sub-pixels  Trade-offs: motion accuracy versus MV bit-rate and complexity increase  H.26L uses down to 1/4-MEMC, maybe even 1/8 integer pixel half pixel AB CD b cd b = round[(A+B)/2] c = round[(A+C)/2] d = round[(A+B+C+D)/4]

69 B frames  Possible temporal prediction for B pictures: b C1 C2 Frame k-1 Frame k Frame k+1

70 B Frames  Can have better coding efficiency  Average of two predictions reduces the variance  New objects can be better predicted using future frames

71 Multiple Reference Frames  More than one previously decoded pictures can be used as reference  R-D optimization is also needed to choose the best reference frame  Employed in H.264

72 Variable-Block-Size BME  Motion model for H.264  Generalization of the traditional translation BME framework  Sub-divide macro-block: 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4 Mode 1Mode 2Mode 3Mode 4Mode 5Mode 6Mode 7

73 Mesh-Based Motion Estimation  MPEG-4 object motion  Affine warping motion model  Deformable polygon meshes  Similar MAD, SSE error measures  Trade-offs: more accurate ME vs. tremendous complexity  Bilinear and perspective motion models are rarely used in video coding

74 Global Motion Estimation  Rarely used in practice: BME/BMC mostly suffices  Reference frame resampling: an option in H.263+/H.263++  Global affine motion model: special-effect warping  3D subband & wavelet coding: align frames before temporal filtering [Taubman]

75 Conclusion  Temporal correlation dominates spatial correlation in video sequences  Motion estimation and motion compensation improve coding performance the most in video coding  State-of-the-art video coders, such as the current H.26L verification model, still employ BME-block DCT coding framework  Despite its simplicity, BME is still the bottleneck of real- time video communications


Download ppt "Signal Prediction, Transformation & Decomposition Trần Duy Trác ECE Department The Johns Hopkins University Baltimore MD 21218."

Similar presentations


Ads by Google