Introduction to Wavelet Transform

Introduction to Wavelet Transform

Time Series are Ubiquitous!
A random sample of 4,000 graphics from 15 of the world’s newspapers published from 1974 to 1989 found that more than 75% of all graphics were time series (Tufte, 1983).

Why is Working With Time Series so Difficult?
Answer: We are dealing with subjective notions of similarity. The definition of similarity depends on the user, the domain and the task at hand. We need to be able to handle this subjectivity.

Wavelet Transform - Overview
History Fourier (1807) Haar (1910) Math World

What kind of Could be useful? Impulse Function (Haar): Best time resolution Sinusoids (Fourier): Best frequency resolution We want both of the best resolutions Heisenberg (1930) Uncertainty Principle There is a lower bound for (An intuitive prove in [Mac91])

Gabor (1945) Short Time Fourier Transform (STFT) Disadvantage: Fixed window size

Constructing Wavelets Daubechies (1988) Compactly Supported Wavelets Computation of WT Coefficients Mallat (1989) A fast algorithm using filter banks

Discrete Fourier Transform I
Basic Idea: Represent the time series as a linear combination of sines and cosines, but keep only the first n/2 coefficients. Why n/2 coefficients? Because each sine wave requires 2 numbers, for the phase (w) and amplitude (A,B). X X' 20 40 60 80 100 120 140 Jean Fourier 1 2 3 4 5 6 7 Excellent free Fourier Primer Hagit Shatkay, The Fourier Transform - a Primer'', Technical Report CS , Department of Computer Science, Brown University, 1995. 8 9

Discrete Fourier Transform II
Pros and Cons of DFT as a time series representation. Good ability to compress most natural signals. Fast, off the shelf DFT algorithms exist. O(nlog(n)). (Weakly) able to support time warped queries. Difficult to deal with sequences of different lengths. Cannot support weighted distance measures. X X' 20 40 60 80 100 120 140 1 2 3 4 5 6 7 Note: The related transform DCT, uses only cosine basis functions. It does not seem to offer any particular advantages over DFT. 8 9

History…

Discrete Wavelet Transform I
Basic Idea: Represent the time series as a linear combination of Wavelet basis functions, but keep only the first N coefficients. Although there are many different types of wavelets, researchers in time series mining/indexing generally use Haar wavelets. Haar wavelets seem to be as powerful as the other wavelets for most problems and are very easy to code. 20 40 60 80 100 120 140 Haar 0 Haar 1 Haar 2 Haar 3 Haar 4 Haar 5 Haar 6 Haar 7 X X' DWT Alfred Haar Excellent free Wavelets Primer Stollnitz, E., DeRose, T., & Salesin, D. (1995). Wavelets for computer graphics A primer: IEEE Computer Graphics and Applications.

Wavelet Series

Discrete Wavelet Transform III
Pros and Cons of Wavelets as a time series representation. Good ability to compress stationary signals. Fast linear time algorithms for DWT exist. Able to support some interesting non-Euclidean similarity measures. Signals must have a length n = 2some_integer Works best if N is = 2some_integer. Otherwise wavelets approximate the left side of signal at the expense of the right side. Cannot support weighted distance measures. 20 40 60 80 100 120 140 Haar 0 Haar 1 Haar 2 Haar 3 Haar 4 Haar 5 Haar 6 Haar 7 X X' DWT

Singular Value Decomposition I
Basic Idea: Represent the time series as a linear combination of eigenwaves but keep only the first N coefficients. SVD is similar to Fourier and Wavelet approaches, we represent the data in terms of a linear combination of shapes (in this case eigenwaves). SVD differs in that the eigenwaves are data dependent. SVD has been successfully used in the text processing community (where it is known as Latent Symantec Indexing ) for many years. Good free SVD Primer Singular Value Decomposition - A Primer. Sonia Leach X X' SVD James Joseph Sylvester 20 40 60 80 100 120 140 eigenwave 0 eigenwave 1 eigenwave 2 eigenwave 3 eigenwave 4 eigenwave 5 eigenwave 6 eigenwave 7 Camille Jordan ( ) Eugenio Beltrami

Singular Value Decomposition II
How do we create the eigenwaves? We have previously seen that we can regard time series as points in high dimensional space. We can rotate the axes such that axis 1 is aligned with the direction of maximum variance, axis 2 is aligned with the direction of maximum variance orthogonal to axis 1 etc. Since the first few eigenwaves contain most of the variance of the signal, the rest can be truncated with little loss. X X' SVD 20 40 60 80 100 120 140 eigenwave 0 eigenwave 1 eigenwave 2 eigenwave 3 eigenwave 4 eigenwave 5 eigenwave 6 eigenwave 7 This process can be achieved by factoring a M by n matrix of time series into 3 other matrices, and truncating the new matrices at size N.

Singular Value Decomposition III
Pros and Cons of SVD as a time series representation. Optimal linear dimensionality reduction technique . The eigenvalues tell us something about the underlying structure of the data. Computationally very expensive. Time: O(Mn2) Space: O(Mn) An insertion into the database requires recomputing the SVD. Cannot support weighted distance measures or non Euclidean measures. X X' SVD 20 40 60 80 100 120 140 eigenwave 0 eigenwave 1 eigenwave 2 eigenwave 3 eigenwave 4 eigenwave 5 eigenwave 6 eigenwave 7 Note: There has been some promising research into mitigating SVDs time and space complexity.

Piecewise Linear Approximation I
Basic Idea: Represent the time series as a sequence of straight lines. Lines could be connected, in which case we are allowed N/2 lines If lines are disconnected, we are allowed only N/3 lines Personal experience on dozens of datasets suggest disconnected is better. Also only disconnected allows a lower bounding Euclidean approximation X Karl Friedrich Gauss X' 20 40 60 80 100 120 140 Each line segment has length left_height (right_height can be inferred by looking at the next segment) Each line segment has length left_height right_height

Problem with Fourier sinusoids of different frequencies.
· Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. · A serious drawback in transforming to the frequency domain, time information is lost. When looking at a Fourier transform of a signal, it is impossible to tell when a particular event took place.

Function Representations
sequence of samples (time domain) finite difference method pyramid (hierarchical) polynomial sinusoids of various frequency (frequency domain) Fourier series piecewise polynomials (finite support) finite element method, splines wavelet (hierarchical, finite support) (time/frequency domain)

What Are Wavelets? In general, a family of representations using:
hierarchical (nested) basis functions finite (“compact”) support basis functions often orthogonal fast transforms, often linear-time

Function Representations – Desirable Properties
generality – approximate anything well discontinuities, nonperiodicity, ... adaptable to application audio, pictures, flow field, terrain data, ... compact – approximate function with few coefficients facilitates compression, storage, transmission fast to compute with differential/integral operators are sparse in this basis Convert n-sample function to representation in O(nlogn) or O(n) time

Wavelet History, Part 1 1805 Fourier analysis developed
1965 Fast Fourier Transform (FFT) algorithm … 1980’s beginnings of wavelets in physics, vision, speech processing (ad hoc) … little theory … why/when do wavelets work? 1986 Mallat unified the above work 1985 Morlet & Grossman continuous wavelet transform … asking: how can you get perfect reconstruction without redundancy?

Wavelet History, Part 2 1985 Meyer tried to prove that no orthogonal wavelet other than Haar exists, found one by trial and error! 1987 Mallat developed multiresolution theory, DWT, wavelet construction techniques (but still noncompact) 1988 Daubechies added theory: found compact, orthogonal wavelets with arbitrary number of vanishing moments! 1990’s: wavelets took off, attracting both theoreticians and engineers

Time-Frequency Analysis
For many applications, you want to analyze a function in both time and frequency Analogous to a musical score Fourier transforms give you frequency information, smearing time. Samples of a function give you temporal information, smearing frequency. Note: substitute “space” for “time” for pictures.

Comparison to Fourier Analysis
Basis is global Sinusoids with frequencies in arithmetic progression Short-time Fourier Transform (& Gabor filters) Basis is local Sinusoid times Gaussian Fixed-width Gaussian “window” Wavelet Frequencies in geometric progression Basis has constant shape independent of scale

Wavelets are faster than ffts!

· The results of the CWT are many wavelet coefficients, which are a function of scale and position

Gabor’s Proposal: Short Time Fourier Transform
Requirements: Signal in time domain: require short time window to depict features of signal. Signal in frequency domain: require short frequency window (long time window) to depict features of signal.

What are wavelets? Haar wavelet
· Wavelets are functions defined over a finite interval and having an average value of zero. Haar wavelet

What is wavelet transform?
· The wavelet transform is a tool for carving up functions, operators, or data into components of different frequency, allowing one to study each component separately. · The basic idea of the wavelet transform is to represent any arbitrary function ƒ(t) as a superposition of a set of such wavelets or basis functions. · These basis functions or baby wavelets are obtained from a single prototype wavelet called the mother wavelet, by dilations or contractions (scaling) and translations (shifts).

The continuous wavelet transform (CWT)
· Fourier Transform FT is the sum over all the time of signal f(t) multiplied by a complex exponential.

· Similarly, the Continuous Wavelet Transform (CWT) is defined as the sum over all time of the signal multiplied by scale , shifted version of the wavelet function : where * denotes complex conjugation. This equation shows how a function ƒ(t) is decomposed into a set of basis functions , called the wavelets. Z=r+iy, z*=r-iy The variables s and t are the new dimensions, scale and translation (position), after the wavelet transform.

· The wavelets are generated from a single basic wavelet , the so-called mother wavelet, by scaling and translation: s is the scale factor, t is the translation factor and the factor s-1/2 is for energy normalization across the different scales. · It is important to note that in the above transforms the wavelet basis functions are not specified. · This is a difference between the wavelet transform and the Fourier transform, or other transforms.

·Scale · Scaling a wavelet simply means stretching (or compressing) it.

·Scale and Frequency · Low scale a · High scale a ·Translation (shift)
Compressed wavelet Rapidly changing details High frequency · High scale a stretched wavelet slowly changing details low frequency ·Translation (shift) · Translating a wavelet simply means delaying (or hastening) its onset.

Haar wavelet

Discrete Wavelets · Discrete wavelet is written as j and k are integers and s0 > 1 is a fixed dilation step. The translation factor t0 depends on the dilation step. The effect of discretizing the wavelet is that the time-scale space is now sampled at discrete intervals. We usually choose s0 = 2 If j=m and k=n others

A band-pass filter · The wavelet has a band-pass like spectrum From Fourier theory we know that compression in time is equivalent to stretching the spectrum and shifting it upwards: Suppose a=2 This means that a time compression of the wavelet by a factor of 2 will stretch the frequency spectrum of the wavelet by a factor of 2 and also shift all frequency components up by a factor of 2.

Subband coding · If we regard the wavelet transform as a filter bank, then we can consider wavelet transforming a signal as passing the signal through this filter bank. · The outputs of the different filter stages are the wavelet- and scaling function transform coefficients. · In general we will refer to this kind of analysis as a multiresolution. · That is called subband coding.

· Splitting the signal spectrum with an iterated filter bank.
HP LP 4B f HP LP 4B 2B B f HP LP 4B 2B · Summarizing, if we implement the wavelet transform as an iterated filter bank, we do not have to specify the wavelets explicitly! This is a remarkable result.

The Discrete Wavelet Transform
· Calculating wavelet coefficients at every possible scale is a fair amount of work, and it generates an awful lot of data. What if we choose only a subset of scales and positions at which to make our calculations? · It turns out, rather remarkably, that if we choose scales and positions based on powers of two -- so-called dyadic scales and positions -- then our analysis will be much more efficient and just as accurate. We obtain just such an analysis from the discrete wavelet transform (DWT).

Approximations and Details
· The approximations are the high-scale, low-frequency components of the signal. The details are the low-scale, high-frequency components. The filtering process, at its most basic level, looks like this: · The original signal, S, passes through two complementary filters and emerges as two signals .

Downsampling · Unfortunately, if we actually perform this operation on a real digital signal, we wind up with twice as much data as we started with. Suppose, for instance, that the original signal S consists of 1000 samples of data. Then the approximation and the detail will each have 1000 samples, for a total of 2000. · To correct this problem, we introduce the notion of downsampling. This simply means throwing away every second data point.

An example:

Reconstructing Approximation and Details
Upsampling

Wavelet Decomposition
Multiple-Level Decomposition The decomposition process can be iterated, with successive approximations being decomposed in turn, so that one signal is broken down into many lower-resolution components. This is called the wavelet decomposition tree.

DWT · Scaling function (two-scale relation) · Wavelet
· The signal f(t) can be expresses as DWT

Wavelet Reconstruction (Synthesis)
Perfect reconstruction :

(4,0) y1 x1 (1,0)

original L H · 2-D Discrete Wavelet Transform
· A 2-D DWT can be done as follows: Step 1: Replace each row with its 1-D DWT; Step 2: Replace each column with its 1-D DWT; Step 3: repeat steps (1) and (2) on the lowest subband for the next scale Step 4: repeat steps (3) until as many scales as desired have been completed original L H LH HH HL LL One scale two scales

Image at different scales

Correlation between features at different scales

Wavelet construction – a simplified approach
Traditional approaches to wavelets have used a filterbank interpretation Fourier techniques required to get synthesis (reconstruction) filters from analysis filters Not easy to generalize

Wavelet construction – lifting
3 steps Split Predict (P step) Update (U step)

Example – the Haar wavelet
S step Splits the signal into odd and even samples even samples odd samples

P step Predict the odd samples from the even samples l For the Haar wavelet, the prediction for the odd sample is the previous even sample :

Detail signal : l l

U step Update the even samples to produce the next coarser scale approximation The signal average is maintained :

Summary of the Haar wavelet decomposition
Can be computed ‘in place’ : ….. ….. -1 -1 P step 1/2 1/2 U step

Inverse Haar wavelet transform
Simply run the forward Haar wavelet transform backwards! Then merge even and odd samples Merge

General lifting stage of wavelet decomposition
+ Split P U -

Multi-level wavelet decomposition
We can produce a multi-level decomposition by cascading lifting stages … lift lift lift

General lifting stage of inverse wavelet synthesis
- U P Merge +

Multi-level inverse wavelet synthesis
We can produce a multi-level inverse wavelet synthesis by cascading lifting stages lift lift lift …...

Advantages of the lifting implementation
Inverse transform Inverse transform is trivial – just run the code backwards No need for Fourier techniques Generality The design of the transform is performed without reference to particular forms for the predict and update operators Can even include non-linearities (for integer wavelets)

Example 2 – the linear spline wavelet
A more sophisticated wavelet – uses slightly more complex P and U operators Uses linear prediction to determine odd samples from even samples

The linear spline wavelet
P-step – linear prediction Linear prediction at odd samples Detail signal (prediction error at odd samples) Original signal

The prediction for the odd samples is based on the two even samples either side :

The U step – use current and previous detail signal sample

Preserves signal average and first-order moment (signal position) :

Can still implement ‘in place’ -1/2 P step -1/2 -1/2 -1/2 U step 1/4 1/4 1/4 1/4

Summary of linear spline wavelet decomposition
Computing the inverse is trivial : The even and odd samples are then merged as before

Wavelet decomposition applied to a 2D image
detail approx lift .

Wavelet decomposition applied to a 2D image
detail approx lift approx

Why is wavelet-based compression effective?
Allows for intra-scale prediction (like many other compression methods) – equivalently the wavelet transform is a decorrelating transform just like the DCT as used by JPEG Allows for inter-scale (coarse-fine scale) prediction

Original 1 level Haar 1 level linear spline 2 level Haar

Wavelet coefficient histogram

Coefficient entropies Entropy Original image 7.22 1-level Haar wavelet 5.96 1-level linear spline wavelet 5.53 2-level Haar wavelet 5.02 2-level linear spline wavelet 4.57

Wavelet coefficient dependencies X

Lets define sets S (small) and L (large) wavelet coefficients The following two probabilities describe interscale dependancies

Without interscale dependancies

Measured dependancies from Lena 0.886 0.529 0.781 0.219

Intra-scale dependencies X1 X X8

Measured dependancies from Lena 0.912 0.623 0.781 0.219

Have to use a causal neighbourhood for spatial prediction

Example image compression algorithms
We will look at 3 state of the art algorithms Set partitioning in hierarchical sets (SPIHT) Significance linked connected components analysis (SLCCA) Embedded block coding with optimal truncation (EBCOT) which is the basis of JPEG2000

The SPIHT algorithm Coefficients transmitted in partial order msb lsb
Coeff. number ……. msb 1 … x 5 4 3 2 1 lsb

The SPIHT algorithm 2 components to the algorithm Refinement pass
Sorting pass Sorting information is transmitted on the basis of the most significant bit-plane Refinement pass Bits in bit-planes lower than the most significant bit plane are transmitted

The SPIHT algorithm N= msb of (max(abs(wavelet coefficient)))
for (bit-plane-counter)=N downto 1 transmit significance/insignificance wrt bit-plane counter transmit refinement bits of all coefficients that are already significant

The SPIHT algorithm Insignificant coefficients (with respect to current bitplane counter) organised into zerotrees

The SPIHT algorithm Groups of coefficients made into zerotrees by set paritioning

The SPIHT algorithm SPIHT produces an embedded bitstream bitstream
… ……… ….

The SLCCA algorithm Wavelet transform Quantise coefficients
Cluster and transmit significance map Bit-plane encode significant coefficients

The SLCCA algorithm The significance map is grouped into clusters

The SLCCA algorithm Clusters grown out from a seed Seed
Significant coeff Insignificant coeff

The SLCCA algorithm Significance link symbol Significance link

Image compression results
Evaluation Mean squared error Human visual-based metrics Subjective evaluation

Mean-squared error Usually expressed as peak-signal-to-noise (in dB)

SPIHT 0.2 bits/pixel JPEG 0.2 bits/pixel

SPIHT JPEG

EBCOT, JPEG2000 JPEG2000, based on embedded block coding and optimal truncation is the state-of-the-art compression standard Wavelet-based It addresses the key issue of scalability SPIHT is distortion scalable as we have already seen JPEG2000 introduces both resolution and spatial scalability also An excellent reference to JPEG2000 and compression in general is “JPEG2000” by D.Taubman and M. Marcellin

EBCOT, JPEG2000 Resolution scalability is the ability to extract from the bitstream the sub-bands representing any resolution level bitstream … ……… ….

EBCOT, JPEG2000 Spatial scalability is the ability to extract from the bitstream the sub-bands representing specific regions in the image Very useful if we want to selectively decompress certain regions of massive images bitstream … ……… ….

Introduction to EBCOT JPEG2000 is able to implement this general scalability by implementing the EBCOT paradigm In EBCOT, the unit of compression is the codeblock which is a partition of a wavelet sub-band Typically, following the wavelet transform,each sub-band is partitioned into small blocks (typically 32x32)

Introduction to EBCOT Codeblocks – partitions of wavelet sub-bands

Introduction to EBCOT ……
A simple bit stream organisation could comprise concatenated code block bit streams …… Length of next code-block stream

Introduction to EBCOT This simple bit stream structure is resolution and spatially scalable but not distortion scalable Complete scalability is obtained by introducing quality layers Each code block bitstream is individually (optimally) truncated in each quality layer Loss of parent-child redundancy more than compensated by ability to individually optimise separate code block bitstreams

Introduction to EBCOT … … …
Each code block bit stream partitioned into a set of quality layers … … …

EBCOT advantages Multiple scalability Efficient compression
Distortion, spatial and resolution scalability Efficient compression This results from independent optimal truncation of each code block bit stream Local processing Independent processing of each code block allows for efficient parallel implementations as well as hardware implementations

EBCOT advantages Error resilience
Again this results from independent code block processing which limits the influence of errors

Performance comparison
A performance comparison with other wavelet-based coders is not straightforward as it would depend on the target bit rates which the bit streams were truncated for With SPIHT, we simply truncate the bit stream when the target bit rate has been reached However, we only have distortion scalability with SPIHT Even so, we still get favourable PSNR (dB) results when comparing EBCOT (JPEG200) with SPIHT

We can understand this more fully by looking at graphs of distortion (D) against rate (R) (bitstream length) D R-D curve for continuously modulated quantisation step size Truncation points R

Truncating the bit stream to some arbitrary rate will yield sub-optimal performance D R

Comparable PSNR (dB) results between EBCOT and SPIHT even though: Results for EBCOT are for 5 quality layers (5 optimal bit rates) Intermediate bit rates sub-optimal We have resolution, spatial, distortion scalability in EBCOT but only distortion scalability in SPIHT

Introduction to Wavelet Transform

Similar presentations

Presentation on theme: "Introduction to Wavelet Transform"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Wavelet Transform

Similar presentations

Presentation on theme: "Introduction to Wavelet Transform"— Presentation transcript:

Similar presentations

About project

Feedback