Presentation is loading. Please wait.

Presentation is loading. Please wait.

E.M. Bakker LML Audio Processing and Indexing 20101.

Similar presentations


Presentation on theme: "E.M. Bakker LML Audio Processing and Indexing 20101."— Presentation transcript:

1 E.M. Bakker LML Audio Processing and Indexing 20101

2 Overview The Physics of Sound The Production of Speech The Perception of Speech and Audio Frequency Masking Noise Masking Temporal Masking Phonetics and Phonology Analog and Digital Signals LML Audio Processing and Indexing 20102

3 The Physics of Sound Speed of Sound Air (at sea level, 20 C)343 m/s (1230 km/h)V=( T) m/s Water (at 20 C)1482 m/s (5335 km/h) Steel5960 m/s (21460 km/h) Tendon1650 m/s LML Audio Processing and Indexing Speed of Sound Proportional to sqrt(elastic modulus/density) Second order dependency on the amplitude of the sound => nonlinear propagation effects

4 The Production of Speech LML Audio Processing and Indexing Speech Recognition Words “How are you?” Speech Signal How is SPEECH produced?  Characteristics of Acoustic Signal How is SPEECH produced?  Characteristics of Acoustic Signal

5 Human Speech Production Physiology Schematic and X-ray Sagittal View Vocal Cords at Work Transduction Spectrogram Acoustics Acoustic Theory Wave Propagation LML Audio Processing and Indexing (Picture by Y.Mrabet from Wikipedia)

6 Sagittal Plane View of the Human Vocal Apparatus LML Audio Processing and Indexing 20106

7 Sagittal Plane View of the Human Vocal Apparatus LML Audio Processing and Indexing 20107

8 Characterization of English Phonemes LML Audio Processing and Indexing 20108

9 English Phonemes LML Audio Processing and Indexing Bet Debt Get Pin Sp i n Allophone p

10 The Vowel Space We can characterize a vowel sound by the locations of the first and second spectral resonances, known as formant frequencies. Some voiced sounds, such as diphthongs, are transitional sounds that move from one vowel location to another. LML Audio Processing and Indexing

11 Phonetics Formant Frequency Ranges LML Audio Processing and Indexing

12 Vocal Chords LML Audio Processing and Indexing

13 Models for Speech Production LML Audio Processing and Indexing

14 Models for Speech Production LML Audio Processing and Indexing

15 Vocal Tract Workshop Articulatory Speech Synthesis using TractSyn (P. Birkholz, 2004). Reading material: P. Birkholz, D. Jackel, A Three-Dimensional Model of the Vocal Tract for Speech Synthesis. In Proceedings of the 15 th International Congress of Phonetic Sciences, pp , Barcelona, Spain, See: LML Speech Recognition

16 The Perception of Speech LML Audio Processing and Indexing Speech Recognition Words “How are you?” Speech Signal How is SPEECH perceived

17 The Perception of Speech Sound Pressure The ear is the most sensitive human organ. Vibrations on the order of angstroms are used to transduce sound. It has the largest dynamic range (~140 dB) of any organ in the human body. The lower portion of the curve is an audiogram - hearing sensitivity. It can vary up to 20 dB across listeners. Above 120 dB corresponds to a nice pop-concert (or standing under a Boeing 747 when it takes off). Typical ambient office noise is about 55 dB. x dB = 10 log 10 (x/x 0 )), x 0 = 1kHz signal with intensity that is just hearable. LML Audio Processing and Indexing

18 LML Audio Processing and Indexing

19 The Perception of Speech:The Ear Three main sections, outer, middle, and inner ear. The outer and middle ears reproduce the analog signal (impedance matching). the inner ear transduces the pressure wave into an electrical signal. The outer ear consists of the external visible part and the auditory canal. The tube is about 2.5 cm long. The middle ear consists of the eardrum and three bones (malleus, incus, and stapes). It converts the sound pressure wave to displacement of the oval window (entrance to the inner ear). LML Audio Processing and Indexing

20 The Perception of Speech:The Ear The inner ear primarily consists of a fluid-filled tube (cochlea) which contains the basilar membrane. Fluid movement along the basilar membrane displaces hair cells, which generate electrical signals. There are a discrete number of hair cells (30,000). Each hair cell is tuned to a different frequency. Place vs. Temporal Theory: firings of hair cells are processed by two types of neurons (onset chopper units for temporal features and transient chopper units for spectral features). LML Audio Processing and Indexing

21 Perception Psychoacoustics Psychoacoustics: a branch of science dealing with hearing, the sensations produced by sounds. A basic distinction must be made between the perceptual attributes of a sound vs measurable physical quantities: Many physical quantities are perceived on a logarithmic scale (e.g. loudness). Our perception is often a nonlinear function of the absolute value of the physical quantity being measured (e.g. equal loudness). Timbre can be used to describe why musical instruments sound different. What factors contribute to speaker identity? LML Audio Processing and Indexing

22 Perception Equal Loudness Just Noticeable Difference (JND): The acoustic value at which 75% of responses judge stimuli to be different (limen) The perceptual loudness of a sound is specified via its relative intensity above the threshold. A sound's loudness is often defined in terms of how intense a reference 1 kHz tone must be heard to sound as loud. LML Audio Processing and Indexing dB

23 Perception, Non-Linear Frequency Warping: Bark and Mel Scale Critical Bandwidths: correspond to approximately 1.5 mm spacings along the basilar membrane, suggesting a set of 24 bandpass filters. Critical Band: can be related to a bandpass filter whose frequency response corresponds to the tuning curves of auditory neurons. A frequency range over which two sounds will sound like they are fusing into one. Bark Scale: Mel Scale: LML Audio Processing and Indexing

24 Perception Bark and Mel Scale The Bark scale implies a nonlinear frequency mapping LML Audio Processing and Indexing

25 Perception Bark and Mel Scale Filter Banks used in ASR: The Bark scale implies a nonlinear frequency mapping LML Audio Processing and Indexing

26 Comparison of Bark and Mel Space Scales LML Audio Processing and Indexing

27 Perception: Tone-Masking Noise Frequency masking: one sound cannot be perceived if another sound close in frequency has a high enough level. The first sound masks the second. Tone-masking noise: noise with energy EN (dB) at Bark frequency g masks a tone at Bark frequency b if the tone's energy is below the threshold: TT(b) = EN g + Sm(b-g) (dB SPL) where the spread-of-masking function Sm(b) is given by: Sm(b) = (b+0.474)-17.5* sqrt(1 + (b+0.474)2) (dB) Temporal Masking: onsets of sounds are masked in the time domain through a similar masking process. Thresholds are frequency and energy dependent. Thresholds depend on the nature of the sound as well. LML Audio Processing and Indexing

28 Perception: Noise-Masking Tone Noise-masking tone: a tone at Bark frequency g with energy ET (dB) masks noise at Bark frequency b if the noise energy is below the threshold: TN(b) = ET g + Sm(b-g) (dB SPL) Masking thresholds are commonly referred to as Bark scale functions of just noticeable differences (JND). Thresholds are not symmetric. Thresholds depend on the nature of the noise and the sound. LML Audio Processing and Indexing

29 Masking LML Audio Processing and Indexing

30 Perceptual Noise Weighting Noise-weighting: shaping the spectrum to hide noise introduced by imperfect analysis and modeling techniques (essential in speech coding). Humans are sensitive to noise introduced in low-energy areas of the spectrum. Humans tolerate more additive noise when it falls under high energy areas of the spectrum. The amount of noise tolerated is greater if it is spectrally shaped to match perception. We can simulate this phenomena using "bandwidth-broadening" LML Audio Processing and Indexing

31 Perceptual Noise Weighting Simple Z-Transform interpretation: can be implemented by evaluating the Z-Transform around a contour closer to the origin in the z-plane: Hnw(z) = H(az). Used in many speech compression systems (Code Excited Linear Prediction). Analysis performed on bandwidth-broadened speech; synthesis performed using normal speech. Effectively shapes noise to fall under the formants. LML Audio Processing and Indexing

32 Perception: Echo and Delay Humans are used to hearing their voice while they speak - real-time feedback (side tone). When we place headphones over our ears, which dampens this feedback, we tend to speak louder. Lombard Effect: Humans speak louder in the presence of ambient noise. When this side-tone is delayed, it interrupts our cognitive processes, and degrades our speech. This effect begins at delays of approximately 250 ms. Modern telephony systems have been designed to maintain delays lower than this value (long distance phone calls routed over satellites). Digital speech processing systems can introduce large amounts of delay due to non-real-time processing. LML Audio Processing and Indexing

33 Perception: Adaptation Adaptation refers to changing sensitivity in response to a continued stimulus, and is likely a feature of the mechano-electrical transformation in the cochlea. Neurons tuned to a frequency where energy is present do not change their firing rate drastically for the next sound. Additive broadband noise does not significantly change the firing rate for a neuron in the region of a formant. Visual Adaptation The McGurk Effect is an auditory illusion which results from combining a face pronouncing a certain syllable with the sound of a different syllable. The illusion is stronger for some combinations than for others. For example, an auditory 'ba' combined with a visual 'ga' is perceived by some percentage of people as 'da'. A larger proportion will perceive an auditory 'ma' with a visual 'ka' as 'na'. Some researchers have measured evoked electrical signals matching the "perceived" sound. LML Audio Processing and Indexing

34 Perception: Timing Temporal resolution of the ear is crucial. Two clicks are perceived mono-aurally as one unless they are separated by at least 2 ms. 17 ms of separation is required before we can reliably determine the order of the clicks. (~58bps or ~3530bpm) Sounds with onsets faster than 20 ms are perceived as "plucks" rather than "bows". Short sounds near the threshold of hearing must exceed a certain intensity-time product to be perceived. Humans do not perceive individual "phonemes" in fluent speech - they are simply too short. We somehow integrate the effect over intervals of approximately 100 ms. Humans are very sensitive to long-term periodicity (ultra low frequency) – this has implications for random noise generation. LML Audio Processing and Indexing

35 Analog and Digital Signals 1. From Analog to Digital Signal 2. Sampling & Aliasing LML Audio Processing and Indexing

36 Analog and Digital Signals LML Audio Processing and Indexing Continuous function continuous Continuous function F of a continuous variable t ( t can be time, space etc) : F(t) Analog Signals Discrete functiondiscrete Discrete function F k of a discrete (sampling) variable t k, with k an integer: F k = F(t k ). Digital Signals Uniform (periodic) sampling with sampling frequency f S = 1/ t S, e.g., t s = sec => f s = 1000Hz

37 Digital System Implementation Important issues: Analysis bandwidth, Dynamic range LML Audio Processing and Indexing Analog InputAntialiasing FilterA/D ConversionDigital ProcessingDigital Output Pass/stop bands Sampling rate, Number of bits, and further parameters Digital format

38 Sampling How fast must we sample a continuous signal to preserve its information content? Example: turning wheels in a movie 25 frames per second, i.e., 25 samples/sec. Train starts => wheels appear to go clockwise Train accelerates => wheels go counter clockwise Frequency misidentification due to low sampling frequency. Sampling: independent variable (for example time) continuous -> discrete Quantisation: dependent variable (for example voltage) continuous -> discrete. Here we’ll talk about uniform sampling. LML Audio Processing and Indexing

39 Sampling LML Audio Processing and Indexing __ s(t) = sin(2  f 0 t) f S f 0 = 1 Hz, f S = 3 Hz __ s 1 (t) = sin(8  f 0 t) __ s 2 (t) = sin(14  f 0 t) s k (t) = sin( 2  (f 0 + k f S ) t ),  k   N f S represents exactly all sine-waves s k (t) defined by:

40 The sampling theorem LML Audio Processing and Indexing A signal s(t) with maximum frequency f MAX can be recovered if sampled at frequency f S > 2 f MAX. Theorem Condition on f S ? f S > 300 Hz F 1 =25 Hz, F 2 = 150 Hz, F 3 = 50 Hz F1F1 F2F2 F3F3 f MAX Example * Multiple proposers: Whittaker(s), Nyquist, Shannon, Kotel’nikov. Nyquist frequency (rate) f N = 2 f MAX

41 Frequency Domain Time and Frequency are two complementary signal descriptions. Signal can be seen as projected onto the time domain or frequency domain. Bandwidth indicates the rate of change of a signal: High bandwidth => signal changes fast Passband Bandwidth => lower and upper cutoff frequencies LML Audio Processing and Indexing Ear Ear + brain act as frequency analyser: audio spectrum split into many narrow bands low-power sounds detected out of loud background. Example

42 Sampling Low-Pass Signals LML Audio Processing and Indexing (a) Band-limited signal: frequencies in [-B, B] (f MAX = B). (a)(c) aliasing ! (c) f S 2 B aliasing ! Aliasing: signal ambiguity in frequency domain (b) Time sampling frequency repetition. f S > 2 B no aliasing. (b) sk (t) = sin( 2  (f 0 + k f S ) t ),  k  N Note: s(t) at f S represents all sine-waves sk(t) defined by:

43 Antialiasing Filter LML Audio Processing and Indexing Filter it before! (a),(b) Out-of-band noise can aliase into band of interest. Filter it before! (a) (b) (c) Passband : depends on bandwidth of interest. Antialiasing filter (c) Antialiasing filter

44 Under-sampling LML Audio Processing and Indexing m  N, selected so that f S > 2B f C = 20 MHz, B = 5MHz Without under-sampling f S > 40 MHz. With under-sampling f S = 22.5 MHz (m=1); = 17.5 MHz (m=2); = MHz (m=3).Example Advantages  Slower ADCs / electronics needed.  Simpler antialiasing filters. Using spectral replications to reduce sampling frequency f S requirements.

45 Over-sampling LML Audio Processing and Indexing Oversampling : sampling at frequencies f S >> 2 f MAX. Over-sampling & averaging may improve ADC resolution f OS = over-sampling frequency, w = additional bits required. f OS = 4 w · f S Each additional bit implies/requires over-sampling by a factor of four.

46 (Some) ADC parameters LML Audio Processing and Indexing Number of bits N (~resolution) 2.Data throughput (~speed) 3.Signal-to-noise ratio (SNR) 4.Signal-to-noise-&-distortion rate (SINAD) 5.Effective Number of Bits (ENOB) 6.… NB: Definitions may be slightly manufacturer-dependent! Imaging / video Communication Radar systems Static distortion Different applications have different needs.

47 ADC - Number of bits N LML Audio Processing and Indexing Continuous input signal digitized into 2 N levels. Uniform, bipolar transfer function (N=3) Quantisation error Quantisation step Quantisation step q = V max 2 N Ex: V max = 1V, N = 12 q =  V Voltage ( = q) Scale factor (= 1 / 2 N ) Percentage (= 100 / 2 N )

48 ADC - Quantisation error LML Audio Processing and Indexing  Quantisation Error e q in [-0.5 q, +0.5 q].  e q limits ability to resolve small signal.  Higher resolution means lower e q. QE for N = 12 V FS = 1

49 SNR of ideal ADC LML Audio Processing and Indexing (1) Also called SQNR (signal-to-quantisation-noise ratio) (RMS = root mean square) e q  Ideal ADC: only quantisation error e q p(e) (p(e) constant, no stuck bits…) e q  e q uncorrelated with signal.  ADC performance constant in time.Assumptions Input(t) = ½ V FSR sin(  t). (sampling frequency f S = 2 f MAX )

50 SNR of ideal ADC LML Audio Processing and Indexing (2)Substituting in (1) => One additional bit SNR increased by 6 dB - Real signals have noise. - Forcing input to full scale unwise. - Real ADCs have additional noise (aperture jitter, non-linearities etc). Real SNR lower because: Actually (2) needs correction factor depending on ratio between sampling freq & Nyquist freq. Processing gain due to oversampling.

51 ADC selection dilemma LML Audio Processing and Indexing Speed & resolution: a tradeoff. a tradeoff. Effective Number of Bits (ENOB) High resolution (bit #) - Higher cost & dissipation. - Tailored onto DSP word width. High speed - Large amount of data to store/analyse. - Lower accuracy & input impedance. * * DIFFICULT * DIFFICULT area moves down & right every year. Rule of thumb: 1 bit improvement every 3 years. may increase SNR. 2 Oversampling & averaging Oversampling & averaging (see ). Dithering Dithering ( = adding small random noise before quantisation). Sound cards: x

52 Finite word-length effects LML Audio Processing and Indexing Dynamic range dB Dynamic range dB = 20 log 10 largest value smallest value Fixed point ~ 180 dB Floating point ~1500 dB Overflow Overflow : arises when arithmetic operation result has one too many bits to be represented in a certain format. High dynamic range => wide data set representation with no overflow. Note: Different applications have different needs. For example: Telecommunication: 50 dB HiFi audio: 90 dB.

53 Erwin M. Bakker

54 Overview Fourier Transforms DFT Topics Material slightly adapted from lectures by Dr M.E. Angoletta at DISP2003, a DSP course given by CERN and University of Lausanne (UNIL) external/training/tech/special/DISP2003.asp

55 Fourier Transforms Frequency analysis: a powerful tool A tour of Fourier Transforms Continuous Fourier Series (FS) Discrete Fourier Series (DFS)

56 Frequency Analysis  Fast & efficient insight on signal’s building blocks.  Simplifies original problem - ex.: solving Part. Diff. Eqns. (PDE).  Powerful & complementary to time domain analysis techniques.  Several transforms: Fourier, Discrete Cosine, Laplace, z, Wavelet, etc. time, t frequency, f F s(t)S(f) = F [s(t)] analysis synthesis s(t), S(f) : Transform Pair General Transform as problem-solving tool

57 Bases of Vector Spaces v = xe 1 +ye 2 +ze 3 e1e1 e2e2 e3e3 v is a linear combination of the basis vectors e i (i = 1,2,3) e it e 2it e 3it f(t) = k 1 e it + k 2 e 2it +k 3 e 3it f is a linear combination of the basis functions e it,e 2it,e 3it

58 Fourier Coefficients v w cwcw v- c w Let an in-product for our vector space V. Then we calculate the Fourier coefficient of v in V with respect to (basis) vector w by: c w is the component of v along the direction of w.

59 Fourier Analysis - Tools Input Time Signal Frequency spectrum Discrete DFS Periodic (period T) Continuous DTFT Aperiodic Discrete DFT ** ** Calculated using FFT Periodic (period T) Discrete ContinuousFT Aperiodic FS Continuous Note: j =  -1,  = 2  /T, s[n]=s(t n ), N = No. of samples

60 History Fourier Transform  1669 corpuscular  1669 : Newton stumbles upon light spectra (specter = ghost) but fails to recognise “frequency” concept (corpuscular theory of light, & no waves).  18 th century two outstanding problems  18 th century : two outstanding problems  celestial bodies orbits: Lagrange, Euler & Clairaut approximate observation data with linear combination of periodic functions; Clairaut,1754(!) first DFT formula. BUT only  vibrating strings: Euler describes vibrating string motion by sinusoids (wave equation). BUT peers’ consensus is that sum of sinusoids only represents smooth curves. Big blow to utility of such sums for all but Fourier...  1807 Fourier presents his work on heat conduction  Fourier analysis born.  1807 : Fourier presents his work on heat conduction  Fourier analysis born.  Diffusion equation  series (infinite) of sines & cosines. Strong criticism by peers blocks publication. Work published, 1822 (“Theorie Analytique de la chaleur”).

61 History Fourier Transform -2  19 th / 20 th century two paths for Fourier analysis - Continuous & Discrete.  19 th / 20 th century : two paths for Fourier analysis - Continuous & Discrete. CONTINUOUS  Fourier extends the analysis to arbitrary function (Fourier Transform).  Dirichlet, Poisson, Riemann, Lebesgue address FS convergence.  Other FT variants born from varied needs (ex.: Short Time FT - speech analysis). DISCRETE: Fast calculation methods (FFT) 1805  Gauss, first usage of FFT (manuscript in Latin went unnoticed!!! Published 1866)  IBM’s Cooley & Tukey “rediscover” FFT algorithm (“An algorithm for the machine calculation of complex Fourier series”).  Other DFT variants for different applications (ex.: Warped DFT - filter design & signal compression).  FFT algorithm refined & modified for most computer platforms.

62 Another Space, Another Base F(t) = sin(2π.t)

63 Another Space, Another Base F(t) = sin(2π.2t)F(t) = sin(2π.t)F(t) = sin(2π.3t) F(t) = cos(2π.t) F(t) = cos(2π.2t)F(t) = cos(2π.3t) { cos(2π.kt), sin(2 π.kt) } k forms a orthogonal basis

64 Linear Combination of Functions F(t) = sin(2π.2t) F(t) = sin(2π.t) F(t) = sin(2π.t) + sin(2π.2t)

65 Linear Combination of Functions F(t) = sin(2π.2t) F(t) = sin(2π.3t) F(t) = sin(2π.t) + sin(2π.2t) + sin(2π.3t) F(t) = sin(2π.t) + sin(2π.2t)

66 Linear Combination of Functions F(t) = sin(2π.t) + sin(2π.2t) + sin(2π.30t) F(t) = sin(2π.t) + sin(2π.2t)F(t) = sin(2π.t) + sin(2π.2t) + sin(2π.3t) F(t) = sin(2π.t) + sin(2π.2t) + sin(2π.4t)

67 Linear Combination of Functions F(t) = sin(2π.t) F(t) = cos(2π.t) – 2.sin(2π.2t) Phase Shift:

68 Low Band Pass Filters F(t) = sin(2π.t) + sin(2π.2t) F(t) = sin(2π.t) + sin(2π.2t) + <- low freq. sin(2π.30t) + sin(2π.120t) <- high freq.

69 Fourier Transforms The values h of some process can be described in the time domain: h(t) values as a function of time t (-∞< t <∞) The same process can be described as amplitudes and phases (complex values) H in the frequency domain: H(f) values as a function of frequency f (-∞< f <∞) One can transform the representation in the time domain to the representation in the frequency domain by using the Fourier Transform equation: And back, using the FT-equation:

70 Fourier Series (FS) * see next slide A periodic function s(t) satisfying Dirichlet’s conditions * can be expressed as a Fourier series, with harmonically related sine/cosine terms. a 0, a k, b k : Fourier coefficients. k : harmonic number, T : period,  = 2  /T For all t but discontinuities Note: {cos(kωt), sin(kωt) } k form orthogonal base of function space. (signal average over a period, i.e. DC term & zero-frequency component.) analysis synthesis

71 Fourier Series Convergence s(t) piecewise-continuous; s(t) piecewise-monotonic; s(t) absolutely integrable, (a) (b) (c) Dirichlet conditions In any period: Example: square wave (a)(b)(c) if s(t) discontinuous then |a k | 0) Rate of convergence

72 Fourier Series Analysis - 1 * Even & Odd functions Odd : s(-x) = -s(x) Even : s(-x) = s(x) FS of odd* function: square wave. (zero average) (odd function)

73 Fourier Series Analysis - 2 f k =k  /2  r K = amplitude,  K = phase v k = r k cos (  k t +  k ) Polar Fourier spectrum representations Rectangular v k = a k cos(  k t) - b k sin(  k t) Fourier spectrum of square-wave.

74 Fourier Series Synthesis Square wave reconstruction from spectral terms Convergence may be slow (~1/k) - ideally need infinite terms. Practically, series truncated when remainder below computer tolerance error BUT (  error). BUT … Gibbs’ Phenomenon.

75 Gibbs Phenomenon Overshoot exist at each discontinuity Max overshoot pk-to-pk = 8.95% of discontinuity magnitude. Just a minor annoyance. in this caseFS converges to (-1+1)/2 = 0 at discontinuities, in this case. First observed by Michelson, Explained by Gibbs.

76 Fourier Series Time Shifting FS of even function:  /2-advanced square-wave  /2-advanced square-wave (even function) (zero average)phase amplitude BUT Note:amplitudes unchanged BUT phases advance by k  /2.

77 Complex Fourier Series Complex form of FS (Laplace 1782). Harmonics c k separated by  f = 1/T on frequency plot. Euler’s notation: e -jt = (e jt )* = cos(t) - j·sin(t)“phasor” Note Note: c -k = (c k )* Link to FS real coeffs. analysis synthesis

78 Fourier Series Properties Time Frequency Homogeneity a·s(t) a·S(k) Additivity s(t) + u(t) S(k)+U(k) Linearity a·s(t) + b·u(t) a·S(k)+b·U(k) Time reversal s(-t) S(-k) Multiplication * s(t)·u(t) Convolution * S(k)·U(k) Time shifting Frequency shifting S(k - m)

79 Fourier Series – ‘Oddities’ k = - , … -2,-1,0,1,2, …+ , ω k = kω,  k = ω k t, phasor turns anti-clockwise. Negative k  phasor turns clockwise (negative phase  k ), equivalent to negative time t,  time reversal. Negative frequencies & time reversal Fourier components { u k } form orthonormal base of signal space: u k = (1/  T) exp(jkωt) (|k| = 0,1 2, …+  ) = δ k,m (1 if k = m, 0 otherwise).(Remember (e jt )* = e -jt ) projection Then c k = (1/  T) i.e. (1/  T) times projection of signal s(t) on component u k Orthonormal base

80 W k = 2 W 0 sync 2 (k  ) W 0 = (  s MAX ) 2 Fourier Series - Power FS convergence ~1/kFS convergence ~1/k   lower frequency terms W k = |c k | 2 carry most power. W k vs. ω k : Power density spectrumW k vs. ω k : Power density spectrum. Example sync(u) = sin(  u)/(  u) Pulse train, duty cycle  = 2  / T b k = 0 a 0 =  s MAX a k = 2  s MAX sync(k  ) Average power W Average power W : Parseval’s Theorem

81 Fourier Series of Some Waveforms

82 Discrete Fourier Series (DFS) N consecutive samples of s[n] completely describe s in time or frequency domains. DFS generate periodic c k with same signal period Synthesis: finite sum  band-limited s[n] Band-limited signal s[n], period = N. Kronecker’s delta Orthogonality in DFS:synthesis Note: Note: c k+N = c k  same period N i.e. time periodicity propagates to frequencies! DFS defined as: ~~ analysis

83 Discrete Fourier Series Analysis DFS of periodic discrete 1-Volt square-wave amplitude phase Discrete signals  periodic frequency spectra. Compare to continuous rectangular function (slide # 20) NL/N s[n]: period N, duty factor L/N

84 Discrete Fourier Series Properties Time Frequency Homogeneity a·s[n] a·S(k) Additivity s[n] + u[n] S(k)+U(k) Linearity a·s[n] + b·u[n] a·S(k)+b·U(k) Multiplication s[n] ·u[n] Convolution S(k)·U(k) Time shifting s[n - m] Frequency shifting S(k - h)

85 TOPICS DFT windows DFT resolution - improvement Efficient DFT calculation: FFT Hints on system spectral analysis

86 DFT – Window Characteristics Finite discrete sequence  spectrum convoluted with rectangular window spectrum. Leakage amount depends on chosen window & on how signal fits into the window. Wish: as high as possible. Resolution : capability to distinguish different tones. Inversely proportional to main- lobe width. Wish: as high as possible. (1) Several windows used (application- dependent): Hamming, Hanning, Blackman, Kaiser... Rectangular window Peak-sidelobe level : maximum response outside the main lobe. Determines if small signals are hidden by nearby stronger ones. Wish: as low as possible. (2) Sidelobe roll-off : sidelobe decay per decade. Trade-off with (2). (3)

87 Sampled sequence In time it reduces end- points discontinuities. Non windowed Windowed DFT of Main Windows Windowing reduces leakage by minimising sidelobes magnitude. Some window functions

88 DFT - Window Choice Common windows characteristics NB: Strong DC component can shadow nearby small signals. Remove it!  Far & strong interfering components  high roll-off rate.  Near & strong interfering components  small max sidelobe level.  Accuracy measure of single tone  wide main-lobe Observed signal Window wish list

89 DFT - Window Loss Remedial Smooth data-tapering windows cause information loss near edges. Attenuated inputs get next window’s full gain & leakage reduced. Usually 50% or 75% overlap (depends on main lobe width). Drawback: increased total processing time. Solution: sliding (overlapping) DFTs.

90 Zero Padding Improves DFT frequency inter-sampling spacing (“resolution”). After padding frequencies N S = original samples, L = padded.

91 Zero Padding -2 Additional reason for zero-padding: to reach a power-of-two input samples number (see FFT). Zero-padding in frequency domain increases sampling rate in time domain. Note: it works only if sampling theorem satisfied! DFT spectral resolution not improved Capability to distinguish two closely- spaced frequencies: not improved by zero-padding!. increased Frequency inter-sampling spacing: increased by zero-padding (DFT “frequency span” unchanged due to same sampling frequency) after Apply zero-padding after windowing (if any)! Otherwise stuffed zeros will partially distort window function. NOTE

92 SL bin, k DFT - Scalloping Loss (SL) Worst case when f 0 falls exactly midway between 2 successive bins (|r|=½) Note: SL less severe Note: Non-rectangular windows broaden DFT main lobe  SL less severe Correction depends on window used. f0f0 Frequency error:  f = r f S /N, relative error:  R =  f / f 0 = r/[(k max +r)]  R  1/(1+2 k max ) k max bin, k We’re lucky here! May impact on data interpretation (wrong f 0 !) Input frequency f 0 btwn. bin centres causes magnitude loss SL = 20 Log 10 (|c r+k max /c k max |)~~ |r|  ½ f 0 = (k max + r) f S /N

93 DFT - SL Example DC bias correction, Rectang. window, zero padding, FFT DC bias correction, Hanning window, zero padding, FFT increasing N (?) improve windowing, zero-padding, interpolation around k max. SL remedial

94 DFT - Parabolic Interpolation  Parabolic interpolation often enough to find position of peak (i.e. frequency).  Other algorithms available depending on data. Rectangular window Hanning window

95 Efficient DFT Calculation: FFT W N n,k = twiddle factors k = 0,1.. N-1 DFT Direct DFT calculation redundancy W N kn periodic function calculated many times. Algorithms (= Fast Fourier Transform) developed to compute N-points DFT with ~ Nlog 2 N multiplications (complexity O(Nlog 2 N) ). Direct DFT calculation requires ~N 2 complex multiplications. complexity O(N 2 ) Impractical ! Impractical !

96 FFT Advantages DFT  N 2 FFT  N log 2 N NDFTRadix DSPs & PLDs influenced algorithms design. ‘60s & ‘70s: multiplication counts was “quality factor”. Now: number of additions & memory access (s/w) and communication costs (h/w) also important.

97 FFT Philosophy divide & conquer General philosophy (to be applied recursively): divide & conquer. sub-problemsmappingoriginal problem  cost( sub-problems ) + cost( mapping ) < cost( original problem ) Different algorithms balance costs differently. Example: Decimation-in-time algorithm time frequency Step 3: Frequency-domain synthesis. N spectra synthesised into one. Step 2: 1-point input spectra calculation. (Nothing to do!) Step 1: Time-domain decomposition. N-points signal  N, 1-point signals (interlace decomposition). Shuffled input data (bit- reversal). log 2 N stages. (*) (*) : only first decomposition shown. *(*)*(*)

98 FFT Family Tree Divide & conquer N : GCD(N 1,N 2 ) <> 1 Ex: N = 2 n MAPPING Cost: MAPPING. Twiddle-factors calculations. Easier sub-problems. Some algorithms: Cooley-Tukey, Decimation-in-time / in-frequency Radix-2, Radix-4, Split radix. N : GCD ( * ) (N 1,N 2 ) = 1 N 1, N 2 co-prime. Ex: 240 = 16·3·5 SUB-PROBLEMS Cost: SUB-PROBLEMS. No twiddle-factors calculations. Easier mapping (permutations). Some algorithms: Good-Thomas, Kolba, Parks, Winograd. ( * ) GCD= Greatest Common Divisor

99 References 1.On bandwidth, David Slepian, IEEE Proceedings, Vol. 64, No 3, pp The Shannon sampling theorem - Its various extensions and applications: a tutorial review, A. J. Jerri, IEEE Proceedings, Vol. 65, no 11, pp 1565 – What every computer scientist should know about floating-point arithmetic, David Goldberg. 4.IEEE Standard for radix-independent floating-point arithmetic, ANSI/IEEE Std Papers 1.Understanding digital signal processing, R. G. Lyons, Addison-Wesley Publishing, The scientist and engineer’s guide to digital signal processing, S. W. Smith, at 3.Discrete-time signal processing, A. V. Oppeheim & R. W. Schafer, Prentice Hall, Books

100 References Tom, Dick and Mary discover the DFT, J. R. Deller Jr, IEEE Signal Processing Magazine, pg , April On the use of windows for harmonic analysis with the Discrete Fourier Transform, F. J. Harris, IEEE Proceedings, Vol. 66, No 1, January Some windows with a very good sidelobe behaviour, A. H. Nuttall, IEEE Trans. on acoustics, speech and signal processing, Vol ASSP-29, no. 1, February Some novel windows and a concise tutorial comparison of windows families, N. C. Geckinli, D. Yavuz, IEEE Trans. on acoustics, speech and signal processing, Vol ASSP-26, no. 6, December Study of the accuracy and computation time requirements of a FFT-based measurement of the frequency, amplitude and phase of betatron oscillations in LEP, H.J. Schmickler, LEP/BI/Note Causes et corrections des erreurs dans la mesure des caracteristiques des oscillations betatroniques obtenues a partir d’une transformation de Fourier, E. Asseo, CERN PS 85-9 (LEA). Papers

101 References The Fourier Transform and its applications, R. N. Bracewell, McGraw-Hill, A History of scientific computing, edited by S. G. Nash, ACM Press, Introduction to Fourier analysis, N. Morrison, John Wiley & Sons, The DFT: An owner’s manual for the Discrete Fourier Transform, W. L. Briggs, SIAM, The FFT: Fundamentals and concepts, R. W. Ramirez, Prentice Hall, Books 7.Precise measurements of the betatron tune, R. Bartolini et al., Particle Accel., 1996, vol. 55, pp How the FFT gained acceptance, J. W. Cooley, IEEE Signal Processing Magazine, January A comparative analysis of FFT algorithms, A. Ganapathiraju et al., IEEE Trans.on Signal Processing, December 1997.


Download ppt "E.M. Bakker LML Audio Processing and Indexing 20101."

Similar presentations


Ads by Google