2 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous= it has an amplitude value at all points in time.and there are an infinite number of possible air pressure values.analog clockBack in the bad old days, acoustic phonetics was strictly an analog endeavor.
3 Analog and DigitalIn the good new days, we can represent sound digitally in a computer. In a computer, sounds must be discrete.everything = 1 or 0digital clockComputers represent sounds as sequences of discrete pressure values at separate points in time.Finite number of pressure values.Finite number of points in time.
4 Analog-to-Digital Conversion Recording sounds onto a computer requires an analog-to-digital conversion (A-to-D)When computers record sound, they need to digitize analog readings in two dimensions:X: Time (this is called sampling)Y: Amplitude (this is called quantization)quantizationsampling
5 Thanks to Chilin Shih for making these materials available. Sampling Example
7 Sampling Rate Sampling rate = frequency at which samples are taken. What’s a good sampling rate for speech?Typical options include:22050 Hz, Hz, Hzsometimes even Hz and HzHigher sampling rate preserves sound quality.Lower sampling rate saves disk space.(which is no longer much of an issue)Young, healthy human ears are sensitive to sounds from 20 Hz to 20,000 Hz
8 One Consideration The Nyquist Frequency = highest frequency component that can be captured with a given sampling rate= one-half the sampling rateHarry Nyquist ( )Problematic Example:100 Hz sound100 Hz sampling ratesamples
9 Nyquist’s Implication An adequate sampling rate has to be…at least twice as much as any frequency components in the signal that you’d like to capture.100 Hz sound200 Hz sampling ratesamples
10 Sampling Rate Demo 44100 Hz 22050 Hz 11025 Hz (watch out for [s]) Speech should be sampled at at least Hz(although there is little frequency information in speech above 10,000 Hz)44100 Hz22050 Hz11025 Hz (watch out for [s])8000 Hz5000 Hz
11 Another ProblemWhen the continuous sound signal completes more than one cycle in between samples, a phenomenon called aliasing occurs.The digital signal then contains a low frequency component which is not in the analog signal.
12 The Aliasing Solution: Filtering Whenever sound is digitized, frequencies above the Nyquist frequency need to be filtered out of the end product.E.g., CDs digitize at a Hz sampling rate…And filter out any components over Hz.“Low-pass filters”allow low frequencies to pass through the filter.and remove high frequencies from the signal.Cf. “high-pass” filters:allow high frequencies to pass through filter.
13 Low-Pass Filter in Action Power spectrum of 100 Hz Hz combo:Filter passes 100 Hz component, but not 1000 Hz component.
14 Digital Dimension #2: Quantization Each sample that is taken has a range of pressure valuesThis range is determined by the number of bits allotted to each sampleRemember: in computers, numbers are stored in binary format (sequences of ones and zeroes).Ex: 89 = in 8-bit encodingTypical sample sizes:8 bits values12 bits 212 4,096 values16 bits ,536 values
15 Samples Go Small Sample size here = 2 bits = 22 = 4 values We lose information when the sample size is too small, given the same sampling rate.Sample size here = 2 bits = 22 = 4 values
18 Sample Size Demo 11k 16 bits 11k 8 bits 8k 16 bits 8k 8bits (telephone)Note: CDs sample at 44,100 Hz and have 16-bit quantization.Also check out bad and actedout examples in Praat.
19 Quantization RangeWith 16-bit quantization, we can encode 65,536 different possible amplitude values.Remember that I(dB) = 10 * log10 (A2/r2)Substitute the max and min amplitude values for A and r, respectively, and we get:I(dB) = 10 * log10 (655362/12) = 96.3 dBSome newer machines have 24-bit quantization--= 16,777,216 possible amplitude values.I(dB) = 10 * log10 ( /12) = dBThis is bigger than the range of sounds we can listen to without damaging our hearing.
20 Problem: ClippingClipping occurs when the pressure in the analog signal exceeds the quantization range in digitizationCheck out sylvester and normal in Praat.
21 A Note on Formats Digitized sound files come in different formats… .wav, .aiff, .au, etc.Lossless formats digitize sound in the way I’ve just described.They only differ in terms of “header” information and specified limits on file size, etc.Lossy formats use algorithms to condense the size of sound files…and the sound file loses information in the process.For instance: the .mp3 format primarily saves space by eliminating some very high frequency information.(which is hard for people to hear)
22 AIFF vs. MP3 .aiff format .mp3 format (digitized at 128 kB/s) This trick can work pretty well…
23 MP3 vs. MP3 .mp3 format (digitized at 128 kB/s) .mp3 format .mp3 conversion can induce reverb artifacts, and also cut down on temporal resolution (among other things).
24 Sound Digitization Summary Samples are taken of an analog sound’s pressure value at a recurring sampling rate.This digitizes the time dimension in a waveform.The sampling frequency needs to be twice as high as any frequency components you want to capture in the signal.E.g., Hz for speechQuantization converts the amplitude value of each sample into a binary number in the computer.This digitizes the amplitude dimension in a waveform.Rounding off errors can lead to quantization noise.Excessive amplitude can lead to clipping errors.
25 The Digitization of Pitch Praat can give us a representation of speech that looks like:The blue line represents the fundamental frequency (F0) of the speaker’s voice.Also known as a pitch trackHow can we automatically “track” F0 in a sample of speech?
26 Pitch Tracking Voicing: Air flow through vocal folds Rapid opening and closing due to Bernoulli EffectEach cycle sends an acoustic shockwave through the vocal tract…which takes the form of a complex wave.The rate at which the vocal folds open and close becomes the fundamental frequency (F0) of a voiced sound.
29 Voicing = Complex Wave Note: voicing is not perfectly periodic. …always some random variation from one cycle to the next.How can we measure the fundamental frequency of a complex wave?
30 duration = ???The basic idea: figure out the period between successive cycles of the complex wave.Fundamental frequency = 1 / period
31 Measuring F0 To figure out where one cycle ends and the next begins… The basic idea is to find how well successive “chunks” of a waveform match up with each other.One period = the length of the chunk that matches up best with the next chunk.Automatic Pitch Tracking parameters to think about:Window size (i.e., chunk size)Step sizeFrequency range (= period range)
32 Here’s an example of a small window Window (Chunk) SizeHere’s an example of a small window
33 Here’s an example of a large(r) window Window (Chunk) SizeHere’s an example of a large(r) window
34 Initial window of the waveform is compared to another window (of the same duration) at a later point in the waveform
35 Matching???The waveforms in the two windows are compared to see how well they match up.Correlation = measure of how well the two windows match
36 Autocorrelation The measure of correlation = Sum of the point-by-point products of the two chunks.The technical name for this is autocorrelation…because two parts of the same wave are being matched up against each other.(“auto” = self)
37 Autocorrelation Example Ex: consider window x, with n samples…What’s its correlation with window y?(Note: window y must also have n samples)x1 = first sample of window xx2 = second sample of window x…xn = nth (final) sample of window xy1 = first sample of window y, etc.Correlation (R) = x1*y1 + x2* y2 + … + xn* ynThe larger R is, the better the correlation.
38 By the Numbers Sample 1 2 3 4 5 6 x .8 .3 -.2 -.5 .4 .8 productSum of products = -.48These two chunks are poorly correlated with each other.
39 By the Numbers, part 2 Sample 1 2 3 4 5 6 x .8 .3 -.2 -.5 .4 .8 zproductSum of products = 1.26These two chunks are well correlated with each other.(or at least better than the previous pair)Note: matching peaks count for more than matches close to 0.
40 Back to (Digital) Reality ???These two windows are poorly correlatedThe waveforms in the two windows are compared to see how well they match up.Correlation = measure of how well the two windows match
41 Next: the pitch tracking algorithm moves further down the waveform and grabs a new window
42 “step”The distance the algorithm moves forward in the waveform is called the step size
43 Matching, again???The next window gets compared to the original.
44 Matching, again ??? These two windows are also poorly correlated The next window gets compared to the original.
45 another “step”The algorithm keeps chugging and, eventually…
46 Matching, again ??? These two windows are highly correlated The best match is found.
47 periodThe fundamental period can be determined by calculating the length of time between the start of window 1 and the start of (well correlated) window 2.
48 Mopping up period Frequency is 1 / period Q: How many possible periods does the algorithm need to check?Frequency range (default in Praat: 75 to 600 Hz)