Presentation on theme: "So, what is ?Microdynamics? And some free file-based audio metrics. James. D. (jj) Johnston Bell Labs Audio Researcher (and other stuff)"— Presentation transcript:
So, what is ?Microdynamics? And some free file-based audio metrics. James. D. (jj) Johnston Bell Labs Audio Researcher (and other stuff)
?Microdynamics? Well, it’s not dynamic range It’s not RMS for sure Is it variation in RMS? – How do you decide what time interval to use? – How do you relate that to hearing? Is it variation in loudness? – This relates to hearing – It provides a time interval to consider
A bit of Review first Loudness is SENSATION LEVEL, it’s how loud you feel something is. – It can be reasonable well modelled for most uses – For time domain issues, it’s a bit more tricky RMS is an analytic measurement of what will become power when played in the real world. – It is trivial to calculate – What does it mean in this context? Not much.
From the last talk: A simple loudness model – It could be adapted to have proper time stride in regard to frequency – It would be a lot more complicated and about 32 times as slow – It would be more accurate For now, let’s use the one we have, and give it a try.
The difference between “dynamic range” and variation in loudness. It is possible to make many signals with exactly the same dynamic range – One can smoothly increase from the smallest to largest value – One can maximize the inter-block values, but keep exactly the same histogram and mean loudness – A hypothetical example
We will propose the following, 10 block system, with a uniform distribution, and loudness of blocks being 1, 2, 3, … 8, 9, 10. There can be many orders that create that histogram: – One order is 1, 2, 3, 4, 5, … – One could have as well … – In fact there are such orders one could observe, although many of them would be enormously unlikely in a real audio signal
For the sequence 1, 2, 3 … Loudness from 1 to 10 in order Difference, block to block
What do we see there? Our sequence of loudnesses, 1 to 10, in order Difference at each step is -1. RMS difference is 1. Not a big difference
And now for a different sequence
And here? There’s rather a lot more difference. In fact, the rms difference here is 5.67 The point? These two sequences have exctly the same mean and the same histogram. From the histogram and mean, you can not determine any “micro” kinds of characteristics
How to maximize the rms difference? Well, there are sequences I leave it to the reader to figure out which one (or ones, the time reverse will have the same RMS value) has the highest RMS. Good luck!
My point? You need to look at the time series of something or other in order to get any sense of more than overall dynamic range Hopefully, it’s obvious (given last year’s talk) why loudness is more useful than RMS values.
Ok, that was all hypothetical “block to block” differences. So, how long should the block be? – Well, this is hearing, so how long would make sense for the auditory system? – That is “interesting”. At low frequencies, 17 milliseconds makes sense (that’s about 735 samples at redbook rates), but don’t forget our window. That would suggest 1024 samples is a useful number.
Again, recalling the ERB structure of the ear, which is about ¼ octave at high frequencies, at 16kHz we’re talking about.33 milliseconds. Now, that’s not so helpful, is it? Could one make a loudness model that accommodates all of that? – Yes. But we’re not going to do that today. That would be a good subject for a 3 hour tutorial talk! – Here, let’s try the 1024 sample window, which has the necessary frequency resolution to have a chance of working at low frequencies. – Most of the energy (and loudness) in most signals is in low frequencies. – BUT percussion, which is very dynamic, has a broad spectrum.
The Loudness Model This is the same loudness model used in last year’s talk. No real changes. – It works on 1024 sample blocks, shifting by 512 samples per measurement. Yes, overlap is necessary. – This model is in the matlab file lplt_t.m, also on the PNW Section Web site. – You’re welcome to have fun with it. – “How to get Octave” is a question best asked of your nearest linux guru.
What come out of that program? There are 4 plots, each a measurement of some characteristic of the signal There is a string of numbers below that. The interpretation of each plot will follow. – There is a lot of information in that one little plot. – This is where the rubber meets the road, or doesn’t.
A word about these loudness numbers They are arbitrary units, in the range of 0 to 400. – The listener has a volume control – His system has a sensitivity (and one that may vary with frequency) – So, we stick with arbitrary units of loudness, let’s call them ALU, for Arbitrary Loudness Units
Top Plot: Histogram of Loudness This is a histogram, NOT a time-domain plot. – The vertical axis goes from 0 to 1 (more about those negative values in a minute) – The horizontal axis goes from 0 to 400, units ALU – The top of each bar on the vertical axis shows the fraction of blocks in the clip being analyzed with loudness in its bin, the center value being the value on the horizontal axis.
Those negative values? (there’s no such thing as negative loudness, zero means you can’t hear it) They are marking three points on the loudness scale, from left to right: – The value where 5% of the blocks measure smaller than that value. – The mean value – The value where 95% of the blocks measure smaller than that value. The mean value is also shown numerically in the text at the bottom of the plot. The ratio of 5% value to 95% value makes a decent estimate of dynamic range. To convert that to dB, raise that ratio to the 3.5 power and then convert to dB: (10 log10 (ratio^3.5)).
2 nd from the top This one is simpler, it is the plot of loudness as a function of time. – Loudness is the vertical axis, again ALU – Horizontal axis is the block number, where each block shift is 512 samples. – This shows how much loudness varies, in some sense, in a file.
The 3 rd Plot This is block to block normalized loudness difference, in ALU, of course. It shows attacks, decays, etc. This seems, to my ear, to maybe relate to “microdynamics” The mean absolute value is shown in the bottom label. These numbers seem small, but that’s because they are normalized. It is not a loudness difference, it is a relative loudness difference.
Should that be RMS? I don’t know. If you want a hardcore psychoacoustics research project, give it a go! It does seem to scale with my personal sense of “dynamic signals”. It varies from about.04 to about.12 for most signals, and.04 sounds squished to toothpaste, while.12 sounds almost excessively dynamic. Your mileage may vary. It should, perhaps, when talking about preference.
The Last Plot That’s a histogram of the actual PCM levels in the signal over the whole clip. This has very little to do with the psychoacoustic realm, but it does point out clipping in a jiffy. I’ll show you some plots that are and are not clipped momentarily.
The text at the bottom: It shows in order: – File name of the file analyzed – Mean Loudness – Mean block to block change (absolute value of change) – RMS level of the file – Peak level of the file – Peak to RMS level in units of amplitude (not dB) convert to dB by 20 log10(Peak to RMS) – Number of blocks analyzed – Length of analysis block (twice the shift length)
Ok, on to some plots. This is the same one as used in the example above:
What do we see there? The first thing is that the loudness histogram is fairly wide, indicating dynamic range. This is confirmed by the difference in the 5% and 95% values. The loudness histogram also has a “bulge” at lower loudness, and extends quite a ways to toward higher loudness, showing that the range is mostly “upwards”, i.e. peaks, not periods of quiet sound. The loudness vs. time plot shows the same thing in another way, which helps to make clear the reasons for the shape of the loudness distribution in the first plot.
The difference plot This shows that the differences are mostly large upward, followed by slower downward changes. This is typical of a physical process, and also of most audio signals. The upward peaks are close to 1, meaning that they are rapid increases in loudness. The string at the bottom gives a block to block change of.08. Without some observation of other signals, this is not yet clearly meaningful. (remember.08 is not necessarily a small average absolute value, we have no scaling here)
The Level Histogram This shows that there is, effectively, no clipping, and that the center half of the PCM levels are heavily used. Zero (or close) is expected to be the most common level, and it is. Outside of the very center, the roll off in frequency of PCM bin is more or less a straight line, making it close to log-normal. This is “as expected” from the basic mathematics.
And, a different signal Yes, a horse of another color!
The Loudness Plot It’s very, very loud. The peak in the distribution is quite narrow, meaning level does not change a lot. There is a secondary near-peak at lower loudness. This is shown to be “the quiet part” when looking at the time-domain plot, not some kind of unusual local (in time) dynamics. It has a much lower dynamic range. If we ignore the “quiet part” it has very little dynamic range at all. Its mean loudness is 235. Compare that to the previous clip, which is at 87. Yes, it’s loud. It’s supposed to be, of course.
The difference plot Here there are bursts of local dynamics, interspersed with sections of very little variation in the local dynamics. The mean average block to block change is.06. This is not that much smaller than the.08 above, which suggests that small changes may be important.
The level histogram This is a poster boy for clipped. Do I need to explain? If I do, shout out, and I’ll explain. Those “train tracks” at the sides are classic evidence of clipping. You may be surprised to find that kind of clipping not always at max and min. Don’t ask me!
And a smooth vocal track
What else to say? It’s very, very smooth. No peaks, no dips Every attack is the percussion machine behind the group Not loud at all. Very nice use of the CD’s full range with no clipping. Interblock mean absolute difference of.05. – Without the cymbal it would be even smaller.
And then there’s this:
Yeah…. It’s loud. It has very little internal dynamics It’s loud. Did I mention it’s loud? Note it doesn’t show extensive clipping like some of the other tracks, but here is a most unusual peak near negative maximum. Oh, and it’s loud. And squished flat.
And, One more for the road
What here? It’s relatively quiet, but has some strong peaks. Its average mean difference is high, at.09 There is something interesting in the level histogram. To explain: – There are many levels in 1 vertical pixel. – What’s plotted is the most common and least common in each bar. – Notice the many, many BOTTOMS to the level histogram? That’s kind of important.
That’s actually quite important What the level histogram shows in this case is that there are many “missing codes”, that is to say that many PCM codewords are unused. This means that the file was mistreated somewhere.