Audio, Radio, Acoustics, Signal Processing? James D. Johnston Retired Audio Geek.

Slides:



Advertisements
Similar presentations
The people Look for some people. Write it down. By the water
Advertisements

SIMS-201 The Telephone System Wired and Wireless.
My list of 10 worst mistakes in Audio (some of which are pet peeves) J. D. (jj) Johnston Neural Audio Kirkland, Wa, USA.
Word List A.
Collaborating By: Mandi Schumacher.
A.
Dolch Words.
So You Think You Can Argue? RELA 8 Howelll/Larson All About Writing Persuasively.
Farewell Argument Paper Weeds & Roses. Why this weeds and roses is extremely important:  You are likely to have concerns, comments, questions, gripes.
Why do we hear what we hear? James D. Johnston Chief Scientist DTS, Inc.
A primer by J. D. Johnston Microsoft Corporation
Book p.44 “let’s read this information report to find out more about the library,’’ Mrs. Chan said to her class. Book p.44.
Michael Shurtleff.  A: Hey mom. What is going on?  B: Oh… I was just wondering why this garage was still a mess.  A. I meant to clean it before practice.
Sunny’s First Day of School And A Hard Lesson Learned! Written and Illustrated by: Allison Griffies Narrated by: Marie Griffies.
Present Perfect / Present Perfect Continuous
Chapter 1 My Dad’s Home I don’t remember this place, I thought. It isn’t home. Not my home. My home is far away, in New Zealand. With Mum. This is a.
Third Grade Curriculum. Hi, I’m Max. I’m here to talk about BULLYING. Do you know what Bullying is?
Adam Diel.  In 1981 IBM PC 150 introduced the first PC Speaker.  Each game had to write support for it (sound cards were impractical during this time)
What is narrative interviewing?
Order of Operations And Real Number Operations
Hello, Pig! Hello, Rabbit! Look at this – I am making a list!
Mental Health Week Introduction W e are here today to help you understand more about what gets you down and hopefully find a few ways to help. This.
Wish upon a Star Ross Shire Women’s Aid 2010.
Five Ways to Sabotage Your Business By Nancy Friedman, Telephone Doctor.
The Learning Brain: Growth Mindset and Effective Effort in the Classroom Jared Peet – History Department
I need volunteers who can read nice and loud for us. Each volunteer will read a different slide. These slides will explain what we’re going to do today.
Dani Wilkinson View on -advert-review -advert-review
The hallway at Butler County Middle School was empty because the bell to be in class had just rung, but the one person Jessie didn’t want to see, especially.
Why do we hear what we hear? James D. Johnston Chief Scientist, DTS, Inc.
Antonia Lannie, PhD student,
MOM AND DAD I’M GOING TO COLLEGE. A PRESENTATION TO THE STUDENTS AND PARENTS OF PATRICK HENRY SCHOOL PRESENTED BY GEORGE DOOLEY COUNSELOR, SCHOOL OF CONTINUING.
I Want to Change for the BETTER! Mahragan el Keraza Sunday, May 6 th 2012.
Comp 1001: IT & Architecture - Joe Carthy 1 Review Floating point numbers are represented in scientific notation In binary: ± m x 2 exp There are different.
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
1 Live Sound Reinforcement Audio measurements. 2 Live Sound Reinforcement One of the most common terms you will come across when handling any type of.
Digital Audio Multimedia Systems (Module 1 Lesson 1)
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
MIT Museum. Fourier What did you learn?? Perception Can pick out one frequency.
Graphic Equalizer Table By Jose Lerma. Main Idea The main idea of this table is to display the frequencies of any sound or audio input, either by microphone.
© Curriculum Foundation1 Section 2 The nature of the assessment task Section 2 The nature of the assessment task There are three key questions: What are.
By Edward Lim 8.7.  What?  Today we started the Cornerstone Piece and we were given a few tasks to complete. The tasks were to watch the Kurt Fearnly.
Dynamic Range and Dynamic Range Processors
The Care and Feeding of Loudness Models J. D. (jj) Johnston Chief Scientist Neural Audio Kirkland, Washington, USA.
PET for Schools. Paper 3: Speaking What’s in the Speaking Test? Part 1: You answer the examiner’s questions about yourself and give your opinions. Part.
When to Code WHEN NOT TO CODE James D. (jj) Johnston Chief Scientist DTS, Inc.
.. HFM Distance Learning Project Student Survey 2003 – 2004 School Year BOCES Distance Learning Program Quality Access Support.
Georgia Institute of Technology Introduction to Processing Digital Sounds part 1 Barb Ericson Georgia Institute of Technology Sept 2005.
Analogue & Digital. Analogue Sound Storage Devices.
Loudness level (phon) An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a.
Intro-Sound-part1 Introduction to Processing Digital Sounds part 1 Barb Ericson Georgia Institute of Technology Oct 2009.
Sight Words.
High Frequency Words.
Encoding How is information represented?. Way of looking at techniques Data Medium Digital Analog Digital Analog NRZ Manchester Differential Manchester.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
Perceptual Audio Coding The AT&T/Bell Labs view James D. Johnston Chief Scientist Neural Audio, Kirkland, Wa.
Skills For Effective Communication
Digital Audio I. Acknowledgement Some part of this lecture note has been taken from multimedia course made by Asst.Prof.Dr. William Bares and from Paul.
Conflict Resolution notes. What is Conflict Resolution? Sometimes we all get pretty angry. We may feel that something is unfair, something has been taken.
The Care and Feeding of Loudness Models J. D. (jj) Johnston Neural Audio Kirkland, Washington, USA.
Created By Sherri Desseau Click to begin TACOMA SCREENING INSTRUMENT FIRST GRADE.
The problem you have samples for has been edited to reduce the amount of reading for the students. We gave the original late in the year. As we were working.
MIT Museum.
Loudness level (phon) An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a.
Image and Sound Representation
Loudness level (phon) An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a.
Multimedia Systems and Applications
A primer by J. D. Johnston Microsoft Corporation
A primer by J. D. Johnston Microsoft Corporation
Recap In previous lessons we have looked at how numbers can be stored as binary. We have also seen how images are stored as binary. This lesson we are.
Digital Audio Application of Digital Audio - Selected Examples
Presentation transcript:

Audio, Radio, Acoustics, Signal Processing? James D. Johnston Retired Audio Geek

What a long, strange trip it’s been!

When I started life in Audio That would be 3 times: – In Jr. High School – At CMU – At Bell Labs Each time, I was discouraged by teachers, professors, or management, on the basis that audio was full of nonsense, and we didn’t understand things very well, either. None the less, Bell Labs has a long legacy of work in hearing and audio.

What did I learn at CMU about audio? Leased wideband lines are terrible Oh, did I mention? Leased wideband lines are terrible. Yes, I hung out at the campus radio station. No, it’s not TPC to blame, it’s physics. Yes, folks, that was one of the facts that caused MP3.

The Bell Labs Legacy Harvey Fletcher worked at Bell Labs in the pre-WW2 days. His work underpins a great deal of the modern understanding of hearing. Stevens, Zwicker and others built and expanded on his work. He established the understanding of the ear as a frequency-sensitive organ. He used this, among other things:

Frequency Sensitivity Fletcher showed, by way of masking experiments, that the ear was a frequency selective device. This has a few important effects. – The first, and foremost, is that Signal to Noise Ratio is kind of pointless, unless you know it in some variety of frequency-sensitive sense. – Everything, give or take, needs to take the frequency selectivity into account.

How’s that? Let’s consider two signals: – Signal 1 has an SNR of 6.0 dB – The signal consists of narrowband noise between 950 and 1050 Hz. The noise is a tone at 1kHz, 6dB lower in energy than the narrowband noise. – Signal 2 has an SNR of 60 dB – The signal consists of a 20kHz sine wave. The noise consists of a 1.5kHz sine wave 60dB lower.

Which one can you hear? Let’s have a show of hands. Which situation will allow you to determine signal vs. signal + noise in a properly constructed double-blind test in a quiet room? – Signal 1 – Please show hands – Signal 2 – Again, please show hands Try it yourself! Download octave, make the signals, and do the test on yourself.

The Message? Very basic, older tests show that SNR, which is a superset of THD, is, to take a line from the Hitchhiker’s Guide to the Universe: Mostly Harmless Yes, it is useful at extremes. Between extremes, well, not so much.

So, how does this lead to conflict? Most measurements available from literature give THD or SNR. Almost nowhere do we see noise spectra or anything of the sort, even for one signal at one frequency, let alone many. So, we hear the classic argument – THE THD IS GREAT, WHAT’S YOUR PROBLEM vs – No, It sounds like (bleep)

There we have it already, a start of the conflict that remains today in Audio. That was a long, long time ago! (and we didn’t even discuss the people who preferred steel vs. thorn needles on their Victrola)

Loudness vs. Sound Pressure, Intensity, and so on Loudness is an internal, perceptual level. It is the SENSATION LEVEL SPL is Sound Pressure Level. This carries part of the information of the power in the atmosphere Intensity is the sound power in the atmosphere at a given point, only part of which is converted by HRTF’s to the pressure at the eardrum.

Which Brings Me to Another Point or Two. First, what you like to listen to is PREFERENCE, not “accuracy”. You listen to what you prefer to hear, not what is measurably more accurate, unless of course, you prefer a good measurement. Preference is inviolate! Preference can amount from many, many things, in many, many ways.

A modern view of perception Lateral inhibition Frequency filtering, loudness analysis Object Analysis Feature Analysis Cognitive-level understanding Peripheral Processing Megabits Many Megabits Kilobits Bits OBSERVE THE MASSIVE FEEDBACK AND THE LOSS OF DATA AT EACH STEP

How does this cause conflict? First, it clearly shows the need for “blind testing” in fact “double-blind testing”. – No, that doesn’t mean you wear a blindfold – It does mean that you have to detect the differences you’re listening for WITHOUT HINTS FROM OTHER SENSES No, you can’t ignore them. It’s not delusion, hallucination, or stupidity, it’s how your brain works. If it’s not a DBT, you have no idea what you were responding to, beyond “something.”

And that’s bad? Not always. – If you like chartreuse wires, that’s preference. Repeat after me again “preference is inviolate”. – There are undoubtedly things beyond the sound of something that you care about, like Reliability WAF (endless list) – It all depends on what you want to do with your listening experience.

Your preference is not my preference, it is your preference. When describing an aesthetic experience, opinions are just that, opinions, preferences, what-have you. – They do not extend beyond your own PREFERENCE – They may not match someone else’s. – They certainly may not have much to do with the sound emitted during the experience.

When is preference bad? When you’re trying to determine what the auditory system, just the auditory system, and only the auditory system is providing to the rest of the process. The systems are so very flexible that only if you have a FALSIFIABLE result can you proceed with a scientific investigation Anecdotes start the process, but there has to be more than anecdotes to investigate scientifically.

And there we have it Another cause of the divide between the engineers and scientists, and perhaps the nastiest one of all. The SNR experiences teach the artistic side to ignore the engineer The lack of DBT’s and testability teach the engineers to ignore the artist.

And that, ladies and gentlemen IS A REALLY BAD THING!

Ok, now onward. After college, as I said, I went to Bell Labs, and was discouraged officially from working on audio. There were a few things to consider here, though. – I worked for Dave Goodman – He worked for Jim Flanagan – He worked for Max Matthews

Bell Labs vs. Audio Thanks to a variety of legal and tariff issues, Bell Labs was not supposed to work on audio systems. Research was OK, but not things like loudspeakers, stereos, etc. That didn’t keep the people at Bell Labs from being interested.

My first summer at Bell Labs I designed and built an ADPCM coder that did from 2 to 8 bit ADPCM, using analog multipliers, integrators, and so on. It could cycle at 8kHz. Barely. I sat in Max’s Lab, next to another young college student type who went by the name of Bitsy Cohen at the time. – I forget what she was working on, you’ll have to ask her. At the end of the summer, I said something to the effect of: If I had faster ADC and DAC, I could make one of these that would code music.

Jim Flanagan Jim Flanagan, who hired me into Bell Labs, should be known for a lot of things: The artificial Larynx ADPCM coding (the CELP in your cell phone comes from that family of codecs) Being a very, very good manager in terms of supporting people who want to invent new things. He was also interested in music coding, saying something like: Well, you know, it is a transmission problem, and we do that sort of thing.

So, how would he get support? In two words: Max Matthews

I suspect I don’t have to explain Max, as I suspect everyone knows, was very, very much interested in computer music. He was Jim’s boss.

So, the next summer… I was hired back to build another analog ADPCM codec: – 2-12 bits this time – 6khz to 32kHz sampling rate – High Dynamic range Soldered point to point perfboard: – AT 12 bits, dynamic range of 110dB (re 10V RMS) – DBX 202’s gave us true exponential step size control

So.. To make a long story short, I stuck at Bell Labs, learning signal processing from Dave, Jim, Larry Rabiner, Nikil Jayant, and lots of other people – the “Two band sub-band commentary grade codec”. – 56 kb/s – Two-band ADPCM/APCM – G722, much later, was the same, but with adaptive predictors – The first perceptual lesson

The Lesson This codec sounded great. – We put classical through it – We put rock through it. – We put pipe organ through it – We put male vocal through it And then we put female vocal (acapella) through it. No, it didn’t sound great any more. This was my first introduction to “upward spread of masking”

Array Microphones Along about this time, Jim Flanagan decided to build an array microphone for the Murray Hill auditorium. I don’t have any photos of the first mike and hardware, but had 49 elements, and used CCD’s for delay. I know way too much about it, I designed the circuit boards for the CCD’s (8 channels per board, digitally addressable for delay setting), and Paula Bottone stuffed them, fixed the soldier spillovers (from the board manufacture), and we tried it out.

This is what it turned into: Here we see Gary Elko looking at the more modern, higher-order, octave spaced array. And the beamforming hardware

What next? Well, next was a digital earphone. – It had 4 bits – You got the performance you expect from 4 bits. – It used a 6 th order acoustic filter stuck on the output side of the electret, which was split into sections, hence the 4 bits of resolution. I haven’t seen much like that since.

But back to coding: Implementing a real perceptual coder had to wait, there wasn’t enough memory on the high-speed minicomputers. They did have a pretty good memory space, it was a full 32 kB. And then the Alliant FX8 arrived…

The Alliant ran a Unix variant I was the only Unix user in the department (thanks to needing lots of circuit design tools written by Joe Condon, Steve Bourne, and others) It had lots of memory, 64 megabytes, if I recall correctly. LOTS of memory… Here, jj, you test this thing.

Which, after a story for another day, brings us to this The 13 dB miracle You will hear 3 tracks in random order. Original 13dB SNR white noise 13dB SNR perceptually injected noise Ok, which is which?

Once again SNR IS MOSTLY HARMLESS!

Now, if I only had a nickel for every time somebody said: “Yes, Mr. Johnston, but what is the SNR of that codec?”

From there: PXFM MP3 PAC AAC PSR A bunch of other stuff, for another day. If you want to know more, see my “Audio 2004” talk at It’s still as valid as it was in 2004 when I first gave that talk.

Which is why we have the problem we have today: Perception does not respond to broadband SNR in any really useful fashion Perception integrates all senses Reproducing one point in a room accurately does not reproduce the soundfield in the original venue This all comes down to one basic idea

AT ALL TIMES, IN ALL PLACES, ONE MUST ALWAYS CONSIDER PERCEPTION

What’s left? Array microphones Array speakers (not just wavefield synthesis) Perceptual soundfield capture Perceptual soundfield synthesis Capture and representation of soundfield parameters in PERCEPTUAL TERMS Object oriented audio A whole bunch of other stuff I’m not going to fit on this slide.

So, then Can we stop arguing with each other, talk, and develop some understanding among the engineers why the artistic side (mixers, etc.) do what they do? Can we have the artistic side stop with the “talk to the hand” treatment? Please, no more wideband SNR arguments. Puhleeeze! I’m tired of both. This is partially why I’m retired.

Some examples of where to go from here courtesy of Gary Elko and friends, I don’t have a single photograph of my own stuff:

Thank you, and GOOD NIGHT