Presentation on theme: "MPEG-4 Structured Audio CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John Lazzaro John Wawrzynek June 18, 2001 Modified."— Presentation transcript:
MPEG-4 Structured Audio CS Division University of California at Berkeley John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault January 20, 2003
MPEG 4 Standard Structured Audio: One “component” in the MPEG audio standard. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS ISO/IEC sec5
Audio Compression Basics How well does this work? “Perceptually Lossless” : 10X-20X reduction MP3, Dolby AC3, … True Lossless: 2.5X reduction Shorten, T. Robinson (Cambridge University) decoderencoder time amp Filter into Critical Bands Allocate Bits Format Bit- stream Compute Masking Traditional Technique for Music
The Kolmogorov alternative: Write a computer program that generates the desired audio stream. Transmit the computer program. To decode, execute the program. MPEG-4 Structured Audio (MP4-SA) uses this approach. Eric Scheirer, Editor (MIT Media Lab). Similar to Postscript!
MP4-SA Encoding may be a creative act: writing a program. directly (emacs), or indirectly (GUI, webpage) In this case, MP4-SA is a lossless compressor. may be automatic: given a sound, an encoder writes a program that generates the sound. Automatic encoding is a hard in the general case. MP4-SA Decoders are interpreters or compilers.
Key Application: Music Production Modern music production is computer-based. Musicians enter performances into computers as control information, not audio waveforms. Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control. “The Program” synthesis algorithms effects “boxes” mixers Musical performance Mix-down control information “The Decoder” sound rendering MP4-SA Maps to Modern Music Production Network Premium on low-bandwidth
Key Application: Music Production Modern music production is computer-based. Musicians enter performances into computers as control information, not audio waveforms. Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control. “The Program” synthesis algorithms effects “boxes” mixers Musical performance Mix-down control information “The Decoder” sound rendering MP4-SA Maps to Modern Music Production Ideal for collaborative productions, remixes, and... File System Standard Framework
Key Application: Music Performance Music Performance requires dynamic control. True interactively requires parameterized sounds. Musicians control instruments and effects with interactive controllers. Control could be indirect and remote (ex: games). MP4-SA Enables Networked Music Performance Network Premium on low-bandwidth “The Decoder” sound rendering + “The Decoder” sound rendering +
MPEG 4 Structured Audio: A binary file format that encodes: The programming language SAOL (pronounced: sail). The musical score language SASL. Legacy support for MIDI. Audio sample data. Result is normative: an MP4-SA file will sound identical on all compliant decoders. èDifferent from MIDI files.
Why SAOL and MP4-SA? Why not Java? Musical performance have temporal structure that changes over several timescales: Sample-by-sample 10’s of usec Amplitude & timbre envelopes: 10’s of msec Note-by-note: 100’s of msec Writing sound generation code in a conventional language results in code dominated by time-scale management. Hard to maintain, hard to optimize.
Time management is built into SAOL. A SAOL program executes by moving a simulated clock forward in time, performing calculations along the way in a synchronous fashion. Work is scheduled to happen: at the a-rate (the audio sample rate) at the k-rate (envelope control rate) at the i-rate (rate for new notes) Language variables are typed as a/k/i-rate. A language statement is scheduled based on the rate of the variables it contains.
SAOL, SASL, and Scheduling: Sound creation in MP4-SA can be compared to a musician playing notes on an instrument. A SAOL subprogram (called an instr or instrument) serves as the instrument. SASL commands (called score lines) act to play notes on SAOL instruments. Many instances of a SAOL instr can be active at one time, making sounds corresponding to notes launched by different score lines in a SASL file.
An example: SAOL instrument tone, that plays a gated sine wave. (SAOL code in next slide.) This SASL file plays melody on tone : 0.5 tone tone tone tone tone tone tone end How long instrument runs When instance is launched Instance parameters (note number, loudness)
SAOL Features Rate semantics: i/k/a-rate execution Vector arithmetic: ex: A=B+C for i=1,n A[i]=B[i]+C[i] All floating-point arithmetic. Extensive build-in audio function library: signal generators, table operators, pitch converters, filters, fft, sample rate conversion, effects,...
Spectrum of implementations Startup delay Execution performance ISO/IEC sec 5, reference implementation Zoia & Alverti, EPFL, ICASSP 2001 Significant development & maintenance complexity Directly Interpret Translate to VM, Interpret VM code Compile to machine code Translate to C, compile C code
Sfront - a SAOL-to-C translator sfront foo.mp4sa.c Converts MP4-SA files to a ANSI C program, that when executed, produces audio. Runs on UNIX, Windows, MacOS. Under Linux, supports real-time MIDI input, real-time audio input and output, and MIDI over RTP. sfront foo.mp4 SAOL MIDI Uncompressed samples SASL sa.c Handles SAOL, SASL, MIDI, uncompressed samples.
Generator Techniques Much of the SA standard describes a library 104 core opcodes (ex: pow(), allpass(), reverb() ) 16 wave table generators (ex: harm, spline, random) Sfront optimizes the code produced for each library element instance based on the invocation attributes rate, width, size, constancy, integral nature of the parameters, number of paramaters
Interesting Issues: MP4-SA puts emphasis on sound synthesis methods that can be described in a small amount of space. Physical Modeling good Sampling Natural Instruments bad If models are chosen carefully, compression ratios of 100 to 10,000 are possible. Physical Modeling is relatively immature, but holds much promise.
Interesting Issues (cont.): MP4-SA specifies that a decoder produces audio that “sounds identical” to computing the program accurately. A new role for psychophysics: Instead of using psychophysics to squeeze bits out of a sound representation, MP4-SA decoders will use psychophysics to squeeze FLOPS out of sound computations. Leverage spectral and temporal masking.