Multi-Threading LAME MP3 Encoder TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY Electrical Engineering Department Software Systems Lab Multi-Threading LAME MP3 Encoder Performed by : Gilad Riachshtian Copyright, 2004 © Gilad Raichshtain.
Talk Layout What is the L.A.M.E. Project ? Project Goal MP3 Encoding & Hyper-Threading Overview Multi-Threading strategies Results & Remarks Future Work
What is the L.A.M.E. Project? An Open Source project An Educational Tool used for learning about MP3 encoding It’s goal is to improve Psycho-acoustics quality The speed of MP3 encoding Lame is the most popular state of the art MP3 encoder/decoder used by today’s leading products. FOR MORE INFO... http://lame.sourceforge.net
Project Goal Speeding up the encryption of an audio stream Turning LAME into a Multi-Threaded (MT) engine Be 1:1 bit compatible with the original version Optimize specifically for SMT platforms (implementation on Intel’s P4 with Hyper-Threading Technology)
Thread Level Parallelism Provides thread level parallelism on each processor Resulting in Increased use of processor execution resources Higher processing throughput Achieved by duplicating the architectural state on each processor, while sharing one set of processor execution resources
MP3 Encoding Overview Specifically in LAME Break up the audio stream into frames (uniform chunks, typically ~1K) Frame 1 Audio Stream Frame 2 Frame 3 Frame 4 Read Frame Psycho-Acoustic Perceptual Model Analysis Filterbank MDCT Quantization Huffman Encoding Bitstream Encode Specifically in LAME
LAME MT – Intuitive approach The intuitive approach: Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6 Thread 1: Thread 2: An unbreakable dependence due to Huffman Encoding This is actually Data Decomposition
LAME MT – Functional Decomposition Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6 Floating Point Intensive T1: Psycho-Acoustic Read Frame Analysis Filterbank MDCT Quantization Huffman Encoding T2: Integer Intensive
Results
Results due to Multi-Threading SMT Platform CBR / VBR SMP Platform Using Microsoft’s Compiler 22% / 32% 38% / 62% Using Intel’s Compiler 8.1 20% / 29% 44% / 59%
Results using Intel’s Compiler 8.1 SMT Platform CBR / VBR SMP Platform LAME Original Code 3.97a 21% / 19% 22% / 17% LAME MT Code 19% / 17% 28% / 15%
Overall Performance Results SMT Platform CBR / VBR SMP Platform LAME MT code + Using Intel’s Compiler 8.1 52% / 70% 78% / 109%
Remarks Architectural Issues Implement a PNI version for FFT Pitfall found in version 3.93: Memory access to two different pages with the same offset ~11% speedup achieved by fixing it No longer relevant in later versions No major arch issues found in versions 3.94-3.97a Implement a PNI version for FFT No significant gain achieved Overall ~40 blocks of code were change and are under #ifdef
Future work
Future Work Splitting the encoding process into more than two steps Reading frames in parallel
That's all Folks