Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.

Slides:

Advertisements

Similar presentations

RISC Instruction Pipelines and Register Windows

Advertisements

Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.

MP3 Optimization Exploiting Processor Architecture and Using Better Algorithms Mancia Anguita Universidad de Granada J. Manuel Martinez – Lechado Vitelcom.

Final Project : Pipelined Microprocessor Joseph Kim.

RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.

Performance of Cache Memory

1 Advanced Computer Architecture Limits to ILP Lecture 3.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.

1 VR BIT MICROPROCESSOR โดย นางสาว พิลาวัณย์ พลับรู้การ นางสาว เพ็ญพรรณ อัศวนพเกียรติ

Lecture 2-Berkeley RISC Penghui Zhang Guanming Wang Hang Zhang.

Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.

CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.

Embedded Systems Programming

COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.

Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.

Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.

Source Code Optimization and Profiling of Energy Consumption in Embedded System Simunic, T.; Benini, L.; De Micheli, G.; Hans, M.; Proceedings on The 13th.

Scheduling Reusable Instructions for Power Reduction J.S. Hu, N. Vijaykrishnan, S. Kim, M. Kandemir, and M.J. Irwin Proceedings of the Design, Automation.

CISC and RISC L1 Prof. Sin-Min Lee Department of Mathematics and Computer Science.

Optimization Of Power Consumption For An ARM7- BASED Multimedia Handheld Device Hoseok Chang; Wonchul Lee; Wonyong Sung Circuits and Systems, ISCAS.

Review for Midterm 2 CPSC 321 Computer Architecture Andreas Klappenecker.

Appendix A Pipelining: Basic and Intermediate Concepts

EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.

Low power and cost effective VLSI design for an MP3 audio decoder using an optimized synthesis- subband approach T.-H. Tsai and Y.-C. Yang Department of.

Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.

Pipelining By Toan Nguyen.

Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.

Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,

Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth.

RISC:Reduced Instruction Set Computing. Overview What is RISC architecture? How did RISC evolve? How does RISC use instruction pipelining? How does RISC.

Computer Architecture Project Team A Sergio Rico, Ertong Zhang, Vlad Chiriacescu, ZhongYin Zhang.

CLEMSON U N I V E R S I T Y AVR32 Micro Controller Unit Atmel has created the first processor architected specifically for 21st century applications that.

HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.

High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.

1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.

Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

A Decompression Architecture for Low Power Embedded Systems Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, Proceedings International.

RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.

Computer Organization and Design Computer Abstractions and Technology

Module : Algorithmic state machines. Machine language Machine language is built up from discrete statements or instructions. On the processing architecture,

The TM3270 Media-Processor. Introduction Design objective – exploit the high level of parallelism available. GPPs with Multi-media extensions (Ex: Intel’s.

EECE 476: Computer Architecture Slide Set #5: Implementing Pipelining Tor Aamodt Slide background: Die photo of the MIPS R2000 (first commercial MIPS microprocessor)

How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining.

1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000.

Performance Tuning John Black CS 425 UNR, Fall 2000.

Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.

Lx: A Technology Platform for Customizable VLIW Embedded Processing.

3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,

Chao Han ELEC6200 Computer Architecture Fall 081ELEC : Han: PowerPC.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.

Architectural Effects on DSP Algorithms and Optimizations Sajal Dogra Ritesh Rathore.

PipeliningPipelining Computer Architecture (Fall 2006)

Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.

CS 352H: Computer Systems Architecture

CS 286 Computer Architecture & Organization

Visit for more Learning Resources

A Closer Look at Instruction Set Architectures

Embedded Systems Design

Pipelining: Advanced ILP

Introduction to Microprocessor Programming

Reducing pipeline hazards – three techniques

Lecture 4: Instruction Set Design/Pipelining

ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.

Chapter 4 The Von Neumann Model

Presentation transcript:

Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information Science & Electronic Engineering Department, P.R. China IEEE Transaction on Consumer Electronics November 2004

Abstract  This paper proposes general software optimization techniques for embedded systems based on processors, which mainly include general optimization methods in high language and software & hardware co-optimization in assembly language. Then these techniques are applied to optimize our MP3 decoder, which is based on RISC32, a RISC core compatible with MIPS Ⅰ instruction set. The last optimization decoder requires 48 MIPS and 49Kbytes memory space to decode 128Kbps, 44.1KHz joint stereo MP3 in real time with CPI 1.15, and we have achieved performance increase of 46.7% and memory space decrease of 38.8% over the original decoding software.

Outline  What’s the problem ?  Related work  Introduction to RISC32  Embedded software optimization techniques  MP3 decoding software optimization  Experiment result  Conclusion

What’s the problem ?  Low cost with fast time to market is the top requirement in embedded system  Develop based on microprocessor architecture by software and hardware co-design  How to make software run efficiently on selected processor had become a main problem

Related works  Many portable MP3 players adopt DSP core or dual core that are comprised by DSP and RISC

Introduction to RISC32  6 pipeline stages  Instruction fetch (IF) 、 instruction decoding (ID) 、 execute (EX) 、 data memory access (DM) 、 data tag comparing (TC) and write back (WB)  Compatible with MIPS-I ISA  Separate 16Kbytes ICache and 16Kbytes DCache by Harvard architecture  Direct mapping, write-through cache strategy  16Kbytes on-chip RAM  200 MHz clock frequency  1mW power dissipation per MHz

Embedded software optimization techniques (1)  The embedded software optimization can be divide into two parts  Algorithm optimization in high level language Tradeoff among computation load, the complexity of data conveying and the size of coefficients  Code optimization in assembly level Take the features of instruction set, micro-architecture and pipeline

Embedded software optimization techniques (2)

Embedded software optimization techniques (3)  Phase 1  Analysis all kinds of constraint in the embedded system  Phase 2  Analysis the feature of specific processor  Phase 3  Software optimization in C and in assembly code  There have not ever existed a good complier for the embedded processor to meet the requirement suggested from phase 1  Embedded software must be close link to target processor  Phase 4  Software and hardware co-validation to make certain to reach the goal

Software optimization in high level language  Software module partition and performance estimation  Divide application software into small module  Run and get the profile information  Module algorithm optimization  Modify algorithm to match with the used processor  General optimization  Implement general optimization

Software optimization in assembly level  Resource statistics  Memory sizes of objective code  Execution MIPS  Memory size optimization  Assembly program structure optimization  Take the features of instruction set, micro- architecture and pipeline into account

MP3 decode flow

Profiling of ISO reference decoder  Employing floating-point computation

Performance analysis  Integer computation-based

Algorithm Optimization in C-code level -- IMDCT  Using the Britanak & Rao’s algorithm

Algorithm Optimization in C-code level – subband synthesis  Two complexity  Matrix operation Can be computed by a 32-point DCT and some data copy operation  Window filter  Using Lee’s DCT algorithm

Algorithm Optimization in C-code level – zero value optimization  Main idea  The result is always zero during zero value of input  Set tags to indicate zero position  Ex : Without zero value optimization For (i=0 ; i<576 ; i++) Output[i] = function[huf i ] With zero value optimization For (i=0 ; i<Position_zero ; i++) Output[i] = function[hufi] For (i=Position_zero ; i<576 ; i++) Output[i] = 0

Memory Size Optimization (1)  The memory size of MP3 decoder related to  Coefficient sizes, bytes per coefficient and data spacing of decoding  It needs | huf i | 4/3 and 2 x (x is not a integer number)

Memory Size Optimization (2)  The value of huf i varies from 0 to 8206  Need 16Kbytes to store these coefficients if a coefficient take 2 bytes  Modify the computation  | huf i | 4/3 = 16 × | huf i /8 | 4/3  Setting upper bound ( huf i > 256 ) ? 255 : huf i  It needs only 256 coefficients with unaware of distortion

Memory Size Optimization (3)  Another problem : 2 1/4(global_gain[gr] - 210), 0 ≦ global_gain[gr] < 255  It can be decomposed as 2 N 2 i/4  2 N can be realized by shift  Need only 4 coefficients

CPI Optimization  CPI = CPI idea + CPI stall  eliminate the CPI stall  The main reasons causing pipeline stall  JBU (Jump Branch Unit) is placed at EX stage  Data cache access is separate into DM and TC stage  Cache miss penalty

CPI Optimization (cont.)  Solutions  Modify the order of assembly program to eliminate control hazards and load stalls by delay slot techniques  Reduce program and data space  Rearrange data space

Experiment result

Conclusion  Proposed embedded software code optimization techniques  optimized the MP3 decoding program on RISC32 according to these techniques  Algorithm optimization in C language for achieving smaller computation complexity  Optimize target code for achieving low CPI value and small memory sizes on RISC32