Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Tuning Team Chia-heng Tu June 30, 2009

Similar presentations


Presentation on theme: "Performance Tuning Team Chia-heng Tu June 30, 2009"— Presentation transcript:

1 Performance Tuning Team Chia-heng Tu June 30, 2009
summer projects Performance Tuning Team Chia-heng Tu June 30, 2009

2 Optimization levels in General System architecture
Design & Source code level Compile level Compiler Library level OS level Bus Architecture level Processing Elements (ARM, PPC, etc) Accelerators (DSPs, FPGA, ASICs, etc) I/O Devices (UART, USB, LCD, etc) Processing Elements (ARM, PPC, etc) Accelerators (DSPs, FPGA, ASICs, etc)

3 List of summer projects
Performance Evaluation of the CUDA programs on Muticore platforms (Compiler, Architecture, Parallel Computing, Performance Tools) Establishing Heterogeneous Multicore Environment (QEMU) (System software) Integrate an existing DSP simulator (System software) Communication facility (MSG library) on the environment (Compiler) Write a DSP emulator Performance Analysis Infrastructure (QEMU) (System software) Port PAPI onto QEMU (arm processor) (Architecture) Add Hardware Performance Monitoring Events (Performance tool) Tracing tool library porting on QEMU Embedded Development Platform (TI Davinci) Port Tracing tool onto TI Davinci platform Port MSG Library onto TI Davinci platform Integrate PAPI onto TI Davinci platform Study of the Impact of CPU Architecture on Program Performance (Architecture, performance tools) Memory opportunity MOEA Project

4 Performance Evaluation of the CDUA programs on Multicore platforms
Programming model vs. CPU architectures Binaries (PPE+SPE) Real Apps. (written in CUDA program model) Real Apps. (Parallel C program) Application Layer Code translator Cell compiler OS Layer Red Hat or Fedora 9 Linux Platform Layer Nehalem image is from: vs. Intel Nehalem Architecture IBM Cell Broadband Engine Architecture

5 Establishing Heterogeneous Multicore Environment
Integrate an existing DSP simulator Build communication facility (MSG library) on the environment Write a DSP emulator ARM Binary DSP Binary Application Layer Real Apps. (Crypto, Multimedia, etc) Library Layer High-level Communication Interface Communication Library OS Layer OS (Linux) Bus I2C Bus Accelerator (PAC DSP) Platform Layer (Virtual Platform) ARM Memory QEMU

6 Performance Analysis Infrastructure
Port Tracing tool library on QEMU (performance tools) Freq and time of function calls, and call graph Integrate Performance Application Programming Interface (PAPI) Add Hardware Performance Events 1 Real Apps. (Crypto, Multimedia, etc) Application Layer Source code instrumentor int main(int argc,char **argv) { // The data structure recording the performance data struct perfctr_sum_ctrs before, after; // … prolog: setup the environment. read_PMU(&before); dijkstra.c(); read_PMU(&after); //… Epilog: dump the performance data (Instruction Counts) return 0; } PMU_dijkstra.c 2 High-level Performance Analysis Interface Library Layer Tracing lib. Performance Application Programming Interface Library Perfctr (PMU Driver) OS Layer OS (Linux) Bus 3 I2C Bus ARM cache miss rate, etc Accelerator (PAC DSP ISS) Logical Time Stamp Counter (TSC) Performance Monitoring Unit (PMU) Platform Layer (Virtual Platform) Timing Facility QEMU

7 Embedded Development Platform (TI Davinci)
Port MSG Library onto TI Davinci platform Port Tracing tool onto TI Davinci platform Integrate PAPI onto TI Davinci platform Real Apps. (Crypto, Multimedia, etc) Application Layer 2 3 1 Call graph High-level Performance Analysis Interface High-level Communication Interface A Library Layer Communication Library Tracing lib. Performance Application Programming Interface Library OS (Micro-kernel) OS (Micro-kernel) B C OS Layer Bus I2C Bus D E F ARM C64x DSP Performance Monitoring Unit (PMU) Platform Layer (Virtual Platform) TI-Davinci

8 Study of the impact of CPU Architecture on Program Performance
Performance comparison of parallel programs on different multicore architectures Impact factors: cache size, cache hierarchy, interconnection among cores, etc Real Apps. (Parallel C Programs) Application Layer OS Layer Linux Platform Layer IBM Cell Broadband Engine Architecture AMD Quad Core Architecture Intel Nehalem Architecture

9 Everyone is Welcome to join us!!!
Practical, system wide, and up to date research projects Everyone is Welcome to join us!!!


Download ppt "Performance Tuning Team Chia-heng Tu June 30, 2009"

Similar presentations


Ads by Google