Performance Tuning Team Chia-heng Tu June 30, 2009

Slides:



Advertisements
Similar presentations
Virtualization Technology
Advertisements

Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip.
Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU.
ANDROID OPERATING SYSTEM Guided By,Presented By, Ajay B.N Somashekar B.T Asst Professor MTech 2 nd Sem (CE)Dept of CS & E.
Fall 2006Lecture 16 Lecture 16: Accelerator Design in the XUP Board ECE 412: Microcomputer Laboratory.
Define Embedded Systems Small (?) Application Specific Computer Systems.
Configurable System-on-Chip: Xilinx EDK
Presenter: Jyun-Yan Li Multiprocessor System-on-Chip Profiling Architecture: Design and Implementation Po-Hui Chen, Chung-Ta King, Yuan-Ying Chang, Shau-Yin.
Embedded Computing From Theory to Practice November 2008 USTC Suzhou.
BY MUKTADIUR RAHMAN DATE: JUNE 10, 2010 Introduction to iPhone SDK.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Chapter 1 CSF 2009 Computer Abstractions and Technology.
Operating System Virtualization
2006 Chapter-1 L2: "Embedded Systems - Architecture, Programming and Design", Raj Kamal, Publs.: McGraw-Hill, Inc. 1 Introduction to Embedded Systems –
Department of Electrical Engineering Electronics Computers Communications Technion Israel Institute of Technology High Speed Digital Systems Lab. High.
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
Advisor: Dr. Aamir Shafi Co-Advisor: Mr. Ali Sajjad Member: Dr. Hafiz Farooq Member: Mr. Tahir Azim Optimizing N-body Simulations for Multi-core Compute.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
1 Latest Generations of Multi Core Processors
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.
 Historical view:  1940’s-Vacuum tubes  1947-Transistors invented by willliam shockely & team  1959-Integrated chips invented by Texas Instrument.
SOC Virtual Prototyping: An Approach towards fast System- On-Chip Solution Date – 09 th April 2012 Mamta CHALANA Tech Leader ST Microelectronics Pvt. Ltd,
Software Performance Monitoring Daniele Francesco Kruse July 2010.
Lecture 7: Overview Microprocessors / microcontrollers.
Compiler Research How I spent my last 22 summer vacations Philip Sweany.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Virtualization Neependra Khare
Computer System Structures
Fast iteration and prototyping in high-performance computing medical applications: a case study with Mentor Vista 8th INFIERI WORKSHOP 10/21/2016.
Software and Communication Driver, for Multimedia analyzing tools on the CEVA-X Platform. June 2007 Arik Caspi Eyal Gabay.
Computer Organization and Architecture Lecture 1 : Introduction
Operating System Overview
Android Mobile Application Development
Virtualization.
NFV Compute Acceleration APIs and Evaluation
ARM Embedded Systems
“…Embedded Software to OSS/Applications…”
Current Generation Hypervisor Type 1 Type 2.
Testing of Heterogeneous Multi-Core Embedded Systems
Hands On SoC FPGA Design
CA Final Project – Multithreaded Processor with IPC Interface
ECE354 Embedded Systems Introduction C Andras Moritz.
Chapter 1: A Tour of Computer Systems
CS6401- OPERATING SYSTEMS L T P C
Roadmap C: Java: Assembly language: OS: Machine code: Computer system:
Tracing and Performance Analysis Tools for Heterogeneous Multicore System by Soon Thean Siew.
Andes Technology Innovate SOC ProcessorsTM
Texas Instruments TDA2x and Vision SDK
عمارة الحاسب.
الجزء السابع الجزء السادس الجزء الخامس الجزء الرابع الجزء الثالث الجزء
Using FPGAs with Processors in YOUR Designs
Konstantis Daloukas Nikolaos Bellas Christos D. Antonopoulos
Virtualization Techniques
Assembly Language for Intel-Based Computers
A Survey on Virtualization Technologies
Today’s agenda Hardware architecture and runtime system
Tools.
A High Performance SoC: PkunityTM
Chapter 1 Introduction.
Perfctr-Xen: A framework for Performance Counter Virtualization
Tools.
Graphics Processing Unit
What Are Performance Counters?
System View Inc..
Presentation transcript:

Performance Tuning Team Chia-heng Tu June 30, 2009 summer projects Performance Tuning Team Chia-heng Tu June 30, 2009

Optimization levels in General System architecture Design & Source code level Compile level Compiler Library level http://i.zdnet.com/blogs/android-architecture-485b.jpg OS level Bus Architecture level Processing Elements (ARM, PPC, etc) Accelerators (DSPs, FPGA, ASICs, etc) I/O Devices (UART, USB, LCD, etc) Processing Elements (ARM, PPC, etc) Accelerators (DSPs, FPGA, ASICs, etc)

List of summer projects Performance Evaluation of the CUDA programs on Muticore platforms (Compiler, Architecture, Parallel Computing, Performance Tools) Establishing Heterogeneous Multicore Environment (QEMU) (System software) Integrate an existing DSP simulator (System software) Communication facility (MSG library) on the environment (Compiler) Write a DSP emulator Performance Analysis Infrastructure (QEMU) (System software) Port PAPI onto QEMU (arm processor) (Architecture) Add Hardware Performance Monitoring Events (Performance tool) Tracing tool library porting on QEMU Embedded Development Platform (TI Davinci) Port Tracing tool onto TI Davinci platform Port MSG Library onto TI Davinci platform Integrate PAPI onto TI Davinci platform Study of the Impact of CPU Architecture on Program Performance (Architecture, performance tools) Memory opportunity MOEA Project

Performance Evaluation of the CDUA programs on Multicore platforms Programming model vs. CPU architectures Binaries (PPE+SPE) Real Apps. (written in CUDA program model) Real Apps. (Parallel C program) Application Layer Code translator Cell compiler OS Layer Red Hat or Fedora 9 Linux Platform Layer http://images.google.com/imgres?imgurl=http://www.hec.nasa.gov/news/gallery_images/cell.chip_diagram.jpg&imgrefurl=http://www.hec.nasa.gov/news/features/2008/cell.074208.html&usg=__l70_zIt-_yhYeYWFoYHwMepKDmg=&h=297&w=490&sz=29&hl=en&start=16&sig2=cZATgCqvqi631FdAOa2Zig&um=1&tbnid=li1pDgl39bwC6M:&tbnh=79&tbnw=130&prev=/images%3Fq%3DIBM%2BCell%2Bprocessor%26hl%3Den%26rlz%3D1B3GGGL_enTW176TW243%26sa%3DN%26um%3D1&ei=YYRISs_JMY2CkQXczejvCQ Nehalem image is from: http://news.cnet.com/8301-13924_3-10008472-64.html vs. Intel Nehalem Architecture IBM Cell Broadband Engine Architecture

Establishing Heterogeneous Multicore Environment Integrate an existing DSP simulator Build communication facility (MSG library) on the environment Write a DSP emulator ARM Binary DSP Binary Application Layer Real Apps. (Crypto, Multimedia, etc) Library Layer High-level Communication Interface Communication Library OS Layer OS (Linux) Bus I2C Bus Accelerator (PAC DSP) Platform Layer (Virtual Platform) ARM Memory QEMU

Performance Analysis Infrastructure Port Tracing tool library on QEMU (performance tools) Freq and time of function calls, and call graph Integrate Performance Application Programming Interface (PAPI) Add Hardware Performance Events 1 Real Apps. (Crypto, Multimedia, etc) Application Layer Source code instrumentor int main(int argc,char **argv) { // The data structure recording the performance data struct perfctr_sum_ctrs before, after; // … prolog: setup the environment. read_PMU(&before); dijkstra.c(); read_PMU(&after); //… Epilog: dump the performance data (Instruction Counts) return 0; } PMU_dijkstra.c 2 High-level Performance Analysis Interface Library Layer Tracing lib. Performance Application Programming Interface Library Perfctr (PMU Driver) OS Layer OS (Linux) Bus 3 I2C Bus ARM cache miss rate, etc Accelerator (PAC DSP ISS) Logical Time Stamp Counter (TSC) Performance Monitoring Unit (PMU) Platform Layer (Virtual Platform) Timing Facility QEMU

Embedded Development Platform (TI Davinci) Port MSG Library onto TI Davinci platform Port Tracing tool onto TI Davinci platform Integrate PAPI onto TI Davinci platform Real Apps. (Crypto, Multimedia, etc) Application Layer 2 3 1 Call graph High-level Performance Analysis Interface High-level Communication Interface A Library Layer Communication Library Tracing lib. Performance Application Programming Interface Library OS (Micro-kernel) OS (Micro-kernel) B C OS Layer Bus I2C Bus D E F ARM C64x DSP Performance Monitoring Unit (PMU) Platform Layer (Virtual Platform) TI-Davinci

Study of the impact of CPU Architecture on Program Performance Performance comparison of parallel programs on different multicore architectures Impact factors: cache size, cache hierarchy, interconnection among cores, etc Real Apps. (Parallel C Programs) Application Layer OS Layer Linux Platform Layer http://www.digital-daily.com/cpu/quad_core_opteron/ IBM Cell Broadband Engine Architecture AMD Quad Core Architecture Intel Nehalem Architecture

Everyone is Welcome to join us!!! Practical, system wide, and up to date research projects Everyone is Welcome to join us!!!