 Copyright, HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919)484-9803 P.O. Box 569, Chapel Hill, NC.

Slides:



Advertisements
Similar presentations
MP3 Optimization Exploiting Processor Architecture and Using Better Algorithms Mancia Anguita Universidad de Granada J. Manuel Martinez – Lechado Vitelcom.
Advertisements

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Chapter 1 An Introduction To Microprocessor And Computer
Advanced microprocessor optimization Kampala August, 2007 Agner Fog
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
Copyright HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919) P.O. Box 569, Chapel Hill,
Copyright HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919) P.O. Box 569, Chapel Hill,
Pentium 4 and IA-32 ISA ELEC 5200/6200 Computer Architecture and Design, Fall 2006 Lectured by Dr. V. Agrawal Lectured by Dr. V. Agrawal Kyungseok Kim.
1 Lecture 6 Performance Measurement and Improvement.
Introduction to Scientific Computing Doug Sondak Boston University Scientific Computing and Visualization.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
E.Papandrea PM3 - Paris, 2 nd Mar 2004 DFCI COMPUTING PERFORMANCEPage 1 Enzo Papandrea COMPUTING PERFORMANCE.
Gordon Moore Gordon Moore, cofounder of Intel 1965: 2 x trans. per chip/year After 1970: 2 x trans. per chip/1.5year 摩爾定律.
Copyright © 2006, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners Intel® Core™ Duo Processor.
COCOA(1/19) Real Time Systems LAB. COCOA MAY 31, 2001 김경임, 박성호.
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
Evaluation of Windows 7 RC Build 7100 By Muswera Walter Supervisor: Mr John Ebden Consultants: Billy Morgan and Jill Japp.
Digital Signal Processors for Real-Time Embedded Systems By Jeremy Kohel.
ITEC 325 Lecture 29 Memory(6). Review P2 assigned Exam 2 next Friday Demand paging –Page faults –TLB intro.
High Throughput Compression of Double-Precision Floating-Point Data Martin Burtscher and Paruj Ratanaworabhan School of Electrical and Computer Engineering.
Testing Virtual Machine Performance Running ATLAS Software Yushu Yao Paolo Calafiura LBNL April 15,
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Overview Introduction The Level of Abstraction Organization & Architecture Structure & Function Why study computer organization?
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Computer Performance Computer Engineering Department.
The Instruction Set Architecture Level Dept. of Computer Science Virginia Commonwealth University.
Upgrade to Real Time Linux Target: A MATLAB-Based Graphical Control Environment Thesis Defense by Hai Xu CLEMSON U N I V E R S I T Y Department of Electrical.
Measuring Synchronisation and Scheduling Overheads in OpenMP J. Mark Bull EPCC University of Edinburgh, UK
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
Introduction and Overview Summer 2014 COMP 2130 Introduction to Computer Systems Computing Science Thompson Rivers University.
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
History of Microprocessor MPIntroductionData BusAddress Bus
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
Performance Optimization Getting your programs to run faster CS 691.
Parallelization of the Classic Gram-Schmidt QR-Factorization
 Copyright, HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919) P.O. Box 569, Chapel Hill, NC.
AES Encryption Code Generator Undergraduate Research Project by Paul Magrath. Supervised by Dr David Gregg.
 Copyright, HiCLAS1 George Delic, Ph.D. HiPERiSM Consulting, LLC And Arney Srackangast, AS1MET Services
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
Performance Optimization Getting your programs to run faster.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Introduction to MMX, XMM, SSE and SSE2 Technology
Introdution to SSE or How to put your algorithms on steroids! Christian Kerl
Threaded Programming Lecture 2: Introduction to OpenMP.
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Single Node Optimization Computational Astrophysics.
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers Jack Sampson*, Rubén González†, Jean-Francois Collard¤, Norman P.
Sunpyo Hong, Hyesoon Kim
Co-Processor Architectures Fermi vs. Knights Ferry Roger Goff Dell Senior Global CERN/LHC Technologist |
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
Programming Multi-Core Processors based Embedded Systems A Hands-On Experience on Cavium Octeon based Platforms Lab Exercises: Lab 1 (Performance measurement)
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
Measuring Performance Based on slides by Henri Casanova.
GMProf: A Low-Overhead, Fine-Grained Profiling Approach for GPU Programs Mai Zheng, Vignesh T. Ravi, Wenjing Ma, Feng Qin, and Gagan Agrawal Dept. of Computer.
MET4750 Techniques for Earth System Modeling MET 5990 Techniques for Earth System Modeling and Research (
Stored Program Concept Learning Objectives Learn the meaning of the stored program concept The processor and its components The fetch-decode-execute and.
NFV Compute Acceleration APIs and Evaluation
Employing compression solutions under openacc
Session 3 Memory Management
Roadmap C: Java: Assembly language: OS: Machine code: Computer system:
CSCE 212 Chapter 4: Assessing and Understanding Performance
Vector Processing => Multimedia
CSCI206 - Computer Organization & Programming
CMAQ PARALLEL PERFORMANCE WITH MPI AND OpenMP George Delic, Ph
Comparison of AMD64, IA-32e extensions and the Itanium architecture
Performance of computer systems
Performance of computer systems
Performance of computer systems
Presentation transcript:

 Copyright, HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919) P.O. Box 569, Chapel Hill, NC HiPERiSM Consulting, LLC.

 Copyright, HiPERiSM Consulting, LLC, CHOOSING A COMPILER FOR AQM APPLICATIONS ON LINUX George Delic, Ph.D. Models-3 User’s Workshop October 27-29, 2003 RTP, NC

 Copyright, HiPERiSM Consulting, LLC, Overview 1.Introduction 2.Choice of Hardware 3.Choice of Compilers 4.Choice of Benchmarks 5.Comparing Execution Times 6.Evaluation of SSE Results 7.Tests for AQM’s 8.Conclusions

 Copyright, HiPERiSM Consulting, LLC, Introduction  Motivation  AQM’s are migrating to COTS hardware  Linux is preferred  Rich choice of compilers is now available  Need to learn about portability issues  What is known about compilers for IA-32?  CMAQ releases switch compilers w/o comment  Where is the analysis of differences in Performance? Numerical accuracy & stability? Portability problems?

 Copyright, HiPERiSM Consulting, LLC, Choice of Hardware & Compilers  Hardware  Intel Pentium III (933 MHz, dual processor) with SSE extensions and 256MB L2 cache  Linux kernel  Fortran compilers for IA-32  Absoft 8.0  Intel 7.1  Lahey 5.6  Portland CDK 4.0

 Copyright, HiPERiSM Consulting, LLC, Choice of Benchmarks  Kallman Integer and Logical Algorithm  Uses only I & L operations with bit intrinsics  Negligible I/O and memory operations  Six cases with problem size scaling  Stommel Ocean Model sp Floating Point Algorithm  Jacobi iteration sweep over 2-D physical domain  Regular loops optimal for testing vectorization  Six cases in the range N=2x10 3 to 7x10 3 with N 2 =4 to 49 million data points

 Copyright, HiPERiSM Consulting, LLC, Choice of Benchmarks (cont.)  Princeton Ocean Model dp FP Algorithm  Example of “real-world” code that is numerically unstable with sp arithmetic!  500+ vectorizable loops to exercise compilers  9 procedures account for 85% of CPU time  2-Day simulation for two cases:  Small problem: 65 x 49 x 21  Large problem: 100 x 40 x 15

 Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: Kallman compiler switches Compiler and version Compiler command and selected switches Absoft 8.0f90 –O3 –ffixed Intel 7.1ifc –O3 –tpp6 -FI Lahey 5.6lf95 –tpp –fix Portland 4.0pgf90 –fast

 Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: Kallman (seconds) NAbsoftIntelLaheyPortland

 Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: Kallman (log10 seconds)

 Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: Kallman (ratio to Absoft time)

 Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: SOM (POM) compiler switches (without SSE) Compiler and version Compiler command and selected switches Absoft 8.0f90 –s –cpu:p6–O3 (-N113) – ffixed Intel 7.1ifc –O3 (-r8) –tpp6 -FI Lahey 5.6lf95 –tpp (-dbl) –fix Portland 4.0pgf90 –fast (-r8) –Mvect

 Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: SOM without SSE (seconds) NAbsoftIntelLaheyPortland

 Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: SOM (without SSE)

 Copyright, HiPERiSM Consulting, LLC, Statistics for four compilers: SOM (without SSE)

 Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: POM (without SSE) CaseAbsoftIntelLaheyPortland

 Copyright, HiPERiSM Consulting, LLC, Statistics for four compilers: Variability vs. problem size

 Copyright, HiPERiSM Consulting, LLC, Evaluation of SSE Results  IA-32 Hardware  Intel Pentium III+ supports Streaming- Single-Instruction-Multiple-Data Extensions (SSE)  Linux kernel supports SSE  Fortran compilers that enable SSE  Intel 7.1  Portland CDK 4.0

 Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: SOM (POM) compiler switches (with SSE) Compiler and version Compiler command and selected switches Intel 7.1ifc –O3 -xK (-r8) –tpp6 -FI Portland 4.0pgf90 –fast (-r8) –Mvect=sse

 Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: SOM (with SSE)

 Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: POM (with SSE)

 Copyright, HiPERiSM Consulting, LLC, Evaluation of SSE Results  Fortran compilers with SOM (sp)  Intel 7.1 Average speed up of 1.44  Portland CDK 4.0 Average speed up of 1.70  Fortran compilers with POM (dp)  Intel 7.1 Average speed up of 1.25  Portland CDK 4.0 Average speed up of 1.19

 Copyright, HiPERiSM Consulting, LLC, Tests for AQM’s Next steps for CMAQ with four compilers: Report on portability issues Re-compilation of all libraries Performance instrumentation & analysis Numerical & stability analysis OpenMP performance study Please propose scenarios worthwhile using for these tests!

 Copyright, HiPERiSM Consulting, LLC, Conclusions  Hardware: COTS is the way to go but …….  Linux: Operating System is popular but …..  Programming Environment: rich in choices  Consequences for AQM: the combination of hardware, Linux, and programming environment needs careful on-going evaluation. HiPERiSM is ready for this task!

 Copyright, HiPERiSM Consulting, LLC, HiPERiSM’s URL Talk to us about your requirements