The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Slides:



Advertisements
Similar presentations
More Intel machine language and one more look at other architectures.
Advertisements

Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Instruction Set Design
CPU Review and Programming Models CT101 – Computing Systems.
Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
1 Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)
1 Advanced Computer Architecture Limits to ILP Lecture 3.
1 Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 – 4.4)
Java Implementation Arthur Sale & Saeid Nooshabadi The background to a Large Grant ARC Application.
Introduction to Advanced Topics Chapter 1 Mooly Sagiv Schrierber
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Computer Architecture.
Instruction Level Parallelism (ILP) Colin Stevens.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
RISC By Don Nichols. Contents Introduction History Problems with CISC RISC Philosophy Early RISC Modern RISC.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Synergistic Processing In Cell’s Multicore Architecture Michael Gschwind, et al. Presented by: Jia Zou CS258 3/5/08.
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
Part 1.  Intel x86/Pentium family  32-bit CISC processor  SUN SPARC and UltraSPARC  32- and 64-bit RISC processors  Java  C  C++  Java  Why Java?
Computer Organization and Architecture Reduced Instruction Set Computers (RISC) Chapter 13.
CH13 Reduced Instruction Set Computers {Make hardware Simpler, but quicker} Key features  Large number of general purpose registers  Use of compiler.
Lecture 10 : Introduction to Java Virtual Machine
TRIPS – An EDGE Instruction Set Architecture Chirag Shah April 24, 2008.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
2015/10/22\course\cpeg323-08F\Final-Review F.ppt1 Midterm Review Introduction to Computer Systems Engineering (CPEG 323)
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
Part 1.  Intel x86/Pentium family  32-bit CISC processor  SUN SPARC and UltraSPARC  32- and 64-bit RISC processors  Java  C  C++  Java  Why Java?
RISC architecture and instruction Level Parallelism (ILP) based on “Computer Architecture: a Quantitative Approach” by Hennessy and Patterson, Morgan Kaufmann.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
CISC Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware
Crosscutting Issues: The Rôle of Compilers Architects must be aware of current compiler technology Compiler Architecture.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Reduced Instruction Set Computers. Major Advances in Computers(1) The family concept —IBM System/ —DEC PDP-8 —Separates architecture from implementation.
Pipelining and Parallelism Mark Staveley
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
1 chapter 1 Computer Architecture and Design ECE4480/5480 Computer Architecture and Design Department of Electrical and Computer Engineering University.
CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
COMPUTER ORGANIZATIONS CSNB123 NSMS2013 Ver.1Systems and Networking1.
ISA's, Compilers, and Assembly
Memory-Aware Compilation Philip Sweany 10/20/2011.
CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.
Use of Pipelining to Achieve CPI < 1
CS 352H: Computer Systems Architecture
Topics to be covered Instruction Execution Characteristics
Advanced Architectures
Assembly language.
A Closer Look at Instruction Set Architectures
Introduction to Advanced Topics Chapter 1 Text Book: Advanced compiler Design implementation By Steven S Muchnick (Elsevier)
Simultaneous Multithreading
CC 423: Advanced Computer Architecture Limits to ILP
/ Computer Architecture and Design
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
“C” and Assembly Language- What are they good for?
A primer on Computers and Programs
Chapter 12 Pipelining and RISC
Advanced Architecture +
CSC3050 – Computer Architecture
Lecture 4: Instruction Set Design/Pipelining
Lecture 5: Pipeline Wrap-up, Static ILP
CSc 453 Final Code Generation
Introduction to Computer Systems Engineering
Presentation transcript:

The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture Lizy Kurian John Department of Electrical and Computer Engineering The University of Texas at Austin

Lizy Kurian John, LCA, UT Austin2 Somebody once said “Computers are dumb actors and compilers/programmers are the master playwrights.”

Lizy Kurian John, LCA, UT Austin3 Computer Architecture Basics  ISAs  RISC vs CISC  Assembly language coding  Datapath (ALU) and controller  Pipelining  Caches  Out of order execution Hennessy and Patterson architecture books

Lizy Kurian John, LCA, UT Austin4 Basics  ILP  DLP  TLP  Massive parallelism  SIMD/MIMD  VLIW  Performance and Power metrics Hennessy and Patterson architecture books ASPLOS, ISCA, Micro, HPCA

Lizy Kurian John, LCA, UT Austin5 The Bottomline Programming Language choice affects performance and power eg: Java Compilers affect Performance and Power

Lizy Kurian John, LCA, UT Austin6 A Java Hardware Interpreter  Radhakrishnan, Ph. D 2000 (ISCA2000, ICS2001)  This technique used by Nazomi Communications, Parthus (Chicory Systems) Java class file Native executable Fetch Hardware bytecode translator DecodeExecute bytecodes Native machine instructions

Lizy Kurian John, LCA, UT Austin7 HardInt Performance Hard-Int performs consistently better than the interpreter In JIT mode, significant performance boost in 4 of 5 applications.

Lizy Kurian John, LCA, UT Austin8 Compiler and Power A B D F C E A B D F A B D F C C E E DDG Peak Power = 3 Energy = 6 Peak Power = 2 Energy = 6 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 1 Cycle 2 Cycle 3 Cycle 4

Lizy Kurian John, LCA, UT Austin9 Valluri et al 2001 HPCA workshop  Quantitative Study  Influence of state-of-the-art optimizations on energy and power of the processor examined  Optimizations studied Standard –O1 to –O4 of DEC Alpha’s cc compiler Four individual optimizations – simple basic-block instruction scheduling, loop unrolling, function inlining, and aggressive global scheduling

Lizy Kurian John, LCA, UT Austin10 Standard Optimizations on Power

Lizy Kurian John, LCA, UT Austin11 Somebody once said “Computers are dumb actors and compilers/programmers are the master playwrights.”

Lizy Kurian John, LCA, UT Austin12 A large part of modern out of order processors is hardware that could have been eliminated if a good compiler existed.

Lizy Kurian John, LCA, UT Austin13 Let me get more arrogant A large part of modern out of order processors was designed because computer architects thought compiler writers could not do a good job.

Lizy Kurian John, LCA, UT Austin14 Value Prediction Is a slap on your face Shen and Lipasti

Lizy Kurian John, LCA, UT Austin15 Value Locality  Likelihood that an instruction’s computed result or a similar predictable result will occur soon  Observation – a limited set of unique values constitute majority of values produced and consumed during execution

Lizy Kurian John, LCA, UT Austin16 Load Value Locality

Lizy Kurian John, LCA, UT Austin17 Causes of value locality  Data redundancy – many 0s, sparse matrices, white space in files, empty cells in spread sheets  Program constants –  Computed branches – base address for jump tables is a run-time constant  Virtual function calls – involve code to load a function pointer – can be constant

Lizy Kurian John, LCA, UT Austin18 Causes of value locality  Memory alias resolution – compiler conservatively generates code – may contain stores that alias with loads  Register spill code – stores and subsequent loads  Convergent algorithms – convergence in parts of algorithms before global convergence  Polling algorithms

Lizy Kurian John, LCA, UT Austin19 2 Extremist Views Anything that can be done in hardware should be done in hardware. Anything that can be done in software should be done in software.

Lizy Kurian John, LCA, UT Austin20 What do we need? The Dumb actor Or the The defiant actor – who pays very little attention to the script

Lizy Kurian John, LCA, UT Austin21 Challenging all compiler writers The last 15 years was the defiant actor’s era What about the next 15? TLP, Multithreading, Parallelizing compilers – It’s time for a lot more dumb acting from the architect’s side. And it’s time for some good scriptwriting from the compiler writer’s side.

The University of Texas at Austin Lizy Kurian John, LCA, UT Austin22 BACKUP

Lizy Kurian John, LCA, UT Austin23 Compiler Optimzations  cc - Native C compiler on Dec Alpha running OSF1 operating system  gcc – Used to study the effect of individual optimizations

Lizy Kurian John, LCA, UT Austin24 Std Optimizations Levels on cc -O0 – No optimizations performed -O1 – Local optimizations such as CSE, copy propagation, IVE etc -O2 – Inline expansion of static procedures and global optimizations such as loop unrolling, instruction scheduling -O3 – Inline expansion of global procedures -O4 – s/w pipelining, loop vectorization etc

Lizy Kurian John, LCA, UT Austin25 Std Optimizations Levels on gcc - O0 – No optimizations performed -O1 – Local optimizations such as CSE, copy propagation, dead-code elimination etc -O2 – aggressive instruction scheduling -O3 – Inlining of procedures  Almost same optimizations in each level of cc and gcc  In cc and gcc, optimizations that increase ILP are in levels -O2, -O3, and -O4  cc used where ever possible, gcc used used where specific hooks are required NOTE:

Lizy Kurian John, LCA, UT Austin26 Individual Optimizations  Four gcc optimizations, all optimizations applied on top -O1  -fschedule-insns – local register allocation followed by basic-block list scheduling  -fschedule-insns2 – Postpass scheduling done  -finline-functions – Integrated all simple functions into their callers  -funroll-loops – Perform the optimization of loop unrolling

Lizy Kurian John, LCA, UT Austin27 Some observations  Energy consumption reduces when # of instructions is reduced, i.e., when the total work done is less, energy is less  Power dissipation is directly proportional to IPC

Lizy Kurian John, LCA, UT Austin28 Observations (contd.)  Function inlining was found to be good for both power and energy  Unrolling was found to be good for energy consumption but bad for power dissipation

Lizy Kurian John, LCA, UT Austin29 MMX/SIMD Automatic usage of SIMD ISA still difficult 10+ years after introduction of MMX.

Lizy Kurian John, LCA, UT Austin30 Standard Optimizations on Power (Contd)