[Tim Shattuck, 2006][1] Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck April 19, 2006 From the.

Slides:



Advertisements
Similar presentations
Computer Abstractions and Technology
Advertisements

RISC ARCHITECTURE By Guan Hang Su. Over View -> RISC design philosophy -> Features of RISC -> Case Study -> The Success of RISC processors -> CRISC.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
CS 7810 Lecture 23 Maximizing CMP Throughput with Mediocre Cores J. Davis, J. Laudon, K. Olukotun Proceedings of PACT-14 September 2005.
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
CS 7810 Lecture 12 Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors D. Brooks et al. IEEE Micro, Nov/Dec.
High Performance Computer Architecture Challenges Rajeev Balasubramonian School of Computing, University of Utah.
Chapter Hardwired vs Microprogrammed Control Multithreading
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
CS 300 – Lecture 2 Intro to Computer Architecture / Assembly Language History.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Multi-core Processing The Past and The Future Amir Moghimi, ASIC Course, UT ECE.
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
Chapter 18 Multicore Computers
Computer performance.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Computer Performance Computer Engineering Department.
Multi Core Processor Submitted by: Lizolen Pradhan
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  Why computer organization is important  Logistics  Modern trends.
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
Sean Mathews, Christopher Kiser, Haoxiang Chen. Processor Design Tradeoffs: Instruction Set Design Support useful functions while implementing as efficiently.
Computer Architecture Introduction Lynn Choi Korea University.
Microprocessor Microarchitecture Introduction Lynn Choi School of Electrical Engineering.
EKT303/4 Superscalar vs Super-pipelined.
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
Hewlett-Packard PA-RISC Bit Processors: History, Features, and Architecture Presented By: Adam Gray Christie Kummers Joshua Madagan.
CS203 – Advanced Computer Architecture
The Pentium Series CS 585: Computer Architecture Summer 2002 Tim Barto.
Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
PipeliningPipelining Computer Architecture (Fall 2006)
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
William Stallings Computer Organization and Architecture 8th Edition
CS203 – Advanced Computer Architecture
Lynn Choi School of Electrical Engineering
Multi-core processors
Assembly Language for Intel-Based Computers, 5th Edition
Architecture & Organization 1
Scalable Processor Design
Unit 2 Computer Systems HND in Computing and Systems Development
Hyperthreading Technology
Lecture 2: Performance Today’s topics: Technology wrap-up
Architecture & Organization 1
Computer Architecture Lecture 4 17th May, 2006
Adaptive Single-Chip Multiprocessing
Chapter 1 Introduction.
Computer Evolution and Performance
William Stallings Computer Organization and Architecture 8th Edition
Performance of computer systems
William Stallings Computer Organization and Architecture 8th Edition
Presentation transcript:

[Tim Shattuck, 2006][1] Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck April 19, 2006 From the Paper by James Laudon Computer Architecture News, Volume 33, Number 4, September 2005

[Tim Shattuck, 2006][2] At Issue: Power Hungry Servers Increasing Costs to Power Hardware Wastes Limited Resources

[Tim Shattuck, 2006][3] Three Trends High power consumption to performance gains ratio Hardware costs account for a smaller percentage of Total Cost of Ownership (TCO) Energy costs are rising These trends are expected to make power the dominant factor in calculating TCO within five years.

[Tim Shattuck, 2006][4] Niagra Optimizations Simple Clock gating Pipelines More complex Hardware support for multithreading

[Tim Shattuck, 2006][5] Simple Optimizations Clock gating Don't power idle parts of the chip Shorter, medium-length pipelines Fewer registers, transistors between stages Less power wasted on (failed) speculation Allow for more cores / chip

[Tim Shattuck, 2006][6] More Optimizations Hardware Multithreading Keep on-chip resources busy Deals with high cache miss rates Boosts performance / Watt Increases throughput of threads Increases power consumption only slightly Increases size of the die 4 - 7% per thread

[Tim Shattuck, 2006][7] Cores / Die Fewer complex cores More simple cores Individual thread completion Aggregate thread throughput Simpler cores tend to have better performance / Watt ratios

[Tim Shattuck, 2006][8] Sufficient Cache and Memory Bandwidth Necessary to keep threads busy Sun's Niagra: Cores connected to L2 cache by a crossbar switch Cache bandwidth of 76.8 GB/s Four memory controllers directly connected to DDR2 SDRAM memory unit (200 Mhz) Raw memory bandwidth of 25.6 GB/s Controllers can reorder accesses to favor reads over writes.

[Tim Shattuck, 2006][9] Testing SPEC JBB 2000 Java server side business logic TPC-C, TPC-W Transactional processing tests XML Test Sun's multithreaded processing test. Result: Scalar processors with moderate pipelines and thread support outperformed superscalar processors.

[Tim Shattuck, 2006][10] Case Studies Sun's Niagra 8 cores, 4 threads each Scalar cores Tries to maximize performance / Watt Intel's Pentium Extreme Edition 2 cores, 2 threads each Superscalar cores Tries to maximize performance

[Tim Shattuck, 2006][11] Case Studies (II) - Results FeatureNiagraPentium Extreme Edition Clock Speed1.2 Ghz3.2 Ghz Pipeline Depth6 stages31 stages Number of Cores82 Number of Threads324 L2 Bandwidth76.8 GB/s~180 GB/s Memory Bandwidth25.6 GB/s6.4 GB/s Transistor Count279 Million230 Million Power72 W130 W

[Tim Shattuck, 2006][12] Simple Core Limitations Lower single thread performance Amplified by lower instruction level parallelism Keeping a large number of threads busy may become difficult Hot locks – threaded applications may not scale very well

[Tim Shattuck, 2006][13] Future Directions Use multithreading to enhance single threaded applications Run-ahead execution – allows out of order execution with only a modest amount of hardware Software control of power consumption Dynamic adjustments to voltage and frequency to tune power consumption Control of non-processing devices' (disk, memory systems) power consumption

[Tim Shattuck, 2006][14] Conclusion Invest in a Niagra today!