Computing Systems Roadmap and its Impact on Software Development Michael Ernst, BNL HSF Workshop at SLAC 20-21 January, 2015.

Slides:



Advertisements
Similar presentations
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Advertisements

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
1 Cleared for Open Publication July 30, S-2144 P148/MAPLD 2004 Rea MAPLD 148:"Is Scaling the Correct Approach for Radiation Hardened Conversions.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Computer Architecture & Organization
Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
CAD and Design Tools for On- Chip Networks Luca Benini, Mark Hummel, Olav Lysne, Li-Shiuan Peh, Li Shang, Mithuna Thottethodi,
September 28 th 2004University of Utah1 A preliminary look Karthik Ramani Power and Temperature-Aware Microarchitecture.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Figure 1.1 Interaction between applications and the operating system.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Implications for Programming Models Todd C. Mowry CS 495 September 12, 2002.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
Leveling the Field for Multicore Open Systems Architectures Markus Levy President, EEMBC President, Multicore Association.
Computer System Architectures Computer System Software
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
2007 Sept 06SYSC 2001* - Fall SYSC2001-Ch1.ppt1 Computer Architecture & Organization  Instruction set, number of bits used for data representation,
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Enterprise Storage A New Approach to Information Access Darren Thomas Vice President Compaq Computer Corporation.
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  Why computer organization is important  Logistics  Modern trends.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
1 Recap (from Previous Lecture). 2 Computer Architecture Computer Architecture involves 3 inter- related components – Instruction set architecture (ISA):
Operating Systems Lecture 02: Computer System Overview Anda Iamnitchi
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Back-end (foundation) Working group X-stack PI Kickoff Meeting Sept 19, 2012.
Computer Organization and Design Computer Abstractions and Technology
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
Dean Tullsen UCSD.  The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive.
Computer Organization & Assembly Language © by DR. M. Amer.
Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications.
RISC and CISC. What is CISC? CISC is an acronym for Complex Instruction Set Computer and are chips that are easy to program and which make efficient use.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
1 Evolving ATLAS Computing Model and Requirements Michael Ernst, BNL With slides from Borut Kersevan and Karsten Koeneke U.S. ATLAS Distributed Facilities.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Data Center & Large-Scale Systems (updated) Luis Ceze, Bill Feiereisen, Krishna Kant, Richard Murphy, Onur Mutlu, Anand Sivasubramanian, Christos Kozyrakis.
Programmability Hiroshi Nakashima Thomas Sterling.
Dept. of Computer Science - CS6461 Computer Architecture CS6461 – Computer Architecture Fall 2015 Lecture 1 – Introduction Adopted from Professor Stephen.
Computing Systems: Next Call for Proposals Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
Parallel IO for Cluster Computing Tran, Van Hoai.
Tackling I/O Issues 1 David Race 16 March 2010.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Addressing modes, memory architecture, interrupt and exception handling, and external I/O. An ISA includes a specification of the set of opcodes (machine.
Productive Performance Tools for Heterogeneous Parallel Computing
Architecture & Organization 1
Scaling for the Future Katherine Yelick U.C. Berkeley, EECS
CS775: Computer Architecture
Architecture & Organization 1
CS/EE 6810: Computer Architecture
Chapter 1 Introduction.
Introduction to Heterogeneous Parallel Computing
Computer Evolution and Performance
COMS 361 Computer Organization
Chapter 4 Multiprocessors
TensorFlow: A System for Large-Scale Machine Learning
Presentation transcript:

Computing Systems Roadmap and its Impact on Software Development Michael Ernst, BNL HSF Workshop at SLAC January, 2015

Processor Die Scaling Package Power/total Energy consumption limits number of Transistors – Adding more cores & operating chips at highest frequency power consumption would be prohibitive – Keeping power within reasonable bounds severely limits improvement in microprocessor performance

Optimization Traditional approach: invest max transistors in the 90% case to increase single-thread performance that can be applied broadly New scaling regime (slow transistors, energy improvements) – Makes no sense to add transistors to a single core as energy efficiency suffers – Using additional transistors to build more cores produces limited benefit Increased performance apps w/ thread parallelism 90/10 optimization no longer applies – Instead – Optimizing w/ a 10% accelerator for a 10% case, then another for different 10% case, then another … often produces a system w/ better overall energy efficiency and performance -> “10 x 10 optimization” – Operating the chip w/ 10% of transistors active / 90% inactive, but a different 10% at each point in time – 20 generations into Moore’s Law have shifted the balance Using a fraction of transistors on chip seems to be the right solution

System-on-a Chip (TI)

Energy Efficiency is Driver for Performance Reality of finite energy budget for processors must produce quantitative shift in how chip architects think – Energy efficiency is key metric for design – Energy-proportional computing must be ultimate goal for H/W architecture and software-application design While this is recognized in macro-scale computing (large-scale data centers), the idea of micro-scale energy-efficient computing in Microprocessors is even more challenging – With fixed energy budget this corresponds directly to higher performance – Quest for extreme energy efficiency is ultimate driver for performance

Software Challenges renewed: Programmability vs. Efficiency End of scaling of single-thread performance means major software challenges – Shift to symmetric parallelism caused greatest software challenge in history of computing – Pressure on energy-efficiency will lead to extensive use of heterogeneous cores and accelerators – Need/need to adopt high-level “productivity” languages built on advanced interpretive and compiler technologies and increasing use of dynamic translation techniques Trend: higher-level programming, extensive customization through libraries, sophisticated automated performance search techniques (i.e. autotuning)

Logic Organization Challenges, Trends, Directions

Things that may Change Due to energy consumption constraints Drop H/W support for single flat address space, single- memory hierarchy, steady rate of execution Future systems will place components under S/W control – Requires sophisticated S/W tools to manage H/W boundaries and irregularities w/ greater energy efficiency – High-performance applications may manage these complexities explicitly – Architectural features shift responsibility for distribution of computation and data across compute and storage elements of Microprocessors to software – Improves energy efficiency but requires advances in applications, compilers, runtime environments and OSs to understand and predict application/workload behavior Advances require radical research breakthroughs & major changes in S/W practices (next slide)

Software Challenges, Trends, Directions

Conclusions - Microprocessors Great old days of Moore’s Law scaling delivered 1,000 fold performance improvement in 20 years Pretty good new days will be more difficult – Frequency of operation will increase slowly – Few large cores, large number of small cores operating at low frequency and low voltage – Aggressive use of accelerators will yield highest performance – Efficient data orchestration w/ more efficient memory hierarchies and new types of interconnects tailored for locality will be critical Depends on sophisticated S/W to place computation and data so as to minimize data movement – Programming Systems will have to comprehend restrictions and provide Tools & Environments to harvest the performance Use of materials & technologies other than Si CMOS for processor design will face challenges rendering the ones we’re fighting with today a warm-up exercise for what lies ahead

Data Storage and Data Access POSIX I/O becoming serious Impediment to I/O performance and scaling – Hasn’t changed much in 26 years though world of storage evolves – Change or relaxation could spur development of new storage mechanisms to improve application performance, management, reliability, portability, and scalability ODMS not successful but provided valuable lessons Object Storage Technology – Scaling shared storage to extraordinarily levels of b/w & capacity w/o sacrificing reliability or simple, low cost operations – Key properties include variable length ordered sequence of addressable bytes, embedded management of data layout, extensible attributes and fine grain device-enforced access control

Resource Provisioning HEP facing Transition to more shared and Opportunistic Resources provide through variety of Interfaces – Effective use of diverse environments – Perform provisioning of variety of different processing and storage resources across dedicated, specialized, contributed, opportunistic and purchased resources  Compared to present situation Computing in HEP will become a much less Static System  Resource Providers will be continuously flowing in and out of Distributed Computing Environments  Key Issue is the Time it needs to register Resources and adopt Applications  Should be no longer than 10% of Execution Time  APIs are key for creating new composite applications w/o developer needing to know anything about underlying compute, network, and storage resources. Application APIs make SaaS possible

Networking Next gen advanced Networked Apps will require network capabilities and services far beyond what’s available from networks today Dynamic network service topologies (overlays) with replication capabilities Resource management and optimization algorithms (e.g. available network resources, load on depots) Use of multi-constraint path finding algorithms to optimize solutions Network caching to optimize transfers and data access – Let the network learn what jobs need (assuming overlapping interest) – Dynamic, self-learning caches at major network exchange points – “Named Data Networks” support content distribution Requests are routed based on content rather than file names mapped to IP addresses