Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted.

Slides:



Advertisements
Similar presentations
IBM Software Group ® Integrated Server and Virtual Storage Management an IT Optimization Infrastructure Solution from IBM Small and Medium Business Software.
Advertisements

Combining the strengths of UMIST and The Victoria University of Manchester Matthew Livesey, Hemanth John Jose and Yongping Men COMP60611 – Future of large-scale.
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (
CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Parallel Research at Illinois Parallel Everywhere
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Classification of Distributed Systems Properties of Distributed Systems n motivation: advantages of distributed systems n classification l architecture.
Lecture Objectives: 1)Explain the limitations of flash memory. 2)Define wear leveling. 3)Define the term IO Transaction 4)Define the terms synchronous.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
1: Operating Systems Overview
CS 300 – Lecture 22 Intro to Computer Architecture / Assembly Language Virtual Memory.
1 Interfacing Processors and Peripherals I/O Design affected by many factors (expandability, resilience) Performance: — access latency — throughput — connection.
OPERATING SYSTEM OVERVIEW
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
DISTRIBUTED COMPUTING
Computer System Architectures Computer System Software
Yavor Todorov. Introduction How it works OS level checkpointing Application level checkpointing CPR for parallel programing CPR functionality References.
N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.
A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.
1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Priority Research Direction (use one slide for each) Key challenges -Fault understanding (RAS), modeling, prediction -Fault isolation/confinement + local.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
Mayuresh Varerkar ECEN 5613 Current Topics Presentation March 30, 2011.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Copyright © Clifford Neuman and Dongho Kim - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Advanced Operating Systems Lecture.
IT253: Computer Organization
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Advanced Principles of Operating Systems (CE-403).
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day14:
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
1: Operating Systems Overview 1 Jerry Breecher Fall, 2004 CLARK UNIVERSITY CS215 OPERATING SYSTEMS OVERVIEW.
Workshop on Parallelization of Coupled-Cluster Methods Panel 1: Parallel efficiency An incomplete list of thoughts Bert de Jong High Performance Software.
Distributed Computing Systems CSCI 6900/4900. Review Distributed system –A collection of independent computers that appears to its users as a single coherent.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Parallel IO for Cluster Computing Tran, Van Hoai.
Distributed Computing Systems CSCI 6900/4900. Review Definition & characteristics of distributed systems Distributed system organization Design goals.
Background Computer System Architectures Computer System Software.
Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
COMPSCI 110 Operating Systems
Memory COMPUTER ARCHITECTURE
Chapter 1: Introduction
Introduction to Operating Systems
O.S Lecture 13 Virtual Memory.
Lecture 24: Memory, VM, Multiproc
CSC3050 – Computer Architecture
Database System Architectures
Distributed Systems and Concurrency: Distributed Systems
Facts About High-Performance Computing
Presentation transcript:

Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted by: KAPIL CHOGGA CAO JIANFENG RAMASESHAN KANNAN

Hardware issues for large scale parallel computing. Cost, Power and Processor Challenge The Memory and Storage Challenge Communication Resiliency Challenge -Power consumption is now a critical issue. -Power, required cooling affect density and floor space are other issues. -For example, The 10 petaflop Opteron- based system was estimated to cost $1.8 billion and required 179 megawatts to operate. This kind of approach is not feasible. -Use of Smaller Processors Energy Efficient [Chandrakasan et al1992] Large processors can have limitations of clock speed Highest performance per unit area for parallel codes Smaller is easily manageable. (in case of defect, it might be easy to deal with it.) -FPGAs is an option. -Different or same processors Amdahl’s Law [Hennessy and Patterson 2007] suggests that heterogenous many core systems yield better performance. -major consequence of the power challenge. - The currently available main memories (DRAM) and disk drives (HDD) consume way too much power. main memories (DRAM)disk drives (HDD) - New technologies are needed. -Exascale requires higher bandwidth. -Higher band width can be achieved by point to point connectivity between cores. (new ways to connect cores is required) -Chip-scale multiprocessors (CMPs) provides greater inter- core bandwidth and less Inter- core latencies -Synchronization Using Transactional Memory( to avoid locks) -The problem with tightly coupled designs is that any delays in moving information from any node to any other node can cause a delay for all the nodes. In other words, small delays can quickly add up to big drops in performance. -Resilience the ability of a system (with such huge number of components) to continue operations in the presence of faults. -An exascale system must be truly autonomic in nature, constantly aware of its status, and optimizing and adapting itself to rapidly changing conditions, including failures of its individual components. autonomic

Software issues for large scale parallel computing. Security Synchronicity -Data input/output is a considerable problem on petascale machines. As a trivial example, imagine a 100K machine in which all processors try to open a file for reading. The resulting file system storm would probably swamp any single-interface storage server. Furthermore, without intelligent file system semantics, 100K copies of exactly the same file could be pushed through the network. -The amount of data that can be generated by a petascale machine is staggering. There should be one dedicated I/O node for every 8 compute node. -Obviously, no single fileserver can currently handle data input in the range of 100GB/s. Thus, file I/O must be parallelized. A dedicated parallel filesystem has become a standard component for leadership-class architectures. -TLB is a cache used to improve the speed of virtual address translation. One major challenge is to avoid TLB trash -Cache pollution occurs when multiple programs attempt to use the same processor core cache. Cache "pollution" is bad and techniques to avoid it should be developed. -The crash of one component in a browser, such as the Acrobat Reader or the Flash Player, should not cause the entire browser or worse yet, the entire machine to falter. -Also important is how the petascale OS coordinates its fault response with other parts of the system. The most common and robust method for providing fault tolerance in scientific applications is the checkpoint/restart (CPR). -Increasing Need of Protection required because of enormous users; their privacy and security. Handling I/OFault Tolerance

References: Operating System Issues for Petascale Systems Argonne National Laboratory {beckman, iskra, kazutomo, Software Challenges for Extreme Scale Computing: Going from Petascale to Exascale Systems Michael A. Heroux, Sandia National Laboratories The Landscape of Parallel Computing Research: A View from Berkeley Irving Wladawsky-Berger Productive Petascale Computing: Requirements, Hardware, and Software