High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Threads, SMP, and Microkernels
Distributed Processing, Client/Server and Clusters
Database Architectures and the Web
MACHINE-INDEPENDENT VIRTUAL MEMORY MANAGEMENT FOR PAGED UNIPROCESSOR AND MULTIPROCESSOR ARCHITECTURES R. Rashid, A. Tevanian, M. Young, D. Golub, R. Baron,
IO-Lite: A Unified Buffering and Caching System By Pai, Druschel, and Zwaenepoel (1999) Presented by Justin Kliger for CS780: Advanced Techniques in Caching.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
Computer Systems/Operating Systems - Class 8
Introduction to Operating Systems CS-2301 B-term Introduction to Operating Systems CS-2301, System Programming for Non-majors (Slides include materials.
Distributed Processing, Client/Server, and Clusters
File System Implementation
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.
Computer System Overview Chapter 1. Basic computer structure CPU Memory memory bus I/O bus diskNet interface.
1 I/O Management in Representative Operating Systems.
PRASHANTHI NARAYAN NETTEM.
DISTRIBUTED COMPUTING
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Chapter 3 Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
DISKS IS421. DISK  A disk consists of Read/write head, and arm  A platter is divided into Tracks and sector  The R/W heads can R/W at the same time.
Computer System Architectures Computer System Software
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Distributed File Systems
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
ITEC 502 컴퓨터 시스템 및 실습 Chapter 11-2: File System Implementation Mi-Jung Choi DPNM Lab. Dept. of CSE, POSTECH.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.
System Components ● There are three main protected modules of the System  The Hardware Abstraction Layer ● A virtual machine to configure all devices.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 4 Computer Systems Review.
M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.
The Mach System Silberschatz et al Presented By Anjana Venkat.
UNIX & Windows NT Name: Jing Bai ID: Date:8/28/00.
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Operating Systems Lecture 9 Introduction to Paging Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of.
Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Introduction to Operating Systems Concepts
Module 3: Operating-System Structures
Memory COMPUTER ARCHITECTURE
Understanding Operating Systems Seventh Edition
Chapter 1: Introduction
Operating System Structure
CSI 400/500 Operating Systems Spring 2009
Storage Virtualization
Threads, SMP, and Microkernels
Page Replacement.
Outline Midterm results summary Distributed file systems – continued
Overview Continuation from Monday (File system implementation)
Chapter 1 Introduction to Operating System Part 5
Chapter 2: Operating-System Structures
Chapter 2: Operating-System Structures
Presentation transcript:

High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center

2 Network RAM Introduction Remote Memory Paging Network Memory File Systems Applications of Network RAM in Database Summary

3 Introduction (I) Discuss efficient distribution & exploitation of main memory in a cluster Network RAM or Network Memory the aggregate main memory of the workstation in a cluster local memory vs. remote memory A significant amount of resources unused in the cluster at any given time 80-90% of workstation will be idle in the evening & late afternoon 1/3 of workstations are completely unused, even at the busiest times of day 70-85% of the Network RAM in a cluster is idle Consist of volatile memory but still can offer good reliability use some form of redundancy, such as replication of data or parity offer an excellent cost-effective alternative to magnetic disk storage

4 Introduction (II) Technology trend High performance networks switched high performance local area network : ex) myrinet low latency communication and messaging protocols : ex) U-Net latency of Active Message on ATM (20  s) vs. magnetic disk latency (10ms) High performance workstations memories with low cost Network RAM even more attractive The I/O processor performance gap the performance improvement rate for microprocessor: % per year disk latency: 10% per year network latency: 20% per year network bandwidth: 45% per year

5 Memory Hierarchy Figures for a Typical Workstation Cluster A typical NOW with 100 workstations A network of 20  s latency & 15 MB/s bandwidth

6 Introduction (III) Issues in Using Network RAM Using network RAM consists essentially of locating & managing effectively unused memory resources in a workstation cluster keeps up-to-date information about unused memory exploit memory in a transparent way so that local users will not notice any performance degradation require the cooperation of several different systems Common use of Network RAM Remote memory paging performance: between the local RAM and the disk Network memory filesystems can be used to store temporary data by providing Network RAM with a filesystem abstraction Network memory DBs can be used as a large DB cache and/or a fast non-volatile data buffer to store DB sensitive data

7 Remote Memory Paging (I) 64-bit address space heavy memory usage application ’ s working sets have increased dramatically sophisticated GUI, multimedia, AI, VLSI design tools, DB and transaction processing systems Network RAM equivalent of remote memory paging (cache) – (main memory) – (network RAM) – (disk) faster than the disk all memory-intensive applications could benefit from Network RAM useful for mobile or portable computers, where storage space is limited

8 Changing the Memory Hierarchy

9 Remote Memory Paging (II) Implementation Alternatives Main idea of remote memory paging start memory server processes on workstations that are either idle or lightly loaded and have sufficient amount of unused physical memory A policy for the page management which local pages? which remote nodes? client periodically asks the global resource manager distribute the pages equally among the servers how server load? negotiation from server to client & migration transfer data to server ’ s disk

10 Remote Paging System Structure

11 Remote Memory Paging (III) Implementation Alternatives the client workstation has a remote paging subsystem user-level in the OS enable it to use network memory as backing store a global resource management process running on a workstation in the cluster hold information about the memory resources & how they are being utilized in the cluster a configuration server (registry server) process responsible for the setup of the network memory cluster is contacted when a machine wants to enter or leave the cluster authenticated according to the system administrator ’ s rules

12 Remote Memory Paging (IV) Client remote paging subsystem 2 main components a mechanism for intercepting & handling page faults a page management policy keep track of which local pages are stored on which remote nodes 2 alternative implementations of transparent remote paging subsystem on the client workstation using the mechanisms provided by the existing client ’ s OS by modifying the virtual memory manager & its mechanisms in the client ’ s OS kernel

13 Transparent Network RAM Implementation using Existing OS Mechanisms (I) User Level Memory Management each program uses new memory allocation & management calls, which are usually contained in a library dynamically or statically linked to the user applications require some program code modifications prototype by E. Anderson a custom malloc library using TCP/IP, perform a 4K page replacement in 1.5 – 8.0 ms (1.3 – 6.6 faster) mechanism intercept page faults through the use of segmentation faults a segmentation fault signal from OS memory manager this signal intercepted by the user level signal handler routines page management algorithm is called to locate & transfer a remote page into local memory, replacing a local page

14 User Level Memory Management Solution ’ s Structure

15 Transparent Network RAM Implementation using Existing OS Mechanisms (II) Swap Device Driver the OS level implementation all modern operating system provide virtual memory manager which swap pages to the first device on the list of swap devices which will be the remote memory paging device drivers create a custom block device & configure the virtual memory manager to use it for physical memory backing store when run out of physical memory, it will swap pages to the first device on the list (may be the remote memory paging device driver) if and when this Network RAM device runs out of space, the virtual memory manager will swap pages to the next device simple in concept & require one minor modification to the OS the addition of a device driver

16 Swap Device Driver Solution ’ s Structure

17 Transparent Network RAM Implementation using Existing OS Mechanisms (III) Page Management Policies which memory server should the client select each time to send pages global resource management process distribute the pages equally among the servers handling of a server ’ s pages when its workstation becomes loaded inform the client & the other servers of its loaded state transfer them to the server ’ s disk & deallocate them each time they are referenced transfer all the pages back to the client when all servers are full store the pages on disk

18 Transparent Network RAM Implementation Through OS Kernel Modification (I) OS modification software solution offers the highest performance without modifying user program an arduous task not portable between architectures Kernel modification provide global memory management in a workstation cluster using a single unified memory manager as low-level component of OS that runs on each workstation can integrate all the Network RAM for use by all higher- level functions, including virtual memory paging, memory mapped files and filesystem caching

19 Operating System Modification Architecture

20 Transparent Network RAM Implementation Through OS Kernel Modification (II) Global memory management Page fault in node P in GMS the faulted page is in the global memory of another node Q the faulted page is in the global memory nodes of node Q, but P ’ s memory contains only local pages the page is on disk the faulted page is a shared page in the local memory of another node Q

21 Boosting Performance with Subpages Subpages are transferred over the network much faster than whole pages not with magnetic disk transfers Using subpages allow a large window for the overlap of computation with communication

22 Reliability (I) Main disadvantages of remote memory paging security: can be resolved through the use of a registry reliability (fault-tolerant) Use a reliable device the simplest reliability policy reliable device such as the disk async write but, bandwidth limited can achieve high availability but simple performance under failure disk memory overhead (expensive) Replication – mirroring using page replication to remote memory without using disk (-) use a lot of memory space, high network overhead

23 Reliability (II) Simple parity extensively used in RAIDs, based on XOR operation reduce memory requirement (1+1/S) network bandwidth & computation overhead if two servers have failure??? Parity logging reduce the network overhead close to (1 + 1/S) difficult problems from page overwriting

24 An Example of the Simple Parity Policy (Each server in this example can store 1000 pages)

25 A Parity Logging Example with Four Memory Servers and One Parity Server

26 Remote Paging Prototypes (I) The Global Memory Service (GMS) A modified kernel solution for using Network RAM as a paging device (reliability) Write data asynchronously to disk The prototype was implemented on the DEC OSF/1 OS under real application workloads, show speedups of 1.5 to 3.5 respond effectively to load changes Generally very efficient system for remote memory paging

27 Pros and Cons of GMS

28 Remote Paging Prototypes (II) The Remote Memory Pager use the swap device driver approach the prototype was implemented on the DEC OSF/1 v3.2 OS consist of a swap device driver on the client machine & a user-level memory server program runs on the servers effective remote paging system no need to modify the OS

29 Pros and Cons of the Remote Memory Pager

30 Network Memory File System (I) Use the Network RAM as a filesystem cache, or directly as a faster-than-disk storage device for file I/O Using Network Memory as a File Cache all filesystem use a portion of the workstation ’ s memory as a filesystem cache two problem multiple cached copies waste local memory no knowledge of the file ’ s existence several ways to improve the filesystem ’ s caching performance eliminate the multiple cached copies create a global network memory filesystem cache ‘ cooperate caching ’

31 Network Memory File System (II) Network RamDisks Disk ’ s I/O performance problem Use reliable network memory for directly storing temporary files Similar to Remote Memory Paging Can be easily implemented without modifying OS (Device Driver) Can be accessed by any filesystem (NFS) Network RamDisks a block device that unifies all the idle main memories in a NOW under a disk interface behaves like any normal disk, but implemented in main memory (RAM) (diff) instead of memory pages, send disk blocks to remote memory disk blocks are much smaller than memory pages (512 or 1024 bytes) and on many OSs variable in size (from 512 bytes to 8 Kbytes)

32 Applications of Network RAM in Databases (I) Transaction-Based Systems to substitute the disk accesses with Network RAM accesses reducing the latency of a transaction a transaction during its execution makes a number of disk accesses to read its data, makes some designated calculations on that data, writes its results to the disk and, at the end, commits atomicity & recoverability 2 main areas where Network RAM can be used to boost a transactions-based system ’ s performance at the initial phase of a transaction when read requests from the disk are performed through the use of global filesystem cache speed up synchronous write operations to reliable storage at transaction commit time transaction-based systems make many small synchronous writes to stable storage

33 Applications of Network RAM in Databases (II) Transaction-Based Systems Steps performed in a transaction-based system that uses Network RAM at transaction commit time, the data are synchronously written to remote main memory concurrently, the same data are asynchronously sent to the disk the data have been safely written to the disk In the modified transaction-based system a transaction commits after the second step replicate the data in the main memories of 2 workstations Results the use of Network RAM can deliver up to 2 orders of magnitude higher performance

34 Summary Emergence of high speed interconnection network added a new layer in memory hierarchy (Network RAM) Boost the performance of applications remote memory paging file system & database Conclusion Using Network RAM results in significant performance improvement Integrating Network RAM in existing systems is easy device driver, loadable filesystem, user-level code The benefits of Network RAM will probably increase with time gap between memory and disk continue to be widen Future Trend Reliability & Filesystem Interface