1 Lecture: Storage, GPUs Topics: disks, RAID, reliability, GPUs (Appendix D, Ch 4)

Slides:



Advertisements
Similar presentations
1 Lecture 22: I/O, Disk Systems Todays topics: I/O overview Disk basics RAID Reminder: Assignment 8 due Tue 11/21.
Advertisements

CS224 Spring 2011 Computer Organization CS224 Chapter 6A: Disk Systems With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide.
Faculty of Information Technology Department of Computer Science Computer Organization Chapter 7 External Memory Mohammad Sharaf.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
RAID Redundant Array of Independent Disks
CSCE430/830 Computer Architecture
Chapter 6 External Memory Disk and RAID (Redundant Arrays of Independent Disks) CS-147 Fall 2010 Jonathan Wang.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
CSCE 212 Chapter 8 Storage, Networks, and Other Peripherals Instructor: Jason D. Bakos.
Computer ArchitectureFall 2007 © November 28, 2007 Karem A. Sakallah Lecture 24 Disk IO and RAID CS : Computer Architecture.
1 Storage (cont’d) Disk scheduling Reducing seek time (cont’d) Reducing rotational latency RAIDs.
Other Disk Details. 2 Disk Formatting After manufacturing disk has no information –Is stack of platters coated with magnetizable metal oxide Before use,
1 Lecture 26: Storage Systems Topics: Storage Systems (Chapter 6), other innovations Final exam stats:  Highest: 95  Mean: 70, Median: 73  Toughest.
1 Lecture 27: Disks, Reliability, SSDs, Processors Topics: HDDs, SSDs, RAID, Intel and IBM case studies Final exam stats:  Highest 91, 18 scores of 82+
Computer ArchitectureFall 2008 © November 12, 2007 Nael Abu-Ghazaleh Lecture 24 Disk IO.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
1 Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID.
S.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
Secondary Storage CSCI 444/544 Operating Systems Fall 2008.
I/O Systems and Storage Systems May 22, 2000 Instructor: Gary Kimura.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
Lecture 39: Review Session #1 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
Storage & Peripherals Disks, Networks, and Other Devices.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
CSE 321b Computer Organization (2) تنظيم الحاسب (2) 3 rd year, Computer Engineering Winter 2015 Lecture #4 Dr. Hazem Ibrahim Shehata Dept. of Computer.
L/O/G/O External Memory Chapter 3 (C) CS.216 Computer Architecture and Organization.
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
RAID COP 5611 Advanced Operating Systems Adapted from Andy Wang’s slides at FSU.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Disks Chapter 5 Thursday, April 5, Today’s Schedule Input/Output – Disks (Chapter 5.4)  Magnetic vs. Optical Disks  RAID levels and functions.
1 Lecture 26: Networks, Storage Topics: router microarchitecture, disks, RAID (Appendix D) Final exam: Monday 30 th Apr 10:30-12:30 Same rules as the midterm.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
RAID Disk Arrays Hank Levy. 212/5/2015 Basic Problems Disks are improving, but much less fast than CPUs We can use multiple disks for improving performance.
Auxiliary Memory Magnetic Disk:
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
1 Lecture 23: Storage Systems Topics: disk access, bus design, evaluation metrics, RAID (Sections )
My Coordinates Office EM G.27 contact time:
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
Part IV I/O System Chapter 12: Mass Storage Structure.
LECTURE 13 I/O. I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access.
1 Components of the Virtual Memory System  Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk.
1 Lecture: Networks, Disks, Datacenters, GPUs Topics: networks wrap-up, disks and reliability, datacenters, GPU intro (Sections , App D, Ch 4)
Magnetic Disks Have cylinders, sectors platters, tracks, heads virtual and real disk blocks (x cylinders, y heads, z sectors per track) Relatively slow,
CS Introduction to Operating Systems
A Case for Redundant Arrays of Inexpensive Disks (RAID) -1988
Multiple Platters.
Disks and RAID.
RAID Non-Redundant (RAID Level 0) has the lowest cost of any RAID
I/O System Chapter 5 Designed by .VAS.
Lecture 13 I/O.
RAID Disk Arrays Hank Levy 1.
Lecture 26: Multiprocessors
Lecture 28: Reliability Today’s topics: GPU wrap-up Disk basics RAID
RAID Disk Arrays Hank Levy 1.
CSE 451: Operating Systems Spring 2005 Module 17 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
Lecture 27: Pot-Pourri Today’s topics:
TECHNICAL SEMINAR PRESENTATION
UNIT IV RAID.
CSE 451: Operating Systems Autumn 2004 Redundant Arrays of Inexpensive Disks (RAID) Hank Levy 1.
Lecture 27: Multiprocessors
CSE 451: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Autumn 2009 Module 19 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
RAID Disk Arrays Hank Levy 1.
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
Presentation transcript:

1 Lecture: Storage, GPUs Topics: disks, RAID, reliability, GPUs (Appendix D, Ch 4)

2 Magnetic Disks A magnetic disk consists of 1-12 platters (metal or glass disk covered with magnetic recording material on both sides), with diameters between inches Each platter is comprised of concentric tracks (5-30K) and each track is divided into sectors (100 – 500 per track, each about 512 bytes) A movable arm holds the read/write heads for each disk surface and moves them all in tandem – a cylinder of data is accessible at a time

3 Disk Latency To read/write data, the arm has to be placed on the correct track – this seek time usually takes 5 to 12 ms on average – can take less if there is spatial locality Rotational latency is the time taken to rotate the correct sector under the head – average is typically more than 2 ms (15,000 RPM) Transfer time is the time taken to transfer a block of bits out of the disk and is typically 3 – 65 MB/second A disk controller maintains a disk cache (spatial locality can be exploited) and sets up the transfer on the bus (controller overhead)

4 RAID Reliability and availability are important metrics for disks RAID: redundant array of inexpensive (independent) disks Redundancy can deal with one or more failures Each sector of a disk records check information that allows it to determine if the disk has an error or not (in other words, redundancy already exists within a disk) When the disk read flags an error, we turn elsewhere for correct data

5 RAID 0 and RAID 1 RAID 0 has no additional redundancy (misnomer) – it uses an array of disks and stripes (interleaves) data across the arrays to improve parallelism and throughput RAID 1 mirrors or shadows every disk – every write happens to two disks Reads to the mirror may happen only when the primary disk fails – or, you may try to read both together and the quicker response is accepted Expensive solution: high reliability at twice the cost

6 RAID 3 Data is bit-interleaved across several disks and a separate disk maintains parity information for a set of bits For example: with 8 disks, bit 0 is in disk-0, bit 1 is in disk-1, …, bit 7 is in disk-7; disk-8 maintains parity for all 8 bits For any read, 8 disks must be accessed (as we usually read more than a byte at a time) and for any write, 9 disks must be accessed as parity has to be re-calculated High throughput for a single request, low cost for redundancy (overhead: 12.5%), low task-level parallelism

7 RAID 4 and RAID 5 Data is block interleaved – this allows us to get all our data from a single disk on a read – in case of a disk error, read all 9 disks Block interleaving reduces thruput for a single request (as only a single disk drive is servicing the request), but improves task-level parallelism as other disk drives are free to service other requests On a write, we access the disk that stores the data and the parity disk – parity information can be updated simply by checking if the new data differs from the old data

8 RAID 5 If we have a single disk for parity, multiple writes can not happen in parallel (as all writes must update parity info) RAID 5 distributes the parity block to allow simultaneous writes

9 Other Reliability Approaches High reliability is also expected of memory systems; many memory systems offer SEC-DED support – single error correct, double error detect; implemented with an 8-bit code for every 64-bit data word on ECC DIMMs Some memory systems offer chipkill support – the ability to recover from complete failure in one memory chip – many implementations exist, some resembling RAID designs Caches are typically protected with SEC-DED codes Some cores implement various forms of redundancy, e.g., DMR or TMR – dual or triple modular redundancy

10 SIMD Processors Single instruction, multiple data Such processors offer energy efficiency because a single instruction fetch can trigger many data operations Such data parallelism may be useful for many image/sound and numerical applications

11 GPUs Initially developed as graphics accelerators; now viewed as one of the densest compute engines available Many on-going efforts to run non-graphics workloads on GPUs, i.e., use them as general-purpose GPUs or GPGPUs C/C++ based programming platforms enable wider use of GPGPUs – CUDA from NVidia and OpenCL from an industry consortium A heterogeneous system has a regular host CPU and a GPU that handles (say) CUDA code (they can both be on the same chip)

12 The GPU Architecture SIMT – single instruction, multiple thread; a GPU has many SIMT cores A large data-parallel operation is partitioned into many thread blocks (one per SIMT core); a thread block is partitioned into many warps (one warp running at a time in the SIMT core); a warp is partitioned across many in-order pipelines (each is called a SIMD lane) A SIMT core can have multiple active warps at a time, i.e., the SIMT core stores the registers for each warp; warps can be context-switched at low cost; a warp scheduler keeps track of runnable warps and schedules a new warp if the currently running warp stalls

13 The GPU Architecture

14 Architecture Features Simple in-order pipelines that rely on thread-level parallelism to hide long latencies Many registers (~1K) per in-order pipeline (lane) to support many active warps When a branch is encountered, some of the lanes proceed along the “then” case depending on their data values; later, the other lanes evaluate the “else” case; a branch cuts the data-level parallelism by half (branch divergence) When a load/store is encountered, the requests from all lanes are coalesced into a few 128B cache line requests; each request may return at a different time (mem divergence)

15 GPU Memory Hierarchy Each SIMT core has a private L1 cache (shared by the warps on that core) A large L2 is shared by all SIMT cores; each L2 bank services a subset of all addresses Each L2 partition is connected to its own memory controller and memory channel The GDDR5 memory system runs at higher frequencies, and uses chips with more banks, wide IO, and better power delivery networks A portion of GDDR5 memory is private to the GPU and the rest is accessible to the host CPU (the GPU performs copies)

16 Advanced Courses For GPU architectures and programming, see Mary Hall’s CS 6235, Parallel Programming for Many-Core Arch Spr’13: CS 7810: Advanced Computer Architecture  Mo/We 11:50am-1:10pm  Core design, cache hierarchies, networks, memory systems, datacenters, etc.  Major course project that evaluates original ideas with simulators (often leads to publications)  One assignment  Take-home final

17 Title Bullet