Presentation is loading. Please wait.

Presentation is loading. Please wait.

CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Disks.

Similar presentations


Presentation on theme: "CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Disks."— Presentation transcript:

1 CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Disks

2 CIT 470: Advanced Network and System AdministrationSlide #2 Topics 1.Disk interfaces 2.Disk components 3.Performance 4.Reliability 5.Partitions 6.RAID 7.Adding a disk 8.Logical volumes 9.Filesystems 10.Storage Management

3 CIT 470: Advanced Network and System AdministrationSlide #3 Volumes A volume is a chunk of storage as seen by the server. A disk. A partition. A RAID set. Logical Unit Numbers (LUNs) identify volumes. A volume is formatted with a filesystem. ext2 and ext3 ZFS FAT NTFS ISO9660

4 CIT 470: Advanced Network and System AdministrationSlide #4 Disk Interfaces SCSI Standard interface for servers. IDE Standard interface for PCs. Fibre Channel (FC-AL) High bandwidth Can run SCSI or IP iSCSI SCSI over fast (e.g., 10-gigabit) IP network equipment. USB Fast enough for slow devices on PCs.

5 CIT 470: Advanced Network and System AdministrationSlide #5 SCSI Small Computer Systems Interface Fast, reliable, expensive. A bus, not a simple PC to device interface. Each device has a target # ranging 0-7 or Devices can communicate directly w/o CPU. Many versions Original: SCSI-1 (1979) 5MB/s Current: SCSI-3 (2001) 320MB/s Serial Attached SCSI (SAS) Up to 128 devices Up to 750 MB/s full duplex.

6 CIT 470: Advanced Network and System AdministrationSlide #6 IDE Integrated Drive Electronics / AT attachment Slower, less reliable, cheap. Only allows 2 devices per interface. ATAPI standard added removable devices. Many versions Original: IDE / ATA (1984) Current: Ultra-ATA/ MB/s Serial ATA Up to 128 devices. 150 MB/s (SATA-1) and 300 MB/s (SATA-2)

7 CIT 470: Advanced Network and System AdministrationSlide #7 IDE vs. SCSI SCSI offers better performance/scale Faster bus Faster hard drives (up to 15,000rpm). Lower CPU usage Better handling of multiple requests. Cheaper IDE often best for workstations. Convergence SATA2 and SAS converging on a single std.

8 CIT 470: Advanced Network and System AdministrationSlide #8 Hard Drive Components

9 CIT 470: Advanced Network and System AdministrationSlide #9 Hard Drive Components Actuator Moves arm across disk to read/write data. Arm has multiple read/write heads (often 2/platter.) Platters Rigid substrate material. Thin coating of magnetic material stores data. Coating type determines areal density: Gbits/in2 Spindle Motor Spins platters from ,000 rpm. Speed determines disk latency. Cache 8-32MB of cache memory Reliability: write-back vs. write-through

10 CIT 470: Advanced Network and System AdministrationSlide #10 Disk Information: hdparm # hdparm -i /dev/hde /dev/hde: Model=WDC WD1200JB-00CRA1, FwRev=17.07W17, SerialNo=WD-WMA8C Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq } RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40 BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=off CurCHS=16383/16/63, CurSects= , LBA=yes, LBAsects= IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 AdvancedPM=no WriteCache=enabled Drive conforms to: device does not report version: * signifies the current active mode

11 CIT 470: Advanced Network and System AdministrationSlide #11 Disk Performance Seek Time Time to move head to desired track (3-8 ms) Rotational Delay Time until head over desired block (8ms for 7200) Latency Seek Time + Rotational Delay Throughput Data transfer rate ( MB/s)

12 CIT 470: Advanced Network and System AdministrationSlide #12 Latency vs. Throughput Which is more important? Depends on the type of load. Sequential access – Throughput Multimedia on a single user PC Random access – Latency Most servers How to improve performance Faster disks (15 krpm vs 7200rpm) Caching (disk, controller, server OS, client OS) More spindles (disks). More disk controllers.

13 CIT 470: Advanced Network and System AdministrationSlide #13 Disk Performance: hdparm # hdparm -tT /dev/sda /dev/sda: Timing cached reads: 1954 MB in 2.00 seconds = MB/sec Timing buffered disk reads: 268 MB in 3.02 seconds = MB/sec

14 CIT 470: Advanced Network and System AdministrationSlide #14 Reliability MTBF Average time between failures (>100,000 hours). Real failure curves Early phase: high failure rate from defects. Constant failure rate phase: MTBF valid. Wearout phase: high failure rate from wear. Failures more likely on traumatic events. Power on/off. Systems often wear out before MTBF.

15 CIT 470: Advanced Network and System AdministrationSlide #15 Partitions and the MBR 4 primary partitions. One can be used as an extended partition, which is a link to an Extended boot record on the 1 st sector of that partition. Each logical partition is described by its own EBR, which links to the next EBR.

16 CIT 470: Advanced Network and System AdministrationSlide #16 Extended Partitions and EBRs There is only one extended partition. –It is one of the primary partitions. –It contains one or more logical partitions. –It should contain all disk space not used by the other primary partitions. EBRs contain two entries. –The first entry describes a logical partition. –The second entry points to the next EBR if there are more logical partitions after the current one.

17 CIT 470: Advanced Network and System AdministrationSlide #17 Why Partition? 1.Separate OS from user files, to allow user backups + OS upgrades w/o problems. 2.Have a faster swap area for virtual memory. 3.Improve performance by keeping filesystem tables small and keeping frequently used together files close together on the disk. 4.Limit the effect of disk full issues, often caused by log or cache files. 5.Multi-boot systems with multiple OSes.

18 CIT 470: Advanced Network and System AdministrationSlide #18 RAID Redundant Array of Independent Disks Combine physical disks into single logical unit. Can be implemented in hardware or software. Hardware RAID controllers: Caching Automate rebuilding of arrays Advantages Capacity Reliability Throughput

19 CIT 470: Advanced Network and System AdministrationSlide #19 RAID Levels LevelMinDescription JBOD2Merge disks for capacity, no striping. RAID 02Striped for performance + capacity. RAID 12Mirrored for fault tolerance. RAID 33Striped set with dedicated parity disk. RAID 43Block instead of byte level striping. RAID 53Striped set with distributed parity. RAID 64Striped set with dual distributed parity.

20 CIT 470: Advanced Network and System AdministrationSlide #20 Striping Distribute data across multiple disks. Improve I/O by accessing disks in parallel. –Independent requests can be serviced in parallel by separate disks. –Single multi-block requests can be serviced by multiple disks. Performance vs. reliability –Performance increases with # disks. –Reliability decreases with # disks.

21 CIT 470: Advanced Network and System AdministrationSlide #21 Parity Store extra bit with each chunk of data. 7-bit dataeven parityodd parity Odd parity add 0 if # of 1s is odd add 1 if # of 1s is even Even parity add 0 if # of 1s is even add 1 if # of 1s is odd

22 CIT 470: Advanced Network and System AdministrationSlide #22 Error Detection with Parity Even: every byte must have even # of 1s. What if you read a byte with an odd # of 1s? –Its an error. –An odd # of bits were flipped. What if you read a byte with an even # of 1s? –It may be correct. –It may be an error where an even # of bits are bad.

23 CIT 470: Advanced Network and System AdministrationSlide #23 Error Correction XOR each block to get parity information. XOR with parity block to retrieve missing block on bad drive.

24 CIT 470: Advanced Network and System AdministrationSlide #24 RAID 0: Striping, no Parity Performance Throughput = n * disk speed Reliability Lower reliability. If one disk lost, entire set is lost. MTBF = (avg MTBF)/# disks Capacity n * disk size

25 CIT 470: Advanced Network and System AdministrationSlide #25 RAID 1: Disk Mirroring Performance –Reads are faster since read operations will return after first read is complete. –Writes are slower because write operations return after second write is complete. Reliability –System continues to work after one disk dies. –Doesnt protect against disk or controller failure that corrupts data instead of killing disk. –Doesnt protect against human or software error. Capacity –n/2 * disk size

26 CIT 470: Advanced Network and System AdministrationSlide #26 RAID 3: Striping + Dedicated Parity Reliability Survive failure of any 1 disk. Performance Striping increases performance, but Parity disk must be accessed on every write. Parity calculation decreases write performance. Good for sequential reads (large graphics + video files.) Capacity (n-1) * disk size

27 CIT 470: Advanced Network and System AdministrationSlide #27 RAID 4: Stripe + Block Parity Disk Identical to RAID 3 except uses block striping instead of byte striping.

28 CIT 470: Advanced Network and System AdministrationSlide #28 RAID 5: Stripe + Distributed Parity Reliability Survive failure of any 1 disk. Performance Fast reads (RAID 0), but slow writes. Like RAID 4 but without bottleneck of a single parity disk. Still have to read blocks + write parity block if alter any data blocks. Capacity (n-1) * disk size

29 CIT 470: Advanced Network and System AdministrationSlide #29 RAID 6: Striped w/ Dual Parity Like RAID 5 but with two parity blocks. Can survive failure of two drives at once.

30 CIT 470: Advanced Network and System AdministrationSlide #30 Nested RAID Levels Many RAID systems can use both –Physical drives. –RAID sets. as RAID volumes: –Allows admins to combine advantages of levels. –Nested levels named by combination of levels, e.g. RAID 01 or RAID 0+1

31 CIT 470: Advanced Network and System AdministrationSlide #31 RAID 01 (0+1) Mirror of stripes. –If disk fails in RAID 0 array, can be tolerated by using disk from other RAID 0. –Cannot tolerate 2 disk failures unless both from same stripe.

32 CIT 470: Advanced Network and System AdministrationSlide #32 RAID 10 (1+0) Stripe of mirrors. –Can tolerate all but one drive can failing from each RAID 1 set. –Uses more disk space than RAID 5 but provides higher performance. –Highest capacity, performance, and cost.

33 CIT 470: Advanced Network and System AdministrationSlide #33 RAID 51 (5+1) Mirror of RAID 5s Capacity = (n/2-1) * disk size Min disks: 6

34 CIT 470: Advanced Network and System AdministrationSlide #34 RAID Failures RAID sets work after single disk failure Except RAID 0 Operate in degraded mode RAID set rebuilds after bad disk replaced Can take hours to rebuild parity/mirror data. Some hardware allows hot swapping, so server doesnt have to be rebooted to replace disk. Some hardware supports a hot spare disk that will be used immediately on disk failure for rebuild.

35 CIT 470: Advanced Network and System AdministrationSlide #35 You still need backups Human and software errors –RAID wont protect you from rm –rf / or copying over the wrong file. System crash –Crashes can interrupt write operations, leading to situation where data is updated but parity not. Correlated disk failures –Accidents (power failures, dropping the machine) can impact all disks at once. –Disks bought at same time often fail at same time. Hardware data corruption –If a disk controller writes bad data, all disks will have the bad data.

36 CIT 470: Advanced Network and System AdministrationSlide #36 Logical Volumes What are logical volumes? Appear to user as a physical volume. But can span multiple partitions and/or disks. Why logical volumes? Aggregate disks for performance/reliability. Grow and shrink logical volumes on the fly. Move logical volumes btw physical devices. Replace volumes w/o interrupting service.

37 CIT 470: Advanced Network and System AdministrationSlide #37 LVM

38 CIT 470: Advanced Network and System AdministrationSlide #38 LVM Components Logical Volume Group (LVG) Set of physical volumes (partitions or disks.) May be divided into logical volumes (LVs.) LVs made up of fixed sized logical extents Each LE is 4MB. Physical extents are the same size.

39 CIT 470: Advanced Network and System AdministrationSlide #39 Mapping Modes Linear Mapping LVs assigned to continguous areas of PV space. Striped Mapping LEs interleaved across PVs to improve performance.

40 CIT 470: Advanced Network and System AdministrationSlide #40 Setting up a LVG and LV 1.Initialize physical volumes pvcreate /dev/hda1 pvcreate /dev/hdb1 2.Initialize a volume group vgcreate nku_proj /dev/hda1 /dev/hdb1 Use vgextend to add more PVs later. 3.Create logical volumes lvcreate -n nku1 --size 100G nku_proj1 4.Create filesystem mkfs –v –t ext3 /dev/nku_proj/nku1

41 CIT 470: Advanced Network and System AdministrationSlide #41 Extending a LV Set absolute size lvextend –L120G /dev/nku_proj/nku1 Or set relative size lvextend –L+20G /dev/nku_proj/nku1 Expand the filesystem without unmounting ext2online –v /dev/nku_proj/nku1 Check size df –k

42 CIT 470: Advanced Network and System AdministrationSlide #42 Adding a Disk Install new hardware Verify disk recognized by BIOS. Boot Verify device exists in /dev Partition fdisk /dev/sdb Create filesystem mkfs –v –t ext3 /dev/sdb1 Add entry to /etc/fstab /dev/sdb1 /proj ext3 defaults 0 2 mount -a

43 CIT 470: Advanced Network and System AdministrationSlide #43 Filesystems ext3fs Current common Linux filesystem. Journaling eliminates need for regular fscking. ext2fs Old Linux non-fragmenting fast filesystem.

44 CIT 470: Advanced Network and System AdministrationSlide #44 inode File consists of inode + data blocks. inodes are static: –table has fixed # inodes –size: 128 bytes inode contains –UID of owner –GID of group –Permissions –Timestamps –Reference count –Block pointers

45 CIT 470: Advanced Network and System AdministrationSlide #45 When dont you need a filesystem? Swap space mkswap –v /dev/sdb1 Server applications Oracle VMWare Server

46 CIT 470: Advanced Network and System AdministrationSlide #46 Swap Can use swapfile instead of swap partition dd if=/dev/zero of=/swapfile bs=1024k count=512 mkswap /swapfile Enable swap swapon /swapfile swapon /dev/sda2 Disable swap swapoff /swapfile swapoff /dev/sda2 Check swap resource usage cat /proc/swaps

47 CIT 470: Advanced Network and System AdministrationSlide #47 Mounting To use a filesystem mount /dev/sda1 /mnt df /mnt Automatic mounting Add an entry in /etc/fstab Unmount umount /dev/sda1 Cannot unmount a volume in use.

48 CIT 470: Advanced Network and System AdministrationSlide #48 fstab # /etc/fstab: static file system information. # proc /proc proc defaults 0 0 /dev/hdc1 / ext3 defaults 0 1 /dev/hdc5 /win vfat user,rw 0 0 /dev/hdc7 none swap sw 0 0 /dev/hdc8 /var ext3 defaults 0 2 /dev/hdc9 /home ext3 defaults 0 2 /dev/hda /media/cdrom0 iso9660 ro,user 0 0 /dev/fd0 /media/floppy0 auto rw,user 0 0

49 CIT 470: Advanced Network and System AdministrationSlide #49 UUIDs # /etc/fstab: static file system information. # UUID=fbdfebe2-fbde-42c9-963d-12428b642f1d / ext3 defaults 0 1 UUID=a1858e04-78b9-460b-a6cb-3f1dfe3fa16e /home ext3 defaults 0 2 UUID=c4f14e27-96cd-420c bd5298e3f76 none swap sw 0 0 Universally Unique IDentifiers –128-bit numbers written as 32 hex digits. –3.4 × possible UUIDs Used to identify devices on Linux –To find UUID for a specific device: vol_id –u /dev/sda1 –All devices: ls –l /dev/disk/by-uuid

50 CIT 470: Advanced Network and System AdministrationSlide #50 fsck : check + repair fs Filesystem corruption sources Power failure System crash Types of corruption Unreferenced inodes. Bad superblocks. Unused data blocks not recorded in block maps. Data blocks listed as free that are used in files. fsck can fix these and more Asks user to make more complex decisions. Stores unfixable files in lost+found.

51 CIT 470: Advanced Network and System AdministrationSlide #51 Cost of Storage Disks are cheap 1TB SATA disks cost $100 in late Storage is expensive 20% of cost is hard disks 80% of cost is overhead Servers Power AC Support Backups

52 CIT 470: Advanced Network and System AdministrationSlide #52 Storage Management Group-based Storage –Ideally, each group has its own fileserver. –Group activities can interfere with each other: Capacity (filling disks) Performance Storage Needs Assessment –Ask customers for anticipated storage growth. –Monitor servers to measure current growth. Storage SLA –Availability –Performance (response time) –Cost and time to add new storage.

53 CIT 470: Advanced Network and System AdministrationSlide #53 References 1.Aeleen Frisch, Essential System Administration, 3 rd edition, OReilly, Charles M. Kozierok, Reference GuideHard Disk Drives, A.J. Lewis, LVM HOWTO, HOWTO/index.html, 2005.http://www.tldp.org/HOWTO/LVM- HOWTO/index.html 4.H. Mauelson and M. OKeefe, The Linux Logical Volume Manager, Red Hat Magazine, July Evi Nemeth et al, UNIX System Administration Handbook, 3 rd edition, Prentice Hall, Octane, SCSI Technology Primer, RedHat, RHEL4 System Administration Guide, Manual/sysadmin-guide/, Manual/sysadmin-guide/ 8.Wikipedia, RAID,


Download ppt "CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Disks."

Similar presentations


Ads by Google