Presentation is loading. Please wait.

Presentation is loading. Please wait.

EC2 demystification, server power efficiency, disk drive reliability CSE 490h, Autumn 2008.

Similar presentations


Presentation on theme: "EC2 demystification, server power efficiency, disk drive reliability CSE 490h, Autumn 2008."— Presentation transcript:

1 EC2 demystification, server power efficiency, disk drive reliability CSE 490h, Autumn 2008

2

3 Hardware Machine Platform OS Apps There’s no magic to an OS z How does an app do a file write?

4 Hardware Machine Platform OS Apps z How does an app do a file write? z What happens if the app tries to cheat?

5 Hardware Machine Platform OS Apps OS Apps VMM / Hypervisor z How does an app do a file write? There’s no magic to a VMM

6 Hardware Machine Platform OS Apps OS Apps VMM / Hypervisor z How does an app do a file write? z What happens when the guest OS attempts a disk write?

7 Hardware Machine Platform OS Apps OS Apps VMM / Hypervisor z How does an app do a file write? z What happens when the guest OS attempts a disk write? z What happens if the app tries to cheat?

8 Hardware Machine Platform OS Apps VMM / Hypervisor There’s no magic to creating a new VM Control Interface (console and network)

9 Hardware Machine Platform OS Apps VMM / Hypervisor Control Interface (console and network)

10 There’s no magic to creating a bootable system image z Original UNIX file system y Boot block x can boot the system by loading from this block y Superblock x specifies boundaries of next 3 areas, and contains head of freelists of inodes and file blocks y i-node area x contains descriptors (i-nodes) for each file on the disk; all i- nodes are the same size; head of freelist is in the superblock y File contents area x fixed-size blocks; head of freelist is in the superblock y Swap area x holds processes that have been swapped out of memory z And there are startup scripts for apps, etc.

11 Hardware Machine Platform OS Apps OS Apps VMM / Hypervisor Control Interface (console and network)

12 Hardware Machine Platform OS Apps OS Apps VMM / Hypervisor There’s no magic to talking to your VM over the network z Suppose your app was a webserver?

13 payload physical addresspayload IP addresspayload TCP hdr payload HTTP hdr

14 Server power efficiency z It matters http://www.electronics-cooling.com/articles/2007/feb/a3/

15 z Servers are typically operated at middling utilizations z Necessary for performance reasons y Response time has a “knee” as utilization rises z Terrible for energy efficiency y Only a 2:1 power consumption difference between low utilization and high utilization z Very different than desktops y No one gave a rip about power consumption until recently z Very different than laptops y Operate at peak or at idle, seldom in the middle

16 “The Case for Energy-Proportional Computing”

17

18

19

20

21 Disk drive reliability z Focus on disks as a commonly replaced component “Disk failures in the real world”

22 Disk drive reliability z Typical disk spec sheet MTTF is 1,000,000 hours y Corresponds to an annual failure rate of about 1% z If a datacenter has 20,000 machines and each machine has 4 disks, that would be an average failure rate of more than 2 a day z But it’s worse … y Field replacement rates are much higher than the spec sheet MTTF would suggest x By a factor of 2-10 for disks less than 5 years old x By a factor of 30 for disks between 5 and 8 years old Why might this be?

23 y Failure rates increase annually – the “bathtub curve” doesn’t represent reality What’s an example of a situation where the “bathtub curve” is realistic? “Disk failures in the real world”

24

25 y Failures are clustered in time Why might this be?

26 "Failure Trends in a Large Disk Drive Population" y Failures aren’t very dependent on average operating temperature Does this contradict the previous discussion?

27 y Failures aren’t very dependent on utilization Except for young disks – why? "Failure Trends in a Large Disk Drive Population"

28 y Scan errors are correlated with impending failure "Failure Trends in a Large Disk Drive Population"

29 y But like all SMART (Self-Monitoring Analysis and Reporting Technology) parameters, scan errors don’t come anywhere close to predicting all failures "Failure Trends in a Large Disk Drive Population"


Download ppt "EC2 demystification, server power efficiency, disk drive reliability CSE 490h, Autumn 2008."

Similar presentations


Ads by Google