Why Energy Efficient Software Power Utilization Efficiency (PUE) = Total power used by a datacenter IT power used by a datacenter = IT power + PDU + UPS + HVAC+ Lighting + other overhead Servers, network, storage ≥ 2 circa 2006 and before 1 present day Most of the further savings to be had in IT hardware and software
Productivity Energy as a Performance Metric Resources Used Traditional view of the software system design space Increase productivity for fixed resources of a system
Energy as a Performance Metric Productivity Resources Used Maybe this is a better view on the design space? Energy Decrease energy without compromising productivity?
MapReduce Overview HDFS Map HDFS Local FS Network RAM Local FS Reduce Network
Methodology Performance Metrics Parameters Workload Energy measurement Basket of metrics – job duration, energy, power. Performance variance? Static – cluster size, workload size, configuration parameters. Dynamic – Task scheduling? Block placement? Speculative execution? Exercise all components – sort, HDFS read, HDFS write, shuffle. Representative of production workloads – nutch, gridmix, others? Wall plug energy measurement – 1W accuracy, 1 reading per second. Fine grain measurement to correlate energy consumption to hardware components?
Scaling to More Workers – Sort Terasort format, 100 bytes records with 10 bytes keys, 10GB of total data Out of box Hadoop with default config. Reduce energy by adding more workers???? JouleSort highly customized system vs. Out of box Hadoop with default config. 11k sorted records per joule vs. 87 sorted records per joule
Scaling to More Workers – Sort Terasort format, 100 bytes records with 10 bytes keys, 10GB of total data Out of box Hadoop with default config., workers energy only Energy of the master amortized by additional workers
Scaling to More Workers – Nutch Nutch web crawler and indexer, with Hadoop 0.19.1. Index URLs anchored at www.berkeley.edu, depth 7, 2000 links per page Workload has some built-in bottlenecks?
Isolating IO Stages HDFS read, shuffle, HDFS write jobs, modified from prepackaged sort example Read, shuffle, write 10GB of data, terasort format, does nothing else HDFS write seems to be the scaling bottleneck
HDFS Replication HDFS read, shuffle, HDFS write, sort jobs, 10GB data, terasort format Modify the number of HDFS replica, default config. for everything else Some workloads are affected – HDFS write, some are not – shuffle
HDFS Replication Replication 3 – default Replication 2 Reducing HDFS replication to 2 makes HDFS write less of a bottleneck?
Changing Input Size Sort, modified from prepackaged sort example Jobs that handle less than ~1GB of data per node bottlenecked by overhead Out of box Hadoop competitive with JouleSort winner at 100MB?!? Here’s a somewhat noteworthy result:
HDFS Block Size HDFS read, shuffle, HDFS write, sort jobs, 10GB data, terasort format Modify the HDFS block size, default config. for everything else Some workloads are affected – HDFS read, some are not – shuffle
Slow Nodes One node on the cluster consistently received fewer blocks Removing the slow node leads to performance improvement Clever ways to use the slow node instead of taking it offline?
Predicting IO Energy Working example: Predict IO energy for a particular task Benchmark energy in joules per byte for HDFS read, shuffle, HDFS write IO energy = bytes read × joules per byte (HDFS read) + bytes shuffled × joules per byte (shuffle) + bytes written × joules per byte (HDFS write) The simple model is effective, but requires prior measurements
Cluster Provision and Configuration Working example: Find optimal cluster size for a steady job steam Optimize for E(N) over the range N such that D(N) ≤ T In general, multi-dimensional optimization problem to meet job constraints
Optimal HDFS Replication Working example: Reduce HDFS replication from 3 to 2, i.e. off-rack replica only? Cost-benefit trade-off between lower energy and higher recovery costs Need to quantify probability of failure/recovery to set sensible replication
Faster = More Energy Efficient? Power Work rate Constant energy for fixed workload size, so run as fast as we can
Faster = More Energy Efficient? Power Work rate Reduce energy by using more resources, so run as fast as we can, again
Faster = More Energy Efficient? Power Work rate Caveats: What is meant by resource? What is a realistic behavior for R(r)?
Faster = More Energy Efficient? Performance Resources Used If work rate resources used, energy is another aspect of performance All prior performance optimization techniques don’t need to be re-invented What if work rate is not proportional to resources used? Different hardware? Productivity benchmarks? Hadoop as terasort and JouleSort winner?