Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

Similar presentations


Presentation on theme: "1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University."— Presentation transcript:

1 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University

2 2 Outline  Introduction and Motivation  System Model  Algorithm  Performance Analysis  Summary

3 3 Introduction Distributed scientific applications in many cases require access to massive data sets. In High Energy Physics (HEP) applications, for example, a handful of experiments have started producing petabytes of data per year for decades. Data grids have served as a technology bridge between the need to access extremely large data sets and the goal of achieving high data transfer rates by providing geographically distributed computing resources and large- scale storage systems.

4 4 Introduction  The Google Data Cluster 31,654 machines 63,184 CPUs 126,368 Ghz of processing power two identical buildings contain about 100,000 square feet of data center floor space

5 5 Introduction  Reliability  Computing in high temperatures is more error-prone than in an appropriate environment.  Operational Cost  A single 200-Watt server, such as the IBM 1U*300. The energy bill for this single server would be $180/year.

6 6 Introduction A key factor in the process of scheduling data- intensive tasks is locations of input data sets required by tasks. A straightforward strategy to enhance performance of data-intensive applications on data grids is to replicate popular data sets to multiple resource sites. Offering higher data access speeds compared to maintaining the data sets in a single site.

7 7 Drawbacks of Making Too Many Replicas It is challenging to maintain consistency among replicas in Data Grids. It is nontrivial to efficiently generate replicas of massive data sets on the fly in Data Grids. A large number of data replicas can increase energy dissipation in storage resources.

8 8 Reduce Energy Consumption in Data Grids Minimize electricity cost Improve system reliability How to reduce energy consumption in Data Grids?  E nergy-efficient scheduling algorithms for applications running on data grids.

9 9 Goals of Scheduling Tradeoffs between energy efficiency and high- performance for data-intensive applications. Integrate data placement strategies with task scheduling Consider real-time requirements How to achieve the goals? A Distributed Energy-Efficient Scheduler called DEES Three key components: energy-aware ranking, performance-aware scheduling, and energy-aware dispatching.

10 10 Design Goals of DEES Maximize the number of tasks completed before their corresponding deadlines Replicate data and place replicas in an energy- efficient way Dispatches real-time tasks to peer computing sites, considering three factors: Computational capacities of peer computing sites, Energy consumption introduced by tasks, and Data location.

11 11 Features of DEES High scalability Require no full knowledge of workload conditions of all the computing sites in a data grid. One must consider that obtaining full knowledge of the state of the grid is a difficult task.

12 12 Key Ideas  High-priority tasks are scheduled first in order to meet their deadlines.  Explore slacks: low-priority tasks can have their deadlines guaranteed.  The dynamic voltage scaling (DVS) technique is used to reduce energy consumption by exploiting available slacks and adjusting appropriate voltage levels accordingly.

13 13 Dynamic Voltage Scaling  A effective technique for reducing energy consumption by adjusting the clock speed and supply voltage dynamically.  Energy dissipation per CPU cycle is proportional to v 2  Processor energy can be saved by reducing CPU voltages while running it at a slower speed.

14 14 Design Ideas  Two types of tasks: hard real-time tasks and soft real-time tasks.  Prioritize hard real-time tasks but create slacks by delaying their executions till the latest moment.  After a schedule is made, the processor voltage is adjusted to the lowest possible level on a task-by- task basis at each scheduling point.

15 15 System Model Geographically distributed sites are interconnected through a WAN. Each site consists of storage resources, computing resources, and a ticket server.

16 16 Energy Consumption Model  Consider energy consumption of executing tasks, making data replicas, and communicating.  The total energy consumption of a data grid, E total can be expressed as: where E comp is the total energy consumption of computing resources, E comm is the total energy consumption of communication, and E rep is the total energy consumption of replicating data.

17 17 Four Cases of Energy Consumption Case 1: Local execution and local data Case 2: Local execution and remote data Case 3: Remote execution and same remote data Case 4: Remote execution and different remote data

18 18 If data is not locally available, then? Executing a task at a site where its data is located: Energy efficient No data transfer and no replication cost Compared to the local execution and remote data scenario, executing the task at a remote site where data is located is still more energy efficient if task’s input data set is larger than its execution code size.

19 19 Algorithm Components  DEES is composed of  Ranking  Scheduling  Dispatching  Goals:  Maximize the number of tasks meeting deadlines  Minimize energy consumption  Improve scalability

20 20 Task Grouping  Task Grouping:  Tasks requiring the same data are grouped together.  The task group whose data resides in the local site, called local task group, is ranked first.  Other task groups are ranked in descending order, according to the number of tasks in the task group.  Considering Real-Time Requirements:  Within each group, tasks are ordered by increasing deadline.  Thus, tasks with shorter deadlines are scheduled sooner.

21 21 DEES Scheduling DEES schedules tasks on a group basis. A local task group is scheduled first. In order to schedule task t i on site s u, DEES selects machine m k at s u that can complete t i within its deadline and provide the minimum completion time. After processing all tasks, remaining unscheduled tasks will be dispatched to remote sites.

22 22 Dispatching Dispatching: To delivers tasks within each task group to data sites. For task group g j whose data site is s o, scheduling decisions are made by s o ’s scheduler based on its local resource status and task information of g j. If s o cannot schedule all tasks in g j, then unscheduled tasks are dispatched to s o ’s immediate neighbors using tickets in a breadth-first manner.

23 23 Energy-Aware Ranking To make tradeoffs between energy efficiency and real-time performance, we propose a ranking system to rank s o ’s neighbors. where n is the number of tasks in g j that can be scheduled on s v, ε is a coefficient concerning the task deadline, μ is a coefficient concerning energy saving. Energy consumed to replicate g i ’s data from s o to s v, Energy consumed to transfer g i ’s data from s o to s v, Energy consumed to execute these n tasks at s v.

24 24 Dispatching: Energy Efficiency vs. real-time ε and μ: To manage the two conflicting goals of saving energy and meeting deadlines. For mission-critical tasks: ε is set to 1 and μ is set to 0, which means the neighbor that can schedule more tasks is given preference. For energy efficiency: ε is set to 0 and μ is set to 1. Thus, the neighbor that consumes the least amount of energy will be considered first.

25 25 Simulation Parameters

26 26 Performance Analysis Compared DEES with an effective scheduling algorithm - Close-to-Files. Features of the Close-to-Files algorithm: Good performance since Close-to-File takes data locality into account. It schedules a task to its data site to decrease the amount of data transfer. Scheduling overhead is high: It is an exhaustive algorithm that searches across all combinations of computing and data sites to find a result with the minimum computation and data transmission cost.

27 27 Performance Metrics The Guarantee Ratio Normalized Average Energy Consumption and Total Energy Consumption are used as the performance metrics in the evaluation.

28 28 Real-Time Performance Fig. 5. Guarantee Ratio by ranking coefficients

29 29 Energy Consumption Fig. 6. Normalized Average Energy Consumption by ranking coefficients

30 30 Performance Fig. 7. Guarantee Ratio by task loads

31 31 Energy Consumption Fig. 8. Normalized Average Energy Consumption by task loads

32 32 Summary An energy efficient algorithm to schedule real-time tasks with data access requirements on data grids. By reducing the amount of data replication and task transfers, the proposed algorithm effectively saves energy. Distributed since it does not need knowledge of the complete state of the grid. Detailed simulations demonstrate that DEES significantly reduces the energy consumption while increasing the Guarantee Ratio.

33 33 Questions Xiao Qin http://www.eng.auburn.edu/~xqin


Download ppt "1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University."

Similar presentations


Ads by Google