O PTIMAL SERVICE TASK PARTITION AND DISTRIBUTION IN GRID SYSTEM WITH STAR TOPOLOGY G REGORY L EVITIN, Y UAN -S HUN D AI Adviser: Frank, Yeong-Sung Lin.

Slides:



Advertisements
Similar presentations
Crew Pairing Optimization with Genetic Algorithms
Advertisements

LIBRA: Lightweight Data Skew Mitigation in MapReduce
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Capacity of Wireless Channels
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Crew Scheduling Housos Efthymios, Professor Computer Systems Laboratory (CSL) Electrical & Computer Engineering University of Patras.
Advisor: Yeong-Sung Lin Presented by I-Ju Shih 2011/3/07 Defending simple series and parallel systems with imperfect false targets R. Peng, G. Levitin,
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
Computer science is a field of study that deals with solving a variety of problems by using computers. To solve a given problem by using computers, you.
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
1 Complexity of Network Synchronization Raeda Naamnieh.
Reference: Message Passing Fundamentals.
Ashish Gupta Under Guidance of Prof. B.N. Jain Department of Computer Science and Engineering Advanced Networking Laboratory.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Sequencing Problem.
PROMISE: Peer-to-Peer Media Streaming Using CollectCast Presented by: Randeep Singh Gakhal CMPT 886, July 2004.
Reliability-Redundancy Allocation for Multi-State Series-Parallel Systems Zhigang Tian, Ming J. Zuo, and Hongzhong Huang IEEE Transactions on Reliability,
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
A N OPTIMAL RELIABILITY ALLOCATION METHOD FOR DIGITAL SUBSTATION SYSTEMS Y UZHOU H U, P EICHAO Z HANG, Y ONGCHUN S U, Y U Z OU Adviser: Frank, Yeong-Sung.
Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Seyed Mohamad Alavi, Chi Zhou, Yu Cheng Department of Electrical and Computer Engineering Illinois Institute of Technology, Chicago, IL, USA ICC 2009.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*
Network Aware Resource Allocation in Distributed Clouds.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Service Architecture of Grid Faults Diagnosis Expert System Based on Web Service Wang Mingzan, Zhang ziye Northeastern University, Shenyang, China.
Topology aggregation and Multi-constraint QoS routing Presented by Almas Ansari.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Improving Capacity and Flexibility of Wireless Mesh Networks by Interface Switching Yunxia Feng, Minglu Li and Min-You Wu Presented by: Yunxia Feng Dept.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 01, 2005 Session 14.
DLS on Star (Single-level tree) Networks Background: A simple network model for DLS is the star network with a master-worker platform. It consists of a.
Liping WANG 1, Yusheng JI 1,2, and Fuqiang Liu 3 1 The Graduate University for Advanced Studies, Tokyo, Japan 2 National Institute of Informatics, Tokyo,
Algorithms for Allocating Wavelength Converters in All-Optical Networks Authors: Goaxi Xiao and Yiu-Wing Leung Presented by: Douglas L. Potts CEG 790 Summer.
Utilizing Call Admission Control for Pricing Optimization of Multiple Service Classes in Wireless Cellular Networks Authors : Okan Yilmaz, Ing-Ray Chen.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Protection vs. false targets in series systems Reliability Engineering and System Safety(2009) Kjell Hausken, Gregory Levitin Advisor: Frank,Yeong-Sung.
Optimization of Wavelength Assignment for QoS Multicast in WDM Networks Xiao-Hua Jia, Ding-Zhu Du, Xiao-Dong Hu, Man-Kei Lee, and Jun Gu, IEEE TRANSACTIONS.
Parallelizing Video Transcoding Using Map-Reduce-Based Cloud Computing Speaker : 童耀民 MA1G0222 Feng Lao, Xinggong Zhang and Zongming Guo Institute of Computer.
Optimal Resource Allocation for Protecting System Availability against Random Cyber Attack International Conference Computer Research and Development(ICCRD),
The concept of RAID in Databases By Junaid Ali Siddiqui.
Multi-state System (MSS) Basic Concepts MSS is able to perform its task with partial performance “all or nothing” type of failure criterion cannot be.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
Heterogeneous redundancy optimization for multi-state series-parallel systems subject to common cause failures Chun-yang Li, Xun Chen, Xiao-shan Yi, Jun-youg.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
A Bandwidth Scheduling Algorithm Based on Minimum Interference Traffic in Mesh Mode Xu-Yajing, Li-ZhiTao, Zhong-XiuFang and Xu-HuiMin International Conference.
SERENA: SchEduling RoutEr Nodes Activity in wireless ad hoc and sensor networks Pascale Minet and Saoucene Mahfoudh INRIA, Rocquencourt Le Chesnay.
A fault tree – Based Bayesian network construction for the failure rate assessment of a complex system 46th ESReDA Seminar May 29-30, 2014, Politecnico.
SENG521 (Fall SENG 521 Software Reliability & Testing Preparing for Test (Part 6a) Department of Electrical & Computer Engineering,
the project of the voluntary distributed computing ver.4.06 Ilya Kurochkin Institute for information transmission problem Russian academy of.
O PTIMAL R EPLACEMENT AND P ROTECTION S TRATEGY FOR P ARALLEL S YSTEMS R UI P ENG, G REGORY L EVITIN, M IN X IE AND S ZU H UI N G Adviser: Frank, Yeong-Sung.
S URVIVABILITY OF SYSTEMS UNDER MULTIPLE FACTOR IMPACT E DWARD K ORCZAK, G REGORY L EVITIN Adviser: Frank,Yeong-Sung Lin Present by Sean Chou 1.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
OPERATING SYSTEMS CS 3502 Fall 2017
Chapter 15 QUERY EXECUTION.
Objective of This Course
of the Artificial Neural Networks.
COMP60621 Fundamentals of Parallel and Distributed Systems
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
COMP60611 Fundamentals of Parallel and Distributed Systems
State University of Telecommunications
Presentation transcript:

O PTIMAL SERVICE TASK PARTITION AND DISTRIBUTION IN GRID SYSTEM WITH STAR TOPOLOGY G REGORY L EVITIN, Y UAN -S HUN D AI Adviser: Frank, Yeong-Sung Lin Present by Sean Chou 1

A GENDA Introduction The model Algorithm for determining the pmf of the service time Numerical example Conclusions 2

A GENDA Introduction The model Algorithm for determining the pmf of the service time Numerical example Conclusions 3

I NTRODUCTION Grid computing is a newly developed technology for complex systems with large-scale resource sharing, wide-area communication, and multi- institutional collaboration. [1] This is required by a range of collaborative problem-solving and resource-brokering strategies emerging in industry, science, and engineering. 4

I NTRODUCTION The sharing is controlled by a resource management system (RMS) [2] When the RMS receives a service request from a user, the task can be divided into a set of execution blocks (EBs) that are executed in parallel. The RMS assigns those EBs to available resources for execution. After the resources finish the assigned jobs, they return the results back to the RMS 5

I NTRODUCTION The above grid service process can be approximated by a structure with star topology 6

I NTRODUCTION The performance of grid computing is of great concern. Usually the measure of grid performance is the task execution time (service time). This index can be significantly improved by using the RMS that divides a task into a set of EBs which can be executed in parallel by multiple online resources. Many complicated and time-consuming tasks that could not be implemented before are currently working well under the grid computing environment 7

I NTRODUCTION The service time is a random variable affected by many factors [3]. 1. There are many resources available online, that have different task processing speeds. 2. Some resources can fail when running the jobs 3. The communication links in grid service can fail during the data transmission. 4. The choice of the group of subtasks assigned to the same EB and running on the same resource can influence the total amount of data transmitted between the RMS and the resource since different subtasks can use common input data blocks. 8

I NTRODUCTION Most of the previous researchers separated performance and reliability into two different fields and studied them individually. However in fact, performance and reliability are closely related and affect each other, in particular when the grid computing is implemented. 9

I NTRODUCTION For example, when a task is fully parallelized into n different EBs executed by n resources simultaneously, the performance is high but the reliability can be low because failure of any resource makes the entire task incomplete. Therefore, it is worth having some redundant resources to execute same EB especially for those failure-prone resources. However, too many redundancies, even though improving the reliability, can decrease the performance by not fully parallelizing the task. 10

I NTRODUCTION Performance and reliability should be studied together in the grid service analysis. The first model for evaluating performance (service time) of grid with star topology taking the service reliability into account was presented in [4]. 11

I NTRODUCTION Optimizing the division of a service task into EBs and distribution of these EBs among available grid resources can considerably improve the service performance. This paper presents an algorithm for solving these optimization problems based on the model developed in [4]. 12

A GENDA Introduction The model Algorithm for determining the pmf of the service time Numerical example Conclusions 13

T HE MODEL 2.1. Service execution by the grid system with star architecture 2.2. Assumptions 2.3. Service execution time 2.4. Service reliability and expected performance 14

T HE MODEL Service execution by the grid system with star architecture Different resources are distributed in the grid system. The considered service can use a given set of resources. All the resources and communication channels from this set are available at the time when the request for service arrives to the RMS 15

T HE MODEL Each resource is directly connected to the RMS by a single communication channel forming the star topology. 16

T HE MODEL The service task consists of subtasks that can be independently executed by different resources. Different subtasks may need some common input data blocks for their execution. The subtasks can be grouped into EBs. The input data for any EB consists of input data blocks necessary for executing all the subtasks belonging to this EB. 17

T HE MODEL The request for service (task execution) arrives to the RMS which forms the EBs and assigns them to different resources for processing. Each resource gets no more than one EB for processing. The same EB can be assigned to several resources for parallel execution. If the same EB is processed by several resources, it is completed when first output is returned to the RMS. The entire task is completed when all of the EBs are completed and their results are returned to the RMS from the resources. 18

T HE MODEL Assumptions Each resource starts processing the assigned EB immediately after it gets all the necessary input data from the RMS through the corresponding communication channel. Each resource sends the output data to the RMS through the same communication channel immediately after it completes the EB. Each resource has a given constant processing speed when it is available. Each resource has a given constant failure rate. 19

T HE MODEL Each communication channel has constant data transmission speed (bandwidth) when it is available. Each communication channel has a constant failure rate. The subtasks belonging to an EB are processed in sequence. The subtask processing time is proportional to its computational complexity. The data transmission time is proportional to the amount of data transmitted between the RMS and a resource. 20

T HE MODEL The failure rates of the communication channels or resources are the same when they are idle or loaded (hot standby model). The failures at different resources and communication channels are independent. The RMS is fully reliable. The time of task processing by the RMS (formation and assignment of EBs, sending them to the resources, receiving the results and integrating them into entire task output) is negligible when compared with the EBs’ processing time. 21

T HE MODEL Service execution time The entire task consists of m subtasks that can be executed independently Any EB i consisting of a set of subtasks EB’s computational complexity : 22

T HE MODEL Each subtask j needs a set Bj of data blocks as its input and produces amount Oj of output data. The set of the input data blocks necessary for execution of EB i is [j2siBj the amount of data to be transmitted from the RMS to the resource executing this EB is 23

T HE MODEL The total amount of data (input and output) Di that should be transmitted between the RMS and a resource executing EB i is 24

T HE MODEL The EB execution time is defined as time from the beginning of input data transmission from the RMS to a resource to the end of output data transmission from the resource to the RMS. Therefore, the random time tij of EB i completion by resource j can take two possible values If the resource j and the communication channel j do not fail until the subtask completion, and otherwise. 25

T HE MODEL EB i can be successfully completed by resource j if this resource and communication link j do not fail before the end of subtask execution. For constant failure rates of resource j and communication link j one can obtain the probability of EB success as 26

T HE MODEL Assume that each EB i is assigned to resources composing set oi such that oi \ oj ?; for any iaj. The random time of EB i completion is The entire task is completed when all of the subtasks (including the slowest one) are completed. The random task execution time takes the form: 27

T HE MODEL Service reliability and expected performance In order to estimate both the service reliability and performance of a grid system, different measures can be used depending on the application. The system reliability ReyT is defined (according to performability concept [5,6]) as a probability that the correct output is produced in time less than y. 28

T HE MODEL The service reliability is defined as the probability that it produces correct outputs without respect to the service time. This index can be referred to as The conditional expected service time W is considered to be a measure of its performance. 29

T HE MODEL The service task partition into EBs (represented by the sets si, 1piph) and distribution of the EBs among the resources (represented by the sets oi, 1piph) determine the service reliability and performance. Two optimization problems: 30

A GENDA Introduction The model Algorithm for determining the pmf of the service time Numerical example Conclusions 31

A LGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME The procedure used for the evaluation of service time distribution is based on the universal generating function (u-function) technique. Its high computational efficiency that allows it to be used in optimization procedures where a large number of different solutions should be estimated. 32

A LGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME The u-function ui;fjge can define pmf of total completion time tij for EB i assigned to resource j. This u-function takes the form of 33

A LGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME The total completion time of EB i assigned to a pair of resources k and j is equal to the minimum of completion times for different resources To obtain the u-function representing the pmf of this time, composition operator with should be used: 34

A LGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME The u-function representing the pmf of completion time of EB i assigned to all of the resources from set can be obtained recursively: 35

A LGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME Having the u-functions uj;oj ez for each EB i (1piph) one can obtain the u-function representing the pmf of the entire task completion time Y 36

A LGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME The final u-function Uh(z represents the pmf of random task completion time Y in the form 37

A LGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME Algorithm for determining service performance/reliability indices for arbitrary task partition and distribution : 38

A GENDA Introduction The model Algorithm for determining the pmf of the service time Numerical example Conclusions 39

N UMERICAL EXAMPLE Formulations (9) and (10) define a complicated NP complete partitioning/allocation problem. An exhaustive examination of all possible solutions is not realistic, considering reasonable time limitations. 40

N UMERICAL EXAMPLE A heuristic search algorithm is needed which uses only estimates of solution quality and which does not require derivative information to determine the next direction of the search. The genetic algorithm (GA) has been proven to be an effective optimization tool for a large number of complicated problems in reliability engineering [10,11]. 41

N UMERICAL EXAMPLE Consider a grid service that uses six resources distributed in the grid system. 42

N UMERICAL EXAMPLE The entire service task can be divided into eight independent subtasks. 43

N UMERICAL EXAMPLE The amount of data in each input data block is presented in Table 4. 44

N UMERICAL EXAMPLE First the optimal task partition and distribution problem was solved by the GA for formulation (9): The solutions for different allowed service time y are presented in Tables 5 and 6. 45

N UMERICAL EXAMPLE Table 5 contains obtained task partition into EB and their distribution among the resources 46

N UMERICAL EXAMPLE Table 6 contains minimal and maximal possible service times, the service reliability and the conditional expected service time for each obtained solution. 47

N UMERICAL EXAMPLE Functions for the obtained solutions are presented in Fig. 2. It can be seen that the best solutions obtained for certain y provide the greatest reliability for this value of service time whereas for other values of y they provide lower reliability than the solutions obtained for these values. 48

N UMERICAL EXAMPLE 49

N UMERICAL EXAMPLE 50

A GENDA Introduction The model Algorithm for determining the pmf of the service time Numerical example Conclusions 51

C ONCLUSIONS Grid technology is a newly developed method for large scale distributed system. This technology allows effective distribution of computational tasks among different resources presented in the grid. The resource management system (RMS) can divide service task into subtasks and send the subtasks to different resources for parallel execution. 52

C ONCLUSIONS For any given service task the service reliability and performance indices depend on task partition into EBs and their distribution among the available resources. The suggested optimization algorithm is aimed at achieving the greatest reliability/performance by the optimal task partition and distribution. 53

C ONCLUSIONS Most of the previous researchers separated performance and reliability into two different fields and studied them individually. However in fact, performance and reliability are closely related and affect each other, in particular when the grid computing is implemented. This paper presents an algorithm for solving these optimization problems about evaluating performance (service time) of grid with star topology taking the service reliability into account. 54

Thanks for your listening. 55