Presentation on theme: "ALEAE : Handling Uncertainties in Large-Scale Distributed Systems Emmanuel Jeannot LORIA - INRIA - CNRS ALEAE Kick-off April 1st 2009."— Presentation transcript:
ALEAE : Handling Uncertainties in Large-Scale Distributed Systems Emmanuel Jeannot LORIA - INRIA - CNRS ALEAE Kick-off April 1st 2009
Managing uncertainies E. Jeannot2/16 Introduction What is a grid? An infrastructure : Distributed Heterogeneous But also Dynamic Shared Lot of uncertainties
Managing uncertainies E. Jeannot3/16 Uncertainties Uncertainties: unpredictable behavior Behavior not as expected Where does it come from? Infrastructure (hardware) Application (software) Users
Managing uncertainies E. Jeannot4/16 Uncertainty at the infrastructure level The hardware that compose a grid can: Fail Be volatile (be removed or added) Have performance degradation (due to a shared usage)
Managing uncertainies E. Jeannot5/16 Uncertainty at the application level It is often assumed that: One know the duration of the composing part of an application Its resource usage is known It does not fail. However, this is not always the case
Managing uncertainies E. Jeannot6/16 Uncertainties due to the users Users: Submit jobs/requests randomly May behave with some malignity (voluntarily or not) DOS attack Desktop grid : give wrong answer
Managing uncertainies E. Jeannot7/16 Rationale As resource management algorithms cope with heterogeneity or distribution, they also must cope with uncertainty
Managing uncertainies E. Jeannot8/16 Ways to cope with uncertainty Proactive methods (static) Redundancy Duplication Reactive methods (dynamic) Check-point restart migration Mixed (provide a static solution and adapt it dynamically)
Managing uncertainies E. Jeannot9/16 Functional Goals Different kinds of uncertainties lead to different desired behavior Reliability, fault-tolerance: Hardware failure Software failure Robustness: Hardware perf. degradation Software unpredictability Correctness: Bad usage Etc…
Managing uncertainies E. Jeannot10/16 Multi-criteria approach The old good metrics are still valid: Makespan Load-balance Response time Lateness Etc. Most of the time these metrics are contradictory with the other one. Need of a multi-criteria approach (ex: makespan/reliability).
Open issue (1) Gather traces: - What is the behavior of users/programs/infrastructure? - Ease the extraction of useful information - Ensure generality Managing uncertainies E. Jeannot11/16
Managing uncertainies E. Jeannot12/16 Open issues (2) Model the uncertainty Trace the behavior Analyze Provide modeling
Managing uncertainies E. Jeannot13/16 Carefully define metrics Mapping a goal into a metric is not trivial: Ex: robustness Intuitive notion Many metrics (one per paper) Question: relation between these metrics.
Managing uncertainies E. Jeannot14/16 Open issues (4) Provide resource management (scheduling) algorithms Mono-criteria/Multi-criteria Static/dynamic/mixed Works well in the worst case/on the avarage Etc.
Managing uncertainies E. Jeannot15/16 Open issues (5) Static vs. Dynamic? Each approach: advantages and drawback. Dynamic (ex. check-point-restart): time costly, but handle almost every cases Static (ex. duplication): resource costly, can provide some guarantee. What is the best approach depends on the problem. Is the mixed approach always possible/profitable?
Managing uncertainies E. Jeannot16/16 Open issues : real scale experimentation Provide detection mechanisms Failure Malignity Resource usage Correctness Etc. Program and test solutions: Real-scale (grid’5000, DAS-3) Simulation Emulation? Validation of the models.
Big picture Managing uncertainies E. Jeannot17/16
Today Kick-off : ALEAE : a two year INRIA funded project (20 k€/year) Presentation on each item Technical presentation on sub-item Work plan : Other/next meetings Visit/exchange Mission Post-doc Synergies between teams Important : I am moving to INRIA Bordeaux. Managing uncertainies E. Jeannot18/18
Managing uncertainies E. Jeannot19/16 Conclusion Grid environments are full of uncertainties These uncertainties come from different factors Handling them is difficult (especially with the traditional criteria) What is the best way to tackle this problem (dynamic/static/mixed), is of crucial interest. The goal of ALEAE is to tackle such issues.