Transparent Cross-Border Migration of Parallel Multi Node Applications Dominic Battré, Matthias Hovestadt, Odej Kao, Axel Keller, Kerstin Voss Cracow Grid.

Transparent Cross-Border Migration of Parallel Multi Node Applications Dominic Battré, Matthias Hovestadt, Odej Kao, Axel Keller, Kerstin Voss Cracow Grid Workshop 2007

 2 2Axel Keller Outline  Motivation  The Software Stack  Cross-Border Migration  Summary Highly Predictable Clusters for Internet-Grids EC funded project in FP6 Advanced Risk Assessment & Management for Trustable Grids EC funded project in FP6

 3 3Axel Keller The Gap between Grid and RMS SLA RMS M1M2M3 grid middleware user request Reliability? Quality of Service? Best Effort!  User asks for SLA  Grid Middleware realizes job by means of local RMS  BUT: RMS offer Best Effort  Need: SLA-aware RMS Guaranteed!

 4 4Axel Keller HPC4U: Highly Predictable Clusters for Internet-Grids  Objective  Software-only solution for an SLA-aware, fault tolerant infrastructure, offering reliability and QoS, and acting as active Grid component  Key Features  System level checkpointing  Job migration  Job types: sequential and MPI-parallel  Planning based scheduling

 5 5Axel Keller HPC4U: Planning Based Scheduling queues new jobs time queuing systemsplanning systems planned time framepresentpresent and future new job requestsinsert in queuesre-planning assignment of planned start time noall requests runtime estimationnot necessarymandatory backfillingoptionalyes, implicit advance reservationsnot possibleyes, trivial Machine

 6 6Axel Keller HPC4U: Software Stack ProcessNetwork Storage RMS Negotiation User- / Broker- Interface Cluster Scheduler CLI SSC

 7 7Axel Keller HPC4U: Checkpointing Cycle RMS NetworkStorageProcess 1. CP job +halt 2. In- Transit Packets 4. Snap- shot ! 5. Link to Snapshot 6. Resume job 7. Job running again 3. Return: “Checkpoint completed!”

 8 8Axel Keller Cross Border Migration: Intra Domain ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP

 9 9Axel Keller Cross Border Migration: Target Retrieval ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP

 10Axel Keller Cross Border Migration: Checkpoint Migration ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP

 11Axel Keller Cross Border Migration: Remote Execution ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP

 12Axel Keller Cross Border Migration: Result Migration ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP

 13Axel Keller Cross-Border Migration: Using Globus ProcessNetwork Storage RMS User- / Broker- Interface Cluster WS-AG CLI CRM Negotiation Scheduler PP  WS-AG implementation based on GT4  Developed in EU project AssessGrid  Source specifies SLA / file staging parameters  Subset of JSDL (POSIX Jobs)  Resource determination via broker  Source directly contacts destination  Destination pulls migration data via Grid-FTP  Destination pushes result data back to source  Source uses WSRF event notification SSC Broker

 14Axel Keller Ongoing Work: Introducing Risk Management ProcessNetwork Storage RMS User- / Broker- Interface Cluster Consultant Service Monitoring CLI CRM Negotiation Scheduler SSC Risk Assessor PP  Topic of EU project: AssessGrid  Encorporated in SLA  Provider  Estimates risk for agreeing an SLA  Considers propability of failure in schedule  Assessment based on historical data BrokerWS-AG

 15Axel Keller Summary: Best Effort is not Enough Cross border migration and Risk assessment provide new means to increase the reliability of Grid Computing.

 16Axel Keller  Read the paper  AssessGridwww.assessgrid.eu  HPC4Uwww.hpc4u.eu  OpenCCSwww.openccs.eu More information Thanks for your attention!

 17Axel Keller Contents BACKUP

 18Axel Keller Scheduling Aspects  Execution Time  Exact start time  Earliest start time, latest finish time  User provides stage-in files by time X  Provider keeps stage-out files until time Y  Provisional Reservations  Job Priorities  Job Suspension

 19Axel Keller HPC4U: Planning Based Scheduler  Space-Sharing CPUs 16 Time 4 32 81012261416 l=2h Reservation for Grid Job according SLA (l=6h) l=3h Reser- vation 2  Run-time estimation  Start time assignment

 20Axel Keller  HPC4U

 21Axel Keller Motivation: Fault Tolerance  Commercial Grid users need SLAs  Providers cautious on adoption  Reason: Business case risk Missed deadlines due to system failures  Penalties to be paid  Solution: Prevention with Fault Tolerance  Fault tolerance mechanisms available, but 1. Application modification mandatory 2. Overall solution (System software, process, storage, file system, network) required 3. Combination with Grid migration missing

 22Axel Keller HPC4U Objective  Software-only solution for a SLA-aware, fault tolerant infrastructure, offering reliability and QoS, acting as active Grid component  Key features  Definition and implementation of SLAs  Resource reservation for guaranteed QoS  Application-transparent fault tolerance

 23Axel Keller HPC4U: Concept 1. SLA negotiation as an explicit statement of expectations and obligations in a business relationship between provider and customer 2. Reservation of CPU, storage and network for desired time interval 3. Job start in checkpointing environment 4. In case of system failure  Job migration / restart with respect to SLA

 24Axel Keller HPC4U: Project Outcomes

 25Axel Keller Phases of Operation Stage In Compu- tation Stage Out time Pre- Runtime Post- Runtime Negotiation Lifetime of SLA Allocation of system resources Acceptance (or rejection) of SLA  Negotiation of SLA  Pre-Runtime: Configuration of Resources  e.g. network, storage, compute nodes  Runtime: Stage-In, Computation, Stage-Out  Post-Runtime: Re-configuration

 26Axel Keller Phase:Pre-Runtime Task of Pre-Runtime Phase  Configuration of all allocated resources  Goal: Fulfill requirements of SLA Reconfiguration affects all HPC4U elements  Resource Management System –e.g. configuration of assigned compute nodes  Storage Subsystem –e.g. initialization of a new data partition  Network Subsystem –e.g. configuration of network infrastructure

 27Axel Keller Phase: Runtime  Runtime Phase = lifetime of job in system  adherence with SLA has to be assured  FT mechanisms have to be utilized  Phase consists of three distinct steps  Stage-In  transmission of required input data from Grid customer to compute resource  Computation  execution of application  Stage-Out  transmission of generated output data from compute resource back to Grid customer

 28Axel Keller Phase: Post-Runtime  Task of Post-Runtime Phase:  Re-Configuration of all resources  e.g. re-configuration of network  e.g. deletion of checkpoint datasets  e.g. deletion of temporary data  Counterpart to Pre-Runtime Phase  Allocation of resources ends  Update of schedules in RMS and storage  Resources are available for new jobs

 29Axel Keller Administrative Domain HPC4U29 Motivation: Cross Border Migration HPC4U Cluster Middleware HPC4U Customer

 30Axel Keller  PROCESS

 31Axel Keller Subsystems  Process Subsystem  checkpointing of network  cooperative checkpointing protocol (CCP)  Network Subsystem  checkpoint network state  Storage Subsystem  provision of storage  provision of snapshot

 32Axel Keller Metacluster: Checkpointing Subsystem  Virtualization of Resources  Capture of full application context  resources, states, process hierarchy  Non-intrusive  “Virtual Bubble”

 33Axel Keller  STORAGE

 34Axel Keller Storage Resource 1 Storage Resource 2 Storage subsystem  Functionalities  Negotiates the storage part of the SLA  Provides storage capacity at a given QoS level  Provides FT mechanisms  Requirement: manage multiple jobs running on the same SR Computing Storage Physical space Virtual Storage Manager Interface VSM - SR Logical space (data layout strategies)

 35Axel Keller Data Container concept  Idea:  create storage environment for applications at a desired QoS level with abstraction of physical devices  Components: Logical Volume File System Job File I/O (read, write, open,…) Block I/O (read, write, ioctl) Storage Resource Data Container Physical devices Block Address Mapping data layout policies (e.g., simple striping) Block I/O Logical space

 36Axel Keller Data container properties  Storage part of the SLA Data container section  Size  File system type  Number of nodes that need to access the data container (private/shared) Performance section  Application I/O profile  Benchmark  Bandwidth (in MB/s or IO/s) Or Default configuration Dependability section  Data redundancy type (within a cluster)  Snapshot needed or not  Data replication or not (between clusters) Job specific section  Job’s time to schedule and time to finish

 37Axel Keller Fault Tolerance Mechanisms  RAID  Tolerate the failure of one or more disks  RAIN  Tolerate the failure of one or more nodes  Implementation  Hardware  Software  Storage FT mechanisms rely on special data layouts Software  Storage Snapshot

 38Axel Keller Data container snapshot  Provide instantaneous copy of data containers  Technique used: Copy-On-Write (COW)  create multiple copies of data without duplicating all the data blocks  With checkpoint, it allows application restart from a previous running stage  Impact on SR performance  Taken into account at negotiation time

 39Axel Keller Redundant data layout Job Storage Resource Snapshot s ingle node job restart after node failure Characteristics: The job is running on a single node The data container is private to that node Data container snapshot resides on the same storage resource Restore job’s state from previous checkpoint 4 Node failure 1 1 Restore job’s data from previous snapshot 2 2 3 Start data container 3 Job 4 Job restart 5 5

 40Axel Keller Interfaces with other components RMSRMS VSMVSM data container data container data container Storage Resource (SR) Interface VSM - RMS Interface VSM – SR StorageSubsystemStorageSubsystem Network (socket, RDMA, …) VSMVSM ExanodesExanodes wrapper Classical Storage Array wrapper Classical Storage Array Classical SR_type1 SR_type2 Open Source ProprietaryProprietary client-server callbacks

 41Axel Keller  ASSESSGRID

 42Axel Keller Motivation AssessGrid Accept this job? Checkpoint Node crashes Restart

 43Axel Keller Grid Fabric Layer with Risk Assessor NegotiationManager -Agr./Agr.Fact. WS -checks whether offer complies to template -initiation of file transfers Scheduler -creates tentative schedules for offers Risk Assessor Consultant Service -records data Monitoring -runtime behavior

 44Axel Keller Motivation AssessGrid Aim of AssessGrid – Introduce risk awareness in Grid technology Risk awareness incorporated across three layers – End-user – Broker – Service Provider

 45Axel Keller AssessGrid - Architectural Overview  End-user  Portal  Broker  Risk Assessor  Confidence Service  Workflow Assessor  Provider  Negotiator  Scheduler  Risk Assessor  Consultant Service

 46Axel Keller Precautionary Fault-Tolerance  How many spare resources are available at execution time? Use of planning based scheduler

 47Axel Keller Estimating Risk for a Job Execution  Use of planning based scheduler  How much slack time is available for fault tolerance?  How much effort do I undertake for fault tolerance?  What is the considered risk of resource failure? Earliest Start Time Latest Finish Time Execution TimeSlack Time

 48Axel Keller Risk Assessment low risk middle risk high risk  Estimate risk for agreeing an SLA  consider risk of resource failure  estimate risk for a job execution  initiate precautionary FT mechanisms

 49Axel Keller Risk Management at Job Execution Risk Management Decisions Actions Risk Assessment Business Model (price, penalty) Weekend/Holiday/Workday Schedule (SLAs, best effort) Redundancy Measures Events

 50Axel Keller Detection of Bottlenecks  Consultant Service  Analysis of SLA violation  Estimated risk for the job  Planned FT mechanisms  Monitoring Information  Job  Resources  Data Mining  Find connections between SLA violations  Detect weak points in the provider’s infrastructure

 51Axel Keller  WS-AG

 52Axel Keller Components

 53Axel Keller Implementation with Globus Toolkit 4  Why Globus?  Utility: Authentication, Authorization, Delegation, RFT, MDS, WS-Notification  Impact  Problem 1: GRAM (Grid Resource Allocation and Management)  State machine, incl. File-Staging, Delegation of Credentials, RSL  Cannot use it: written for batch schedulers, nor for planning schedulers  Problem 2: Deviations from WS-AG spec.  Different Namespaces WS-A, WS-RF

 54Axel Keller Implementation with Globus Toolkit 4  Technical Challenges  xs:anyType  Wrote custom serializers/deserializers  Subtitution groups  Used in ItemConstraint (Creation Constraints)  Cannot be mapped to Java by Axis  Replaced by xs:anyType – use as DOM tree  CreationConstraints  Namespace prefixes in XPaths meaningless  Need for WSDL and interpretation for xs:all, xs:choice, and friends

 55Axel Keller Context … /C=DE/O=… EPR DN … Context Terms Creation Constraints

 56Axel Keller Terms, SDTs  Conjunction of terms  Common structure of templates  WS-AG too powerful/difficult to fully support  Service Description Term (one)  assessgrid:ServiceDescription (extension of abstract ServiceTermType)  jsdl:POSIXExecutable (executable, arguments, environment)  jsdl:Application (mis-)used for libraries  jsdl:Resources  jsdl:DataStaging *  assessgrid:PoF (upper bound) Context Terms Creation Constraints

 57Axel Keller Terms, GuaranteeTerms  No hierarchy but two meta guarantees  ProviderFulfillsAllObligations  e.g. Reward: 1000 EUR, Penalty 1000 EUR  ConsumerFulfillsAllObligations  e.g. Reward: 0 EUR, Penalty 1000 EUR  First violation is responsible for failure  No hardware problem, then User fault  Other Guarantees  Execution Time  Any start time (best effort)  Exact start time  Earliest start time, latest finish time  User provides StageIn files by time X  Provider keeps StageOut files until time Y Context Terms Creation Constraints No timely execution No stage-out

 58Axel Keller Terms SLA does not contain requirements of fault tolerance mechanisms Covered by asserted PoF, penalty and loss of reputation Compulsory Assessment Intervals not really useful for us How often do you assess that job was allocated for asserted time? Preferences too complicated Context Terms Creation Constraints

 59Axel Keller CreationConstraints Difficult to support Namespaces: //wsag:…/assessgrid:… - prefixes are just strings Very difficult to support structural information xs:group, xs:all, xs:choice, xs:sequence Possible but difficult to support xs:restriction  xs:simple  Check for enumeration (xs:restriction of xs:string)  Check for valid dates (xs:restriction of xs:date)  Everything else close to impossible  {min,max}{In,Ex}clusive  totalDigits, fractionDigits, length, … probably useless Context Terms Creation Constraints

Transparent Cross-Border Migration of Parallel Multi Node Applications Dominic Battré, Matthias Hovestadt, Odej Kao, Axel Keller, Kerstin Voss Cracow Grid.

Similar presentations

Presentation on theme: "Transparent Cross-Border Migration of Parallel Multi Node Applications Dominic Battré, Matthias Hovestadt, Odej Kao, Axel Keller, Kerstin Voss Cracow Grid."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Transparent Cross-Border Migration of Parallel Multi Node Applications Dominic Battré, Matthias Hovestadt, Odej Kao, Axel Keller, Kerstin Voss Cracow Grid.

Similar presentations

Presentation on theme: "Transparent Cross-Border Migration of Parallel Multi Node Applications Dominic Battré, Matthias Hovestadt, Odej Kao, Axel Keller, Kerstin Voss Cracow Grid."— Presentation transcript:

Similar presentations

About project

Feedback