Download presentation
Presentation is loading. Please wait.
Published byClaude Grant Modified over 9 years ago
1
Transparent Cross-Border Migration of Parallel Multi Node Applications Dominic Battré, Matthias Hovestadt, Odej Kao, Axel Keller, Kerstin Voss Cracow Grid Workshop 2007
2
2 2Axel Keller Outline Motivation The Software Stack Cross-Border Migration Summary Highly Predictable Clusters for Internet-Grids EC funded project in FP6 Advanced Risk Assessment & Management for Trustable Grids EC funded project in FP6
3
3 3Axel Keller The Gap between Grid and RMS SLA RMS M1M2M3 grid middleware user request Reliability? Quality of Service? Best Effort! User asks for SLA Grid Middleware realizes job by means of local RMS BUT: RMS offer Best Effort Need: SLA-aware RMS Guaranteed!
4
4 4Axel Keller HPC4U: Highly Predictable Clusters for Internet-Grids Objective Software-only solution for an SLA-aware, fault tolerant infrastructure, offering reliability and QoS, and acting as active Grid component Key Features System level checkpointing Job migration Job types: sequential and MPI-parallel Planning based scheduling
5
5 5Axel Keller HPC4U: Planning Based Scheduling queues new jobs time queuing systemsplanning systems planned time framepresentpresent and future new job requestsinsert in queuesre-planning assignment of planned start time noall requests runtime estimationnot necessarymandatory backfillingoptionalyes, implicit advance reservationsnot possibleyes, trivial Machine
6
6 6Axel Keller HPC4U: Software Stack ProcessNetwork Storage RMS Negotiation User- / Broker- Interface Cluster Scheduler CLI SSC
7
7 7Axel Keller HPC4U: Checkpointing Cycle RMS NetworkStorageProcess 1. CP job +halt 2. In- Transit Packets 4. Snap- shot ! 5. Link to Snapshot 6. Resume job 7. Job running again 3. Return: “Checkpoint completed!”
8
8 8Axel Keller Cross Border Migration: Intra Domain ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP
9
9 9Axel Keller Cross Border Migration: Target Retrieval ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP
10
10Axel Keller Cross Border Migration: Checkpoint Migration ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP
11
11Axel Keller Cross Border Migration: Remote Execution ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP
12
12Axel Keller Cross Border Migration: Result Migration ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP ProcessNetwork Storage RMS User- / Broker- Interface Cluster CLI CRM Negotiation Scheduler SSC PP
13
13Axel Keller Cross-Border Migration: Using Globus ProcessNetwork Storage RMS User- / Broker- Interface Cluster WS-AG CLI CRM Negotiation Scheduler PP WS-AG implementation based on GT4 Developed in EU project AssessGrid Source specifies SLA / file staging parameters Subset of JSDL (POSIX Jobs) Resource determination via broker Source directly contacts destination Destination pulls migration data via Grid-FTP Destination pushes result data back to source Source uses WSRF event notification SSC Broker
14
14Axel Keller Ongoing Work: Introducing Risk Management ProcessNetwork Storage RMS User- / Broker- Interface Cluster Consultant Service Monitoring CLI CRM Negotiation Scheduler SSC Risk Assessor PP Topic of EU project: AssessGrid Encorporated in SLA Provider Estimates risk for agreeing an SLA Considers propability of failure in schedule Assessment based on historical data BrokerWS-AG
15
15Axel Keller Summary: Best Effort is not Enough Cross border migration and Risk assessment provide new means to increase the reliability of Grid Computing.
16
16Axel Keller Read the paper AssessGridwww.assessgrid.eu HPC4Uwww.hpc4u.eu OpenCCSwww.openccs.eu More information Thanks for your attention!
17
17Axel Keller Contents BACKUP
18
18Axel Keller Scheduling Aspects Execution Time Exact start time Earliest start time, latest finish time User provides stage-in files by time X Provider keeps stage-out files until time Y Provisional Reservations Job Priorities Job Suspension
19
19Axel Keller HPC4U: Planning Based Scheduler Space-Sharing CPUs 16 Time 4 32 81012261416 l=2h Reservation for Grid Job according SLA (l=6h) l=3h Reser- vation 2 Run-time estimation Start time assignment
20
20Axel Keller HPC4U
21
21Axel Keller Motivation: Fault Tolerance Commercial Grid users need SLAs Providers cautious on adoption Reason: Business case risk Missed deadlines due to system failures Penalties to be paid Solution: Prevention with Fault Tolerance Fault tolerance mechanisms available, but 1. Application modification mandatory 2. Overall solution (System software, process, storage, file system, network) required 3. Combination with Grid migration missing
22
22Axel Keller HPC4U Objective Software-only solution for a SLA-aware, fault tolerant infrastructure, offering reliability and QoS, acting as active Grid component Key features Definition and implementation of SLAs Resource reservation for guaranteed QoS Application-transparent fault tolerance
23
23Axel Keller HPC4U: Concept 1. SLA negotiation as an explicit statement of expectations and obligations in a business relationship between provider and customer 2. Reservation of CPU, storage and network for desired time interval 3. Job start in checkpointing environment 4. In case of system failure Job migration / restart with respect to SLA
24
24Axel Keller HPC4U: Project Outcomes
25
25Axel Keller Phases of Operation Stage In Compu- tation Stage Out time Pre- Runtime Post- Runtime Negotiation Lifetime of SLA Allocation of system resources Acceptance (or rejection) of SLA Negotiation of SLA Pre-Runtime: Configuration of Resources e.g. network, storage, compute nodes Runtime: Stage-In, Computation, Stage-Out Post-Runtime: Re-configuration
26
26Axel Keller Phase:Pre-Runtime Task of Pre-Runtime Phase Configuration of all allocated resources Goal: Fulfill requirements of SLA Reconfiguration affects all HPC4U elements Resource Management System –e.g. configuration of assigned compute nodes Storage Subsystem –e.g. initialization of a new data partition Network Subsystem –e.g. configuration of network infrastructure
27
27Axel Keller Phase: Runtime Runtime Phase = lifetime of job in system adherence with SLA has to be assured FT mechanisms have to be utilized Phase consists of three distinct steps Stage-In transmission of required input data from Grid customer to compute resource Computation execution of application Stage-Out transmission of generated output data from compute resource back to Grid customer
28
28Axel Keller Phase: Post-Runtime Task of Post-Runtime Phase: Re-Configuration of all resources e.g. re-configuration of network e.g. deletion of checkpoint datasets e.g. deletion of temporary data Counterpart to Pre-Runtime Phase Allocation of resources ends Update of schedules in RMS and storage Resources are available for new jobs
29
29Axel Keller Administrative Domain HPC4U29 Motivation: Cross Border Migration HPC4U Cluster Middleware HPC4U Customer
30
30Axel Keller PROCESS
31
31Axel Keller Subsystems Process Subsystem checkpointing of network cooperative checkpointing protocol (CCP) Network Subsystem checkpoint network state Storage Subsystem provision of storage provision of snapshot
32
32Axel Keller Metacluster: Checkpointing Subsystem Virtualization of Resources Capture of full application context resources, states, process hierarchy Non-intrusive “Virtual Bubble”
33
33Axel Keller STORAGE
34
34Axel Keller Storage Resource 1 Storage Resource 2 Storage subsystem Functionalities Negotiates the storage part of the SLA Provides storage capacity at a given QoS level Provides FT mechanisms Requirement: manage multiple jobs running on the same SR Computing Storage Physical space Virtual Storage Manager Interface VSM - SR Logical space (data layout strategies)
35
35Axel Keller Data Container concept Idea: create storage environment for applications at a desired QoS level with abstraction of physical devices Components: Logical Volume File System Job File I/O (read, write, open,…) Block I/O (read, write, ioctl) Storage Resource Data Container Physical devices Block Address Mapping data layout policies (e.g., simple striping) Block I/O Logical space
36
36Axel Keller Data container properties Storage part of the SLA Data container section Size File system type Number of nodes that need to access the data container (private/shared) Performance section Application I/O profile Benchmark Bandwidth (in MB/s or IO/s) Or Default configuration Dependability section Data redundancy type (within a cluster) Snapshot needed or not Data replication or not (between clusters) Job specific section Job’s time to schedule and time to finish
37
37Axel Keller Fault Tolerance Mechanisms RAID Tolerate the failure of one or more disks RAIN Tolerate the failure of one or more nodes Implementation Hardware Software Storage FT mechanisms rely on special data layouts Software Storage Snapshot
38
38Axel Keller Data container snapshot Provide instantaneous copy of data containers Technique used: Copy-On-Write (COW) create multiple copies of data without duplicating all the data blocks With checkpoint, it allows application restart from a previous running stage Impact on SR performance Taken into account at negotiation time
39
39Axel Keller Redundant data layout Job Storage Resource Snapshot s ingle node job restart after node failure Characteristics: The job is running on a single node The data container is private to that node Data container snapshot resides on the same storage resource Restore job’s state from previous checkpoint 4 Node failure 1 1 Restore job’s data from previous snapshot 2 2 3 Start data container 3 Job 4 Job restart 5 5
40
40Axel Keller Interfaces with other components RMSRMS VSMVSM data container data container data container Storage Resource (SR) Interface VSM - RMS Interface VSM – SR StorageSubsystemStorageSubsystem Network (socket, RDMA, …) VSMVSM ExanodesExanodes wrapper Classical Storage Array wrapper Classical Storage Array Classical SR_type1 SR_type2 Open Source ProprietaryProprietary client-server callbacks
41
41Axel Keller ASSESSGRID
42
42Axel Keller Motivation AssessGrid Accept this job? Checkpoint Node crashes Restart
43
43Axel Keller Grid Fabric Layer with Risk Assessor NegotiationManager -Agr./Agr.Fact. WS -checks whether offer complies to template -initiation of file transfers Scheduler -creates tentative schedules for offers Risk Assessor Consultant Service -records data Monitoring -runtime behavior
44
44Axel Keller Motivation AssessGrid Aim of AssessGrid – Introduce risk awareness in Grid technology Risk awareness incorporated across three layers – End-user – Broker – Service Provider
45
45Axel Keller AssessGrid - Architectural Overview End-user Portal Broker Risk Assessor Confidence Service Workflow Assessor Provider Negotiator Scheduler Risk Assessor Consultant Service
46
46Axel Keller Precautionary Fault-Tolerance How many spare resources are available at execution time? Use of planning based scheduler
47
47Axel Keller Estimating Risk for a Job Execution Use of planning based scheduler How much slack time is available for fault tolerance? How much effort do I undertake for fault tolerance? What is the considered risk of resource failure? Earliest Start Time Latest Finish Time Execution TimeSlack Time
48
48Axel Keller Risk Assessment low risk middle risk high risk Estimate risk for agreeing an SLA consider risk of resource failure estimate risk for a job execution initiate precautionary FT mechanisms
49
49Axel Keller Risk Management at Job Execution Risk Management Decisions Actions Risk Assessment Business Model (price, penalty) Weekend/Holiday/Workday Schedule (SLAs, best effort) Redundancy Measures Events
50
50Axel Keller Detection of Bottlenecks Consultant Service Analysis of SLA violation Estimated risk for the job Planned FT mechanisms Monitoring Information Job Resources Data Mining Find connections between SLA violations Detect weak points in the provider’s infrastructure
51
51Axel Keller WS-AG
52
52Axel Keller Components
53
53Axel Keller Implementation with Globus Toolkit 4 Why Globus? Utility: Authentication, Authorization, Delegation, RFT, MDS, WS-Notification Impact Problem 1: GRAM (Grid Resource Allocation and Management) State machine, incl. File-Staging, Delegation of Credentials, RSL Cannot use it: written for batch schedulers, nor for planning schedulers Problem 2: Deviations from WS-AG spec. Different Namespaces WS-A, WS-RF
54
54Axel Keller Implementation with Globus Toolkit 4 Technical Challenges xs:anyType Wrote custom serializers/deserializers Subtitution groups Used in ItemConstraint (Creation Constraints) Cannot be mapped to Java by Axis Replaced by xs:anyType – use as DOM tree CreationConstraints Namespace prefixes in XPaths meaningless Need for WSDL and interpretation for xs:all, xs:choice, and friends
55
55Axel Keller Context … /C=DE/O=… EPR DN … Context Terms Creation Constraints
56
56Axel Keller Terms, SDTs Conjunction of terms Common structure of templates WS-AG too powerful/difficult to fully support Service Description Term (one) assessgrid:ServiceDescription (extension of abstract ServiceTermType) jsdl:POSIXExecutable (executable, arguments, environment) jsdl:Application (mis-)used for libraries jsdl:Resources jsdl:DataStaging * assessgrid:PoF (upper bound) Context Terms Creation Constraints
57
57Axel Keller Terms, GuaranteeTerms No hierarchy but two meta guarantees ProviderFulfillsAllObligations e.g. Reward: 1000 EUR, Penalty 1000 EUR ConsumerFulfillsAllObligations e.g. Reward: 0 EUR, Penalty 1000 EUR First violation is responsible for failure No hardware problem, then User fault Other Guarantees Execution Time Any start time (best effort) Exact start time Earliest start time, latest finish time User provides StageIn files by time X Provider keeps StageOut files until time Y Context Terms Creation Constraints No timely execution No stage-out
58
58Axel Keller Terms SLA does not contain requirements of fault tolerance mechanisms Covered by asserted PoF, penalty and loss of reputation Compulsory Assessment Intervals not really useful for us How often do you assess that job was allocated for asserted time? Preferences too complicated Context Terms Creation Constraints
59
59Axel Keller CreationConstraints Difficult to support Namespaces: //wsag:…/assessgrid:… - prefixes are just strings Very difficult to support structural information xs:group, xs:all, xs:choice, xs:sequence Possible but difficult to support xs:restriction xs:simple Check for enumeration (xs:restriction of xs:string) Check for valid dates (xs:restriction of xs:date) Everything else close to impossible {min,max}{In,Ex}clusive totalDigits, fractionDigits, length, … probably useless Context Terms Creation Constraints
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.