Presentation is loading. Please wait.

Presentation is loading. Please wait.

JRA7 and SAGA Malcolm Illingworth, EPCC OGF19 Chapel Hill 29/01 – 02/02 2007.

Similar presentations

Presentation on theme: "JRA7 and SAGA Malcolm Illingworth, EPCC OGF19 Chapel Hill 29/01 – 02/02 2007."— Presentation transcript:

1 JRA7 and SAGA Malcolm Illingworth, EPCC OGF19 Chapel Hill 29/01 – 02/

2 DEISA Objectives To deploy and operate a persistent, production quality, distributed supercomputing environment with continental scope To enable scientific discovery across a broad spectrum of science and technology. Scientific impact (enabling new science) is the only criterion for success. Users should not be aware of complex grid technologies) and applications transparency Minimal intrusion on applications

3 JRA7 Objectives To develop a single way of coordinating and integrating OGSA- based services for distributed resource management in a heterogeneous environment, and to use this to integrate a variety of existing user-level tools to provide the necessary high-level services in: - authentication, authorisation and accounting; - job preparation, submission and monitoring; - data movement for job input and output; - other areas to be determined by DEISA user requirements. DESHL: DEISA Services for the Heterogeneous management Layer

4 Current status and future plans Started in May 2004 Decision taken to follow SAGA mid-2005 Project finishes in April 2008 DESHL command line tool deployed and tested at all 11 DEISA sites DESHL training included at DEISA user training sessions since July 2005 Some take up from outside of DEISA Recent focus on usability and robustness DESHL 4.1 due for release in April Possible inclusion by eDEISA for lifesciences portal development (integration with EngineFrame)

5 The Big Picture Standards-based interfaces to allow user-level tools to interact across heterogeneous sites. JRA7 DESHL Data-MgtInformation DataHPC Network Resources HPC Site Data-Mgt UNICORE DRM Information DataHPC Network Resources HPC Site UNICORE DRM DEISA Services for the Heterogeneous management Layer Batch Job service Data Management service Information service User tools User Job At a local site a user wants to run a job on the DEISA heterogeneous environment

6 DESHL v4.1 Components UNICORE Gateway Server SAGA Client Library Grid Access Library ARCON Client library Command Line Tool Client DESHL

7 Command line tool functionality The precise set of operations is based upon application requirements, but focus has been on file transfer and job submission. Data Transfer –Upload/download files between local workstation and DEISA site –delete a file at a DEISA site –determine if a file exists on a DEISA site –list the contents of a directory on a DEISA site –rename a file on a DEISA site –copy/move a file between DEISA sites Job Management –determine the DEISA sites to which a user can submit a batch job to –submit a batch job to a DEISA site –terminate a batch job at a DEISA site –view the status of a batch job on a DEISA site –retrieve job stdout and stderr

8 Client Library Provides factory classes for access to remote job services and remote file systems Specific implementation classes are specified via a properties file and hidden from the caller Changes in implementation should not be visible to caller Remote resources configured locally via configuration file Jobs specified to CLT as SAGA directive scripts SAGA directives translated to JSDL script JSDL script is submitted to a site via Grid Library. Grid Library returns a Task object for submitted JSDL script.

9 SAGA Factory Classes SAGA interfaces obtained from factory classes DESHLNSDir dir = DESHLClientFactory.getNSDirFactory().getInstan ce(Session session); JobService js = DESHLClientFactory.getJobServiceFactory().getI nstance(Session session); Caller identity(s) provided via Session object containing appropriate context objects TODO - Currently have UnicoreContext interface extending Context, will refactor to SAGA-compliant attribute-based Context - TODO – rename DESHLNSDir to NSDir

10 NSDir interface (1) public interface DESHLNSDir { String[] list( String dir ) throws SAGAException, BadParameterException, DoesNotExistException; boolean exists(String name) throws SAGAException, BadParameterException; boolean isDir(String name) throws SAGAException, BadParameterException, DoesNotExistException; boolean isFile(String name) throws SAGAException, BadParameterException, DoesNotExistException;

11 NSDir Interface (2) void copy(String source, String target, int[] copyFlags) throws SAGAException, BadParameterException, DoesNotExistException, IncorrectStateException; void move(String source, String target, int[] moveFlags) throws SAGAException,BadParameterException, DoesNotExistException,IncorrectStateException; void remove(String target, int[] removeFlags) throws SAGAException, BadParameterException, DoesNotExistException,IncorrectStateException; void makeDir(String target, int[] makeDirFlags) throws SAGAException, BadParameterException, IncorrectStateException;

12 NSDir Interface (3) Methods implemented but not currently used: –(no persistence in CLT application, not currently relevant) String getURL() throws SAGAException; String getName() throws SAGAException; void changeDir(String dir) throws SAGAException, BadParameterException, DoesNotExistException; int getNumEntries() throws SAGAException; String getEntry(int entry) throws SAGAException, BadParameterException;

13 Job Service Interface public interface JobService { Job submitJob( JobDefinition jobDef ) throws SAGAException; String[] list(boolean showAllDetails) throws SAGAException; Job getJob( String jobId ) throws SAGAException; /* not specified by SAGA but very useful */ public String[] listJobsForSite( String siteName, boolean showAllDetails) throws SAGAException; }

14 JobDefinition Contains job description as set of SAGA attributes JobDefinition interface extends Attribute interface Implementation defines the set of attributes we support CLT reads SAGA definitions from a text file to build job definition Example simple job submission script: #!/bin/bash # Test job script for DESHL using SAGA. # # SAGA JobDefinition based directives: #$ SAGA_FileTransfer = file:///jobs/ > #$ SAGA_HostList = ssl:// #$ SAGA_JobCmd = #$ SAGA_JobName = example job script

15 More complex example … # SAGA JobDefinition based directives: #$ SAGA_JobCmd = a.out #$ SAGA_FileTransfer = file:///unicore/a.out#HOME > a.out #$ SAGA_HostList = ssl:// #$ SAGA_FileTransfer = file:///TestOutput#HOME < TestOutput #$ SAGA_JobEnv = account_no=e24-sa #$ SAGA_JobEnv = stack_limit=200MB #$ SAGA_Memory = #$ SAGA_NumTasks = 16 #$ SAGA_NumCpus = 1 #$ SAGA_WallClockSoftLimit = 3600

16 Currently supported attributes SAGA_JobCmd SAGA_JobArgs SAGA_JobEnv SAGA_JobName SAGA_FileTransfer SAGA_HostList (note: only one host can currently be specified, DEISA does not have a broker) SAGA_NumTasks SAGA_NumCpus (interpreted as number of threads per task) SAGA_Memory (host uses value to calculate stack and heap) SAGA_WallClockSoftLimit

17 Job Interface Uses subset of SAGA Job interface. Due to translation steps (SAGA-JSDL-AJO), not possible to retrieve SAGA job definition from remote host. public interface Job { String getJobId(); JobState getJobState(); String getJobStateDetail(); void terminate(); /* Not specified by SAGA but required by UNICORE to * retrieve output from USPACE and free resources. */ void cleanUp( File toDir ); }

18 Example job submission Session session; … // get the class factory JobServiceFactory factory = DESHLClientFactory.getJobServiceFactory(); // get an instance of the job service from the factory JobService js = factory.getInstance(session); JobDefBuilder jobDefBuilder = new JobDefBuilder();... // build up job definition from file or arguments // get the constructed job definition JobDef jobDef = jobDefBuilder.create(); // submit the job, return a job instance Job submittedJob = js.submitJob( jobDef ); // get the job identifier, eg to display to the user String jobID = job.getJobId(); // get the job instance again from the job identifier Job remoteJob = js.getJob(jobID); // get the job's status JobState jobState = remoteJob.getJobStatus(); // retrieve the job output to a specified directory remoteJob.fetch("/home/malcolm/joboutputdir");

19 Example copy operation Session session; int copyFlags[] = { NSDirFlags.copyFlags_NoRecursive, NSDirFlags.NoOverwrite }; String source = "ssl://"; String target = "ssl://"; // get an instance of the factory NSDirFactory factory = DESHLClientFactory.getNSDirFactory(); // get an instance of the NSDir interface from the factory NSDir dir = factory.getInstance(session); // verify the source file exits boolean sourceFileExists = dir.exists("ssl:// est.dat"); // copy the file to the other site dir.copy(source, target, copyFlags); // verify the file turned up at the remote site boolean targetFileExists = dir.exists(target);

20 Grid Access Library (roctopus) Presents a generalised object- oriented model for interacting with a UNICORE grid, not purely for DESHL Provides a general interface that can have multiple implementations Jobs submitted to a Site as JSDL scripts, returns a Task. Presents Task interface to represent executing jobs. All of this hidden from the user/application developer Authentication/Authorisation is by existing UNICORE mechanisms ie. long-lived x509 pairs Grid File Storage Site 1 0.* 1 1

21 Grid Library interface Provides dedicated functions for file management/transfer Job submission/management via rich Task interface Job submitted as JSDL, Task instance returned List of tasks at a remote site can be retrieved and manipulated example: JobDefinition jobDef; … XmlJobDefinitionDocument jsdl = JobDefJSDLConverter.jobDefToJSDL( jobDef ); host = new UnicoreLocation( unicoreLocationStr ); Site site = grid.locateSite( host ); final Task task = site.submit( jobSubmission ); task.startASync( new File[] {} );

22 Current Issues (1) –SAGA defines job identifiers as [backend url]-[native id] Example [ssh://]-[1234] –(We escape out any characters likely to be a problem on the command line) –Fine programatically … –From a CLT perspective, not user friendly $ deshl submit –q ssl:// Your job: , has been successfully submitted. $ deshl status

23 Current Issues (2) –Could save job id to a file and use simpler naming convention –DESHL allows aliases to be defined for remote sites $ deshl submit –q myHost Your job myHost%2F has been successfully submitted nsdir.copy(myhosta/home/malcolm/test.dat, myhostb/home/malcolm/test.dat); –Aliases are currently specified and handled outside of the SAGA standard, we would like to include this as an optional attribute in the context

24 Current Issues (3) Retrieving job definition: –Not currently supported … –Job definition originally as SAGA script –Not possible to retrieve original SAGA job definition from remote host, as host does not receive or understand this, would need to rely on local persistence –May be possible to get JSDL description, reverse translate to SAGA –(could store original SAGA script in a local database with job id) Debugging / Exception reporting: –Layered architecture can be difficult to debug. –Sometimes unclear if a problem is in middleware or on remote host, very clear exception reporting required or user will tend to blame middleware for operational problems on host.

25 Questions … ?

Download ppt "JRA7 and SAGA Malcolm Illingworth, EPCC OGF19 Chapel Hill 29/01 – 02/02 2007."

Similar presentations

Ads by Google