Presentation is loading. Please wait.

Presentation is loading. Please wait.

- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.

Similar presentations


Presentation on theme: "- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software."— Presentation transcript:

1 CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop BNL - May 7, 2002

2 CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Processing ATLAS distributed processing, PPDG year 2 program role of MOP, other middleware & third party tools objectives: deliverables to users job description language: status, objectives, options —EDG JDL

3 CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Collective Services Information & Monitoring Replica Manager Grid Scheduler Replica Optimization Replica Catalog Interface Grid Application Layer Job Management Local Application Local Database Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Fabric Local Computing Grid Data Management Metadata Management Object to File Mapper Underlying Grid Services Computing Element Services Authorisation, Authentication and Accounting Replica Catalog Storage Element Services SQL Database Service Service Index pink: WP1 yellow:WP2 Architecture

4 CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Jul’01: PSEUDOCODE FOR ATLAS SHORT TERM UC01 Logical File Name LFN = "lfn://"hostname"/"any_string Physical File Name PFN = "pfn://"hostname"/"path Transfer File Name TFN = "gridftp://"PFN_hostname"/path JDL InputData = {LFN[]} OutputSE = host.domain.name Worker Node LFN[] = WP1.LFNList() for (i=0;i<LFN.list;i++){ PFN[] = ReplicaCatalog.getPhysicalFileNames(LFN[i]) j = Athena.eventSelectonSrv.determineClosestPF(PFN[]) localFile = GDMP.makeLocal(PFN[j],OutputSE) Athena.eventSelectionSrv.open(localFile) } PFN[] = getPhysicalFileNames(LFN): PFN = getBestPhysicalFileName(PFN[], String[] protocols) TFN = getTransportFileName(PFN, String protocol) filename = getPosixFileName(TFN)

5 CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Sample Use Case: Simple Grid Job Submit and run a simple batch job to process one input file to produce one output file The user specifies his job via a JDL file: Executable=/usr/local/atlas.sh Requirements = TS >= 1GB Input.LFN = lfn://atlas.hep/foo.in argv1 = TFN(Input.LFN) Output.LFN= lfn://atlas.hep/foo.out Output.SE = datastore.rl.ac.uk argv2 = TFN(Output.LFN) and where the submitted “job” is: #!/bin/sh gridcp $1 $HOME/tmp1 grep higgs $HOME/tmp1 > $HOME/tmp2 gridcp $HOME/tmp2 $2

6 CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Steps for Simple Job Example Grid Scheduler Replica Manager Compute Element Storage Element Compute Element Storage Element Replica Catalogue send job Get LFN to SFN mapping copy input file, allocate output file start job User Select CE and SE job done copy output file site Asite B

7 CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Steps to Execute this Simple Grid Job User submits the job to the Grid Scheduler. Grid Scheduler asks the Replica Manager for list of all PFNs for the specified input file. Grid Scheduler determines if it is possible to run the job at a Compute Element that is “local” to one of the PFNs. —If not, it locates the best CE for the job, and creates a new replica of the input file on a SE local to that CE. Grid Scheduler then allocates space for the output file, and “pins” the input file so that is not deleted or staged to tape until after the job has completed. Then the job is submitted to the CEs job queue. When the Grid Scheduler is notified that the job has completed, it tells the Replica Manager to create a copy of the output file at the site specified in the Job Descriptions file. Replica Manager then will tag this copy of the output file the “master”, and make the original file a “replica”.

8 CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) WP1: Job Status SUBMITTED -- The user has submitted the job to the User Interface. WAITING -- The Resource Broker has received the job. READY -- A Computing Element matching job requirements has been selected. SCHEDULED -- The Computing Element has received the job. RUNNING -- The job is running on a Computing Element. CHKPT -- The job has been suspended and check-pointed on a Computing Element. DONE -- The execution of the job has completed. ABORTED -- The job has been terminated. CLEARED -- The user has retrieved all output files successfully. Bookkeeping information is purged some time after the job enters this state.

9 CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) WP1: Job Submission Service (JSS) strictly coupled with a Resource Broker —deployed for each installed RB single interface (non-blocking), used by the RB —job_submit() submit a job to the specified Computing Element, managing also input and output sandboxes —job_cancel() kill a list of jobs, identified by their dg_jobId. Logging and Bookkeeping (LB) Service - store & manage logging and bookkeeping information generated by Scheduler & JSS components (Information and Monitoring service —Bookkeeping: currently active jobs - job definition, expressed in JDL, status, resource consumption, user-defined data(?) —Logging - status of the Grid Scheduler & related components. These data are kept for a longer term and are used mainly for debugging, auditing and statistical purposes

10 CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) WP1: Job Description Language (JDL) Condor classified advertisements (ClassAds) adopted as Job Description Language (JDL) —Semi-structured data model: no specific schema is required. —Symmetry: all entities in the Grid, in particular applications and computing resources, should be expressible in the same language. —Simplicity: the description language should be simple both syntactically and semantically. Executable = “simula”; Arguments = “1 2 3”; StdInput = “simula.config”; StdOutput = “simula.out”; StdError = “simula.err”; InputSandbox = {“/home/joe/simula.config”, “/usr/local/bin/simula”}; OutputSandbox = {“simula.out”, “simula.err”, “core”}; InputData = “LF:test367-2”; Replica Catalog = “ldap://pcrc.cern.ch:2010/rc=Replica Catalog, dc=pcrc, dc=cern, dc=ch” DataAccessProtocol = {“file”, “gridftp”}; OutputSE = “lxde01.pd.infn.it”; Requirements = other.Architecture == “INTEL” && other.OpSys == “LINUX”; Rank = other.AverageSI00;

11 CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) WP1: Sandbox Working area (input & output) replicated on each CE to which Grid job is submitted. —Very convenient & natural. My Concerns: —Requires network access (with associated privileges) to all CEs on Grid. Could be a huge security issue with local administrators. —Not (yet) coordinated with WP2 services. —Sandbox contents not customizable to local (CE/SE/PFN) environment. —Temptation to Abuse (not for data files)

12 CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) EDG JDL job description language: status, objectives, options Status: —Working in EDG testbed Objectives: —Provide WP1 Scheduler enough information to locate necessary resources (CE, SE, data, software) to execute job. Options:


Download ppt "- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software."

Similar presentations


Ads by Google