Int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid Elisa Heymann Department of Computer Architecture and Operating.

Slides:



Advertisements
Similar presentations
Challenges for Interactive Grids a point of view from Int.Eu.Grid project Remote Instrumentation Services in Grid Environment RISGE BoF Manchester 8th.
Advertisements

The Interactive European Grid Project Paul Heinzlreiter GUP, University Linz CoreGrid Summer School, Budapest,
Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Hands on Job Services.
EGC 2005, CrossGrid technical achievements, Amsterdam, Feb. 16th, 2005 WP2-3 New Generation Environment for Grid Interactive MPI Applications M igrating.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
Int.eu.grid: A grid infrastructure for interactive applications Gonçalo Borges LIP on behalf of Int.EU.Grid Collaboration INGRID’08, Italy, April 2008.
Grid Resource Allocation Management (GRAM) GRAM provides the user to access the grid in order to run, terminate and monitor jobs remotely. The job request.
Job Submission The European DataGrid Project Team
MPI support in gLite Enol Fernández CSIC. EMI INFSO-RI CREAM/WMS MPI-Start MPI on the Grid Submission/Allocation – Definition of job characteristics.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Supporting MPI Applications on EGEE Grids Zoltán Farkas MTA SZTAKI.
INFSO-RI Enabling Grids for E-sciencE Architecture of the gLite Workload Management System Giuseppe Andronico INFN EGEE Tutorial.
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.
Universität Dortmund Robotics Research Institute Information Technology Section Grid Metaschedulers An Overview and Up-to-date Solutions Christian.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
Workload Management Massimo Sgaravatto INFN Padova.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
EUFORIA FP7-INFRASTRUCTURES , Grant GridKa School 2008 Interactivity on the Grid Marcus Hardt SCC (The insitute formerly known as
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
OGF 25/EGEE User Forum Catania, March 2 nd 2009 Meta Scheduling and Advanced Application Support on the Spanish NGI Enol Fernández del Castillo (IFCA-CSIC)
Computational grids and grids projects DSS,
Grid Computing I CONDOR.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Visualisation of Plasma in Fusion Devices Interactive European Grid 30 th May 2007.
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Grid Workload Management Massimo Sgaravatto INFN Padova.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Migrating Desktop Marcin Płóciennik Marcin Płóciennik Kick-off Meeting, Santander, Graphical.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Review of Condor,SGE,LSF,PBS
Interactive Workflows Branislav Šimo, Ondrej Habala, Ladislav Hluchý Institute of Informatics, Slovak Academy of Sciences.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Globus Grid Tutorial Part 2: Running Programs Across Multiple Resources.
A step towards interoperability (between Int.EU.Grid and EGEE Grid infrastructures) Gonçalo Borges, Jorge Gomes LIP on behalf of Int.EU.Grid Collaboration.
INFSO-RI Enabling Grids for E-sciencE Job Description Language (JDL) Giuseppe La Rocca INFN First gLite tutorial on GILDA Catania,
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
BalticGrid-II Project EGEE UF’09 Conference, , Catania Partner’s logo Framework for Grid Applications Migrating Desktop Framework for Grid.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
Migrating Desktop Uniform Access to the Grid Marcin Płóciennik Poznan Supercomputing and Networking Center Poznan, Poland EGEE’07, Budapest, Oct.
Migrating Desktop Uniform Access to the Grid Marcin Płóciennik Poznan Supercomputing and Networking Center Poland EGEE’08 Conference, Istanbul, 24 Sep.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarksEGEE-III INFSO-RI MPI on the grid:
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
User requirements for interactive controlling and monitoring of applications in grid environments Dr. Isabel Campos Plasencia Institute of Physics of Cantabria.
Workload Management Workpackage
Architecture of the gLite WMS
Design rationale and status of the org.glite.overlay component
Peter Kacsuk – Sipos Gergely MTA SZTAKI
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Job Submission in the DataGrid Workload Management System
I2G CrossBroker Enol Fernández UAB
5. Job Submission Grid Computing.
Condor: Job Management
Basic Grid Projects – Condor (Part I)
The Condor JobRouter.
Condor-G Making Condor Grid Enabled
Presentation transcript:

int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid Elisa Heymann Department of Computer Architecture and Operating Systems

partner’s logo Condor Week 2008, May Outline  Introduction  CrossBroker  Parallel Job Support  Interactive Job Support  Conclusions

partner’s logo Condor Week 2008, May Introduction  int.eu.grid Environment: gLite (EGEE Grid Middleware) Extensions CrossBroker Migrating Desktop  Jobs not handled by gLite: parallel jobs (MPI) Run in more than one resource Interactive jobs The user interacts with the application during its execution

partner’s logo Condor Week 2008, May REMOTE SITE Internet REMOTE SITE Middleware SERVICES Middleware Batch execution on Grids F1F2 Job O1O2

partner’s logo Condor Week 2008, May REMOTE SITE Internet REMOTE SITE Middleware SERVICES Middleware F1F2 Job Parallel & Interactive Job Execution  Use of resources from different sites  Resource-sets search  Co-allocation & synchronization  Fast start-up  Execution in high-occupancy situations F1F2 Job MPI I/O forwarding

partner’s logo Condor Week 2008, May Architecture Scheduling Agent Resource Searcher Application Launcher Condor-GDAGMan CE WN EGEE/Globus CE WN EGEE/Globus Migrating Desktop Information Index Replica Manager CrossBroker

partner’s logo Condor Week 2008, May Architecture - CrossBroker  Scheduling Agent Receives each job and keeps it in a persistent queue Contacts Resource Searcher and gets a list of available resources Selects resources and passes them to the Application Launcher  Resource Searcher Given a job description (JobAd), performs the matchmaking between job needs and available resources. Uses the Condor ClassAd library, originally designed for matches of a single job with a single resource. A set matching has been developed to support matches of a single job to a group of resources.  Application Launcher Responsible for providing a reliable submission service of parallel applications on the Grid. Responsible for file staging at the remote site (executable and input/output files) Uses the services of Condor-G

partner’s logo Condor Week 2008, May Parallel Job Support  Support for parallel jobs: Open MPI PACX-MPI MPICH-P4 MPICH-G2  Takes into account sites capabilites  Ability to define starter scripts/process to start the parallel job mpi-start is configured automatically and used by default.

partner’s logo Condor Week 2008, May Parallel Job Support CE CE3=bee001.ific.uv.es FreeCPUs = 3 Disk = 100 AverageSI = 1000 Cross Broker CE CE5=lngrid02.lip.pt FreeCPUs = 2 Disk = 100 AverageSI = 1000 MPI SubTask MPI SubTask Startup server 1. Launch a PACX Startup Server 2. Submit MPI Subtasks 3. MPI-START will start each of the Subtasks 4. Subtask notify the startup server and start running 5. CrossBroker monitors the application

partner’s logo Condor Week 2008, May Parallel Job Support  Job Description Language file: JOBTYPE: Normal: sequential jobs, just one CPU Parallel: more than one CPU SUBJOBTYPE: openmpi pacx-mpi mpich mpich-g2 plain JOBSTARTER (if not defined, mpi-start) JOBSTARTERARGUMENTS

partner’s logo Condor Week 2008, May Parallel Job Support Type = "Job"; VirtualOrganisation = " imain"; JobType = " Parallel"; SubJobType = " pacx-mpi "; NodeNumber = 5; Executable = " test-app"; Arguments = " -v"; InputSandbox = { " test-app ", " inputfile " }; OutputSanbox = { " std.out ", " std.err " }; StdErr = " std.err “; StdOutput = " std.out " ; Rank = other.GlueHostBenchmarkSI00 ; Requirements = other.GlueCEStateStatus == " Production " ;

partner’s logo Condor Week 2008, May MPI Across Sites  CrossBroker search and selects sets of resources for the jobs  There is no guarantee that all tasks of the same job will start at the same time 1st choice: select only sites with free resources. The job will run immediately. Unfortunately, free resources are not always available 2nd choice: allocate a resource temporally and wait until all other tasks show up. Timeshare the resource with a backfilling policy to avoid resource idleness

partner’s logo Condor Week 2008, May MPI Across Sites [Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10 [Groups with 2 CEs] [Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 [Rank=1000] bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 lngrid02.lip.pt:2129/jobmanager-pbs-workq freeCPUs = 2 CE CE4= xgrid.icm.edu.pl FreeCPUs = 6 Disk = 100 AverageSI = 1000 CE CE2=aocegrid.uab.es FreeCPUs = 10 Disk = 100 AverageSI = 4000 CE CE3=bee001.ific.uv.es FreeCPUs = 3 Disk = 100 AverageSI = 1000 CE CE1=zeus.cyf-kr.edu.pl FreeCPUs = 2 Disk = 100 AverageSI = 2000 RS MPI enabled CE Non-MPI enabled CE CE CE5=lngrid02.lip.pt FreeCPUs = 2 Disk = 100 AverageSI = 1000 [Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10 [Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 Rank=1000] lngrid02.lip.pt/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3

partner’s logo Condor Week 2008, May Time Sharing Scheduling Agent Condor-G CrossBroker Grid Resource LRMS MPI JOB

partner’s logo Condor Week 2008, May Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS MPI JOB

partner’s logo Condor Week 2008, May Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Condor GlideIn VM1VM2 MPI JOB

partner’s logo Condor Week 2008, May Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Condor GlideIn VM1VM2 MPI JOB

partner’s logo Condor Week 2008, May Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Condor GlideIn VM1VM2 MPI TASK Wait for the rest of MPI tasks

partner’s logo Condor Week 2008, May Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Condor GlideIn VM1VM2 MPI TASK JOB

partner’s logo Condor Week 2008, May Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Condor GlideIn VM1VM2 MPI TASK JOB BackFilling while the MPI waits

partner’s logo Condor Week 2008, May Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Condor GlideIn VM1VM2 MPI TASK All tasks Ready! JOB

partner’s logo Condor Week 2008, May Interactive Job Support  Scheduling priority Interactive jobs are sent to sites with available machines If there are not available machines, use time sharing  Support for interactivity in all kinds of jobs sequential and all the MPI flavors  CrossBroker injects interactive agents that enable communication between user and job Transparent to the user Full integration with glogin & gVid Condor Bypass supported

partner’s logo Condor Week 2008, May Interactive Job Support  Job Description Language file: INTERACTIVE: true/false. Indicates that the job is interactive and the broker should treat it with higher proirity INTERACTIVEAGENT INTERACTIVEAGENTARGUMENTS These attributes specify the command (and its arguments) used to communicate with the user.

partner’s logo Condor Week 2008, May Interactive Job Support Type = "Job"; VirtualOrganisation = "imain"; JobType = "Parallel"; SubJobType = “openmpi"; NodeNumber = 11; Interactive = TRUE; InteractiveAgent = “glogin“; InteractiveAgentArguments = “-r –p :23433“; Executable = "test-app"; InputSandbox = {"test-app", "inputfile"}; OutputSanbox = {"std.out", "std.err"}; StdErr = "std.err“; StdOutput = " std.out " ; Rank = other.GlueHostBenchmarkSI00 ; Requirements = other.GlueCEStateStatus == " Production " ;

partner’s logo Condor Week 2008, May Interactive Job Support Particle trajectories in Fusion devices Increasing the temperature of a gas, we get a plasma state At this temperature, the union of light atom nuclei is possible through an exothermal process: Mass after fusion process is less than before it Exceeding mass -> energy

partner’s logo Condor Week 2008, May Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Condor GlideIn VM1VM2 BATCH INT. JOB

partner’s logo Condor Week 2008, May Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Agent VM1VM2 BATCH INT. JOB Startup-time Reduction Only one layer involved

partner’s logo Condor Week 2008, May Conclusions  CrossBroker supports both Parallel and Interactive jobs Automatically Interoperable with EGEE  Glide In Fast startup of jobs Co-allocation without reservation or wasting resources  Real Applications Visualization of plasma in fusion devices Evolution of pollution clouds in the atmosphere Ultrasound Computing Tomography: Reconstruction of a 3D volume FLUIDYNAMICS application

Questions? Elisa Heymann Department of Computer Architecture and Operating Systems