Www.cs.wisc.edu/Condor Condor-G A Quick Introduction Alan De Smet Condor Project University of Wisconsin - Madison.

Slides:



Advertisements
Similar presentations
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Advertisements

Part 7: CondorG A: Condor-G B: Laboratory: CondorG.
It’s not about security... it’s about access! Grid Security Pieter van Beek.
Grid and Cloud Computing Pegasus Dr. Guy Tel-Zur.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Tutorial Getting started with GILDA.
Presentation Two: Grid Security Part Two: Grid Security A: Grid Security Infrastructure (GSI) B: PKI and X.509 certificates C: Proxy certificates D:
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Condor-G: A Case in Distributed.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
GLite authentication and authorization Discipline: Grid Computing, 07/08-2 Practical classes Inês Dutra, DCC/FCUP.
AustrianGrid, LCG & more Reinhard Bischof HPC-Seminar April 8 th 2005.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Open Science Grid: More compute power Alan De Smet
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
INFSO-RI Enabling Grids for E-sciencE Practicals on VOMS and MyProxy Emidio Giorgio INFN Retreat between GILDA and ESR VO, Bratislava,
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Luciano Díaz ICN-UNAM Based on Domenico.
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor-G Operations.
Metrics and Monitoring on FermiGrid Keith Chadwick Fermilab
August 13, 2003Eric Hjort Getting Started with Grid Computing in STAR Eric Hjort, LBNL STAR Collaboration Meeting August 13, 2003.
Grid NERSC demo Shreyas Cholia Open Software and Programming NERSC User Group Meeting September 19, 2007.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
E-infrastructure shared between Europe and Latin America Security Hands-on Christian Grunfeld, UNLP 8th EELA Tutorial, La Plata, 11/12-12/12,2006.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
INFSO-RI Enabling Grids for E-sciencE GILDA Practicals : Security systems GILDA Tutors Singapore, 1st South East Asia Forum -- EGEE.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA Hands-on on security Pedro Rausch IF - UFRJ.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Apr. 25, Grid Computing Hands On Training for Users Faculty of Sciences, University.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Part Five: Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Review of Condor,SGE,LSF,PBS
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks VOMS Vincenzo Ciaschini EGEE/OSG Workshop.
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
EGEE is a project funded by the European Union under contract IST Grid proxy and MyProxy Roberto Barbera Univ. of Catania and INFN SEE-GRID.
4th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS E-infrastructure shared between Europe and Latin America Security Hands-on Vanessa.
Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Alexandre Duarte CERN IT-GD-OPS UFCG LSD 1st EELA Grid School.
FermiGrid School Steven Timm FermiGrid School FermiGrid 201 Scripting and running Grid Jobs.
DTI Mission – 29 June LCG Security Ian Neilson LCG Security Officer Grid Deployment Group CERN.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Practicals on Security Miguel Cárdenas Montes.
E-infrastructure shared between Europe and Latin America Security Hands-on Alexandre Duarte CERN Fifth EELA Tutorial Santiago, 06/09-07/09,2006.
EGEE-II INFSO-RI Enabling Grids for E-sciencE MyProxy - a brief introduction.
Grid Compute Resources and Job Management. 2 Job and compute resource management This module is about running jobs on remote compute resources.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Moisés Hernández Duarte UNAM FES Cuautitlán.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
INFSO-RI Enabling Grids for E-sciencE - II SLCS, VASH, and LCAS/LCMAPS Plugins All-Hands Meeting Helsinki Placi Flury, SWITCH 19.
INFSO-RI Enabling Grids for E-sciencE VOMS & MyProxy interaction Emidio Giorgio INFN NA4 Generic Applications Meeting 10 January.
Enabling Grids for E-sciencE Sofia, 17 March 2009 INFSO-RI Introduction to Grid Computing, EGEE and Bulgarian Grid Initiatives –
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Job Router.
LCG2 Tutorial Viet Tran Institute of Informatics Slovakia.
The VOMS and the SE in Tier2 Presenter: Sergey Dolgobrodov HEP Meeting Manchester, January 2009.
Hands-on security Carlos Fuentes RedIRIS Madrid,26 – 30 de Octubre de 2008.
Hands on Security, Authentication and Authorization Virginia Martín-Rubio Pascual RedIRIS/Red.es Curso Grid y e-Ciencia.
EGI-InSPIRE RI Grid Training for Power Users EGI-InSPIRE N G I A E G I S Grid Training for Power Users Institute of Physics Belgrade.
Tutorial on "GRID Computing“ EMBnet Conference 2008 CNR - ITB Authenticated Grid access with robot certificates Giuseppe LA ROCCA INFN.
Grid security Enrico Fattibene INFN-CNAF 26 Settembre 20111Calcolo Parallelo su Grid e CSN4cluster.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) 马兰馨 IHEP, CAS Hands on gLite Security.
1 Grid Security Jinny Chien Academia Sinica Computing Centre Deployment team.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
A closer look at the VDT RPMs Alain Roy OSG Software Coordinator.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor-G: Condor and Grid Computing.
Bringing Federated Identity to Grid Computing Dave Dykstra CISRC16 April 6, 2016.
EGEE is a project funded by the European Union under contract IST Job Submission Giuseppe La Rocca EGEE NA4 Generic Applications INFN Catania.
Madison, Apr 2010Igor Sfiligoi1 Condor World 2010 Condor-G – A few lessons learned by Igor UCSD.
Virtual Organisations and the NGS Mike Jones Research Computing Services e-Science & “The Grid” for Bio/Health Informaticians, IT January 2008.
FermiGrid - PRIMA, VOMS, GUMS & SAZ Keith Chadwick Fermilab
Practicals on VOMS and MyProxy
Security in OSG Rob Quick
Building Grids with Condor
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Update on EDG Security (VOMS)
The Condor JobRouter.
Presentation transcript:

Condor-G A Quick Introduction Alan De Smet Condor Project University of Wisconsin - Madison

Condor-G › “I want to hand jobs to someone else, but still manage them locally” Earth from NASA Map of Fermilab

Condor-G › Globus, CREAM, remote Condor, Nordugrid, Unicore, PBS, LSF › Condor-G only does the technical side. You’ll need to get permission for these resources. Submit Computer Condor-G job1, 2, 3… Remote Computer Globus, Condor, CREAM, etc…

Condor-G to Globus Submit Computer Condor-G job1 job2 job3 … Remote Computer globus-gatekeeper Condor, or PBS, or LSF, or … Compute Cluster

Identity and Authorization › Who are you? › Are you allowed to use these computers? › Fermilab uses Kerberos › Globus uses x509 certificates and proxies “Mystery Man” © 2006 srqpix. Used under Creative Commons License

x509 Certificates › Your x509 certificate is like your online passport. “Indian passport” © 2009 Robol Goraya used under a Creative Commons license

x509 Certificates at Fermilab › Fermilab will make one based on our Kerberos. $ kx509 $ kxlist -p Service kx509/certificate issuer= /DC=gov/DC=fnal/O=Fermilab/OU=Certificate Authorities/CN=Kerberized CA HSM subject= /DC=gov/DC=fnal/O=Fermilab/OU=People/CN=Alan A. De smet/CN=UID:adesmet serial=01C05555 hash=e7635e83 › Valid for 1 week. No prob, make a new one!

x509 Certificates Elsewhere › Many groups issue x509 certificates › Many US research organizations use the DOE Grids Certificate Authority › Typically renewed yearly › You can make your own  But like a passport from Alanland, no one likely to accept it.

x509 Proxies › You frequently need to hand your certificate to remote servers. › What if the remote server is compromised! › Having your x509 certificate stolen is bad! › To limit risk, you make “Proxies:” short lived, limited copies.

x509 VOMS Proxies › Your proxy can be signed by a “Virtual Organization Membership Service” or VOMS. › Grants specific permissions at some grid sites. › A sort of entrance visa for the grid.

Proxy Management Tools › Basic proxy tools  grid-proxy-init  grid-proxy-info  grid-proxy-destroy › Or with VOMS support  voms-proxy-init  voms-proxy-info  voms-proxy-destroy

voms-proxy-init › Creates a proxy $ voms-proxy-init Enter GRID pass phrase: Your identity: /DC=org/DC=doegrids/OU=People/CN=Alan De Smet Creating proxy Done Your proxy is valid until Fri Jul 23 04:45:

voms-proxy-init -valid › Only valid for 12 hours by default › -valid hours:minutes $ voms-proxy-init -valid 168:0 Enter GRID pass phrase: Your identity: /DC=org/DC=doegrids/OU=People/CN=Alan De Smet Creating proxy Done Your proxy is valid until Thu Jul 29 16:47:

voms-proxy-init –voms › Doesn’t come with VOMS attributes by default, you need to ask for them. › -voms

voms-proxy-init -voms $ voms-proxy-init -valid 24:0 -voms fermilab:/fermilab Enter GRID pass phrase: Your identity: /DC=org/DC=doegrids/OU=People/CN=Alan De Smet Creating temporary proxy Done Contacting voms.fnal.gov:15001 [/DC=org/DC=doegrids/OU=Services/CN=http/voms.fnal.gov ] "fermilab" Done Creating proxy Done Your proxy is valid until Fri Jul 23 16:48:

voms-proxy-info $ voms-proxy-info –all subject : /DC=org/DC=doegrids/OU=People/CN=Alan De Smet /CN=proxy issuer : /DC=org/DC=doegrids/OU=People/CN=Alan De Smet identity : /DC=org/DC=doegrids/OU=People/CN=Alan De Smet type : proxy strength : 1024 bits path : /tmp/x509up_u3014 timeleft : 23:59:43 === VO fermilab extension information === VO : fermilab subject : /DC=org/DC=doegrids/OU=People/CN=Alan De Smet issuer : /DC=org/DC=doegrids/OU=Services/CN=http/voms.fnal.gov attribute : /fermilab/Role=NULL/Capability=NULL attribute : /fermilab/nees/Role=NULL/Capability=NULL timeleft : 23:59:43 uri : voms.fnal.gov:15001 Need -all to see the VOMS information.

voms-proxy-destroy $ voms-proxy-destroy $ voms-proxy-info -all Couldn't find a valid proxy.

Resource names (At least Globus) › Identify the remote server › fgitbgkc2.fnal.gov/jobmanager- condor › fgitbgkc2.fnal.gov/jobmanager-fork  Don't abuse fork! Generally don't use!

globusrun -a -r › Very low level Globus tool. › We're using it as a basic check $ globusrun -a -r fgitbgkc2.fnal.gov/jobmanager-fork GRAM Authentication test successful

Run a very simple job › Must already by on remote server! $ globus-job-run fgitbgkc2.fnal.gov/jobmanager-fork /bin/hostname fgitbgkc2.fnal.gov $ globus-job-run fgitbgkc2.fnal.gov/jobmanager-fork /bin/date Sun Jul 25 15:11:03 CDT 2010

Running a job by hand % globus-job-submit fgitbgkc2.fnal.gov/jobmanager-fork /bin/date % globus-job-status DONE % globus-job-get-output Thu Jul 22 16:57:53 CDT 2010 % globus-job-clean WARNING: Cleaning a job means: - Kill the job if it still running, and - Remove the cached output on the remote resource Are you sure you want to cleanup the job now (Y/N) ? Y › Not designed for bulk work

Old Condor job executable = my_program output = output.txt error = error.txt log = log.txt notification = never universe = vanilla queue

New Condor-G job executable = my_program output = output.txt error = error.txt log = log.txt notification = never universe = grid grid_resource = gt2 fgitbgkc2.fnal.gov/jobmanager-fork queue

Where's my output? › universe=grid doesn't know. transfer_output_files=a_file,an other_file › Error if a file is missing! touch a_file another_file Then add to your submit file transfer_input_files=a_file,anoth er_file

Proxy updates › Jobs taking longer than your proxy's lifespan? Just update your proxy occasionally, Condor will handle it.

Scaling Up › Can manage ten of thousands of jobs › Can manage complex workflows with DAGMan Actual workflow for LIGO

Scaling Up › Can automatically use multiple grid sites  powerful, but complex, see "Matchmaking in the Grid Universe" in the Condor manual › Automatic recovery for many problems › Includes optimizations to reduce network traffic and gatekeeper load