Presentation is loading. Please wait.

Presentation is loading. Please wait.

European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi1 European Condor Week 2006 Using Condor Glide-Ins and GCB to run.

Similar presentations


Presentation on theme: "European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi1 European Condor Week 2006 Using Condor Glide-Ins and GCB to run."— Presentation transcript:

1 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi1 European Condor Week 2006 Using Condor Glide-Ins and GCB to run on the Grid The CDF experience Elliot Lipeles, Matthew Norman, Frank Würthwein, University of California, San Diego Subir Sarkar, Igor Sfiligoi, INFN

2 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi2 Traditional use of Condor in CDF Single Point of Submission system ● CAF Daemons accept and authenticate user jobs, handle output. ● Condor Schedd does the real job CAF (Portal) Worker Node (WN) Startd Collector Schedd Submitter Daemons Negotiator Legacy protocol Monitoring Daemons CDF Analysis Farm (CAF)...

3 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi3 Had to move to the Grid “The Grid” ● HEP moving to the Grid ● Nobody wants to finance dedicated nodes, anymore ● Need to move ● Want to preserve user interface

4 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi4 Why not plain Condor-G? Works, but... ● No central matchmaking ● No control over priorities ● Site selection problem ● Black holes can eat most of the user jobs Grid Pool Globus Submitter Daemons Grid Pool Globus Batch queue CAF (Portal) User jobs Schedd Batch queue Batch Slot User Job

5 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi5 All jobs done CDF decided to go with Condor glide-ins Grid Pool Globus Submitter Daemons Grid Pool Globus Batch queue CAF (Portal) User jobs Schedd Batch queue Collector/Neg Schedd Glidekeeper Monitor Glideins Batch Slot Glidein User Job Glide-ins solve all the problems

6 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi6 What is the Glidekeeper ● Just a simple script – Checks number jobs and number glide-ins – Makes sure there is always at least a glide-in per CE, when idle jobs in the queue ● Does not need to be complex – Just blindly submit to all the CEs – But could be made as complex as needed

7 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi7 What is a glide-in condor_config DAEMON_LIST = MASTER,STARTD NEGOTIATOR_HOST = $(HEAD_NODE) COLLECTOR_HOST = $(HEAD_NODE) TMOUT=288000 MaxJobRetirementTime=$(TMOUT) SHUTDOWN_GRACEFUL_TIMEOUT=$(TMOUT) # How long will it wait in an unclaimed state STARTD_NOCLAIM_SHUTDOWN = 1200 HEAD_NODE = cdfhead.fnal.gov glidein_startup.sh validate_node() local_config()./condor_master -r $mins -dyn -f ● Glide-ins are properly configured startds – Same old binaries used ● CDF uses a startup script to validate the node before starting startd – Prevent job failure

8 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi8 CDF using glide-ins for a year, now CNAF one year history plot: Note over a thousand running jobs for long period of time. SDSC six months history plot CNAF (Bologna, Italy), the first GlideCAF SDSC (San Diego, CA, USA) Fermilab (Batavia, IL, USA)IN2P3 (Lyon, France)MIT (Boston, MA, USA) 1k

9 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi9 Problems found along the road ● File delivery ● Firewalls ● Security concerns

10 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi10 Problems found along the road ● File delivery ● Firewalls ● Security concerns

11 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi11 Condor binaries delivery glidein.submit Universe = Globus GlobusScheduler = mysite/jobmanager-mybatch Executable = glidein_startup.sh transfer_Input_files = condor.tgz Queue Only the executable guaranteed to be transfered on EGEE. Other files need gLite tools. Works fine on OSG Cannot use Condor-G transfer mechanism!

12 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi12 glidein_startup.sh validate_node() local_config() tar -czf condor.tgz./condor_master -r $retmins -dyn -f HTTPd for file transfer glidein_startup.sh validate_node() wget http://cdfhead.fnal.gov/condor.tgz sha1sum knownSHA1 condor.tgz if [ $? -eq 0 ]; then tar -xzf condor.tgz local_config()./condor_master -r $retmins -dyn -f fi GlideCAF (Portal) Collector/Neg Main Schedd Glidekeeper Glide-in Schedd HTTPd (1) (2) (3) (4) Submitter Daemons (2) (3) HTTPd serves files

13 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi13 Add HTTP Proxy to reduce load GlideCAF (Portal) Collector/Neg Main Schedd Glidekeeper Glide-in Schedd HTTPd (1) (2) (3) (4) Submitter Daemons glidein_startup.sh validate_node() env http_proxy=proxy1.fnal.gov wget http://cdfhead.fnal.gov/condor.tgz sha1sum knownSHA1 condor.tgz if [ $? -eq 0 ]; then tar -xzf condor.tgz local_config()./condor_master -r $retmins -dyn -f fi HTTP Proxy (once)

14 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi14 Problems found along the road ● File delivery ● Firewalls ● Security concerns

15 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi15 Working with remote Grid sites - Dumb Collector Main Schedd Available Batch Slot Startd Globus WAN ? ? UDP packets often lost over WAN Available Batch Slot Startd Firewall/ NAT Firewalls and NATs make it impossible Globus Glide-in Schedd Available Batch Slot Startd Globus UDP User Job

16 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi16 Working with remote Grid sites - Smart Collector Main Schedd Available Batch Slot Startd Globus WAN TCP Available Batch Slot Startd Globus Glide-in Schedd Available Batch Slot Startd Globus UDP User Job UDP is fast TCP traffic is reliable Firewall/NAT Use GCB to bridge firewalls GCB Outgoing TCP User Job

17 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi17 Generic Connection Brokering (GCB) ● Condor proxy service – Fully integrated in Condor – Just a configuration parameter ● For scalability, use multiple instances – Point different glide-ins to different GCBs condor_config DAEMON_LIST = MASTER,STARTD NEGOTIATOR_HOST = $(HEAD_NODE) COLLECTOR_HOST = $(HEAD_NODE) TMOUT=288000 MaxJobRetirementTime=$(TMOUT) SHUTDOWN_GRACEFUL_TIMEOUT=$(TMOUT) # How long will it wait in an unclaimed state STARTD_NOCLAIM_SHUTDOWN = 1200 HEAD_NODE = cdfhead.fnal.gov # GCB configuration NET_REMAP_INAGENT = cdfgcb1.fnal.gov NET_REMAP_ENABLE = true NET_REMAP_SERVICE = GCB

18 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi18 How GCB works Firewall/NAT Collector Schedd Grid Pool GCB Broker Startd Globus 4) Schedd address forwarded 3) Send own address 6) A job can be sent to the startd GCB must be on a public network TCP 2) Advertise itself and the GCB connection 5) Establish a TCP connection to the schedd Just a proxy User Job 1) Establish a persistent TCP connection to a GCB TCP

19 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi19 The use of GCB at CDF ● Using a pre-production version of GCB – Waiting for Condor v6.7.20 ● N.American CAF just opened to users – Condor collector and schedds at Fermilab – Using 2 GCBs, both at Fermilab – Gliding into Fermilab, MIT and San Diego – Rest of US and Canada Grid sites to follow

20 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi20 Problems found along the road ● File delivery ● Firewalls ● Security concerns

21 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi21 Security in the pull model ● Real user known only after glide-in starts – Cannot use user proxy when submitting to CE – Real user proxy delivered by Condor ● User job can use it for further authentication ● All glide-ins use a single service proxy – Site admin does not know about the real user – All jobs potentially run under the same UID ● No system protection between users ● Glide-in and job run under the same UID – User can steal service proxy

22 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi22 Security problem at a glance Submitter Daemons Grid Pool Globus CAF (Portal) User jobs Schedd Batch queue Collector/Neg Schedd Glidekeeper Monitor Glideins Batch Slot Glidein User Job Batch Slot Glidein User Job Node uidX ProxyA uidX ProxyB ProxyX

23 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi23 Use gLExec to authenticate on the WN ● Authenticate on WN before starting the user job – Can change UID on the fly ● Grid admin knows who is running Globus Batch queue Batch Slot Glidein uidX ProxyX User Job ProxyA uidA gLExec ProxyA Schedd Collector/Neg Schedd Grid Pool ProxyX

24 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi24 Status of gLExec ● Current gLExec works only on CE – Cannot use it as-is ● Work in progress to make it usable for the glideins – See later talk ● Condor team promised to make use of gLExec once available

25 European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi25 Summary ● Glide-ins and GCB took CDF to the Grid – Without leaving the Condor environment ● Using the pull model made our life easy – No need to select a site in advance – Matchmaking done at a global level – Policies kept in CDF hands ● No need for any add-on at Grid sites – Will use what is found ● gLExec and a local Squid desirable


Download ppt "European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi1 European Condor Week 2006 Using Condor Glide-Ins and GCB to run."

Similar presentations


Ads by Google