Cluster Systems Condor * PBS * SGE LSF All batch queuing systems!
PBS – Portable Batch System Used for dedicated Cluster resources – Homogeneous clusters with MPI Manages thousands of CPUs in near real time Schedules large numbers of jobs quickly and efficiently Many different implementations – PBS Pro (not free but advanced) – Open PBS (free but old) – Torque & Maui (free, stable, advanced) Deployments – Dedicated clusters in academic and corporate settings – Playstation3 Clusters
Condor Used for dedicated and non-dedicated resources – Typically used to “scavenge” CPUs in places where a lot of workstations are available – Heterogeneous environments Separate Condor tasks – Resource Management and Job Management User interface is simple; commands that use small config files Not good for MPI jobs Deployments – Campus workstations and desktops – Corporate servers
Grid tools in Condor Condor-G – Replicates the Job Management functionality – Submission to a grid resource using the Globus Toolkit – NOT a grid service, just a way to submit to a grid Flocking – Allows for queued jobs in one Condor cluster to be executed on another Condor cluster – Directional flocking (A => B but not B => A) – Unidirectional flocking (A B) Glidein – Dynamically adds machines to a Condor cluster – Can be used to create your own personal Condor cluster on the Teragrid!
BOINC Desktop based Grid Computing - “Volunteer Computing” – Centralized Grid system – Users encouraged by gaining credits for their computations – Can partake in one or many different projects Open access for contributing resources, closed access for using grid Allows organizations to gain enormous amounts of computational power with very little cost. BOINC is really a cluster and a grid system in one!
EGEE European Commision funded International Grid system – 250 resource locations, 40,000 CPUs, 20 Petabytes of storage – Originally European grid, but expanded to US and Asia Uses the gLite Middleware system – Uses Globus’ Grid Security Infrastructure (GSI) – Specialized elements to utilize underlying hardware – Groups organized as Virtual Organizations (VOs) and uses VOMS membership services to enable user privileges Originally based on the old LHC Grid – EGEE-I Ended April, 2006. Continued on as EGEE-II – Now part of WLCG
Worldwide LHC Computing Grid Large grid to support the massive computational needs of the Large Hadron Collider at CERN – Project produces >15 Petaflops per year! WLCG is really a mashup of other grids – EGEE, OSG, GridPP, INFN Grid, NorduGrid – Uses specialized upperware to manage these grids Multi-tier system for efficiently distributing data to scientists and researchers around the world Used mostly for ATLAS, ALICE, CMS, LHCb, LHCf and TOTEM experiments
What If we could use all of the Grids together?
Cyberaide Shell There are many different cyberinfrastructure deployments today. – How do we make sence of them? – How do we use them for our benefit? Our idea for Cyberaide Gridshell will be to link to these grid deployments – Provide an easy, all-in-one interface for many different grids – Automate scheduling and resource management – Leverage Web 2.0 technologies