Presentation is loading. Please wait.

Presentation is loading. Please wait.

ATLAS and the Grid ACAT02 Moscow June 2002 RWL Jones Lancaster University.

Similar presentations


Presentation on theme: "ATLAS and the Grid ACAT02 Moscow June 2002 RWL Jones Lancaster University."— Presentation transcript:

1 ATLAS and the Grid ACAT02 Moscow June 2002 RWL Jones Lancaster University

2 RWL Jones, Lancaster University The ATLAS Computing Challenge Running conditions at startup: Running conditions at startup: 0.8x10 9 event sample 1.3 PB/year, before data processing 0.8x10 9 event sample 1.3 PB/year, before data processing Reconstructed events, Monte Carlo data ~10 PB/year (~3 PB on disk) Reconstructed events, Monte Carlo data ~10 PB/year (~3 PB on disk) CPU: ~1.6M SpecInt95 including analysis CPU: ~1.6M SpecInt95 including analysis CERN alone can handle only a fraction of these resources

3 RWL Jones, Lancaster University The Solution: The Grid Note: Truly HPC, but requires more Not designed for tight-coupled problems, but spin- offs many

4 RWL Jones, Lancaster University ATLAS Needs Grid Applications The ATLAS OO software framework is Athena, which co- evolves with the LHCb Gaudi framework The ATLAS OO software framework is Athena, which co- evolves with the LHCb Gaudi framework ATLAS is truly intercontinental ATLAS is truly intercontinental In particular, it is present on both sides of the Atlantic In particular, it is present on both sides of the Atlantic Opportunity: the practical convergence between US and European Grid projects will come through the transatlantic applications Threat: There is an inevitable tendency towards fragmentation/divergence of effort to be resisted Other relevant talks: Other relevant talks: Nick Brook: co-development with LHCb, especially through UK GridPP collaboration (or rather, Ill present this later) Alexandre Vaniachine, describing work for the ATLAS Data Challenges

5 RWL Jones, Lancaster University Data Challenges Test Bench –Data Challenges Prototype IMay 2002 Prototype IMay 2002 Performance and scalability testing of components of the computing fabric (clusters, disk storage, mass storage system, system installation, system monitoring) using straightforward physics applications. Test job scheduling and data replication software (DataGrid release 1.2) Prototype IIMar 2003 Prototype IIMar 2003 Prototyping of the integrated local computing fabric, with emphasis on scaling, reliability and resilience to errors. Performance testing of LHC applications. Distributed application models (DataGrid release 2). Prototype IIIMar 2004 Prototype IIIMar 2004 Full scale testing of the LHC computing model with fabric management and Grid management software for Tier-0 and Tier-1 centres, with some Tier-2 components (DataGrid release 3).

6 RWL Jones, Lancaster University The Hierarchical View Tier2 Centre ~1 TIPS Online System Offline Farm ~20 TIPS CERN Computer Centre >20 TIPS UK Regional Centre (RAL) US Regional Centre French Regional Centre Italian Regional Centre SheffieldManchesterLiverpool Lancaster ~0.25TIPS Workstations ~100 MBytes/sec 100 - 1000 Mbits/sec One bunch crossing per 25 ns 100 triggers per second Each event is ~1 Mbyte Physicists work on analysis channels Each institute has ~10 physicists working on one or more channels Data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~ Gbits/sec or Air Freight Tier2 Centre ~1 TIPS ~Gbits/sec Tier 0 Tier 1 Tier 3 Tier 4 1 TIPS = 25,000 SpecInt95 PC (1999) = ~15 SpecInt95 Northern Tier ~1 TIPS Tier 2

7 RWL Jones, Lancaster University A More Grid-like Model CERN Tier2 Lab a Lancs Lab c Uni n Lab m Lab b Uni b Uni y Uni x Physics Department Desktop Germany Tier 1 USA FermiLab UK France Italy NL USA Brookhaven ………. The LHC Computing Facility

8 RWL Jones, Lancaster University Features of the Cloud Model All regional facilities have 1/3 of the full reconstructed data All regional facilities have 1/3 of the full reconstructed data Allows more on disk/fast access space, saves tape Allows more on disk/fast access space, saves tape Multiple copies mean no need for tape backup Multiple copies mean no need for tape backup All regional facilities have all of the analysis data (AOD) All regional facilities have all of the analysis data (AOD) Resource broker can still keep jobs fairly local Resource broker can still keep jobs fairly local Centres are Regional and NOT National Centres are Regional and NOT National Physicists from other Regions should have also Access to the Computing Resources Cost sharing is an issue Implications for the Grid middleware on accounting Implications for the Grid middleware on accounting Between experiments Between regions Between analysis groups Also, different activities will require different priorities Also, different activities will require different priorities

9 RWL Jones, Lancaster University Resource Estimates

10 RWL Jones, Lancaster University Resource Estimates Analysis resources?: Analysis resources?: 20 analysis groups 20 jobs/group/day = 400 jobs/day sample size : 10 8 events 2.5 SI95s/ev => 10 11 SI95 (s/day) = 1.2*10 6 SI95 Additional 20% for activities on smaller samples

11 RWL Jones, Lancaster University Rough Architecture Installation of Software and Env Compute + Store Sites User Interface to Grid + experiment framework User Middleware RB, GIS Data Catalogue Job Configuration/VDC /metadata

12 RWL Jones, Lancaster University Test Beds EDG Test Bed 1 EDG Test Bed 1 Common to all LHC experiments Using/testing EDG test bed 1 release code Already running boxed fast simulation and installed full simulation US ATLAS Test Bed US ATLAS Test Bed Demonstrate success of grid computing model for HEP in data production in data access in data analysis Develop & deploy grid middleware and applications wrap layers around apps simplify deployment Evolve into fully functioning scalable distributed tiered grid NorduGrid NorduGrid Developing a regional test bed Light-weight Grid user interface, working prototypes etc see talk by Aleksandr Konstantinov

13 RWL Jones, Lancaster University EDG Release 1.2 EDG has strong emphasis on middleware development; applications come second EDG has strong emphasis on middleware development; applications come second ATLAS has been testing the `stable releases of the EDG software as they become available as part of WP8 (ATLAS key contact Silvia Resconi) ATLAS has been testing the `stable releases of the EDG software as they become available as part of WP8 (ATLAS key contact Silvia Resconi) EDG Release (1.2) is under test by Integration Team people plus Loose Cannons (experiment independent people) on the development testbed at CERN. EDG Release (1.2) is under test by Integration Team people plus Loose Cannons (experiment independent people) on the development testbed at CERN. Standard requirements must be met before the ATLAS Applications people test a release: Standard requirements must be met before the ATLAS Applications people test a release: 1.The development testbed must consist of at least 3 sites in 3 different countries ( e.g. CERN, CNAF, RAL ) 2.There must be a long ( > 24 hours) unattended period with a low error rate ( < 1% of jobs failed ) http://pcatl0a.mi.infn.it/~resconi/validation/valid.html

14 RWL Jones, Lancaster University EDG TestBed 1 Status 28 May 2002 17:03 Web interface showing status of (~400) servers at testbed 1 sites 5 Main Production Centres 5 Main Production Centres

15 RWL Jones, Lancaster University GridPP Sites in Testbed(s)

16 RWL Jones, Lancaster University NorduGrid Overview Launched in spring 2001, with the aim of creating a Grid infrastructure in the Nordic countries Launched in spring 2001, with the aim of creating a Grid infrastructure in the Nordic countries Partners from Denmark, Norway, Sweden, and Finland Partners from Denmark, Norway, Sweden, and Finland Initially the Nordic branch of the EU DataGrid (EDG) project testbed Initially the Nordic branch of the EU DataGrid (EDG) project testbed Independent developments Independent developments Relies on funding from NorduNet2 Relies on funding from NorduNet2 http://www.nordugrid.org

17 RWL Jones, Lancaster University Lawrence Berkeley National Laboratory Brookhaven National Laboratory Indiana University Boston University Argonne National Laboratory U Michigan University of Texas at Arlington Oklahoma University US -ATLAS testbed launched February 2001 US Grid Test Bed Sites

18 RWL Jones, Lancaster University US Hardware and Deployment 8 gatekeepers - ANL, BNL, LBNL, BU, IU, UM, OU, UTA 8 gatekeepers - ANL, BNL, LBNL, BU, IU, UM, OU, UTA Farms - BNL, LBNL, IU, UTA + Multiple R&D gatekeepers Farms - BNL, LBNL, IU, UTA + Multiple R&D gatekeepers Uniform OS through kickstart Uniform OS through kickstart Running RH 7.2 First stage deployment First stage deployment Pacman, Globus 2.0b, cernlib (installations) Simple application package Second stage deployment Second stage deployment Magda, Chimera, GDMP… (Grid data management) Third stage Third stage MC production software + VDC Many US names mentioned later, thanks also to Craig Tull, Dan Engh, Mark Sosebee

19 RWL Jones, Lancaster University Important Components GridView - simple script tool to monitor status of test bed (Java version being developed) GridView - simple script tool to monitor status of test bed (Java version being developed) Gripe - unified user accounts Gripe - unified user accounts Magda - MAnager for Grid Data Magda - MAnager for Grid Data Pacman - package management and distribution tool Pacman - package management and distribution tool Grappa - web portal based on active notebook technology Grappa - web portal based on active notebook technology

20 RWL Jones, Lancaster University Grid User Interface Several prototype interfaces Several prototype interfaces GRAPPA EDG Nordugrid Lightweight Nothing experiment specific Nothing experiment specific GRAT Line mode (and we will always need to retain line mode!) Now defining an ATLAS/LHCb joint user interface, GANGA Now defining an ATLAS/LHCb joint user interface, GANGA Co-evolution with Grappa Knowledge of experiment OO architecture needed (Athena/Gaudi)

21 RWL Jones, Lancaster University Interfacing Athena/Gaudi to the GRID Athena/GAUDI Application GANGA/Grappa GUI jobOptions/ Virtual Data Algorithms GRID Services Histograms Monitoring Results ? ?

22 RWL Jones, Lancaster University EDG GUI for Job Submission

23 RWL Jones, Lancaster University GRAPPA Based on XCAT Science Portal, framework for building personal science portals Based on XCAT Science Portal, framework for building personal science portals A science portal is an application-specific Grid portal A science portal is an application-specific Grid portal Active notebook Active notebook HTML pages to describe the features of the notebook and how to use it HTML forms which can be used to launch parameterizable scripts (transformation) Parameters stored in a sub-notebook (derivation) Very flexible Very flexible Jython - access to Java classes Jython - access to Java classes Globus Java CoG kit XCAT XMESSAGES Not every user has to write scripts Not every user has to write scripts Notebooks can be shared among users Notebooks can be shared among users Import/export capability Shava Smallen, Rob Gardner

24 RWL Jones, Lancaster University GRAPPA/XCAT Science Portal Architecture Portal Web Server (tomcat server + java servlets) Jython Intepreter Notebook Database GSI Authentication Users Web Browser Grid Submit Athena jobs to Grid computing elements Manage JobOptions, record sessions Staging and output collection supported Tested on US ATLAS Grid Testbed The prototype can:

25 RWL Jones, Lancaster University GANGA/Grappa Development Strategy Completed existing technology + requirement survey Completed existing technology + requirement survey Must be Grid aware but not Grid-dependent Must be Grid aware but not Grid-dependent Still want to be able to `pack and go to a standalone laptop Must be component-based Must be component-based Interface Technologies (Standards needed GGF) Interface Technologies (Standards needed GGF) Programmatic API (eg. C, C++, etc) Scripting as Glue ala Stallman (eg. Python) Others eg. SOAP, CORBA, RMI, DCOM,.NET, etc. ……… Defining the experiment software services to capture and present the functionality of the Grid service Defining the experiment software services to capture and present the functionality of the Grid service

26 RWL Jones, Lancaster University Possible Designs Two ways of implementation: Two ways of implementation: Based on one of the general-purpose grid portals (not tied to a single application/framework): Alice Environment (AliEn) Grid Enabled Web eNvironment for Site-Independent User Job Submission (GENIUS) Grid access portal for physics applications (Grappa) Based on the concept of Python bus (P. Mato): use different modules whichever are required to provide full functionality of the interface use Python to glue this modules, i.e., allow interaction and communication between them

27 RWL Jones, Lancaster University Workspaces DB Internet GAUDI Python Bus Local user GaudiPython Remote user HTML page GRID Job Configuration DB Bookkeeping DB Production DB GUI Java ModuleOS ModuleEDG API PythonROOT PYTHON SW BUS GAUDI client

28 RWL Jones, Lancaster University Modules description GUI module GUI module Provides basic functionality Can be implemented using: wxPython extension module Qt/Desktop C++ toolkit, etc. GUI module Workspace service Data management service Job submission service Monitoring service Job configuration service Info service

29 RWL Jones, Lancaster University Installation Tools To use the Grid, deployable software must be deployed on the Grid fabrics, and the deployable run- time environment established (Unix and Windows) To use the Grid, deployable software must be deployed on the Grid fabrics, and the deployable run- time environment established (Unix and Windows) Installable code and run-time environment/configuration Installable code and run-time environment/configuration Both ATLAS and LHCb use CMT for the software management and environment configuration Both ATLAS and LHCb use CMT for the software management and environment configuration CMT knows the package interdependencies and external dependencies this is the obvious tool to prepare the deployable code and to `expose the dependencies to the deployment tool (Christian Arnault, Chas Loomis) CMT knows the package interdependencies and external dependencies this is the obvious tool to prepare the deployable code and to `expose the dependencies to the deployment tool (Christian Arnault, Chas Loomis) Grid aware tool to deploy the above Grid aware tool to deploy the above PACMAN (Saul Youssef) is a candidate which seems fairly easy to interface with CMT PACMAN (Saul Youssef) is a candidate which seems fairly easy to interface with CMT

30 RWL Jones, Lancaster University Installation Issues Most Grid projects seem to assume either code is pre-installed or else can be dumped each time into the input sandbox Most Grid projects seem to assume either code is pre-installed or else can be dumped each time into the input sandbox The only route for installation of software through the Grid seems to be as data in Storage Elements The only route for installation of software through the Grid seems to be as data in Storage Elements In general these are non-local Hard to introduce directory trees etc this way (file based) How do we advertise installed code? How do we advertise installed code? Check it is installed by a preparation task sent to the remote fabric before/with the job Advertise the software is installed in your information service for use by the resource broker Probably need both! Probably need both! The local environment and external packages will always be a problem The local environment and external packages will always be a problem Points to a virtual machine idea eventually; Java? Options? Options? DAR – mixed reports, but CMS are interested PACKMAN from AliEn LGCG, OSCAR – not really suitable, more for site management?

31 RWL Jones, Lancaster University CMT and deployable code Christian Arnault and Charles Loomis have a beta-release of CMT that will produce package rpms, which is a large step along the way Christian Arnault and Charles Loomis have a beta-release of CMT that will produce package rpms, which is a large step along the way Still need to have minimal dependencies/clean code! Need to make the package dependencies explicit Rpm requires root to install in the system database (but not for a private installation) Developer and binary installations being produced, probably needs further refinement Developer and binary installations being produced, probably needs further refinement Work to expose dependencies as PACMAN cache files ongoing Work to expose dependencies as PACMAN cache files ongoing Note: much work elsewhere in producing rpms of ATLAS code, notably in Copenhagen; this effort has the advantage of the full dependency knowledge in CMT being exposable Note: much work elsewhere in producing rpms of ATLAS code, notably in Copenhagen; this effort has the advantage of the full dependency knowledge in CMT being exposable

32 RWL Jones, Lancaster University pacman Package manager for the grid in development by Saul Youssef (Boston U, GriPhyN/iVDGL) Package manager for the grid in development by Saul Youssef (Boston U, GriPhyN/iVDGL) Single tool to easily manage installation and environment setup for the long list of ATLAS, grid and other software components needed to Grid-enable a site Single tool to easily manage installation and environment setup for the long list of ATLAS, grid and other software components needed to Grid-enable a site fetch, install, configure, add to login environment, update Sits over top of (and is compatible with) the many software packaging approaches (rpm, tar.gz, etc.) Sits over top of (and is compatible with) the many software packaging approaches (rpm, tar.gz, etc.) Uses dependency hierarchy, so one command can drive the installation of a complete environment of many packages Uses dependency hierarchy, so one command can drive the installation of a complete environment of many packages Packages organized into caches hosted at various sites Packages organized into caches hosted at various sites How to fetch can be cached rather than the desired object Includes a web interface (for each cache) as well as command line tools Includes a web interface (for each cache) as well as command line tools

33 RWL Jones, Lancaster University # An encryption package needed by Globus # name = SSLeay description = Encryption url = http://www.psy.uq.oz.au/~ftp/Crypto/ssleay source = http://www.psy.uq.oz.au/~ftp/Crypto/ssleay systems = { linux-i386: [SSLeay-0.9.0b.tar.gz,SSLeay-0.9.0b],\ linux2 : [SSLeay-0.9.0b.tar.gz,SSLeay-0.9.0b],\ linux2 : [SSLeay-0.9.0b.tar.gz,SSLeay-0.9.0b],\ sunos5 : [SSLeay-0.9.0b.tar.gz,SSLeay-0.9.0b] } sunos5 : [SSLeay-0.9.0b.tar.gz,SSLeay-0.9.0b] } depends = [] exists = [/usr/local/bin/perl] inpath = [gcc] bins = [] paths = [] enviros = [] localdoc = README daemons = [] install = { \ root: [./Configure linux-elf,make clean, \ root: [./Configure linux-elf,make clean, \ make depend,make,make rehash,make test,make install], \ make depend,make,make rehash,make test,make install], \ * : [./Configure linux-elf,make clean,make depend,make, \ * : [./Configure linux-elf,make clean,make depend,make, \ make rehash,make test] } make rehash,make test] }

34 RWL Jones, Lancaster University Grid Applications Toolkit Horst Severini, Kaushik De, Ed May, Wensheng Deng, Jerry Gieraltowski.… (US Test Bed) Horst Severini, Kaushik De, Ed May, Wensheng Deng, Jerry Gieraltowski.… (US Test Bed) Repackaged Athena-Atlfast (OO fast detector simulation) for grid testbed (building on Julian Phillips and UK effort) Repackaged Athena-Atlfast (OO fast detector simulation) for grid testbed (building on Julian Phillips and UK effort) Script 1: can run on any globus enabled node (requires transfer of ~17MB source) Script 2: runs on machine with packaged software preinstalled on grid site Script 3: runs on afs enabled sites (latest version of software is used) Other user toolkit contents Other user toolkit contents to check status of grid nodes submit jobs (without worrying about underlying middleware or ATLAS software) uses only basic RSL & globus-url-copy

35 RWL Jones, Lancaster University Monitoring Tool GridView - a simple visualization tool using Globus Toolkit GridView - a simple visualization tool using Globus Toolkit First native Globus application for ATLAS grid (March 2001) Collects information using Globus tools. Archival information is stored in MySQL server on a different machine. Data published through web server on a third machine.Plans: Java version Java version Better visualization Better visualization Historical plots Hierarchical MDS information Graphical view of system health New MDS schemas New MDS schemas Optimize archived variables Optimize archived variables Publishing historical information through GIIS servers?? Publishing historical information through GIIS servers?? Explore discovery tools Explore discovery tools Explore scalability to large systems Explore scalability to large systems Patrick McGuigan

36 RWL Jones, Lancaster University

37 MDS Information Listing of available object classes

38 RWL Jones, Lancaster University More Details

39 RWL Jones, Lancaster University MDS Team Dantong Yu, Patrick McGuigan, Craig Tull, KD, Dan Engh Dantong Yu, Patrick McGuigan, Craig Tull, KD, Dan Engh Site monitoring Site monitoring publish BNL acas information Glue schema testbed Software installation Software installation pacman information provider Application monitoring Application monitoring Grid monitoring Grid monitoring GridView, Ganglia… hierarchical GIIS server GriPhyN-PPDG GIIS server

40 RWL Jones, Lancaster University Data Management Architecture AMI ATLAS Metatdata Interface Query LFN Associated attributes and values MAGDA MAnager for Grid- based Data Manage replication, physical location VDC Virtual Data Catalog Derive and transform LFNs

41 RWL Jones, Lancaster University MAnager for Grid-based Data (essentially the `replica catalogue tool) MAnager for Grid-based Data (essentially the `replica catalogue tool) Designed for managed production and chaotic end-user usage Designed for managed production and chaotic end-user usage Designed for rapid development of components to support users quickly, with components later replaced by Grid Toolkit elements Designed for rapid development of components to support users quickly, with components later replaced by Grid Toolkit elements Deploy as an evolving production tool and as a testing ground for Grid Toolkit components GDMP will be incorporated Application in DCs Application in DCs Logical files can optionally be organized into collections File management in production; replication to BNL; CERN, BNL data access GDMP integration, replication and end-user data access in DC1Developments Interface with AMI (ATLAS Metadata Interface, allows queries on Logical File Name collections by users, Grenoble project) Interface with AMI (ATLAS Metadata Interface, allows queries on Logical File Name collections by users, Grenoble project) Interfaces to Virtual Data Catalogue, see AVs talk) Interfaces to Virtual Data Catalogue, see AVs talk) Interfacing with hybrid ROOT/RDBMS event store Interfacing with hybrid ROOT/RDBMS event store Athena (ATLAS offline framework) integration; further grid integration Athena (ATLAS offline framework) integration; further grid integration Managing Data -Magda Info: http://www.usatlas.bnl.gov/magda/info Engine: http://www.usatlas.bnl.gov/magda/dyShowMain.plhttp://www.usatlas.bnl.gov/magda/dyShowMain.pl T Wenaus, W Deng

42 RWL Jones, Lancaster University Magda Architecture & Schema MySQL database at the core of the system MySQL database at the core of the system DB access via perl, C++, java, cgi (perl) scripts; C++ and Java APIs auto-generated off the MySQL DB schema User interaction via web interface and command line User interaction via web interface and command line Principal components: Principal components: File catalog covering any file types Data repositories organized into sites, each with its locations Computers with repository access: a host can access a set of sites Replication operations organized into tasks Interfacing to Data Catalogue efforts at Grenoble Interfacing to Data Catalogue efforts at Grenoble Will replace with standard components e.g. GDMP as they become available for production Will replace with standard components e.g. GDMP as they become available for production

43 RWL Jones, Lancaster University Magda Architecture DB access via perl, C++, java, cgi (perl) scripts; C++ and Java APIs auto-generated off the MySQL DB schema User interaction via web interface and command line User interaction via web interface and command line

44 RWL Jones, Lancaster University Magda Sites

45 RWL Jones, Lancaster University Conclusion The Grid is the only viable solution to the ATLAS Computing problem The Grid is the only viable solution to the ATLAS Computing problem The problems of coherence across the Atlantic are large ATLAS (and CMS etc) are `at the sharp end, so we will force the divide to be bridged Many applications have been developed, but need to be refined/merged Many applications have been developed, but need to be refined/merged These revise our requirements – we must use LCG/GGF and any other forum to ensure the middleware projects satisfy the real needs; this is not a test bed! The progress so far is impressive and encouraging The progress so far is impressive and encouraging Good collaborations (especially ATLAS/LHCb) The real worry is scaling up to the full system The real worry is scaling up to the full system Money! Manpower! Diplomacy?!


Download ppt "ATLAS and the Grid ACAT02 Moscow June 2002 RWL Jones Lancaster University."

Similar presentations


Ads by Google