Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tony Doyle GridPP – Project Elements AstroGrid Meeting MSSL, 26 Jun 2002.

Similar presentations


Presentation on theme: "Tony Doyle GridPP – Project Elements AstroGrid Meeting MSSL, 26 Jun 2002."— Presentation transcript:

1 Tony Doyle GridPP – Project Elements AstroGrid Meeting MSSL, 26 Jun 2002

2 Tony Doyle - University of Glasgow GridPP – Project Elements From Web to Grid… e-Science = Middleware LHC Computing Challenge Infrastructure –Tiered Computer Centres –Network BaBar – a running experiment Non-technical issues …Building the Next IT Revolution UK GridPP EU DataGrid –Middleware Development –Operational Grid DataGrid Testbed Status: 25 Jun :38:47 GMT GridPP Testbed Grid Job Submission Things Missing, Apparently… …From Grid to Web

3 Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA LCFG MDS deployment GridSite SlashGrid Spitfire… Applications (start-up phase) BaBar CDF/D0 (SAM) ATLAS/LHCb CMS (ALICE) UKQCD £17m 3-year project funded by PPARC CERN - LCG (start-up phase) funding for staff and hardware... £3.78m £5.67m £3.66m £1.99m £1.88m CERN DataGrid Tier - 1/A Applications Operations

4 Tony Doyle - University of Glasgow Provide architecture and middleware Use the Grid with simulated data Use the Grid with real data Future LHC Experiments Running US Experiments Build Tier-A/prototype Tier-1 and Tier-2 centres in the UK and join worldwide effort to develop middleware for the experiments GridPP

5 Tony Doyle - University of Glasgow GridPP Project Map - Elements

6 Tony Doyle - University of Glasgow Scale? LHC Computing at a Glance The investment in LHC computing will be massive –LHC Review estimated 240MCHF –80MCHF/y afterwards These facilities will be distributed –Political as well as sociological and practical reasons Europe: 267 institutes, 4603 users Elsewhere: 208 institutes, 1632 users

7 Tony Doyle - University of Glasgow Rare Phenomena – Huge Background 9 orders of magnitude! The HIGGS All interactions

8 Tony Doyle - University of Glasgow Complexity CPU Requirements Complex events –Large number of signals –good signals are covered with background Many events –10 9 events/experiment/year –1- 25 MB/event raw data –several passes required Need world-wide: 7*10 6 SPECint95 (3*10 8 MIPS)

9 Tony Doyle - University of Glasgow Physics Analysis ESD: Data or Monte Carlo Event Tags Event Selection Analysis Object Data AOD Analysis Object Data AOD Calibration Data Analysis, Skims Raw Data Tier 0,1 Collaboration wide Tier 2 Analysis Groups Tier 3, 4 Physicists Physics Analysis Physics Objects Physics Objects Physics Objects INCREASING DATA FLOWINCREASING DATA FLOW

10 Tony Doyle - University of Glasgow LHC Computing Challenge Tier2 Centre ~1 TIPS Online System Offline Farm ~20 TIPS CERN Computer Centre >20 TIPS RAL Regional Centre US Regional Centre French Regional Centre Italian Regional Centre Institute Institute ~0.25TIPS Workstations ~100 MBytes/sec Mbits/sec One bunch crossing per 25 ns 100 triggers per second Each event is ~1 Mbyte Physicists work on analysis channels Each institute has ~10 physicists working on one or more channels Data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~ Gbits/sec or Air Freight Tier2 Centre ~1 TIPS ~Gbits/sec Tier 0 Tier 1 Tier 3 Tier 4 1 TIPS = 25,000 SpecInt95 PC (1999) = ~15 SpecInt95 ScotGRID++ ~1 TIPS Tier 2

11 Tony Doyle - University of Glasgow Tier-0 - CERN Commodity Processors +IBM (mirrored) EIDE Disks Scale: ~10,000 CPUs ~5 PBytes Compute Element (CE) Storage Element (SE) User Interface (UI) Information Node (IN) Storage Systems..

12 Tony Doyle - University of Glasgow UK Tier-1 RAL New Computing Farm 4 racks holding 156 dual 1.4GHz Pentium III cpus. Each box has 1GB of memory, a 40GB internal disk and 100Mb ethernet. 50TByte disk-based Mass Storage Unit after RAID 5 overhead. PCs are clustered on network switches with up to 8x1000Mb ethernet out of each rack. Tape Robot upgraded last year uses 60GB STK 9940 tapes 45TB currrent capacity could hold 330TB Scale: 1000 CPUs 0.5 PBytes

13 Tony Doyle - University of Glasgow Regional Centres SRIF Infrastructure Local Perspective: Consolidate Research Computing Optimisation of Number of Nodes? 4 Relative size dependent on funding dynamics Global Perspective: V. Basic Grid Skeleton

14 Tony Doyle - University of Glasgow UK Tier-2 ScotGRID ScotGrid Processing nodes at Glasgow 59 IBM X Series 330 dual 1 GHz Pentium III with 2GB memory 2 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory and dual ethernet 3 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory and Mbit/s ethernet 1TB disk LTO/Ultrium Tape Library Cisco ethernet switches ScotGrid Storage at Edinburgh IBM X Series 370 PIII Xeon with 512 MB memory 32 x 512 MB RAM 70 x 73.4 GB IBM FC Hot- Swap HDD CDF equipment at Glasgow 8 x 700 MHz Xeon IBM xSeries GB memory 1 TB disk Griddev testrig at Glasgow 4 x 233 MHz Pentium II 2004 Scale: 300 CPUs 0.1 PBytes BaBar UltraGrid System at Edinburgh 4 UltraSparc 80 machines in a rack 450 MHz CPUs in each 4Mb cache, 1 GB memory Fast Ethernet and Myrinet switching

15 Tony Doyle - University of Glasgow Network Network Internal networking is currently a hybrid of –100Mb(ps) to nodes of cpu farms –1Gb to disk servers –1Gb to tape servers UK: academic network SuperJANET4 –2.5Gb backbone upgrading to 20Gb in 2003 EU: SJ4 has 2.5Gb interconnect to Geant US: New 2.5Gb link to ESnet and Abilene for researchers UK involved in networking development –internal with Cisco on QoS –external with DataTAG

16 Tony Doyle - University of Glasgow Grid issues – Coordination Technical part is not the only problem Sociological problems? resource sharing –Short-term productivity loss but long-term gain Key? communication/coordination between people/centres/countries –This kind of world-wide close coordination across multi-national collaborations has never been done in the past We need mechanisms here to make sure that all centres are part of a global planning –In spite of different conditions of funding, internal planning, timescales etc The Grid organisation mechanisms should be complementary and not parallel or conflicting to existing experiment organisation –LCG-DataGRID-eSC-GridPP –BaBar-CDF-D0-ALICE-ATLAS-CMS-LHCb-UKQCD Local Perspective: build upon existing strong PP links in the UK to build a single Grid for all experiments

17 Tony Doyle - University of Glasgow Experiment Deployment

18 Tony Doyle - University of Glasgow Grid Middleware development EU contract signed by 21 partners 10 million Euros of EU funding, mainly for personnel Project to start early 2001, duration 3 years Deliverables: Middleware tested with Particle Physics, Earth Observation, Biomedical Applications Flagship project of the EU IST GRID programmeDataGRID

19 Tony Doyle - University of Glasgow DataGrid Middleware Work Packages Collect requirements for middleware –Take into account requirements from application groups Survey current technology –For all middleware Core Services testbed –Testbed 0: Globus (no EDG middleware) First Grid testbed release Testbed 1: first release of EDG middleware WP1: workload –Job resource specification & scheduling WP2: data management –Data access, migration & replication WP3: grid monitoring services –Monitoring infrastructure, directories & presentation tools WP4: fabric management –Framework for fabric configuration management & automatic sw installation WP5: mass storage management –Common interface for Mass Storage Sys. WP7: network services –Network services and monitoring

20 Tony Doyle - University of Glasgow DataGrid Architecture Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Fabric Local Computing Grid Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Service Index

21 Tony Doyle - University of Glasgow Interfaces Interfaces Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Service Index Computing Elements SystemManagers Scientists OperatingSystems File Systems StorageElements Mass Storage Systems HPSS, Castor User Accounts Certificate Authorities ApplicationDevelopers Batch Systems PBS, LSF

22 Tony Doyle - University of Glasgow Software Development Infrastructure CVS Repository –management of DataGrid source code –all code available (some mirrored) Bugzilla Package Repository –public access to packaged DataGrid code Development of Management Tools –statistics concerning DataGrid code –auto-building of DataGrid RPMs –publishing of generated API documentation –more to come… Lines of Code 10 Languages

23 Tony Doyle - University of GlasgowAuthentication/Authorization Authentication (CA Working Group) –11 national certification authorities –policies & procedures mutual trust –users identified by CAs certificates Authorization (Authorization Working Group) –Based on Virtual Organizations (VO). –Management tools for LDAP-based membership lists. –6+1 Virtual Organizations VOs ALICEEarth Obs. ATLASBiomedical CMS LHCbGuidelines CAs CERN CESNET CNRS DataGrid- ES GridPP Grid-Ireland INFN LIP NIKHEF NorduGrid Russian DataGrid

24 Tony Doyle - University of Glasgow Testbed Elements VO Membership Certification Authorities User Interface Resource Broker Testbed Sites (5) CC-IN2P3, CERN, CNAF, NIKHEF, RAL VO Replica Catalog Network Monitoring

25 Tony Doyle - University of Glasgow Typical Testbed Site Gatekeeper Worker Nodes Disk Storage Mass Storage HPSS, Castor Computing Element Storage Element

26 Tony Doyle - University of Glasgow Software Evaluation ComponentETTUTITNINFFMBSD Resource Broker vvvl Job Desc. Lang. vvvl Info. Index vvvl User Interface vvvl Log. & Book. Svc. vvvl Job Sub. Svc. vvvl Broker Info. API vvl SpitFire vvl GDMP l Rep. Cat. API vvl Globus Rep. Cat. vvl ETTExtensively Tested in Testbed UTUnit Testing ITIntegrated Testing NINot Installed NFFSome Non-Functioning Features MBSome Minor Bugs SDSuccessfully Deployed ComponentETTUTITNINFFMBSD Schema vvvl FTree vvl R-GMA vvl Archiver Module vvl GRM/PROVE vvl LCFG vvvl CCM vl Image Install. vl PBS Info. Prov. vvvl LSF Info. Prov. vvl ComponentETTUTITNINFFMBSDSD SE Info. Prov. vvl File Elem. Script l Info. Prov. Config. vvl RFIO vvl MSS Staging l Mkgridmap & daemon vl CRL update & daemon vl Security RPMs vl EDG Globus Config. vvl ComponentETTUTITNINFFMBSD PingER vvl UDPMon vvl IPerf vvl Globus2 Toolkit vvl

27 Tony Doyle - University of Glasgow WP1 – Workload Management (Job Submission) 1. Authentication grid-proxy-init 2. Job submission to DataGrid dg-job-submit 3. Monitoring and control dg-job-status dg-job-cancel dg-job-get-output 4. Data publication and replication (WP2) globus-url-copy, GDMP 5. Resource scheduling – use of CERN MSS JDL, sandboxes, storage elements Important to implement this for all experiments …

28 Tony Doyle - University of Glasgow WP2 - Spitfire Servlet Container SSLServletSocketFactory TrustManager Security Servlet Does user specify role? Map role to connection id Authorization Module HTTP + SSL Request + client certificate Yes Role Trusted CAs Is certificate signed by a trusted CA? No Has certificate been revoked? Revoked Certs repository Find default No Role repository Role ok? Connection mappings Translator Servlet RDBMS Request a connection ID Connection Pool

29 Tony Doyle - University of Glasgow WP3 - R-GMA Consumer Servlet Registry API Consumer Servlet Registry API Consumer Servlet Registry API Consumer Servlet Registry API Sensor Code Producer API Application Code Consumer API ProducerServlet Registry API Registry Servlet Schema API Schema Servlet Event Dictionary Application Code Archiver API DBProducer Servlet Archiver Servlet Consumer API Consumer API Consumer API Consumer API User code here. Builds on R-GMA Database Structures. User code monitors output here.

30 Tony Doyle - University of Glasgow WP4 - LCFG

31 Tony Doyle - University of Glasgow Interface Queue Manager Request Manager Pipe Manager Tape Disk Named Pipe Interface LayerThe Core and the Bottom Layer MSM Handler Named Pipe Pipe Store Network Data Flow Diagram for SE WP5 – Storage Element u A consistent interface to Mass Storage Systems. n MSS s Castor s HPSS s RAID arrays s SRM s DMF s Enstore n Interfaces s GridFTP s GridRFIO s /grid s OGSA

32 Tony Doyle - University of Glasgow EDG TestBed 1 Status 25 Jun :38 Web interface showing status of (~400) servers at testbed 1 sites Production Centres

33 Tony Doyle - University of Glasgow GridPP Sites in Testbed(s)

34 Tony Doyle - University of Glasgow GridPP Sites in Testbed: Status 25 Jun :38

35 Tony Doyle - University of Glasgow GridPP Sites in Testbed(s) Key "Green Dot" - normally present on the GridPP monitoring mapGridPP monitoring map G currently running at least one machine with a Globus (or 1.1.4) gatekeeper. G2.0(b) - currently running at least one machine with a Globus 2.0beta or 2.0 gatekeeper (including EDG Computing Elements and BaBar CE's.) EDG-CE - running a standard EDG Computing Element (ie gatekeeper), usually installed by LCFG. BaBar-CE - running a BaBarGrid CE (based on the EDG CE but installed by a different procedure.)

36 Tony Doyle - University of Glasgow t0t0 t1t1 From Grid to Web… using GridSite

37 Tony Doyle - University of Glasgow WP7 – Network Monitoring

38 Tony Doyle - University of Glasgow WP7 - EDG Authorisation grid-mapfile generation o=testbed, dc=eu-datagrid, dc=org CN=Franz Elmer ou=People CN=John Smith mkgridmap grid-mapfile VO Directory Authorization Directory CN=Mario Rossi o=xyz, dc=eu-datagrid, dc=org CN=Franz ElmerCN=John Smith Authentication Certificate ou=Peopleou=Testbed1ou=??? local usersban list

39 Tony Doyle - University of Glasgow Current User Base Grid Support Centre GridPP (UKHEP) CA uses primitive technology –It works but takes effort –201 personal certs issued –119 other certs issued GSC will run a CA for UK escience CA –Uses openCA; Registration Authority uses web –We plan to use it –Namespace identifies RA, not Project –Authentication not Authorisation Through GSC we have access to skills of CLRC eSC Use helpdesk to formalise support later in the rollout UK e-Science UK e-Science Certification Certification Authority Authority Scale Scale

40 Tony Doyle - University of GlasgowDocumentation GridPP Web Site: EDG User Guide: EDG User Guide: A biomedical user point of view. JDL Howto: Document.pdf Document.pdf GDMP Guide:

41 Tony Doyle - University of Glasgow Job Submission 1. Authentication grid-proxy-init 2. Job submission to DataGrid dg-job-submit 3. Monitoring and control dg-job-status dg-job-cancel dg-job-get-output 4. Data publication and replication globus-url-copy, GDMP 5. Resource scheduling JDL, sandboxes, storage elements Linux text interfaces implemented GUIs next..

42 Tony Doyle - University of Glasgow Job Submission Example dg-job-submit /home/evh/sicb/sicb/bbincl jdl -o /home/evh/logsub/ bbincl jdl: # Executable = "script_prod"; Arguments = " ,v235r4dst,v233r2"; StdOutput = "file output"; StdError = "file err"; InputSandbox = {"/home/evhtbed/scripts/x509up_u149","/home/evhtbed/sicb/mcsend", "/home/evhtbed/sicb/fsize","/home/evhtbed/sicb/cdispose.class","/ home/evhtbed/v235r4dst.tar.gz","/home/evhtbed/sicb/sicb/bbincl sh","/home/evhtbed/script_prod","/home/evhtbed/sicb/sicb dat","/home/evhtbed/sicb/sicb dat","/home/evhtbed/sicb /sicb dat","/home/evhtbed/v233r2.tar.gz"}; OutputSandbox = {"job txt","D ","file output","file er r","job txt","job txt"};

43 Tony Doyle - University of Glasgow GRID JOB SUBMISSION – External User Experience

44 Tony Doyle - University of Glasgow Things Missing, apparently… i.e. not ideal… …but it works

45 Tony Doyle - University of Glasgow GUI - today

46 Tony Doyle - University of Glasgow Testbed GUI (release 1.3)

47 Tony Doyle - University of Glasgow GUI Future? Web Services Access via Grid Certificate

48 Tony Doyle - University of Glasgow GridPP – An Operational Grid From Web to Grid… Fit into UK e-Science structures LHC Computing – Particle physicists will use experience in distributed computing to build and exploit the Grid Infrastructure – tiered computing down to the physicist desktop Importance of networking Existing experiments have immediate requirements Non-technical issues = recognising/defining roles (at various levels) UK GridPP started 1/9/01 EU DataGrid First Middleware ~1/9/01 Development requires a testbed with feedback –Operational Grid Status: 25 Jun :38:47 GMT – a day in the life.. GridPP Testbed is relatively small scale – migration plans reqd. e.g. for CA. Grid jobs are being submitted today.. user feedback loop is important.. Grid tools web page development by a VO. Next stop. Web services…

49 Tony Doyle - University of GlasgowSummary A vision is only useful if its shared Grid success is fundamental for PP 1.Scale in UK? 0.5 Pbytes and 2,000 distrib. CPUs GridPP in Sept Integration – ongoing.. 3.Dissemination – external and internal 4.LHC Analyses – ongoing feedback mechanism.. 5.Other Analyses – closely integrated using EDG tools 6.DataGrid - major investment = must be (and is so far) successful 7.LCG – Grid as a Service 8.Interoperability – sticky subject 9.Infrastructure – Tier-A/1 in place, Tier-2s to follow… 10.Finances – (very well) under control Next steps on framework VI.. CERN = EUs e-science centre? Co-operation required with other disciplines/industry esp. AstroGrid


Download ppt "Tony Doyle GridPP – Project Elements AstroGrid Meeting MSSL, 26 Jun 2002."

Similar presentations


Ads by Google