Presentation is loading. Please wait.

Presentation is loading. Please wait.

EScience – Grid Computing Graduate Lecture 5 th November 2012 Robin Middleton – PPD/RAL/STFC 1 I am indebted to the EGEE,

Similar presentations


Presentation on theme: "EScience – Grid Computing Graduate Lecture 5 th November 2012 Robin Middleton – PPD/RAL/STFC 1 I am indebted to the EGEE,"— Presentation transcript:

1 eScience – Grid Computing Graduate Lecture 5 th November 2012 Robin Middleton – PPD/RAL/STFC (Robin.Middleton@stfc.ac.uk) 1 I am indebted to the EGEE, EGI, LCG and GridPP projects and to colleagues therein for much of the material presented here.

2 eScience Graduate Lecture  What is eScience, what is the Grid ?  Essential grid components  Grids in HEP  The wider picture  Summary 2 A high-level look at some aspects of computing for particle physics today

3 What is eScience ? …also : e-Infrastructure, cyberinfrastructure, e-Research, … Includes –grid computing (e.g. WLCG, EGEE, EGI, OSG, TeraGrid, NGS…) computationally and/or data intensive; highly distributed over wide area –digital curation –digital libraries –collaborative tools (e.g. Access Grid) –…other areas Most UK Research Councils active in e-Science –BBSRC –NERC(e.g. climate studies, NERC DataGrid - http://ndg.nerc.ac.uk/ )http://ndg.nerc.ac.uk/ –ESRC(e.g. NCeSS - http://www.merc.ac.uk/ )http://www.merc.ac.uk/ –AHRC(e.g. studies in collaborative performing arts) –EPSRC(e.g. MyGrid - http://www.mygrid.org.uk/ )http://www.mygrid.org.uk/ –STFC (e.g. GridPP - http://www.gridpp.ac.uk/ )http://www.gridpp.ac.uk/ 3

4 eScience – year ~2000 Professor Sir John Taylor, former (1999-2003) Director General of the UK Research Councils, defined eScience thus: –science increasingly done through distributed global collaborations enabled by the internet, using very large data collections, terascale computing resources and high performance visualisation’. Also quotes from Professor Taylor… –‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ –‘e-Science will change the dynamic of the way science is undertaken.’ 4

5 What is Grid Computing ? Grid Computing –term invented in 1990s as metaphor for making computer power as easy to access as the electric power grid (Foster & Kesselman - "The Grid: Blueprint for a new computing infrastructure“) –combines computing resources from multiple administrative domains CPU and storage…loosely coupled –serves the needs of one or more virtual organisations (e.g. LHC experiments) –different from Cloud Computing (e.g. Amazon Elastic Compute Cloud - http://aws.amazon.com/ec2/ )http://aws.amazon.com/ec2/ Volunteer Computing (SETI@home, LHC@home - http://boinc.berkeley.edu/projects.php ) http://boinc.berkeley.edu/projects.php 5

6 Essential Grid Components  Middleware  Information System  Workload Management; Portals  Data Management  File transfer  File catalogue  Security  Virtual Organisations  Authentication  Authorisation  Accounting 6

7 Information System At the heart of the Grid Hierarchy of BDII (LDAP) servers GLUE information schema (http://www.ogf.org/documents/GFD.147.pdf)http://www.ogf.org/documents/GFD.147.pdf LDAP (Lightweight Directory Access Protocol) –tree structure –DN: Distinguished Name 7 o = grid (root of the DIT) st = Chilton or = STFC ou = PPD ou = ESC c= US c=UK c=Spain

8 Workload Management System (WMS) For example - composed of the following parts: 1.User Interface (UI) : access point for the user to the WMS 2.Resource Broker (RB) : the broker of GRID resources, responsible to find the “best” resources where to submit jobs 3.Job Submission Service (JSS) : provides a reliable submission system 4.Information Index (BDII) : a server (based on LDAP) which collects information about Grid resources – used by the Resource Broker to rank and select resources 5.Logging and Bookkeeping services (LB) : store Job Info available for users to query However, you are much more likely to use a portal to submit work… 8 Executable = “gridTest”; StdError = “stderr.log”; StdOutput = “stdout.log”; InputSandbox = {“/home/robin/test/gridTest”}; OutputSandbox = {“stderr.log”, “stdout.log”}; InputData = “lfn:testbed0-00019”; DataAccessProtocol = “gridftp”; Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX” && other.FreeCpus >=4; Rank = “other.GlueHostBenchmarkSF00”; Example JDL

9 9 Portals - Ganga Job Definition & Management Implemented in Python Extensible – plug-ins Used ATLAS, LHCb & non-HEP http://ganga.web.cern.ch/ganga/index.php

10 Data Management Storage Element (SE) –>1 implementation –all are accessed through SRM (Storage Resource Manager) interface –DPM – Disk Pool Manager (disk only) o secure: authentication via GSI, authorisation via VOMS o full POSIX ACL support with DN (userid) and VOMS groups o disk pool management (direct socket interface) o storage name space (aka. storage file catalog) o DPM can act as a site local replica catalog o SRMv1, SRMv2.1 and SRMv2.2 o gridFTP, rfio –dCache (disk & tape) – developed at DESY –ENSTORE – developed at Fermilab –CASTOR – devloped at CERN o Cern Advanced STORage manager o HSM – Hierarchical Storage Manager  disk cache & tape 10

11 File Transfer Service File Transfer Service is a data movement fabric service –multi-VO service, balance usage of site resources according to VO and site policies –uses SRM and gridFTP services of an Storage Element (SE) Why is it needed ? –For the user, the service it provides is the reliable point to point movement of Storage URLs (SURLs) among Storage Elements –For the site manager, it provides a reliable and manageable way of serving file movement requests from their VOs –For the VO manager, it provides ability to control requests coming from users (re-ordering, prioritization,...) 11

12 File Catalogue LFC – LHC File Catalogue - a file location service Glossary o LFN = Logical File Name; GUID = Global Unique ID; SURL = Storage URL –Provides a mapping from one or more LFN to the physical location of file –Authentication & authorisation is via a grid certificate –Provides very limited metadata – size, checksum Experiments usually have a metadata catalogue layered above LFC –e.g. AMI – ATLAS Metadata Interface 12

13 Grid Security Based around X.509 certificates – Public Key Infrastructure (PKI) –issued by Certificate Authorities –forms a hierarchy of trust Glossary –CA – Certificate Authority –RA – Registration Authority –VA – Validation Authority How it Works… –User applies for certificate with public key at a RA –RA confirms user's identity to CA which in turn issues the certificate –User can then digitally sign a contract using the new certificate –User identity is checked by the contracting party with VA –VA receives information about issued certificates by CA 13

14 Virtual Organisations Aggregation of groups (& individuals) sharing use of (distributed) resources to a common end under an agreed set of policies –a semi-informal structure orthogonal to normal institutional allegiances –e.g. A HEP Experiment Grid Policies –Acceptable use; Grid Security; New VO registration; –http://proj-lcg-security.web.cern.ch/proj-lcg-security/security_policy.htmlhttp://proj-lcg-security.web.cern.ch/proj-lcg-security/security_policy.html VO specific environment –experiment libraries, databases,… –resource sites declare which VOs it will support 14

15 Security - The Three As Authentication –verifying that you are who you say you are –your Grid Certificate is your “passport” Authorisation –knowing who you are, validating what you are permitted to do –e.g. submit analysis jobs as a member of LHCb –e.g. VO capability to manage production software Accounting (auditing) –local logging what you have done – your jobs ! –aggregated into grid-wide respository –provides usage statistics information source in event of security incident 15

16 Grids in HEP  LCG; EGEE & EGI Projects  GridPP  The LHC Computing Grid  Tiers 0,1,2  The LHC OPN  Experiment Computing Models  Typical data access patterns  Monitoring  Resource providers view  VO view  End-user view 16

17 17 LCG  EGEE->EGI LCG  LHC Computing Grid Distributed Production Environment for Physics Data Processing World’s largest production computing grid In 2011 : >250,000 CPU cores, 15PB/Yr, 8000 physicist, ~500 institutes EGEE  Enabling Grids for E-sciencE Starts from LCG infrastructure Production Grid in 27 countries HEP, BioMed, CompChem, Earth Science, … EU Support

18 18 GridPP Integrated within the LCG/EGI framework UK Service Operations (LCG/EGI) –Tier-1 & Tier-2s HEP Experiments –@ LHC, FNAL, SLAC –GANGA (LHCb & ATLAS) Working with NGS in forming the UK NGI for EGI Phase 1 : 2001-2004 –Prototype (Tier-1) Phase 2 : 2004-2008 –“From Prototype to Production” –Production (Tier-1&2) Phase 3 : 2008-2011 –“From Production to Exploitation” –Reconstruction, Monte Carlo, Analysis Phase 4 : 2011-2014… routine operation during LHC running Tier-1 Farm Usage

19 LCG – The LHC Computing Grid Worldwide LHC Computing Grid - http://lcg.web.cern.ch/lcg/ http://lcg.web.cern.ch/lcg/ Framework to deliver distributed computing for the LHC experiments –Middleware / Deployment –(Service/Data Challenges) –Security (operations & policy) –Applications (Experiment) Software –Distributed Analysis –Private Optical Network –Experiments  Resources  MoUs Coverage –Europe  EGI –USA  OSG –Asia  Naregi, Taipei, China… –Other… 19

20 LHC Computing Model 20 physics group regional group Tier2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x Tier3 (physics department)    Desktop Germany Tier 1 USA UK France Italy ………. CERN Tier 1 ………. The LHC Computing Centre CERN Tier 0

21 LHCOPN – Optical Private Network Principle means to distribute LHC data Primarily linking Tier-0 and Tier-1s Some Tier-1 to Tier-1 Traffic Runs over leased lines Some resilience Mostly based on 10 Gigabit technology Reflects Tier architecture 21

22 LHC Experiment Computing Models General (ignoring experiment specifics) –Tier-0 (@CERN) 1 st pass reconstruction (including initial calibration) RAW data storage –Tier-1 Re-processing; some centrally organised analysis; Custodial copy of RAW data, some ESD, all AOD, some SIMU –Tier-2 (chaotic) user analysis; simulation some AOD (depends on local requirements) Event sizes determine disk buffers at experiments & Tier-0 Event datasets formats (RAW, ESD, AOD, etc); (adaptive) placement (near analysis); replicas Data streams – physics specific, debug, diagnostic, express, calibration CPU & storage requirements Simulation 22

23 23 Typical Data Access Patterns Raw Data ~1000 Tbytes AOD ~10 TB Reco-V1 ~1000 TbytesReco-V2 ~1000 Tbytes ESD-V1.1 ~100 Tbytes ESD-V1.2 ~100 Tbytes ESD-V2.1 ~100 Tbytes ESD-V2.2 ~100 Tbytes Access Rates (aggregate, average) 100 Mbytes/s (2-5 physicists) 500 Mbytes/s (5-10 physicists) 1000 Mbytes/s (~50 physicists) 2000 Mbytes/s (~150 physicists) Typical LHC particle physics experiment One year of acquisition and analysis of data

24 Monitoring A resource provider’s view 24

25 Monitoring Virtual Organisation specifics 25

26 Monitoring - Dashboards Virtual Organisation view e.g. ATLAS dashboard 26

27 Monitoring – Dashboards For the end user –available through dashboard 27

28 The wider Picture  What some other communities do with Grids  The ESFRI projects  Virtual Instruments  Digital Curation  Clouds  Volunteer Computing  Virtualisation 28

29 What are other communities doing with grids ? Astronomy & Astrophysics o large-scale data acquisition, simulation, data storage/retrieval Computational Chemistry o use of software packages (incl. commercial) on EGEE Earth Sciences o Seismology, Atmospheric modeling, Meteorology, Flood forecasting, Pollution Fusion (build up to ITER) o Ion Kinetic Transport, Massive Ray Tracing, Stellarator Optimization. Computer Science –collect data on Grid behaviour (Grid Observatory) High Energy Physics o four LHC experiments, BaBar, D0, CDF, Lattice QCD, Geant4, SixTrack, … Life Sciences o Medical Imaging, Bioinformatics, Drug discovery o WISDOM – drug discovery for neglected / emergent diseases (malaria, H5N1, …) 29

30 ESFRI Projects (European Strategy Forum on Research Infrastructures) Many are starting to look at their e-Science needs –some at a similar scale to the LHC (petascale) –project design study stage –http://cordis.europa.eu/esfri/http://cordis.europa.eu/esfri/ 30 Cherenkov Telescope Array

31 Virtual Instruments Integration of scientific instruments into the Grid –remote operation, monitoring, scheduling, sharing… GridCC - Grid enabled Remote Instrumentation with Distributed Control and Computation CR: build workflows to monitor & control remote instruments in real-time CE, SE, ES, IS & SS: as in a “normal” grid Monitoring services Instrument Element (IE) - interfaces for remote control & monitoring –CMS run control includes an IE…but not really exploited (yet) ! DORII – Deployment Of Remote Instrumentation Infrastructure –Consolidation of GridCC with EGEE, g-Eclipse, Open MPI, VLab The Liverpool Telescope - robotic –not just remote control, but fully autonomous –scheduler operates on basis of observing database –(http://telescope.livjm.ac.uk/)http://telescope.livjm.ac.uk/ 31

32 Digital Curation Preservation of digital research data for future use Issues –media; data formats; metadata; data management tools; reading (FORTRAN);... –digital curation lifecycle - http://www.dcc.ac.uk/digital-curation/what-digital-curation http://www.dcc.ac.uk/digital-curation/what-digital-curation Digital Curation Centre - http://www.dcc.ac.uk/ http://www.dcc.ac.uk/ –NOT a repository ! –strategic leadership –influence national (international) policy –expert advice for both users and funders –maintains suite of resources and tools –raise levels of awareness and expertise 32

33 JADE (1978-86) New results from old data –new & improved theoretical calculations & MC models; optimised observables –better understanding of Standard Model (top, W, Z) –re-do measurements – better precision, better systematics –new measurements, but at (lower) energies not available today –new phenomena – check at lower energies Challenges –rescue data from (very) old media; resurrect old software; data management; implement modern analysis techniques –but, luminosity files lost – recovered from ASCII printout in an office cleanup Since 1996 –~10 publications (as recent as 2009) –~10 conference contributions –a few PhD Theses (ack S.Bethke) 33

34 What is HEP doing about it ? ICFA Study Group on Data Preservation and Long Term Analysis in High Energy Physics https://www.dphep.org/https://www.dphep.org/ 5 Workshops so far; intermediate report to ICFA Available at arxiv:0912.0255 Initial recommendations December 2009 “Blueprint for Data Preservation in High Energy Physics” to follow 34

35 Grids, Clouds, Supercomputers, … 35 (Ack: Bob Jones – former EGEE Project Director) Bob Jones - October 2009 35 Grids Collaborative environment Distributed resources (political/sociological) Commodity hardware (also supercomputers) (HEP) data management Complex interfaces (bug not feature) Supercomputers Expensive Low latency interconnects Applications peer reviewed Parallel/coupled applications Traditional interfaces (login) Also SC grids (DEISA, Teragrid) Supercomputers Expensive Low latency interconnects Applications peer reviewed Parallel/coupled applications Traditional interfaces (login) Also SC grids (DEISA, Teragrid) Clouds Proprietary (implementation) Economies of scale in management Commodity hardware Virtualisation for service provision and encapsulating application environment Details of physical resources hidden Simple interfaces (too simple?) Clouds Proprietary (implementation) Economies of scale in management Commodity hardware Virtualisation for service provision and encapsulating application environment Details of physical resources hidden Simple interfaces (too simple?) Volunteer computing Simple mechanism to access millions CPUs Difficult if (much) data involved Control of environment  check Community building – people involved in Science Potential for huge amounts of real work Volunteer computing Simple mechanism to access millions CPUs Difficult if (much) data involved Control of environment  check Community building – people involved in Science Potential for huge amounts of real work

36 Clouds / Volunteer Computing Clouds are largely commercial –Pay for use –Interfaces from grids exist o absorb peak demands (e.g. before a conference !) o CernVM images exist Volunteer Computing –LHC@Home o SixTrack – study particle orbit stability in accelerators o Garfield – study behaviour of gas-based detectors 36

37 Virtualisation Virtual implementation of a resource – e.g. a hardware platform –a current buzzword, but not new – IBM launched VM/370 in 1972 ! Hardware virtualisation –one or more virtual machines running an operating system within a host system –e.g. run Linux (guest) in a virtual machine (VM) with Microsoft Windows (host) –independent of hardware platform; migration between (different) platforms –run multiple instances on one box; provides isolation (e.g. against rogue s/w) Hardware-assisted virtualisation –not all machine instructions are “virtualisable” (e.g. some privileged instructions) –h/w-assist traps such instructions and provides hardware emulation of them Implementations –Zen, VMware, VirtualBox, Microsoft Virtual PC, … Interest to HEP ? –the above + opportunity to tailor to experiment needs (e.g. libraries, environment) –CernVM – CERN specific Linux environment - http://cernvm.cern.ch/portal/ http://cernvm.cern.ch/portal/ –CernVM-FS – network filesystem to access experiment specific software –Security – certificate to assure origin/validity of VM 37

38 Summary  What is eScience about and what are Grids  Essential components of a Grid  middleware  virtual organisations  Grids in HEP  LHC Computing GRID  A look outside HEP  examples of what others are doing 38


Download ppt "EScience – Grid Computing Graduate Lecture 5 th November 2012 Robin Middleton – PPD/RAL/STFC 1 I am indebted to the EGEE,"

Similar presentations


Ads by Google