Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.gisela-grid.eu Grid Initiatives for e-Science virtual communities in Europe and Latin America Grid Computing - DCC/FCUP The middleware.

Similar presentations


Presentation on theme: "Www.gisela-grid.eu Grid Initiatives for e-Science virtual communities in Europe and Latin America Grid Computing - DCC/FCUP The middleware."— Presentation transcript:

1 Grid Initiatives for e-Science virtual communities in Europe and Latin America Grid Computing - DCC/FCUP The middleware

2 Disclaimer This presentation is based on materials provided and authorized by the EGEE project and is available to download and use according to the terms of the following license: 2

3 OUTLINE The EGEE Project –Objective –Relationship to other projects The gLite middleware –Middleware decomposition  Foundation  High-level services 3

4 Part I The EGEE Project 4

5 The EGEE project EGEE –1 April 2004 – 31 March 2006 –71 partners in 27 countries, federated in regional Grids EGEE-II –1 April 2006 – 31 March 2008 –91 partners in 32 countries –13 Federations EGEE-III –1 April 2008 – 31 March 2010 –More than 120 partners Objectives –Large-scale, production-quality infrastructure for e-Science –Attracting new resources and users from industry as well as science –Improving and maintaining “gLite” Grid middleware 5 US partners in EGEE-II: Univ. Chicago Univ. South. California Univ. Wisconsin RENCI

6 Main lines of the EGEE project Infrastructure operation –Currently includes sites across 39 countries –Continuous monitoring of grid services & automated site configuration/management Middleware –Production quality middleware distributed under business friendly open source licence User Support - Managed process from first contact through to production usage –Training –Expertise in grid-enabling applications –Online helpdesk –Networking events (User Forum, Conferences etc.) Interoperability –Expanding geographical reach and interoperability with related infrastructures 6 TWGRID KnowARC

7 Applications on EGEE Applications from an increasing number of domains –Astrophysics –Computational Chemistry –Earth Sciences –Financial Simulation –Fusion –Geophysics –High Energy Physics –Life Sciences –Multimedia –Material Sciences –… Book of abstracts: 7

8 EU projects related to EGEE 8 EUGRID

9 Sustainability: Beyond EGEE-III 9 EGI Need to prepare for permanent Grid infrastructure –Ensure a reliable and adaptive support for all sciences –Independent of short project funding cycles –Infrastructure managed in collaboration with national grid initiatives

10 Part II The gLite middleware 10 Programming the Grid with gLite

11 Middleware structure Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory Foundation Grid Middleware will be deployed on the EGEE infrastructure –Must be complete and robust –Should allow interoperation with other major grid infrastructures –Should not assume the use of Higher-Level Grid Services OMII-Europe All-Hands meeting, Bologna, February Foundation Grid Middleware Security model and infrastructure Computing (CE) and Storage Elements (SE) Accounting Information and Monitoring Higher-Level Grid Services Workload Management Replica Management Visualization Workflow Grid Economies... Applications Overview paper

12 gLite Services Decomposition 12 6 High Level Services + CLI & API Legend: Available Foreseen in the architecture (only Job provenance was available in the end of EGEE-II)

13 gLite components UI: User Interface CE: Computing Element SE: Storage Element WN: Worker Node WMS: Workload Management System VOMS: Virtual Organization Membership Service LB: Logging and Bookkeeping MonBOX: monitoring LFC: Logical File Catalog BDII: Berkeley Database Information Index, stores all infomation about the resources available in the grid infrastructure 13

14 Job Workflow in gLite 14 UI JDL Logging & Book-keeping ResourceBroker Job Submission ServiceStorageElementComputingElement InformationService Job Status LFCCatalog DataSets info Author. &Authen. Job Submit Event Job Query Job Status Input “sandbox” Input “sandbox” + Broker Info Globus RSL Output “sandbox” Job Status Publish voms-proxy-init Expanded JDL SE & CE info

15 Job Workflow in gLite 15 UI JDL Logging & Book-keeping ResourceBroker Job Submission ServiceStorageElementComputingElement InformationIndex Job Status LFCCatalog DataSets info Author. &Authen. Job Submit Event Job Query Job Status Input “sandbox” Input “sandbox” + Broker Info Globus RSL Output “sandbox” Job Status Publish voms-proxy-init Expanded JDL SE & CE info WMProxy

16 High Level Services: Workload Manag. Resource brokering, workflow management, I/O data management  Web Service interface: WMProxy –Task Queue: keep non matched jobs –Information SuperMarket: optimized cache of information system –Match Maker: assigns jobs to resources according to user requirements –Job submission & monitoring  Condor-G  ICE (to CREAM) –External interactions:  Information System  Data Catalogs  Logging&Bookkeeping  Policy Management system (G-PBox) 16 CREAM: Computing Resource Execution and Management ICE: Interface to CREAM Environment

17 Grid Foundation: Security Authentication based on X.509 PKI infrastructure –Certificate Authorities (CA) issue (long lived) certificates identifying individuals (much like a passport)  Commonly used in web browsers to authenticate to sites –Trust between CAs and sites is established (offline) –In order to reduce vulnerability, on the Grid user identification is done by using (short lived) proxies of their certificates Proxies can –Be delegated to a service such that it can act on the user’s behalf –Include additional attributes (like VO information via the VO Membership Service VOMS) –Be stored in an external proxy store (MyProxy) –Be renewed (in case they are about to expire) 17

18 Grid Foundation: Security Local Centre Authorization Service (LCAS) handles authorization requests to the local computing fabric Local Credential Mapping Service (LCMAPS) provides all local credentials needed for jobs allowed into the fabric. Batch Local ASCII Helper –The protocol (BLAHP): provides a set of plain ASCII commands used by Condor-C (and CREAM) to manage jobs on the batch systems. –The daemon (BLAHPD): implements the helper daemon responsible for converting BLAHP commands into batch system actions, interpreting their results and reporting them in BLAHP format. 18

19 Grid foundation: Information Systems Generic Information Provider (GIP) –Provides LDIF information about a grid service in accordance to the GLUE Schema BDII: Information system in gLite 3.0 (by LCG) –LDAP database that is updated by an external process –More than one DBs is used separate read and write –A port forwarder is used internally to select the correct DB 19 GIP Provider Config File LDIF File Plugin Cache LDIF: Lightweight Directory Interchange Format LDAP: Lightweight Data Access Protocol GLUE: Grid Laboratory Uniform Environment BDII: Berkeley Datbase Information Index

20 Grid foundation: Information Systems R-GMA: provides a uniform method to access and publish distributed information and monitoring data –Used for job and infrastructure monitoring in gLite 3.0 –Working to add authorization Service Discovery: –Provides a standard set of methods for locating Grid services –Currently supports R-GMA, BDII and XML files as backends –Will add local cache of information –Used by some DM and WMS components in gLite

21 Grid foundation: Computing Element Three flavours available now:  LCG-CE (GT2 GRAM)  In production now but will be phased-out next year  gLite-CE (GSI-enabled Condor-C)  Already deployed but still needs thorough testing and tuning. Being done now  CREAM (WS-I based interface)  Deployed on the JRA1 preview test-bed. After a first testing phase will be certified and deployed together with the gLite-CE  Our contribution to the OGF-BES group for a standard WS-I based CE interface  CREAM and WMProxy demo at SC06! BLAH is the interface to the local resource manager (via plug-ins) –CREAM and gLite-CE –Information pass-through: pass parameters to the LRMS to help job scheduling 21 WMS, Clients LRMS WN bdII R-GMA CEMon Computing Element glexec + LCAS/ LCMAPS BLAH Grid Site Information System

22 Grid foundation: Accounting APEL: Uses R-GMA to propagate and display job accounting information for infrastructure monitoring –Reads LRMS log files provided by LCG-CE and BLAH –Preparing an update for gLite 3.0 to use the files from BLAH DGAS: Collects, stores and transfers accounting data. Compliant with privacy requirements –Reads LRMS log files provided by LCG-CE and BLAH. –Stores information in a site database (HLR) and optionally in a central HLR. Access granted to user, site and VO administrators –Not yet certified in gLite 3.0. Deployment plan:  DGAS is in certification at INFN  It will send records to the GOC via DGAS2APEL 22 HLR: Home Location Registers: manage user and resource accounts

23 Grid foundation: Storage Element Storage Element –Common interface: SRMv1, migrating to SRM v2.2 –Various implementation from LCG and other external projects  disk-based: DPM, dCache / tape-based: Castor, dCache –Support for ACLs in DPM (in future in Castor and dCache)  synchronization of ACLs between SEs –Common rfio library for Castor and DPM being added Posix-like file access: –Grid File Access Layer (GFAL) by LCG  Support for ACL in the SRM layer (currently in DPM only)  Support for SRMv2 –gLite I/O  Support for ACLs from the file catalog and interfaced to Hydra for data encryption  Not certified in gLite 3.0. To be dismissed when all functionalities will be also available in GFAL. 23 Hydra: encrypts files and stores them on normal storage elements

24 High Level Services: Catalogues File Catalogs –LFC from LCG  interfaced to POOL (Disk Pool Manager – DPM).  LFC replication and backup. –Hydra: stores keys for data encryption  interfaced to GFAL  Released with gLite 3.1 –AMGA Metadata Catalog: generic metadata catalogue  Joint ARDA development. Used mainly by Biomed 24

25 High Level Services: File transfer FTS: Reliable, scalable and customizable file transfer –Manages transfers through channels  mono-directional network pipes between two sites –Web service interface –Automatic discovery of services –Support for different user and administrative roles –Adding support for pre-staging and new proxy renewal schema –Support for SRMv2.2, delegation, VOMS-aware proxy renewal in certification 25

26 High Level Services: Workload mgmt. WMS helps the user accessing computing resources –Resource brokering, management of job input/output,... LCG-RB: GT2 + Condor-G –To be replaced when the gLite WMS proves to be reliable gLite WMS: Web service (WMProxy) + Condor-G –Management of complex workflows (DAGs) and compound jobs  bulk submission and shared input sandboxes  support for input files on different servers (scattered sandboxes) –Support for shallow resubmission of jobs –Job File Perusal: file peeking during job execution –Supports collection of information from CEMon, BDII, R-GMA and from DLI and StorageIndex data management interfaces –Support for parallel jobs (MPI) when the home dir is not shared –Deployed for the first time in gLite

27 WMS/LB/UI and CE New WMS deployed and thoroughly debugged –CMS: 100 collections * 200 jobs/collection, 3 UIs, 33 CEs  ~ 2.5 h to submit jobs 0.5 seconds/job  ~ 17 hours to transfer jobs to a CE 3 seconds/job 26K jobs/day  Negligible failure rate due to WMS –Shallow resubmission  failure rate drops to less than 1% with 3 resubmissions Stability problems –investigating also other deployment scenarios to make it more robust gLite CE still to be tested and optimized 27 ATLAS CMS

28 High Level Services: Workflows Direct Acyclic Graph (DAG) is a set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs A Collection is a group of jobs with no dependencies –basically a collection of JDL’s A Parametric job is a job having one or more attributes in the JDL that vary their values according to parameters Using compound jobs it is possible to have one shot submission of a (possibly very large, up to thousands) group of jobs –Submission time reduction  Single call to WMProxy server  Single Authentication and Authorization process  Sharing of files between jobs –Availability of both a single Job ID to manage the group as a whole and an ID for each single job in the group 28 nodeE nodeC nodeA nodeD nodeB

29 High Level Services: Job Information Logging and Bookkeeping service –Tracks jobs during their lifetime (in terms of events) –LBProxy for fast access –L&B API and CLI to query jobs –Support for “CE reputability ranking“: maintains recent statistics of job failures at CE’s and feeds back to WMS to aid planning Job Provenance: stores long term job information –Supports job rerun –helps unloading the L&B –Released with gLite

30 Highlights: Job Priorities Applications ask for the possibility to diversify the access to fast/slow queues depending on the user role/group inside the VO GPBOX is a tool that provides the possibility to define, store and propagate fine-grained VO policies –based on VOMS groups and roles –enforcement of policies at sites: sites may accept/reject policies 30

31 Summary gLite 3 is –the next generation middleware for grid computing –developed according to a well defined process  controlled by the EGEE Technical Coordination Group –deployed on the EGEE production infrastructure  More than 200 sites –development is continuing to provide increased robustness, usability, and functionality  On the preview testbed CREAM, Job Provenance, glexec on the WNs, GPBOX –gLite sources: 31

32 32


Download ppt "Www.gisela-grid.eu Grid Initiatives for e-Science virtual communities in Europe and Latin America Grid Computing - DCC/FCUP The middleware."

Similar presentations


Ads by Google