Download presentation
Presentation is loading. Please wait.
1
Future Developments in the EU DataGrid
The European DataGrid Project Team
2
Where is the DataGrid project going?
Overview Where is the DataGrid project going? Future developments in the EDG middleware What you can expect to see in future releases of the software WebServices and Open Grid Services Architecture Where Grid computing is heading in the coming years A quick look at the new EDG 2.0 release In this lecture you will have an overview of the incoming features that will be provided by future releases of the EDG middleware and will get a quick overview of the Web Services based distributed computing architecture
3
Workload Management (WMS)
Architecture has been revised Increase reliability and flexibility Simplification (e.g. minimize duplication of persistent information) Easier to plug-in new components that implement new functionality Address some of the shortcomings that emerged in the early releases Better integration with other Grid services (e.g. data management optimization facilities) Favour interoperability with other Grid frameworks Advanced Functionality A coordination between EDG and PPDG (US project) has been established to define a common approach (one of WP1 modules could be RB, which could be called by CondorG) To favor interoperability with other Grid frameworks is particularly important to guarantee interoperability with other Grid frameworks.
4
WMS: New Functionality
Interactive Jobs jobs with continuous feedback of standard streams (stdin, stdout, and stderr) on the UI (submitting) machine Job Check-pointing save the job state via “trivial” check-pointing API, so execution can be suspended and resumed Job Partitioning decompose the job into smaller sub-jobs (executed in parallel) to reduce the elapsed processing time and optimize the usage of Grid resources Job Dependencies program Y cannot start before program X has successfully finished based on Condor DAGMan C++ and Java API, and GUI API provides similar functions to command line GUI does guided-creation of JDL & monitoring the status of jobs over the whole life cycle Interactive Jobs: The connection between WN and UI is always open from the job (we assume OutBoundIP connectivity available from WNs). Not supported: remote signal sending & asynchronous interaction with the job. Possible extensions will be evaluated after first deployment phase. See Condor Bypass (Grid Console) For check-pointed jobs, theintermediate state is that previously saved. Job Partitioning takes place when a job has to process a large set of “independent elements”. In these cases it may be worthwhile to decompose the job into smaller sub-jobs (which can be executed in parallel), in order to reduce the overall time needed to process all these elements, and to optimize the usage of all available Grid resources. At the end each sub-job must save a final state, then retrieved by a job aggregator, responsible to collect the results of the sub-jobs and produce the overall output. This problem is being addressed in the context of job checkpointing and makes large use of the DAGMan mechanism.
5
New features: Deployment of Accounting infrastructure over Testbed
The DGAS (Data Grid Accounting System) will be based upon the following assumptions: A computational economy model will implement an economic-brokering mechanism where Grid-resources are chosen by the Broker according to a cost assigned to them. Users pay in order to execute their jobs on the resources, and the owner of the resources earns credits by executing the user jobs. There are three reasons for this: To have a nearly stable equilibrium able to satisfy the needs of both resource providers and consumers To credit job resources usage to the resource owner(s) after execution To avoid abuse of Grid resources
6
New features: Advance reservation API and Co-allocation API
Advance reservation of resources to realize end-to-end quality of service (QoS) and reduce competition for resources. The approach is based on concepts discussed in the Global Grid Forum. A reservation is a promise from the system that an application will receive a certain level of service from a resource (e.g. a reservation may promise a given percentage of a CPU). Co-allocation allows the concurrent allocation of multiple resources. These resources can be homogeneous or heterogeneous.
7
Authorization Improvements
plans CA certificate cert signing host-cert cert signing ca-certificate crl cert/crl update host-request grid-cert-request cert-request grid-cert-request service user cert.pkcs12 convert VOMS registration proxy-cert grid-proxy-init authz-info same as before - user requests a certificate same as before - user sends the requests to the CA, which sends back the signed certificate same as before - conversion to browser digestable PKCS12 format same as before - host/service requests a certificate same as before - host/service sends the requests to the CA, which sends back the signed certificate same as before - retrival of the trusted CA certificates and their CRLs registration in the EDG VOMS service registration in the EDG VOMS generats a proxy certificate (24 hours lifetime), also fetches authorization information from VOMS generating service certificate (lifetime may vary), also fetches authorization information from VOMS user contacts a service: they exchange their certificates to authenticate each other; the service bases its authorization decision on the extra information from VOMS the service will use the Local Central Authorization Service to fetch additional authorization information (e.g. access control, local policy) and use it to map grid credentials to local credentials (like gridmap file) the service may use its own database for access control information, like Spitfire and SlashGrid, and compare this information with the attributes from the VOMS package service-cert service-init authz-info host/proxy certs and authorization information exchanged LCAS Spitfire /grid
8
Data Management future : Reptor: The Next Generation Replica Manager
Giggle RepMeC Optor GDMP Replica Manager Client Optimization Transaction Consistency File Transfer Postprocessing Reptor, the next generation replica manager, will comprise all components and services an intelligent replication service is required to provide. Apart from standard replication, using a file transfer component such as GridFTP and a replica location service such as Giggle, it will provide components for optimization, transaction, consistency, pre- and postprocessing, and gdmp-like subscription mechanisms. The optimization component tries to find the ‘best’ replica based upon network characteristics, data access costs (staging out from MSS) and the like. Moreover, active replication will be done based upon data access patterns to reduce access latencies. The transaction module will provide fault tolerance, The consistency component will keep replicas consistent with each other, will allow controlled modifications to replicas (via a versioning system), and keeps information stored on replicas (replica catalogs) consistent. The pre- and postprocessing modules allow the execution of user-defined modules prior and after file transfer. Finally, a subscription model similar to GDMP will be provided. Not all of these services will be available in the first release for Testbed 2.0 but the missing ones will consecutively be added; the ones that already are in development and have code associated with them are shown in the dark boxes with the corresponding codenames. Preprocessing Replica Location Subscription Replica Metadata
9
Replica Location Service
Scalable, distributed replica location service (RLS) Local replica catalogs (LRCs) & collective replica location indeces (RLI) LRC holds reliable local state, RLI unreliable collective state with soft-state update RLI RLI A scalable, distributed replica location service (RLS) will be available. This service will be able to deal with thousands of sites and millions of entries. Two main components: local replica catalogs (LRC): hold a consistent state replicas registered locally; and Replica location indices (RLI): hold information on in which LRCs mapping information for a given LFN is store. The information in RLIs is not always consistent with the information stored in LRC due to the soft-state update protocol that is used to communicate LRC information to RLIs. An LRC might be registered with multiple RLIs and RLIs may form a hierarchy. RLI RLI RLI LRC LRC LRC LRC LRC
10
Replica Meta Data Catalog
Code named RepMeC or RMC Built upon existing Spitfire infrastructure Stores all relevant information for the replication services to function Metadata on data Master copy location and locks Versioning data User information Access information Metadata on service internals Transaction steps and locks Consistency layers Subscription based replication info The metadata catalog for the replication services stores all relevant information for the replication services to function. The monitoring data source is basically raw auditing data. The detailed design of the schema for RepMeC is still evolving. The service is a Spitfire Instance with specialized custom calls for frequently used metadata. It is hence easily extensible. It can be replicated or distributed and will make use of the future developments of Spitfire.
11
Replication Services Architecture
User Interface Replica Location Index Replica Location Index Replica Metadata Catalog Replica Location Index Resource Broker Site Site Replica Manager Local Replica Catalog Replica Manager Local Replica Catalog Core API Optimisation API Optimiser Optimiser This picture shows how the services interoperate using the next-generation components. The idea is to have many local subservices on a site (like the Local Replica Catalog) so that there are not too many dependencies on centrally managed services. The Replica Location Indices and the Replica Metadata Catalog may live on the same site or somewhere else, logically they are detached from any one site. Processing API Pre-/Post- processing Pre-/Post- processing Computing Element Storage Element Computing Element Storage Element
12
Deployment For R-GMA / MDS command flow Information flow Site A
Computing Element Storage Element Information Catalogue Resource Broker Site A command flow Information flow LDAP server GOUT Consumer API Tomcat Consumer Servlet Registry Servlet Schema Circular Buffer Producer Servlet GIN Circular Buffer Producer API LDAP CE information provider LDAP SE information provider 1 2 7 12 11 6 3 5 4 8 9 10 13 15 14 16 17 Deployment For R-GMA / MDS R-GMA is a modular system so it can easily be adapted for different uses This diagram illustrates how the components are currently deployed on the development tested (1.3) At present we are in a transition from LDAP to R-GMA so we are using the existing EDG LDAP information providers Currently there is only one information catalogue, this is a combination of the RegistryServlet and SchemaServlet, work is on going to enable the uses of multiple Information Catalogues, One machine at each site is set up to run Tomcat, by default this is the Computing Element GIN (gadget in) is set up on the Storage and Computing Elements GIN publishes information from the LDAP information providers via a Circular Buffer Producer The existence of these GINs are recorded in the registry The Resource Broker is also running Tomcat and it is running GOUT (gadget out) which publishes the information from R-GMA to a local LDAP server
13
Information Systems future : Pulse (in release 1.3)
Pulse is a graphical interface to the R-GMA registry When a table is selected a Simple Query can be issued A Consumer is created within Pulse that retrieves information from R- GMA belonging to the selected table More complex queries can be defined by selecting specific attributes of tables In a composer window, an SQL selection statement corresponding to the actual selection can be further edited by hand The user may choose to have a single result set returned or to have data streamed back.
14
R-GMA & GRM/PROVE GRM – Provides C API for application monitoring
PROVE - Visualisation tool GRM Application is instrumented using the C library Produces statistics about different processes and blocks of code which are written to a trace file At present the trace file has to be passed back for interpretation at the end of the job The future integration with R-GMA will allow the transfer of the trace information during run time
15
PROVE Statistics include execution time for individual processes
execution time for all processes amount of communications for all processes
16
Fabric Management Future Fabric Management developments:
Installation and configuration Management Gridification Monitoring and Fault Tolerance Resource Management In this tutorial we will present what new functionality is expected in the Fabric Management area for the next release.
17
Fabric Installation and Configuration Mgt
Next Release: LCFGng (new generation): Support for RedHat 7.2 Integration with Monitoring After: The LCFG configuration language and compiler will be replaced by new EDG developments This will require changes in existing components everywhere the configuration is accessed. Configuration can be made available across components in a global configuration tree, altogether with private namespaces (current LCFG approach)
18
Fabric Gridification Gatekeeper LCAS: Local Centre AuthZ Service
Policy-based authorization Plug-able framework Separate daemon LCMAPS: Local Credential MAPping Service Maps credentials and roles to local accounts and capabilities Support for AFS, Kerberos tokens Library implementation Enhances gridmapdir Requires modified Gatekeeper Improved error&status handling Getting a useful message to the user Job repository, FLIDS, FABNAT > EDG 2.x Gatekeeper LCAS config TLS auth ACL IPC timeslot LCAS client gridmap LCMAPS lib LCMAPS The “plain” Globus-provided gatekeeper accepts jobs from the outside world, verifies that the user certificate has been signed by a trusted certification authority, and subsequently calls “gss_assist_gridmapfile” to obtain local account information. The grid-mapfile has a two-fold role in this process: it defines the authorized users based on their certificate (proxy) subject name AND it gives the local uid to be used for this user. LCAS plugins are typically: a banned-user list (so as to quickly dispose of users that abuse the system), CPU budget and quota checks (based on the RMS information), and jobtype/class/role policies as expressed in more complex GridACLs. The LCAS system in implemented as a stand-alone daemon because many of these plugins are complex systems that are difficult to audit for secdurity leaks (the gatekeeper runs as root) LCMAPS will obtain any local credentials for the user’s current role: a UNIX uid/gid, requires AFS tokens, Kerberos tickets, etc. This system extends the current gridmapdir functionality with supprot for roles and role-specific mappings. The implementation of LCMAPS is a library, since the process of changing uid/gid and setting credentials requires in-process privileges. The mapping creaded by LCMAPS is commited in the job repository for later retrieval by, e.g., slashgrid. The job repository, FLIDS and FABNAT are scheduled for later releases. For the job repository, the current Globus-provided solution is adequate for the time being. Some acronyms: TLS – Transport Level Security (TLSv1 is equal to SSL v3) LCAS – Local Centre Authorization Service LCMAPS – Local Credential MAPping Service FLIDS – Fabric-Local Identity Service (local quasi-CA) FABNAT – Fabric Network Address Translation gateway IPC – Inter-Process Communication AFS – Andrew File System apply creds * config role2uid Jobmanager-* role2afs * And store in job repository
19
OGSA: A major development in distributing computing resources and services
A new conceptual framework to distribute computing and services bringing together aspects of web services and grid computing The Open Grid Services Architecture is based on the definition of a Web Service as a set of related application functions that can be programmatically invoked over the Internet. To invoke a Web service, applications make use of the service definition information in a Web Services Description Language (WSDL) document Work on the impact and the possible implementation of an OGSA-based Grid is being carried out ( to define possible architectural frameworks and agree on standards ) within GGF OGSA has been originally proposed by IBM in It grounds the architecture of distributed computing upon the idea of enabling a universal, open set of services description protocols and internet accessed services to standardize the fundamentals process of machine-to-machine communication in a widely distributed network. It naturally addresses the increasing demands of distributing services all over the internet, in a similar way like hardware resourcs, data bases, information has already spread all over the internet accessible world. Once services are distributed, one needs information about where to find them, what do exactly thay provide And which protocols are required to access them. The overlap, the similarity and the close relationships and implications on to the idea of a GRID as it was conceived In its original form are impressively relevant. In this sense the GRID will for sure be affected by the “OGSIfication” of its founding concepts, its architecure, which will be More and more embedded in the OGSA framework for distributed services access on the internet.
20
The Web Service architecture
Three primary players , pillars 1. Providers of the services 2. Directory functions, i.e. Service Broker 3. Service Requesters SOAP (Simple Object Access Protocol) interconnects 1,2,3 WSDL (Web Services Description Language) UDDI (Universal Description, Discovery & Integration)
21
OGSA Features vs. Web Services
Web Services is a conceptual framework to access services to build dynamic applications over the internet, have them executed Dynamic (in the WS scheme) means here we do not necessarily know the format of all the information which will be involved along the path done by our application while executing, but we will access this information anyhow. This is done through a query to the UDDI directory. OGSA is further concerned by the creation of transient instances of web services, by the management of service instances, to address the real issue of creating and destroying dynamically accessible interfaces to the states of distributed activities.
22
OGSA and DataGrid Globus has announced that the next major version of their toolkit (version 3) will be based on OGSA structure Beta release foreseen for Spring 2003 DataGrid members are participating to the OGSA specifications Mapping between existing DataGrid middleware components and OGSA and being defined and we are following closely the evolution of OGSA
23
A quick look at the new EDG 2.0 release
WorkLoad Management System Revised , improved architecture New functionality Improved stability and scalability Data Management Network usage optimization Revised architecture, new Replica Manager Information Systems New architecture ( R-GMA replacing MDS LDAP-based) Improved stability and reliability
24
New functionalities introduced
User APIs Including a Java GUI “Trivial” job checkpointing service User can save from time to time the state of the job (defined by the application) A job can be restarted from an intermediate (i.e. “previously” saved) job state Presented at the EDG review Gangmatching Allow to take into account both CE and SE information in the matchmaking For example to require a job to run on a CE close to a SE with “enough space” Output data upload and registration Possible to trigger (via proper JDL attribute) automatic data upload and registration at job completion
25
New functionalities introduced
Support for parallel MPI jobs Parallel jobs within single CEs Grid accounting services Just for economic accounting for Grid User and Resources The users pay for resource usage while the resources earn virtual credits executing user jobs No integration with “brokering” module Support for interactive jobs Jobs running on some CE worker node where a channel to the submitting (UI) node is available for the standard streams (by integrating the Condor Bypass software)
26
JDL in release EDG 2.0 Type JobType Executable Arguments StdInput
“Job” | “DAG” | “Reservation” | “Co-allocation” (only “Job” supported in Release 2.0) JobType "Normal" "MPICH" "Interactive" "Checkpointable“ "Partitionable“ (not yet supported in Release 2.0) Executable Arguments StdInput StdOutput StdError InputSandbox OutputSandbox -- new attributes are in red --
27
WMS Command Line Interface in EDG 2.0
edg-job-submit [options] <jdl file> Options: --help, version --input, -i <input file path> --resource, -r <CE Identifier> --hours, -h <hours num> --nomsg --output, -o <output file path> --noint --debug --logfile <log file path> --config, -c <config file path> --vo <vo name> --config-vo <config-vo file path> --chkpt <chkpt file path>
28
DM: Rel 2.0 Naming : Everything is a URI
All file names are valid URIs, i.e. they have a scheme and a scheme-specific part, separated by a colon. URLs are also valid URIs. URI jargon: [scheme:]scheme-specific-part[#fragment] [scheme:][//authority][path][?query][#fragment] User’s view : Logical File Name. Scheme = lfn lfn:some_string restrictions: being a valid URI excludes certain characters like space, $, :, %, etc. These may be escaped using % (%20 is the space character). Catalog view : GUID. Scheme = guid guid:73e16e74-26b0-11d7-b1e0-c5c68d88236a
29
Naming (Cont.) RLS stores SURL – GUID mappings
Storage Resource Manager view: Site File Names and Site URL. Scheme : srm srm://srmhost.domain/path/to/data/file.dat This is called a SURL. The SFN is just the path part of this URI, i.e. path/to/data/file.dat in this case. RLS stores SURL – GUID mappings RMC stores GUID – LFN mappings
30
User’s Command Line Interface
The EDG Replica Manager is the user’s interface to all Data Management functions. It is installed on UI and WN in $EDG_LOCATION/bin/edg-replica-manager edg-rm as a short-hand. Command similar to cvs: edg-rm [options] command [command-options]
31
Options for all commands
General options for all commands are: -h, --help print a help screen on all commands or help on commands if command is given -v, --verbose print informative messages while executing -i, --insecure connect insecurely to the services --vo=VO set the vo to ‘VO’. Necessary for insecure operations and until we integrate with VOMS currently this option is mandatory. --config=file load configuration from ‘file’ instead of the default config file at $EDG_LOCATION/etc/edg-replica- manager/edg-replica-manager.conf
32
Bringing a file in the Grid
Make a file available in the grid: copyAndRegisterFile sourceURL [command-options] Copy the file given by source and register it in the RLS. The command will return the GUID of the registered file. Valid source URL schemes are file, gsiftp, http, ftp. For a local file you have to specify the full file path using the file scheme. Short form: cr, creg Options -d destination specify the host of the destination SE or even the full URL of the destination SURL. -l logical-filename specify a logical file name -p protocol specify the protocol to be used (currently only gsiftp) -n nstreams specify the number of streams to be used (def: 8) Example edg-rm --vo=wpsix cr file:/home/pkunszt/test.dat –l lfn:pkunszt-test1
33
The work is not finished!
Outlook The work is not finished! Last year of EDG project will be at least as stimulating and challanging as the first two years. Release 2.0 planned for next month Advances are planned for all aspects of the EDG middleware. The project is following the development of the OGSA paradigm for distributed computing.
34
Interaction with Sister Projects
CrossGrid Using the same security certs. Testbed sites install EDG software Extending it for needs of intensive interactive applications Participating in the EDG testing activities Representatives in each projects architecture & management groups DataTAG (EDT) EDT is deploying EDG sw to investigate inter-operability with US projects (iVDGL, GriPhyN, PPDG) Results feedback into EDG software releases (e.g. GLUE compatible information providers/consumers) NorduGrid Using the same security certs. Involved in EDG architecture work Good ideas for gatekeeper and MDS configuration Helped develop GDMP and GSI extensions for Replica Catalog Involved in Glue schema work Security policy Mware testing Working in WP8 (HEP applications) iVDGL/GriPhyN/PPDG US members in EDG architecture group Looking for common packaging and toolkit usage solutions GriPhyN PPDG iVDGL No strict boundaries with a large cross-fertilization of ideas, software and people DataGRID is learning from the experiences in these projects
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.