Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Technology B Different Flavors of Grids CERN Geneva April 1-3 2003 Geoffrey Fox Community Grids Lab Indiana University

Similar presentations


Presentation on theme: "Grid Technology B Different Flavors of Grids CERN Geneva April 1-3 2003 Geoffrey Fox Community Grids Lab Indiana University"— Presentation transcript:

1 Grid Technology B Different Flavors of Grids CERN Geneva April 1-3 2003 Geoffrey Fox Community Grids Lab Indiana University gcf@indiana.edu

2 Different Types of Grids Compute and File-oriented Grids “Internet Computing” Grids (Desktop Grids) Peer-to-peer Grids Information Grids: to distinguish between File, database and “Perl Filter” based Grids Semantic Grids Integrated (Hybrid, Complexity) Grids –Bio and Geocomplexity Campus Grids Enterprise Grids

3 Compute and File-oriented Grids Different Grids have different structures Compute/File oriented Grids are well represented by “production part of particle physics” either in –Monte Carlo –Production of Data Summary Tapes This is nearer the “Globus GT2” rather than the “Web Service” vision of the Grid Strongly supported of course by EDG (European Data Grid and Trillium project in the US (Virtual Data Toolkit) Physics Analysis phase of particle physics requires more collaboration and is more dynamic

4 What do HEP experiments want to do on the GRID in the long term ?  Production: n Simulation (Monte Carlo generators). n Reconstruction (including detector geometry …). n Event Mixing (bit wise superposition of Signal and Backgrounds). n Reprocessing (Refinement, improved reconstruction data production). n Production (production of AODs and ESDs starting from Raw data). n Very organized activity, generally centrally managed by prod teams  Physics analysis: n Searches for specific event signatures or particle types. (data access can be very sparse, perhaps on the order of one event out of each million). n Measurement of inclusive and exclusive cross sections for a given physics channel – Measurement of relevant kinematical quantities n I/O not feasible to organize the input data in a convenient fashion unless one constructs new files containing the selected events. n the activities are also uncoordinated (not planned in advance) and (often) iterative.

5 EDG “Compute/File” Grid Work Packages WP1: Work Load (Resource) Management System WP2: Data (Replication/Caching) Management WP3: Grid Monitoring / Grid Information Systems (general meta-data lookup WP4: Fabric Management (software etc. on cluster) WP5: Storage Element (Grid Interface to mass storage) WP6: Security WP7: Network Monitoring

6 Compute/File Grid Requirements I Called Data Grid by Globus team Terabytes or petabytes of data –Often read-only data, “published” by experiments Large data storage and computational resources shared by researchers around the world –Distinct administrative domains –Respect local and global policies governing how resources may be used Access raw experimental data Run simulations and analysis to create “derived” data products

7 Compute/File Grid Requirements II Locate data –Record and query for existence of data Data access based on metadata –High-level attributes of data Support high-speed, reliable data movement –E.g., for efficient movement of large experimental data sets Support flexible data access –e.g., databases, hierarchical data formats (HDF), aggregation of small objects Data Filtering –Process data at storage system before transferring

8 Compute/File Grid Requirements III Planning, scheduling and monitoring execution of data requests and computations Management of data replication –Register and query for replicas –Select the best replica for a data transfer Security –Protect data on storage systems –Support secure data transfers –Protect knowledge about existence of data Virtual data –Desired data may be stored on a storage system (“materialized”) or created on demand

9 Functional View of Compute/File Grid Location based on data attributes Location of one or more physical replicas State of grid resources, performance measurements and predictions Metadata Service Application Replica Location Service Information Services Planner: Data location, Replica selection, Selection of compute and storage nodes Security and Policy Executor: Initiates data transfers and computations Data Movement Data Access Compute ResourcesStorage Resources

10 Layered C/F Grid Architecture

11 C/F Grid Architecture I (from the bottom up) Fabric Layer –Storage systems –Compute systems –Networks Connectivity Layer –Communication protocols (e.g., TCP/IP protocol stack) –Authentication and Authorization protocols (e.g., GSI)

12 C/F Grid Architecture II Resource Layer: sharing single resources –Data Access Protocol or Service (e.g., Globus gridftp) –Storage Resource Management (e.g., SRM/DRM/HRM from Lawrence Berkeley Lab) –Data Filtering or Transformation Services (e.g., DataCutter from Ohio State University) –Database Management Services (e.g., local RDBMS) –Compute Resource Management Services (e.g., local supercomputer scheduler) –Resource Monitoring/Auditing Service

13 C/F Grid Architecture III Collective 1 Layer: General Services for Coordinating Multiple Resources –Data Transport Services (e.g., Globus Reliable File Transfer and Multiple File Transfer Service from LBNL) –Data Federation Services –Data filtering or Transformation Service (e.g., Active ProxyG from Ohio State University) –General Data Discovery Services (e.g., Globus Replica Location Service and Globus Metadata Catalog Service) –Storage management/brokering –Compute management/brokering (e.g., Condor from University of Wisconsin, Madison) –Monitoring/auditing service

14 C/F Grid Architecture IV Collective 2 Layer: Services for Coordinating Multiple Resources that are Specific to an Application Domain or a Virtual Organization –Request Interpretation and Planning Services (e.g., Globus Chimera and Pegasus for Physics Applications and Condor DAGMan) –Workflow management service (e.g., Globus Pegasus) –Application-Specific Data Discovery Services (e.g., Earth Systems Grid Metadata Catalog) –Community Authorization service (e.g., Globus CAS) –Consistency Services with varying levels of consistency, including data versioning, subscription, distributed file systems or distributed databases

15 Composing These Services To Provide Higher-Level Functionality For example, a Grid File System might compose: –Fabric layer: storage components, compute elements –Connectivity layer: security and communication protocols –Resource layer: data access protocols or services and storage resource management –Collective layers: transport and discovery services, collective storage management, monitoring and auditing, authorization and consistency services

16 Peer to Peer Network User Resource Service Routing User Resource Service Routing User Resource Service Routing User Resource Service Routing User Resource Service Routing User Resource Service Routing Peers Peers are Jacks of all Trades linked to “all” peers in community Typically Integrated Clients Servers and Resources

17 ServicesNB Routing Peer to Peer (Hybrid) Grid User Resource Service Routing User Resource Service Routing User Resource Service Routing User Resource Service Routing User Resource Service Routing User Resource Service Routing Dynamic Message or Event Routing from Peers or Servers

18 Peer to Peer Grid Database Peers Peer to Peer Grid A democratic organization User Facing Web Service Interfaces Service Facing Web Service Interfaces Event/ Message Brokers Chapter 18 and 19 Grid Book

19 Entropia: Desktop Grid l Entropia (chapter 12 of book), United Devices, Parabon, SETI@Home etc. have demonstrated “internet Computing” or Desktop Grid very succesfully l Used to be called peer-to-peer computing but that fell out of favor due to Napster’s bad name l Condor has similar types of utility but Entropia optimized for –Huge number of clients –Providing a secure “sandbox” for application to run in which guarantees that application will not harm client

20 Scaling of Entropia Application

21 Entropia Architecture Application Execution on the Entropia System. End-user submits computation to Job Management (1). The Job Manager breaks up the computation into many independent “subjobs” (2) and submits the subjobs to the resource scheduler. In the mean time, the available resources of a client are periodically reported to the Node Manager (a) that informs the Subjob Scheduler (b) using the resource descriptions. The Subjob Scheduler matches the computation needs with the available resources (3) and schedules the computation to be executed by the clients (4,5,6). Results of the computation are sent to the Job Manager (7), put together, and handed back to the end-user (8).

22 Information Grids I Actually nearly all Grids consist of composing access to data with processing of that data in some computer program In Compute/File Grids (Data Grids for Globus), one naturally allowed database access from programs although in some cases dominant access is to files In Information Grids, we consider access to databases but view of course files as a special case of databases Real difference is what tier we are looking at: –Compute/File Grids are looking at “backend resources” –Information Grids are looking at “middle tier” because typically data volumes are not large enough to stress typical middle-tier mechanisms

23 Information Grids II Should use Middle tier where possible and adopt hybrid model with control always in middle tier and using backend only where needed –This would require reworking a lot of tools e.g. Condor should schedule services not jobs Most programming models either specify “program view” or “service view” and do not separate –Developments like GT3 will allow changes but it will take a long time before key tools are implemented in hybrid mode Note Bioinformatics and many other Information Grids only require service view –These applications have in UK e-Science started with “Web Service” and not “Globus” view

24 Raw (HPC) Resources Middleware Database Portal Services System Services Application Service System Services Grid Computing Environments User Services “Core” Grid Service View Program View

25 OGSA-DAI (Malcolm Atkinson Edinburgh) UK e-Science Grid Core Programme Development of Data Access and Integration Services for OGSA http://umbriel.dcs.gla.ac.uk/NeSC/general/projects/OGSA_DAI - Access to XML Databases - - Access to Relational Databases - - Distributed Query Processing - - XML Schema Support for e-Science -

26 DAI Key Services GridDataServiceGDSAccess to data & DB operations GridDataServiceFactoryGDSFMakes GDS & GDSF GridDataServiceRegistryGDSRDiscovery of GDS(F) & Data GridDataTranslationServiceGDTSTranslates or Transforms Data GridDataTransportDepotGDTDData transport with persistence Integrated Structured Data Transport Relational & XML models supported Role-based Authorisation Binary structured files (later)

27 1a. Request to Registry for sources of data about “x” 1b. Registry responds with Factory handle 2a. Request to Factory for access to database 2b. Factory creates GridDataService to manage access 2c. Factory returns handle of GDS to client 3a. Client queries GDS with XPath, SQL, etc 3b. GDS interacts with database 3c. Results of query returned to client as XML SOAP/HTTP service creation API interactions RegistryFactory Grid Data Service Client XML / Relationa l database

28 Composing Components OGSA-DAI Component Data Transport Not clear if this part of OGSA-DAI or should Be composed using “general workflow”

29 Client Relational database Grid Data Service Director y / File system XML databas e Interface transparency: one GDS supports multiple database types

30 Software Availability Available now  Phase 1 prototype of GDS, GDSF & GDSR for XML Java implementations for the axis/tomcat platform and the Xindice database Globus-2 Relational database support  BinX Schema v0.2 www.epcc.ed.ac.uk/gridserve/WP5 An XML Schema for describing the structure of binary datafiles – the power of XML for terabyte files Software Q1 2003  Reference implementation 1  Access & Update XML databases Relational databases To be released as Basic Services in Globus Toolkit 3 umbriel.dcs.gla.ac.uk/NeSC/general/projects/OGSA_DAI/products

31 Advanced Components DB Consumer GDS Client GDT Translation GDS:PerformScript

32 Composed Components

33 Futures of OGSA-DAI Allow querying of distributed databases – this is using Grid to federate multiple databases Grid is “intrinsically” federation technology – need to mimic classic database federation ideas in a Grid language Form composite Schema from integration of those of individual databases (OGSA-DAI allows you to query each database web service to find schema) Decide how to deal with very important case where user view is a complex filter run on database query Hardest when need to dynamically assign resource to perform filter Could view as a “simulation Web Service” outside OGSA-DAI DB Filter WSDL Of Filter

34 “The Semantic Web is an extension of the current Web in which information is given a well-defined meaning, better enabling computers and people to work in cooperation. It is the idea of having data on the Web defined and linked in a way that it can be used for more effective discovery, automation, integration and reuse across various applications. The Web can reach its full potential if it becomes a place where data can be processed by automated tools as well as people” From the W3C Semantic Web Activity statement Semantic Grid starts with the Semantic Web which is a “dream” and a project of W3C Digital Brilliance is phase transition coming from “collective effect” in the Grid Spin Glass. The Hosting environment is the “Ether” The Resources are the Spins The forces are the meta-data linking resources Knowledge (The Higgs) will emerge when we get enough meta-data to force phase transition

35 Resource Description Framework

36 Classical Web Semantic Web Richer semantics

37 OWL Web Ontology Language “The World Wide Web as it is currently constituted resembles a poorly mapped geography. Our insight into the documents and capabilities available are based on keyword searches, abetted by clever use of document connectivity and usage patterns. The sheer mass of this data is unmanageable without powerful tool support. In order to map this terrain more precisely, computational agents require machine-readable descriptions of the content and capabilities of web accessible resources. These descriptions must be in addition to the human- readable versions of that information. The OWL Guide

38 SW Tools Good Tools for recording meta-data (OWL) but not so advanced in looking at their implications

39 Classical Web Classical Grid More computation Semantic Web requires a metadata-enabled Web Where will the metadata come from? How about from the linked rich resources of a virtual organization? A Grid …….

40 Compute ResourcesCatalogsData Archives Information Discovery Metadata delivery Data Discovery Data Delivery Catalog Mediator Data mediator 1. Portals and Workbenches Bulk Data Analysis Catalog Analysis Metadata View Data View 4.Grid Security Caching Replication Backup Scheduling 2.Knowledge & Resource Management Standard Metadata format, Data model, Wire format Catalog/Image Specific Access Standard APIs and Protocols Concept space 3. 5. 6. 7. Derived Collections Astronomy Sky Survey Data Grid Grid is metadata based middleware

41 An Example of RDF and Dublin Core D-Lib Program - Research in Digital Libraries The D-Lib program supports the community of people with research interests in digital libraries and electronic publishing. Corporation For National Research Initiatives 1995-01-07 – Research; statistical methods Education, research, related topics Library use Studies World Wide Web Home Page text/html en

42 Annotations of results, workflows and database entries could be represented by RDF graphs using controlled vocabularies described in RDF Schema and OWL Personal notes can be XML documents annotated with metadata or RDF graphs linked to results or experimental plans Exporting results as RDF makes them available to be reasoned over RDF graphs can be the “glue” that associates all the components (literature, notes, code, databases, intermediate results, sketches, images, workflows, the person doing the experiment, the lab they are in, the final paper) The provenance trails that keep a record of how a collection of services were orchestrated so they can be replicated or replayed, or act as evidence For example…

43 Represent the syntactic data types of e-Science objects using XML Schema data types Represent domain ontologies for the semantic mediation between database schema, an application’s inputs and outputs, and workflow work items Represent domain ontologies and rules for parameters of machines or algorithms to reason over allowed configurations Use reasoning over execution plans, workflows and other combinations of services to ensure the semantic validity of the composition Use RDF as a common data model for merging results drawn from different resources or instruments Capture the structure of messages that are exchanged between components More meta-data …

44 At the data/computation layer: classification of computational and data resources, performance metrics, job control, management of physical and logical resources At the information layer: schema integration, workflow descriptions, provenance trail At the knowledge layer: problem solving selection, intelligent portals Governance of the Grid, for example access rights to databases, personal profiles and security groupings Charging infrastructure, computational economy, support for negotiation; e.g. through auction model And more meta-data …

45 Classical Web Classical Grid Semantic Web Richer semantics More computation Semantic Grid Source: Norman Paton http://www.semanticgrid.org

46 Summary of Grid Types Compute/File Grid: The “Linux workstation view of distributed system” – need planning, scheduling of 10,000’s jobs, efficient movement of data to processors Desktop Grid: as above but use huge numbers of “foreign” compute resources Information Grids: Web service access to meta-data rich data repositories Hybrid (complexity) Grids: Combination of Information and Compute/File Grids Peer-to-peer Grid: Unstructured general purpose access to other style grids Semantic Grid: Enables knowledge discovery in all Grids


Download ppt "Grid Technology B Different Flavors of Grids CERN Geneva April 1-3 2003 Geoffrey Fox Community Grids Lab Indiana University"

Similar presentations


Ads by Google