Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Grid Computing Use case Datagrid Jean-Marc Pierson.

Similar presentations


Presentation on theme: "A Grid Computing Use case Datagrid Jean-Marc Pierson."— Presentation transcript:

1 A Grid Computing Use case Datagrid Jean-Marc Pierson

2 DataGrid : a european effort 9.8 million euros 9.8 million euros + 200 researchers involved + 200 researchers involved CERN, ESA, CNRS, INFN… CERN, ESA, CNRS, INFN… objective : share huge amounts of distributed data over the network infrastructure objective : share huge amounts of distributed data over the network infrastructure developed over Globus Toolkit developed over Globus Toolkit (most figures and material from www.eu-datagrid.org)

3

4 Applications Domain High Energy Physics (HEP), led by CERN, for LHC data High Energy Physics (HEP), led by CERN, for LHC data Biology and Medical Image processing, led by CNRS (France), Biology and Medical Image processing, led by CNRS (France), Earth Observations (EO) led by the European Space Agency. Earth Observations (EO) led by the European Space Agency.

5 Data Grid middleware Five work packages Workload Scheduling and Management Workload Scheduling and Management Data Management Data Management Grid Monitoring Services Grid Monitoring Services Mass Storage Management Mass Storage Management Local Fabric Management Local Fabric Management

6 Workload Scheduling and Management (1) the problems : the problems : dynamic relocation of datadynamic relocation of data very large numbers of schedulable components in the system (computers and files)very large numbers of schedulable components in the system (computers and files) large number of simultaneous users submitting work to the systemlarge number of simultaneous users submitting work to the system different access policies applied at different sites and in different countries.different access policies applied at different sites and in different countries.

7 Workload Scheduling and Management (2) A need for : A need for : planning job decompositionplanning job decomposition and planning task distributionand planning task distribution Planning based on knowledge of the availability and proximity of computational capacity and the required data. Planning based on knowledge of the availability and proximity of computational capacity and the required data. a need for cost estimation tools (delays, data migration, caching...) a need for cost estimation tools (delays, data migration, caching...) Extension of job description languages (JSL) to express data dependencies. Extension of job description languages (JSL) to express data dependencies.

8 Data management goals : goals : to permit secure access of massive amounts of data in a universal global name spaceto permit secure access of massive amounts of data in a universal global name space to move and replicate data at high speed from one geographical site to anotherto move and replicate data at high speed from one geographical site to another to manage the synchronisation of remote data copies.to manage the synchronisation of remote data copies. tools : tools : dynamic automated wide-area data caching and distributiondynamic automated wide-area data caching and distribution generic interface to different mass storagegeneric interface to different mass storage performance and reliability issues associated with the use of tertiary storage will be addressed.performance and reliability issues associated with the use of tertiary storage will be addressed.

9 Monitoring the datagrid goal : goal : to enable transparent monitoring of the use of distributed resources at a large scale.to enable transparent monitoring of the use of distributed resources at a large scale. to assess finely the interplay between computer fabrics, networking and mass storageto assess finely the interplay between computer fabrics, networking and mass storage tools : tools : local monitoring of other middlewareslocal monitoring of other middlewares local monitoring of applications themselveslocal monitoring of applications themselves developping short time and long term information of monitoring (real time+archiving)developping short time and long term information of monitoring (real time+archiving) developping effective means of visual presentation of the multivariate data.developping effective means of visual presentation of the multivariate data.

10 Local fabric management goals : goals : information publication concerning resource availability and performanceinformation publication concerning resource availability and performance mapping of authentication and resource allocation mechanisms to local environmentmapping of authentication and resource allocation mechanisms to local environment self healing : dynamic configuration changes and error recovery strategiesself healing : dynamic configuration changes and error recovery strategies difficulty to scale well : tens of thousands of components difficulty to scale well : tens of thousands of components tools : tools : automatic fault detection and isolation, automatic reconfiguation of the fabric and re-running the tasksautomatic fault detection and isolation, automatic reconfiguation of the fabric and re-running the tasks automatic incorporation of new or updated componentsautomatic incorporation of new or updated components

11 Mass Storage Management goals : goals : to introduce standards for handling LHC data so that they can be exchangedto introduce standards for handling LHC data so that they can be exchanged to spread work to other application fieldto spread work to other application field tools : tools : uniform interface to the very different systems used at different sitesuniform interface to the very different systems used at different sites provide interchange of data and meta- data between sitesprovide interchange of data and meta- data between sites develop appropriate resource allocation and information publishing functionsdevelop appropriate resource allocation and information publishing functions

12 Conclusion Globus, and all its services, had to be extended ! Globus, and all its services, had to be extended ! Datagrid : a first effort for handling huge amounts of data Datagrid : a first effort for handling huge amounts of data Collaborative work ! Collaborative work ! Some key issues are not really treated : Some key issues are not really treated : data security is basicdata security is basic cache management does not use data semanticcache management does not use data semantic Useful for raw data intensive computation and management, not for semantically strong data : Le projet Medigrid ! Useful for raw data intensive computation and management, not for semantically strong data : Le projet Medigrid !


Download ppt "A Grid Computing Use case Datagrid Jean-Marc Pierson."

Similar presentations


Ads by Google