Presentation is loading. Please wait.

Presentation is loading. Please wait.

DKRZ German Climate Computing Center Stephan Kindermann Distributed Data Handling Infrastructures in Climatology and “the Grid”

Similar presentations


Presentation on theme: "DKRZ German Climate Computing Center Stephan Kindermann Distributed Data Handling Infrastructures in Climatology and “the Grid”"— Presentation transcript:

1 DKRZ German Climate Computing Center Stephan Kindermann Distributed Data Handling Infrastructures in Climatology and “the Grid”

2 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Talk Context: From climatology to grid infrastructures Climatology: Climatology is the study of climate, scientifically defined as weather conditions averaged over a period of time and is a branch of the atmospheric sciences (Wikipedia) We concentrate on the part of climatology dealing with complex global climate models and especially on the aspect of data handling:  Climatology  Global Climate Models  HPC computers (Intro part of talk)  huge amount of model data  data handling infrastructure  grid (Main focus of talk)

3 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009  Grid- infrastructures: From prototypes towards a sustainable infrastructure  Access to distributed heterogeneous data repositories A national grid project: C3Grid Prototype C3Grid/EGEE integration  An emerging worldwide infrastructure to support intercomparison and management of climate model data

4 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Climate Models and HPC

5 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Intertrans Umzüge wrote: Motivation: Unprecedented environmental change is indisputable – The red areas on these two images show the expansion of seasonal melting of the Greenland ice sheet from 1992 to 2002. – The Yellow line shows the temperature increased by 1ºC from 1900 to 2000

6 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 (One) Question: Environmental change because of antropogenic forcings ?!  Models to understand earth system needed !!

7 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 " Science may be described as the art of oversimplification: the art of discerning what we may with advantage omit." [Karl Popper, “The Open Universe”, Hutchinson, London (1982)] But: The earth system is complex and with many highly coupled subsystems (and often poorly understood coupling effects)  The need for (complex) coupled General Circulation Models (GCMs) requiring tightly coupled HPC ressources

8 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Complex Earth System Models: Components

9 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Atmosphere GCM DynamicsECHAM5 +Physics AerosolsHAM(M7) Ocean+Ice GCM DynamicsMPI-OM +Physics Biogeochem.HAMOCC/DMS Land model HydrologyHD VegetationJSBACH Example: The COSMOS Earth System Model COSMOS: Community Earth System Model Initiative (http://cosmos.enes.org)

10 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009  The complexity of models is increasing

11 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Increasing Complexity, increasing computing demands

12 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Complexity is just one dimension.. ! Disagreement about what terms mean What is a model? What is a component? What is a coupler? What is a code base?

13 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Thus the need for dedicated HPC ressources...

14 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 The DKRZ: A national facility for the climate community (providing compute + data services)

15 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 The german climate computing centre: DKRZ  DKRZ in Europe unique as a national service in its combination of HPC Data services Applications consulting  Non profit organization (gmbH) with 4 shareholders: MPG (6/11), HH/UniHH: (3/11), GKSS (1/11), AWI (1/11); investment costs  BMBF (until now)  Hamburg „centre of excellence“ for climate related studies

16 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 A brand new building..

17 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 252x32 IBM System p575 Power6 8x 288 port Qlogic 4x DDR IB-Switch.. for a brand new supercomputer Power6-Cluster and HPSS movers nodes connected to the same Infiniband Switches Storage Capacity 10 PB / year Archive Capacity 60 PB Transfer Rates (proposed)‏ 5 GB/s (peak)‏ 3 GB/s (sustained)‏ Data migration from GPFS to HPSS

18 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Compute power for the next generation of climate model runs.. Linpack = 115,9 TFLOPS* 252 Nodes = 8064 Cores‏ 76,4% of peak 152 TFLOP Aggregate transfer rate*  Write: 29 GB/s  Read: 32 GB/s Single stream transfer rate  Write: 1.3 GB/s  Read: 1.2 GB/s Metadata operations  10 k/s – 55 k/s * 12x p575 I/O-Servers

19 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Fine, but ….. Centralized HPC Centers.... Centralized Data Centers.... And where ist the „Grid“ perspective ?? [Ma:07]

20 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009  The Climate Model Data Handling Problem Modeling Centers produce an exponentionally growing amount of data stored in distributed data centers Integration of model data and observation data

21 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Expected growth rate for data archive @ DKRZ We are forced to limit data archiving to ~10 PB/year

22 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Data management for the IPCC Assessment Report  Data Volume  10s of terabytes (10 12 bytes)  Downloads: ~500GB/day  Models  25 models  Metadata  CF-1 + IPCC-specific  User Community  Thousands of users  WG1, domain knowledge  Data Volume  1-10 petabytes (10 15 bytes)  Downloads: 10s of TB/day  Models  ~35 models  Increased resolution  More experiments  Increased complexity (ex: biogeochemistry)  Metadata  CF-1 + IPCC-specific  Richer set of search criteria  Model configuration  Grid specification from CF (support for native grids)  User Community  10s of thousands of users  Wider range of user groups will require better descriptions of data, attention to ease-of-use AR5AR4

23 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Network Traffic, Climate and Physics Data, and Network Capacity (foil from ESG-CET) All Three Data Series are Normalized to “1” at Jan. 1990 Ignore the units of the quantities being graphed they are normalized to 1 in 1990, just look at the long-term trends: All of the “ground truth” measures are growing significantly faster than ESnet projected capacity

24 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009  Problem to access data stored at distributed data centers all over the world  Move computation to data  Infrastructural (grid) support components needed

25 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Collect & Prepare Visualize 4 Analyse Find & Select Distributed Climate Data Model Data Observation Data Analysis Dataset Result Dataset Scenario data 3 2 Data description 1 A typical scientific workflow E-infrastructure components needed to support 1,2,3,4: Data volume “humidity flux” workflow example: Several PB ~3,1TB (300-500 files) ~10,3GB (28 files) ~76 MB ~6MB ~66KB

26 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 E-Science Infrastructures for Climate Data Handling (1) A National Climate Community Grid: The German Collaborative Climate Community Data and Processing Grid (C3Grid) Project

27 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 C3Grid Data and Job Management Middleware D-Grid (SRM, d-cache,..) D-Grid (SRM, d-cache,..) C3Grid: Overview World Data Centers Research Institutes ClimateMareRSATPIKGKSSAWIMPI-M Universities FU Berlin Uni Köln Data Access Interface DWD ISO Discovery Metadata Data + Metadata Workflow Data + Metadata Grid Data / Job Interface ISO 19139 Discovery Catalog Result Data Products + Metadata C3Grid Data Providers Collaborative Grid Workspace (A)(B) ? ! IFM- Geomar DKRZ Portal C3RC

28 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 (A) Finding data Description at aggregate level (e.g. experiment)  Aggregate extent description with multiple verticalExtent sections  Sub-selection in data request C3Grid metadata description based on ISO 19139

29 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 (A) Finding Data: The C3Grid Portal

30 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 (B) Accessing Data: Portal

31 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 primary data primary data base data pre- proc. pre- proc. Compute Resource Compute Resource workspace (B) Accessing Data: Server Side Generic Data Request Web Service Interface Provider specific data access interface Analysis Selection of preprocessing tools Metadata generation Implementation Examples: - DB + Archive Wrapper (DKRZ, M&D) - Data Warehouse (Pangaea) - OGSA-DAI + DB (DWD) -.... Grid based data management metadata netCDF, GRIB, HDF, XML,.. geographical + vertical + temporal + content + file format selection Initial Implementation: WSDL Web Service next: WSRF Web Service

32 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Workflow Processing Workflow Scheduler Workflow Scheduler RIS Compute Resource Compute Resource workspace local resources and interfaces GT4 WS-GRAM Interfaces Preinstalled SW packages (use of „modules“ system) „modules“ info published to Grid Resource Information Service (MDS based) Scheduler controls execution (decision based e.g. on modules info + data avalability) Initial set of fixed workflows integrated in portal Open Issues: workflow composition support  interdependency between processing and data user defined processing  debugging, substantial user support needed; security ! Portal JSDL based workflow description

33 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Portal DIS primary metadata primary metadata primary data primary data base data metadata pre- proc. pre- proc. workspace local resources and interfaces C3Grid Portal DMS Distributed grid infrastructure Workflow Scheduler Workflow Scheduler RIS Compute Resource Compute Resource : Interface Research Institutes ClimateMareRSATPIKGKSSAWIMPI-M Universities FU Berlin Uni Köln DWD IFM- Geomar DKRZ C3Grid Data / Compute Providers World Data Centers

34 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 C3Grid Security Infrastructure: Shibboleth + GSI + VOMS / SAML attributes embedded in grid certificates … I omit details in this talk..

35 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 E-Science Infrastructures for Climate Data Handling (2) Climate data handling in an international Grid infrastructure:The C3Grid / EGEE Prototype

36 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Collect & Prepare Visualize 4 Analyse Find & Select AWI, GKSS, … World Data Centers Analysis Dataset Result Dataset DKRZ 3 2 1 C3Grid: community specific tools and agreements Standardized data description Uniform data access with preprocessing functionality Grid based data delivery EGEE: Approved international grid infrastructure mature middleware secure and consistent data management established 7-24 support infrastructure C3Grid Middleware

37 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Bridging EGEE and C3 EGEE UI C3Grid data interface Climate Data Workspace Webservice Interface SECE WN LFC Catalog Web Portal C3 Lucene Index OAI-PMH server Webservice Interface OAI-PMH server AMGA Metadata Catalog (f) Publish (ISO 19115/19139) (g) Harvest (OAI-PMH) German Climate Data Providers: WDC Climate WDC RSAT WDC Mare DWD AWI PIK IFMGeomar MPI-Met GKSS Data Resource Metadata (a) Publish (ISO 19115/19139) (b) Harvest (OAI-PMH)

38 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Finding Data

39 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Accessing Data EGEE UI Data Resource C3Grid data interface Climate Data Workspace Webservice Interface SECE WN LFC Catalog Web Portal C3 Lucene Index Webservice Interface OAI-PMH server OAI-PMH server AMGA Metadata Catalog (1) Find & Select (2) Collect & Prepare (b) Retrieve (jdbc or archive) (c) Stage & Provide Webservice Interface (a) Request (webservice) (d) notify Webservice Interface (f) Transfer & Register (lcg-tools) (e) Request (webservice) (g) Register ( Java-API) Metadata (f) Publish (ISO 19115/19139)

40 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Trigger qflux workflow EGEE UI Data Resource Metadata C3Grid data interface Climate Data Workspace Webservice Interface SECE WN (3) Analyse LFC Catalog (4) Visualize Web Portal C3 Lucene Index Webservice Interface OAI-PMH server OAI-PMH server AMGA Metadata Catalog Webservice Interface (b) submit (glite) qflux (a) Request (webservice) (g) Harvest (OAI-PMH) (f) Publish (ISO 19115/19139) (c) retrieve (lcg-tools) (e) Return graphic (d) Update (Java-API)

41 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009  The context: Climate models and HPC A national climate research facility: The DKRZ  Climate data handling e-/grid- infrastructures  Bridging Heterogenity: Access to distributed data repositories A national grid project: C3Grid Prototype C3Grid/EGEE integration  An emerging infrastructure to support intercomparison and management of climate model data (in the context of CMIP5 and IPCC AR5) Talk Overview

42 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Motivation (1): Different models, different results CCMa ECHAM GFDL HADCM Change in mean annual temperature (°C) SRES A2

43 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Motivation (2): Complexity adds uncertainty and new data intercomparison requirements ! Friedlingstein et al., 2006 „Carbon cycle feedbacks are likly to play a critical role in determining the atmospheric concentration of CO2 over the coming centuries (Friedlingstein et al. 2006; Denman et al. 2007; Meehl et al. 2007)” – taken from Climate-Carbon Cycle Feedbacks: The implications for Australian climate policy, Andrew Macintosh and Oliver Woldring, CCLP Working Paper Series  Coupled Carbon Cycle Climate Model Intercomparison Project

44 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 The Climate Model Intercomparison Project (CMIP) There are different, highly complex global coupled atmosphere-ocean general circulation models (`climate models‘) They provide different results over next decades and longer timescales  Intercomparisons necassary to discover why and where different models give different output or detect ‚consensus‘ aspects.  The World Climate Research Programme`s Working Group on Coupled Modelling (WGCM) proposed and developed CMIP (now in phase 5) CMIP5 will provide the basis for the next Intergovernmental Panel on Climate Change Assessment (AR5), which is scheduled for publication in 2013

45 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Data management for the IPCC Assessment Report  Data Volume  10s of terabytes (10 12 bytes)  Downloads: ~500GB/day  Models  25 models  Metadata  CF-1 + IPCC-specific  User Community  Thousands of users  WG1, domain knowledge  Data Volume  1-10 petabytes (10 15 bytes)  Downloads: 10s of TB/day  Models  ~35 models  Increased resolution  More experiments  Increased complexity (ex: biogeochemistry)  Metadata  CF-1 + IPCC-specific  Richer set of search criteria  Model configuration  Grid specification from CF (support for native grids)  User Community  10s of thousands of users  Wider range of user groups will require better descriptions of data, attention to ease-of-use AR5AR4

46 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 An emerging world wide infrastructure for climate model data intercomparison The scene:  CMIP5 / IPCC AR5  ESG-CET (Earth System Grid – Center for enabling technologies)  IS-ENES and Metafor FP7 programs

47 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 IPCC Core Gateways (Tier 1) Data Nodes GB (BADC) US (PCMDI) DE (WDCC) DKRZ Data nodes: holding data from individual modeling groups Gateways: search, access services to data oftenly co-allocated to (big) data nodes roadmap: Curator+ESG in US, Metafor+IS-ENES in Europe Core Nodes: providing CMIP5 defined CORE data (on rotating disks) roadmap: several in US, two in Europe (BADC, WDCC) and one in Japan The CMIP5 federated architecture: Federation is a virtual trust relationship among independent management domains that have their own set of services. Users authenticate once to gain access to data across multiple systems and organizations.

48 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 CMIP5: > 20 modelling centres > 50 numerical experiments > 86 simulations (total ensemble members) within experiments > 6500 years of simulation > Data to be available from “core- nodes” and “modelling-nodes” in a global federation. > Users need to find & download datasets, and discriminate between models, and between simulation characteristics. CMIP5, IPCC-AR5, Timeline: - Simulations Starting in mid-2009. - Model and Simulation Documentation needed in 2009 (while models are running). - Data available: end of 2010 - Scientific Analysis, Paper Submission and Review: early to mid 2012 (current absolute deadline, July). - Reports: early 2013!

49 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

50 An emerging world wide infrastructure for climate model data intercomparison The scene:  CMIP5 / IPCC AR5  ESG-CET (Earth System Grid – Center for enabling technologies)

51 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Architecture of AR5 federation based on ESG AR5 ESG Gateway (PCMDI) user registration security services monitoring services metadata services notification services services startup/shutdown OPeNDAP/OLFS (aggregation) product server publishing (harvester) storage management backend analysis and vis engine workflow metrics services replica location services replica management ESG Node (GFDL) access control HTTP/FTP/GFTP servers metrics services publishing (extraction) OPeNDAP/OLFSOPeNDAP/BS backend analysis and vis engine monitoring info provider storage management disk cache deep archive data online data ESG node ESG Gateway (CCES) centralized metrics services ESG node ESG Gateway (CCSM) centralized security services Analysis Tool browser data provider service API service implementation client Global sevices ARS mandatory component FEDERATION JOIN ESG-CET architecture Publication GUI

52 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Web based single sign on (SSO): Authentication based on OpenID Authorization based on Attribute Service Details omitted in this talk.. Security infrastructure:

53 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 That`s the basic technology, but ….. … to compare model data we need a common understanding / common language …. Metadata definition in the Metafor FP7 project (EU) ( Metafor is cooperating with the US metadata initiative - the Earth System Curator project)

54 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Metafor: Metadata Definition An activity uses software to produce data to be archived in a repository.

55 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Metafor is also defining a common vocabulary..

56 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 An emerging world wide infrastructure for climate model data intercomparison The scene:  CMIP5 / IPCC AR5  ESG-CET (Earth System Grid – Center for enabling technologies)  The Metafor FP7 project  European deployment: The IS-ENES FP7 project

57 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Infrastructure for European Network for Earth System Modelling (IS-ENES) IS-ENES will provide a service for models and model results both to modelling groups and to the users of model results, especially the impact community. Joint research activities will improve: – efficient use of high-performance computers – model evaluation tool sets – access to model results – climate services for the impact community. Networking activities will – increase the cohesion of the European ESM community – advance a coherent European Network for Earth System modelling. A 4 year, FP7 project, starting March 2009 Led by IPSL, 20 partners

58 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 IS-ENES data services Core data nodes Large data node Ancillary data node Supercomputer Server cluster  v.E.R.C.: virtual Earth System Resource Centre Enhancing European data services infrastructure – OGC service infrastructure – Access to distributed data and processing resources – Integration into CMIP5 federation

59 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Summary: Infrastructure building for climate model data intercomparison E-Infra CMIP5 / AR5 Metafor IS-ENES ESG-CET A big problem: climate model data management A technology provider:.. + „grid“ Common portal + resource sharing A community vocabulary and a common conceptual model community nebula  community e-infrastructure !?

60 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009  Is this only for the climate model community ?  What about related communities ?

61 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Climate Impact Community IPCC Core (Tier 0) Gateways (Tier 1) GB (BADC) US (PCMDI) DE (WDCC) DKRZ Data Nodes (Tier 2) International climate model data federation IS-ENES Portal Impact Community Portal IS-ENES plan: OGC interfaces Analysis services.. A long way to go towards standardized interfaces / services..

62 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 IPCC Core (Tier 0) Gateways (Tier 1) GB (BADC) US (PCMDI) DE (WDCC) DKRZ C3Grid data nodes WDC RSAT WDC Mare DWD …. Data Nodes (Tier 2) International climate model data federation (IPCC AR5) Datenlebenszyklus- verwaltung Datenlebenszyklus- verwaltung Workflow- Management Portal Virtual Workspace C3Grid Infrastructure Uni Köln DKRZ … Climate model data analysis in the (proposed) C3-INAD project

63 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Summary (1): Infrastructure building/using experience (Prototype) Grid-Infrastructures C3Grid (in context of D-Grid), C3Grid/EGEE  heterogeneous data integration, few users by now New infrastructure building effort for a highly demanding community problem: CMIP5/IPCC data federation and associated e-infra initiatives  community specific e-infrastructure components, lots of users, a „must not fail“ project..!

64 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Summary (2): A social perspective The term scientific cyberinfrastructure refers to a new research environment. BUT: a cyberinfrastructure is NOT only technical. A cyberinfrastructure is also an infrastructure with heterogeneous participants (informatics, domain scientists, technologists, etc.), organizational and political practices and social norms. Therefore developing cyberinfrastructure is a technical and a social endeavor! [Thanks to Sonja Palfner (TU Darmstadt) for the following foils]

65 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 „Speaking of cyberinfrastructure as a machine to be built or technical system to be designed tends to downplay the importance of social, institutional, organizational, legal, cultural, and other non-technical problems developers always face.“ (Edwards et al. 2007: 7)

66 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Example: Monitoring, Modeling and Memory: Dynamics of Data and Knowledge in Scientific Cyberinfrastructures (2008-2011)  The project investigates different cases of cyberinfrastructure developments: Long Term Ecological Research Network, the Center for Embedded Networked Sensing, the WATer and Environmental Research Systems Network, and the Earth System Modeling Framework.  Objective: to understand how scientists actually create and share data in practice, and how they use it to create new knowledge. (www.si.umich.edu/~pne/mmm.htm) The National Science Foundation (NSF) pays attention to this complexity of cyberinfrastucture developments.

67 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 What can social sciences bring to cyberinfrastructure developments?  Reflection on the social challenges and problems within cyberinfrastuctures.  Making the social, political and cultural dimensions visible.  Understanding the larger national and transnational context of cyberinfrastructures in different scientific cultures.  Analyzing the conditions for successful cyberinfrastucture projects and „best practices“.  Social scientists can „act as honest brokers between designers and users, explaining the contingencies of each to the other and suggesting ways forward“. (Edwards et al. 2007:34)

68 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Thank You !

69 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Appendix – Additional foils …

70 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Time Level Variable [David Viner, CRU] High volume „gridded“ datasets self describing („container“) data formats (netcdf, grib, HDF)

71 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

72 AA Data access and security Portal DIS DMS Workflow Scheduler Workflow Scheduler Distributed grid infrastructure RIS AA primary metadata primary metadata primary data primary data base data metadata pre- proc. pre- proc. Compute Resource Compute Resource workspace local resources and interfaces : Interface Research Institutes ClimateMareRSATPIKGKSSAWIMPI-M Universities FU Berlin Uni Köln DWD IFM- Geomar DKRZ C3Grid Data / Compute Providers World Data Centers AA single sign on support of users without grid certificates federated identity management X509 grid certificates (EU-GridPMA CA) Grid security infrastructure (GSI) legacy AA infrastructure (LDAP, DB based,..) legacy data access infrastructure

73 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Identity Provider Home Organisation Identity Provider Virtual Organisation MyProxy Delegation Service Grid Service Grid Resource GRAM / DataRAM C3Grid Middleware GridShib SAML tools wflow client SLCS (CA) SLCS (CA) X509 Grid- proxy GridShib for GT policy Portal <..SAML Assertions..> SAML Personal / Group Account „Home attributes + VO attributes“ WAYF

74 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Visualize selected result Collect & Prepare a temporal and spatial subset of the data 4 Analyse the integrated, transport of humidity between selected levels Find & Select relevant & available datasets Distributed Climate Data Analysis Dataset Result Dataset Wind speed 3 2 1 Temperature Specific humidity I want to control where my job is running !! Uniform discovery for these data centers nice, but I also need data from …. I need version xx of yy and … I want to know exactly what`s happening, e.g. need reproducable results I don`t want to learn a new job description language, get a certificate to do a simple analysis … Debugging ???!!! What went wrong ??? Data collection is fine, but I don`t need a „grid“ get my results !!

75 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 ESG-CET Earth System Grid - Center for Enabling Technologies (ESG-CET) Will deliver a federation architecture capable of allowing data held at “nodes” to be visible via “gateways”. Support for CMIP5 via “modelling-nodes” and “core-nodes”, with the former holding all the data from one modelling group, and the latter holding the CMIP5 defined “core” data. Expect multiple “core-nodes”, with two in Europe (BADC, WDCC), several in the US, and one in Japan. Expect multiple gateways (Metafor+IS-ENES in Europe, Curator+ESG in US). ESG being lead by U.S. Program for Climate Model Diagnosis and Intercomparison (PCMDI) at Lawrence Livermore National Laboratory

76 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

77 Compare !! What ??? Disagreement about what terms mean  What is a model?  What is a component?  What is a coupler?  What is a code base?  What is a (canonical) dataset (data-aggregate)  What is a model configuration ? Little or no documentation of the “simulation context” (the whys and wherefores and issues associated with any particular simulation Need to collect information from modelling groups !!!!

78 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Metafor Common Metadata for Climate Modelling Digital Repositories http://metaforclimate.eu SEVENTH FRAMEWORK PROGRAMME Research Infrastructures INFRA-2007-1.2.1 - Scientific Digital Repositories METAFOR describes activities, software, and data involved in the simulation of climate so that “models” can be discovered and compared between distributed digital repositories

79 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 “scientific” words end up in controlled vocabularies definitions of “other” end up in description choices end up in values.. and METAFOR is Responsible for CMIP5 metadata questionnaire [from Bryan Lawrence]

80 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

81 UI C3Grid data interface Climate Data Workspace Webservice Interface SECE WN LFC Catalog Web Portal C3 Lucene Index OAI-PMH server Webservice Interface OAI-PMH server AMGA/… Metadata Catalog Publish (ISO 19115/19139) Harvest (OAI-PMH) Data Resource Metadata Publish (ISO 19115/19139) Harvest (OAI-PMH) Webservice Interface Download, upload & analysis incl. republishing (webservice) Download, preprocessing & analysis (webservice) Nice early prototype, but.. Community ?? Users ??..

82 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 further info: www.c3grid.de kindermann@dkrz.de

83 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Pink = core, yellow = tier 1, green = tier 2. “A Summary of the CMIP5 Experiment Design” Lead authors: Karl E. Taylor, Ronald J. Stouffer and Gerald A. Meehl. 31 December 2008 The Climate Model Intercomparison Project CMIP5

84 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 Location Various data centers & portals Institutional storage & computing facilities local facilities Personal Computer Visualize selected result A concrete example: “qflux” Collect & Prepare a temporal and spatial subset of the data 4 Analyse the integrated, transport of humidity between selected levels Find & Select relevant & available datasets Distributed Climate Data Analysis Dataset Result Dataset Wind speed 3 2 1 Temperature Specific humidity Datavolume Several PB ~3,1TB (300-500 files) ~10,3GB (28 files) ~76 MB ~6MB ~66KB

85 ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009 A common metadata description (Simplified) System Overview Graphical User Interface Metadata Simulation Datasets Models ? Query Model Run Output …… ……… ………… Model Metadata


Download ppt "DKRZ German Climate Computing Center Stephan Kindermann Distributed Data Handling Infrastructures in Climatology and “the Grid”"

Similar presentations


Ads by Google