Presentation is loading. Please wait.

Presentation is loading. Please wait.

Joseph Antony, Andrew Howard, Jason Andrade, Ben Evans, Claire Trenham, Jingbo Wang Production Petascale Climate Data Replication at.

Similar presentations


Presentation on theme: "Joseph Antony, Andrew Howard, Jason Andrade, Ben Evans, Claire Trenham, Jingbo Wang Production Petascale Climate Data Replication at."— Presentation transcript:

1 nci.org.au @NCInews Joseph Antony, Andrew Howard, Jason Andrade, Ben Evans, Claire Trenham, Jingbo Wang Production Petascale Climate Data Replication at NCI – Lustre and our engagement with the Earth Systems Grid Federation (ESGF)

2 nci.org.au MOTIVATION

3 nci.org.au International Climate Change Research – The CMIP projects The UN’s International Panel on Climate Change (IPCC) prepares an intergovernmental assessment report every 6 years This effort requires significant scientific and HPC/HPD resources to back it The most recent of these activities was the Coupled Model Intercomparison Project 5 (CMIP5) The NCI is a major data node within the ESGF federation In this talk I will share with you a ‘view from the coalface’, replicating ~2PB of data

4 nci.org.au

5 CMIP DATA VOLUMES

6 nci.org.au CMIP1 thru CMIP5 Data Volumes Taken from Dean Williams’ ESGF Internet2 presentation, 2014

7 nci.org.au ESGF NODE ARCHITECTURE

8 nci.org.au The ESGF Data Archival and Retrieval System The ESGF is a federated peer-to-peer international data archival and retrieval system Incorporates single- sign-on for end-users It has publication and version management tools Supports data aggregations and can notify users if datasets have been modified

9 nci.org.au THE END-USER PERSPECTIVE

10 nci.org.au The Last-Mile Problem … Data is too large to move onto desktop for analysis – CMIP3 to CMIP5 Users want versioned, curated data to be able to jump right into scientific analysis At NCI – An integrated eco-system exists for data- intensive science Data Repositories Virtual Laboratories – The ICNWG effort to solve the ‘Last Mile Problem’ for networking

11 nci.org.au ICNWG Activities

12 nci.org.au Okay … so where’s Lustre in all of this you ask?

13 nci.org.au Okay … so where’s Lustre in all of this you ask? We use Lustre as our distributed filesystem for a set of dedicated WAN data transfer nodes (DTNs)

14 nci.org.au Okay … so where’s Lustre in all of this you ask? We use Lustre as our distributed filesystem for a set of dedicated WAN data transfer nodes (DTNs) But first a detour …

15 nci.org.au Courtesy Eli Dart, ESnet 1Gbps == 125 MB/sec

16 nci.org.au Courtesy Eli Dart, ESnet

17 nci.org.au Courtesy Eli Dart, ESnet

18 nci.org.au Courtesy Eli Dart, ESnet

19 nci.org.au Courtesy Eli Dart, ESnet

20 nci.org.au

21 AARNet International Links

22 nci.org.au NCI’s DTN Nodes

23 nci.org.au CBR-SYD and onto the CONUS via SXtransport

24 nci.org.au Cable StationNetwork Segment SXtransport – Physical Layout

25 nci.org.au SXtransport – Logical Network Layout

26 nci.org.au What are some of the world’s longest submarine cables you ask? 39,000 Km of submarine fibre

27 nci.org.au What are some of the world’s longest submarine cables you ask? 39,000 Km of submarine fibre 28,900 Km of submarine fibre 1,600 Km of terrestrial fibre

28 nci.org.au Networking Topology for Data Replication Courtesy Mary Hester, ESnet

29 nci.org.au Initial Transfer Rates from NCI Graph shows the data rate vs. the volume of data transferred Different lines in the graph represent how many data streams were required to obtain the given performance. The results of the graph indicate that it is possible to get a line-rate of 1GB/s (8Gbps) between Australia and the United States, however, it requires configuring transfers to run more than 100 parallel streams

30 nci.org.au Data replication and Science DMZs Currently we’ve replicated ~1.5PB Working on improving these rates by employing a Science DMZ model and dedicated data transfer nodes

31 nci.org.au Globus Online Globus Online is a hosted data-transfer-as- a-service offering, run by the University of Chicago It makes the job of large data transfers easy for both instrument owners and end-users

32 nci.org.au Globus Online Architecture

33 nci.org.au

34

35

36

37 Using Dedicated DTNs – January 2015

38 nci.org.au Using Dedicated DTNs – March 2015

39 nci.org.au State of the Union Numbers from the ICNWG Consortium

40 nci.org.au Conclusion Non-trivial to get various ducks lined-up – 10GigE WAN networking – Mellanox tuning work for 10GigE Ethernet and 56Gbp FDR – Being NUMA aware is critical for the GridFTP daemon!

41 nci.org.au THE END

42 nci.org.au VERIFIED, CURATED SCIENTIFIC DATASETS

43 nci.org.au Centralized Quality Control for Data Processing Multi-layered QC – Initial Level 1 QC done at data nodes – DKRZ performs L2 QC – Further metadata and variable checking is done to get to L3 QC At every step, end-users can see the QC Level for their data Replicated data has passed QC Level 3 and receives a DOI


Download ppt "Joseph Antony, Andrew Howard, Jason Andrade, Ben Evans, Claire Trenham, Jingbo Wang Production Petascale Climate Data Replication at."

Similar presentations


Ads by Google