Presentation is loading. Please wait.

Presentation is loading. Please wait.

US Planck Data Analysis Review 1 Julian BorrillUS Planck Data Analysis Review 9–10 May 2006 Computing Facilities & Capabilities Julian Borrill Computational.

Similar presentations


Presentation on theme: "US Planck Data Analysis Review 1 Julian BorrillUS Planck Data Analysis Review 9–10 May 2006 Computing Facilities & Capabilities Julian Borrill Computational."— Presentation transcript:

1 US Planck Data Analysis Review 1 Julian BorrillUS Planck Data Analysis Review 9–10 May 2006 Computing Facilities & Capabilities Julian Borrill Computational Research Division, Berkeley Lab & Space Sciences Laboratory, UC Berkeley

2 2 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 2 Computing Issues Data Volume Data Processing Data Storage Data Security Data Transfer Data Format/Layout Its all about the data

3 3 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 3 Data Volume Planck data volume drives (almost) everything –LFI :  22 detectors with 32.5, 45 & 76.8 Hz sampling  4 x 10 10 samples per year  0.2 TB time-ordered data + 1.0 TB full detector pointing data –HFI :  52 detectors with 200 Hz sampling  3 x 10 11 samples per year  1.3 TB time-ordered data + 0.2 TB full boresight pointing data –LevelS (e.g. CTP “Trieste” simulations) :  4 LFI detectors with 32.5 Hz sampling  4 x 10 9 samples per year  2 scans x 2 beams x 2 samplings x 7 components + 2 noises  1.0 TB time-ordered data + 0.2 TB full detector pointing data

4 4 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 4 Data Processing Operation count scales linearly (& inefficiently) with –# analyses, # realizations, # iterations, # samples –100 x 100 x 100 x 100 x 10 11 ~ O(10) Eflop (cf. '05 Day in the Life) NERSC –Seaborg : 6080 CPU, 9 Tf/s –Jacquard : 712 CPU, 3 Tf/s (cf. Magique-II) –Bassi : 888 CPU, 7 Tf/s –NERSC-5 : O(100) Tf/s, first-byte in 2007 –NERSC-6 : O(500) Tf/s, first-byte in 2010 –Expect allocation of O(2 x 10 6 ) CPU-hours/year => O(4) Eflop/yr (10GHz CPUs @ 5% efficiency) USPDC cluster –Specification & location TBD, first-byte in 2007/8 –O(100) CPU x 80% x 9000 hours/year => O(0.4) Eflop/yr (5GHz CPUs @ 3% efficiency) IPAC small cluster dedicated to ERCSC

5 5 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 5 Processing 0.1 Tf/s ERCSC Cluster 100 Tf/s NERSC 5 (2007) 7 Tf/s NERSC Bassi 3 Tf/s NERSC Jacquard 9 Tf/s NERSC Seaborg 0.5 Tf/s USPDC Cluster 500 Tf/s NERSC 6 (2010)

6 6 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 6 Data Storage Archive at IPAC –mission data –O(10) TB Long-term at NERSC using HPSS –mission + simulation data & derivatives –O(2) PB Spinning disk at USPDC cluster & at NERSC using NGF –current active data subset –O(2 - 20) TB Processor memory at USPDC cluster & at NERSC –running job(s) –O(1 - 10+) GB/CPU & O(0.1 - 10) TB total

7 7 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 7 Processing + Storage 0.1 Tf/s 50 GB ERCSC Cluster 100 Tf/s 50 TB NERSC-5 (2007) 7 Tf/s 4 TB NERSC Bassi 3 Tf/s 2 TB NERSC Jacquard 9 Tf/s 6 TB NERSC Seaborg 0.5 Tf/s 200 GB USPDC Cluster 20/200 TB NERSC NGF 2 TB USPDC Cluster 10 TB IPAC Archive 2/20 PB NERSC HPSS 500 Tf/s 250 TB NERSC-6 (2010)

8 8 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 8 Data Security UNIX filegroups –special account : user planck –permissions _r__/___/___ Personal keyfob to access planck acount –real-time grid-certification of individuals –keyfobs issued & managed by IPAC –single system for IPAC, NERSC & USPDC cluster Allows securing of selected data –e.g. mission vs simulation Differentiates access to facilities and to data –standard personal account & special planck account

9 9 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 9 PLANCK KEYFOB REQUIRED Processing + Storage + Security 0.1 Tf/s 50 GB ERCSC Cluster 7 Tf/s 4 TB NERSC Bassi 3 Tf/s 2 TB NERSC Jacquard 9 Tf/s 7 TB NERSC Seaborg 0.5 Tf/s 200 GB USPDC Cluster 20/200 TB NERSC NGF 2 TB USPDC Cluster 10 TB IPAC Archive 2/20 PB NERSC HPSS 100 Tf/s 50 TB NERSC-5 (2007) 500 Tf/s 250 TB NERSC-6 (2010)

10 10 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 10 Data Transfer From DPCs to IPAC –transatlantic tests being planned From IPAC to NERSC –10 Gb/s over Pacific Wave, CENIC + ESNet –tests planned this summer From NGF to/from HPSS –1 Gb/s being upgraded to 10+ Gb/s From NGF to memory (most real-time critical) –within NERSC  8-64 Gb/s depending on system (& support for this) –offsite depends on location  10Gb/s to LBL over dedicated data link on Bay Area MAN –fallback exists : stage data on local scratch space

11 11 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 11 PLANCK KEYFOB REQUIRED Processing + Storage + Security + Networks 0.1 Tf/s 50 GB ERCSC Cluster 100 Tf/s 50 TB NERSC-5 (2007) 7 Tf/s 4 TB NERSC Bassi 3 Tf/s 2 TB NERSC Jacquard 9 Tf/s 7 TB NERSC Seaborg 0.5 Tf/s 200 GB USPDC Cluster 20/200 TB NERSC NGF 2 TB USPDC Cluster 10 TB IPAC Archive 2/20 PB NERSC HPSS DPCs 10 Gb/s 8 Gb/s 10 Gb/s ? ? ? ? 64 Gb/s ? ? 500 Tf/s 250 TB NERSC-6 (2010)

12 12 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 12 Project Columbia Update Last year we advertised our proposed use of NASA's new Project Columbia (5 x 2048 CPU, 5 x 12 Tf/s), potentially including a WAN-NGF. We were successful in pushing for Ames' connection to the Bay Area MAN, providing a 10Gb/s dedicated data connect. We were unsuccessful in making much use of Columbia: –disk read performance varies from poor to atrocious, effectively disabling data analysis (although simulation is possible). –foreign nationals are not welcome, even if they have passed JPL security screening ! We have provided feedback to Ames and HQ, but for now we are not pursuing this resource.

13 13 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 13 Data Formats Once data are on disk they must be read by codes that do not know (or want to know) their format/layout: –to analyze LFI, HFI, LevelS, WMAP, etc data sets  both individually and collectively –to be able to operate on data while it is being read  e.g. weighted co-addition of simulation components M3 provides a data abstraction layer to make this possible Investment in M3 has paid huge dividends this year: –rapid (10 min) ingestion of new data formats, such as PIOLIB evolution and WMAP –rapid (1 month) development of interface to any compressed pointing, allowing on-the-fly interpolation & translation –immediate inheritance of improvements (new capabilities & optimization/tuning) by the growing number of M3-based codes


Download ppt "US Planck Data Analysis Review 1 Julian BorrillUS Planck Data Analysis Review 9–10 May 2006 Computing Facilities & Capabilities Julian Borrill Computational."

Similar presentations


Ads by Google