Presentation is loading. Please wait.

Presentation is loading. Please wait.

DKIST Data Center Alisdair Davey NSO

Similar presentations


Presentation on theme: "DKIST Data Center Alisdair Davey NSO"— Presentation transcript:

1 Alisdair Davey adavey@nso.edu NSO
DKIST Data Center Alisdair Davey NSO Steven Berukoff, Tony Hays, Kevin Reardon, DJ Spiess, Fraser Watson and Scott Wiant High-Resolution Solar Physics: Past, Present, Future

2 Data Center Roles The First Light Data Center Data Curation
Data Calibration and Processing

3 Data Center Roles Data Curation Long-term storage management.
The management of data throughout its lifecycle, from creation and initial storage to the time when it is archived for posterity or becomes obsolete and is deleted. The main purpose of data curation is to ensure that data is reliably retrievable for future research purposes or reuse. Long-term storage management. Streamlined, automated data and metadata ingest. Effective query, data retrieval. Flexible system accommodating science / instrument / technological changes without the need for frequent major redesign. Planned 44-year lifetime (2 solar cycles)

4 Solar Physics Mission Data Sizes
The State of Solar Data storage and distribution Courtesy K. Reardon

5 Solar Physics Mission Data Sizes
SDO – 800lb (362kg) Data Gorilla in Room

6 Solar Physics Mission Data Sizes
SDO Other Missions DKIST

7 Data Transfer – Maui -> Boulder
What about 60 TB/day? Challenge: Move data from Telescope to Boulder Mean 9 TB/day over shared 10 Gbps network  8 hours Extensive Testing to identify/fix bottlenecks Partner engagement: Upgrade to 40 Gbps by 2019 Use mature high-bandwidth tools (Globus) Leverage several existing networking providers including U.Hawaii, U. Colorado and Internet 2.

8 Hi-Speed Undersea Cables Connecting Hawaii

9 Hi-Speed Undersea Cables Connecting Hawaii

10 DKIST Data Transportation (Alternate Models)
Never underestimate the bandwidth of a man with a van load of tapes! VS

11 Getting DKIST Data to Boulder
West Coast DolphinNet

12 Alternative Route for Getting Data off the Mountain
Grad Student

13 Data Content Management
CURRENT PLAN Receive FITS files from summit, calibrations process Parse FITS, store metadata in “Inventory” Store serialized FITS in Object Store Maintain Offsite Partial Replica Retain everything until science-driven QA/C done (after > 6 mos.) OBJECT STORAGE – Common Adoption Commercial Cloud storage (S3, Azure, Google Cloud) – Also Facebook, Spotify, Dropbox Industry initiatives (OpenStack Swift) Ceph, implements object storage on a single distributed computer cluster, and provides interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, scalable to the exabyte level. Why not send it to the cloud?

14 Why not send it to the cloud?
We can afford to send it to the cloud.

15 Why not send it to the cloud?
We can afford to send it to the cloud. We may even be able to afford the cloud storage…

16 Why not send it to the cloud?
We can afford to send it to the cloud. We may even be able to afford the cloud storage… But We couldn’t afford to give the data out!

17 Why not send it to the cloud?
We can afford to send it to the cloud. We may even be able to afford the cloud storage… But We couldn’t afford to give the data out! But wait!

18 Why not send it to the cloud?
We can afford to send it to the cloud. We may even be able to afford the cloud storage… But We couldn’t afford to give the data out! But wait! Amazon cloud now has modules you can use to charge people to download your data!! That’s ok right?!

19 The Devils They Know!! Hα and Ca II K data from
Chrotel (KIS) on Tenerife

20 The Devils They Know!! GridFTP Hα and Ca II K data from
Chrotel (KIS) on Tenerife

21 GridFTP GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-area networks. GridFTP is an extension of the File Transfer Protocol (FTP) for grid computing. The protocol was defined within the GridFTP working group of the Open Grid Forum. There are multiple implementations of the protocol; the most widely used is that provided by the Globus Toolkit. You will have to register for an account.

22 Critical Science Plans
40 TiB/ 14 TiB  average / median project data-set size Plans as large as ~240TB Room on your desktop? Data transfer times to you are in days … weeks!

23 Ground Based Solar Data Calibration / Analysis

24 Data Calibration & Processing
Complex observations, complex hardware. Ground-based (atmosphere!), high-resolution, high cadence, small field-of-view observations. Sophisticated instruments (multiple modes, large data rates), broad bandwidth support. Multiple external partners / instrument providers - need to coordinate calibration definition/development process. Complex facility support Nine optical assemblies (excluding instruments) AO, Thermal, Enclosure, Polarization Incomplete prior instrument calibration experience Existing instruments not the same as DKIST instruments. No comparable ground-based data processing systems in solar physics Invent “some” of the wheel…but not all!! New for solar physics but not necessarily new for big data Conclusion: This is novel and complex!

25 CRISP Data Pipeline

26 Data Calibration & Processing
Asynchronous Event Driven Pipeline New Data Arrives Automated Calibration Task is Scheduled If it Completes, Write Results to Data Store If not, notify DC Staff to Fix or Manually Execute … and Executed or Queued … is IDd for Calibration Python <-> IDL Bridge

27 Visible Broadband Imager (VBI)
“Straightforward” Calibration VBI will record images from the DKIST telescope at the highest possible spatial and temporal resolution at a number of specified wavelengths in the range from 390nm to 860 nm. VBI will provide high-quality imaging through filters with relatively broad pass-bands to optimize throughput. Its high cadence and short exposure times comes at the expense of information in the spectral domain. The VBI design will allow exposure times short enough (30 frames/s), to effectively "freeze" the atmospheric turbulence and apply speckle interferometric image reconstruction techniques.

28 Calibration - First Light Data Center
Define core calibrations of detector/instrument Leverage known calibration(s) from existing/previous instruments Instrument Calibration Plans are joint effort between DC & Instrument Partners / Providers Build & publish community-contributed Python/IDL code And not forgetting the algorithm documentation! Expect significant iteration + refinement in early Operations Build in: revision & change control processes; “sandbox” computing to support ongoing development Avoid pipe dream of totally automated calibrations (at least to begin with) Must plan for flexible processing (automatic + user-directed) Create opportunities for others to do big data analytics.

29 Managing Expectations
But I just want to run dkist_prep on all the data! What? You want me to write the paper too? Space based solar physicist version.

30 Managing Expectations
But I want you to invert *ALL* the lines for me! Let me guess, ME isn’t good enough for you either!! Ground based solar physicist version.

31 Managing Expectations
DKIST will be ready for science operations (summit & data center) in Jan. 2020! The community will be enabled to do science with DKIST! We welcome community help! Data Center won’t do everything that you want it to do right from the start! There are significant big data challenges But we think we have them managed!

32 And then … You will have the data you can do big solar data analytics with, albeit you will be left with a few scientific decisions on which inversion routines you want to run! First Light Data Center is the beginning, not the end product. dkist_paper, [/apj],[/science],[/nature] etc. coming in (Thanks to Tom Schad for the suggestion!)


Download ppt "DKIST Data Center Alisdair Davey NSO"

Similar presentations


Ads by Google