We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byZaria Bilton
Modified over 4 years ago
© Crown copyright Met Office Met Office Unified Model I/O Server Paul Selwood
© Crown copyright Met Office I/O Server motivation
© Crown copyright Met Office Some History… I/O has always been a problem for NWP, more recently for climate ~2003 – application level output buffering ~2008 – very simple, single threaded I/O servers added for benchmarking Intercepted low-level “open/write/close” Single threaded Some benefit, but limited Not addressed scaling issues – message numbers
© Crown copyright Met Office Old UM I/O – Restart Files
© Crown copyright Met Office Old UM I/O - Diagnostics
© Crown copyright Met Office Why I/O Server approach? Full parallel I/O difficult with our packing “Free” CPUs available “Spare” memory available Chance to re-work old infrastructure Our file format is neither GRIB or netCDF.
© Crown copyright Met Office Diagnostic flexibility Variables (primary and derived) Output times Temporal processing (e.g. accumulations, extrema, means) Spatial processing (sub-domains, spatial means) Variable to unit mapping Basic output resolution is a 2D field
© Crown copyright Met Office Key design decisions Parallelism over output streams Output streams distributed over servers Server is threaded “Listener” receives data & puts in queue “Writer” processes queue including packing Ensures asynchronous behaviour Shared FIFO queue Preserves instruction order Metadata/Data split Data initially stored on compute processes Data of same type combined into large messages
© Crown copyright Met Office Parallelism in I/O Servers Multiple I/O streams in typical job I/O servers spread among nodes Can utilise more memory Will improve bandwidth to disk
© Crown copyright Met Office Automatic post-processing Model can trigger automatic post-processing Requests dealt with by I/O Server FIFO queue ensures integrity of data
© Crown copyright Met Office How data gets output ComputeI/O ListenerWriter Thread 0 Thread 1
© Crown copyright Met Office I/O Server development Initial version – Synchronous data transmission Asynchronous diagnostic data Asynchronous restart data Amalgamated data Asynchronous metadata Load balancing Priority messages with I/O Server
© Crown copyright Met Office Lots of diagnostic output Which processes are I/O servers “Stall” messages Memory log Timing log Full log of metadata / queue All really useful for tuning!
© Crown copyright Met Office Lots of tuneable parameters… Number and spacing of I/O servers Memory for I/O servers Number of local data copies Number of fields to amalgamate Load balancing options Timing tunings + standard I/O tunings (write block size) etc
© Crown copyright Met Office Overloaded servers
© Crown copyright Met Office I/O Servers keeping up!
© Crown copyright Met Office MPI considerations Differing levels of MPI threading support Best with MPI_THREAD_MULTIPLE OK with MPI_THREAD_FUNNELED MPI tuning Want metadata to go as quickly as possible Want data transfer to be truly asynchronous Don’t want to interfere with model comms (e.g. halo exchange) Currently use 19 environment variables!
© Crown copyright Met Office Deployment July 2011 – Operational global forecasts January 2012 – Operational LAM forecasts February 2012 – High resolution climate work Not currently used in Operational ensembles Low resolution climate work Most research work
© Crown copyright Met Office Global Forecast Improvement QG 00/12 QG 06/18 QU Time 777s559s257s %age 19%28%27% Total saving: over 21 node-hours per day
© Crown copyright Met Office Impact on High Resolution Climate N512 resolution AMIP 59 GB restart dumps Modest diagnostics Cray XE6 with up to 9K cores All “in-run” output hidden Waits for final restart dump Most data buffered on client side
© Crown copyright Met Office Current and Future Developments MPI Parallel I/O servers Multiple I/O servers per stream Gives more memory per stream on server Reduced messaging rate per node Parallel packing Potential for parallel I/O Read ahead Potential for boundary conditions / forcings Some possibilities for initial condition
© Crown copyright Met Office Parallel I/O server improvement Before After
© Crown copyright Met Office Questions and answers
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Multiple Processor Systems
© Crown copyright Met Office A New Radiative Time- stepping Scheme A short description of the time-stepping scheme test Peter Hill, 19 Feb 2008.
Experiment Workflow Pipelines at APS: Message Queuing and HDF5 Claude Saunders, Nicholas Schwarz, John Hammonds Software Services Group Advanced Photon.
1 Tuning for MPI Protocols l Aggressive Eager l Rendezvous with sender push l Rendezvous with receiver pull l Rendezvous blocking (push or pull)
1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.
MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
CMPT 431 Dr. Alexandra Fedorova Lecture III: OS Support.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture III: OS Support.
CMPT 401 Dr. Alexandra Fedorova Lecture III: OS Support.
Threads, SMP, and Microkernels
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
Thoughts on Shared Caches Jeff Odom University of Maryland.
WSUS Presented by: Nada Abdullah Ahmed.
Implementing A Simple Storage Case Consider a simple case for distributed storage – I want to back up files from machine A on machine B Avoids many tricky.
Distributed Multimedia Systems
Computer Systems/Operating Systems - Class 8
© 2018 SlidePlayer.com Inc. All rights reserved.