Cape Town 2013 Dr Paul Calleja Director Cambridge HPC Service SKA The worlds largest Radio Telescope streaming data processor.

Slides:



Advertisements
Similar presentations
Founded in 2010: UCL, Southampton, Oxford and Bristol Key Objectives of the Consortium: Prove the concept of shared, regional e-infrastructure services.
Advertisements

SKADSMTAvA A. van Ardenne SKADS Coördinator ASTRON, P.O. Box 2, 7990 AA Dwingeloo The Netherlands SKADS; The.
CORE: e-Infrastructure solutions IDC HPC User Forum: 5 July 2012.
STFC and the UK e-Infrastructure Initiative The Hartree Centre Prof. John Bancroft Project Director, the Hartree Centre Member, e-Infrastructure Leadership.
Square Kilometre Array (SKA) Project PRE-CONSTRUCTION PHASE Briefing for Industry & Science Organisations Tuesday 14 th August Wellington.
SADC HPC Workshop, 2 Dec 2013, Cape Town
Paul Alexander DS3 & DS3-T3 SKADS Review 2006 DS3 The Network and its Output Data Paul Alexander.
CSIRO ASKAP Science Data Archive (CASDA) Project Kick-Off IM&T AND CASS Dan Miller| Project Manager 17 July 2014.
Appro Xtreme-X Supercomputers A P P R O I N T E R N A T I O N A L I N C.
A supercomputer-based software correlator at the Swinburne University of Technology Tingay, S.J. and Deller, A.
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.
Supermicro © 2009Confidential HPC Case Study & References.
SKA South Africa Overview Thomas Kusel MeerKAT System Engineering Manager April 2011.
Probing the field of Radio Astronomy with the SKA and the Hartebeesthoek Radio Observatory: An Engineer’s perspective Sunelle Otto Hartebeesthoek Radio.
ASKAP Central Processor: Design and Implementation Calibration and Imaging Workshop 2014 ASTRONOMY AND SPACE SCIENCE Ben Humphreys | ASKAP Software and.
Future Plans at the Pierre Auger Observatory Lawrence Wiencke Colorado School of Mines Apr , Cambridge 1.
Dell IT Innovation Labs in the Cloud “The power to do more!” Andrew Underwood – Manager, HPC & Research Computing APJ Solutions Engineering Team.
Big Data Imperial June 2013 Dr Paul Calleja Director HPCS The SKA The worlds largest big-data project.
SKA/LOFAR Ray Norris ATNF Outreach workshop 2 Dec 2003.
© Copyright 2010 Hewlett-Packard Development Company, L.P. 1 HP + DDN = A WINNING PARTNERSHIP Systems architected by HP and DDN Full storage hardware and.
Update on Center for High Performance Computing..
Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
Andrew Faulkner University of Manchester Jodrell Bank Observatory.
SKA Introduction Jan Geralt Bij de Vaate Andrew Faulkner, Andre Gunst, Peter Hall.
LOFAR AND AFRICA Daan du Toit DST – South Africa.
VO Sandpit, November 2009 e-Infrastructure to enable EO and Climate Science Dr Victoria Bennett Centre for Environmental Data Archival (CEDA)
LOFAR project Astroparticle Physics workshop 26 April 2004.
Design of a Software Correlator for the Phase I SKA Jongsoo Kim Cavendish Lab., Univ. of Cambridge & Korea Astronomy and Space Science Institute Collaborators:
Paul Alexander & Jaap BregmanProcessing challenge SKADS Wide-field workshop SKA Data Flow and Processing – a key SKA design driver Paul Alexander and Jaap.
ATUC meeting Director’s Report Brian Boyle November 2007.
CRISP & SKA WP19 Status. Overview Staffing SKA Preconstruction phase Tiered Data Delivery Infrastructure Prototype deployment.
March 9, 2015 San Jose Compute Engineering Workshop.
© 2010 DataDirect Networks. Confidential Information D D N D E L I V E R S DataDirect Networks Update Mike Barry, HPC Sales April IDC HPC User Forum.
1 ASTRONET Coordinating strategic planning for European Astronomy.
Square Kilometre Array 15 Aug Outline Radio astronomy across Africa African VLBI Network SKA countries coordination 2 1 SKA Organisation Timelines.
CEA DSM Irfu IRFU site report. CEA DSM Irfu HEPiX Fall 0927/10/ Computing centers used by IRFU people IRFU local computing IRFU GRIF sub site Windows.
A Data Centre for Science and Industry Roadmap. INNOVATION NETWORKING DATA PROCESSING DATA REPOSITORY.
by Arjun Radhakrishnan supervised by Prof. Michael Inggs
Australian SKA Pathfinder (ASKAP) David R DeBoer ATNF Assistant Director ASKAP Theme Leader 06 November 2007.
Patryk Lasoń, Marek Magryś
IDC HPC User Forum April 14 th, 2008 A P P R O I N T E R N A T I O N A L I N C Steve Lyness Vice President, HPC Solutions Engineering
Paul AlexanderSKA Computational Challenges Square Kilometre Array Computational Challenges Paul Alexander.
Cyberinfrastructure for international competitiveness Dr Happy Sithole Contributions from: Mr Leon Staphorst and Dr Anwar Vehad.
Prof. Steven Tingay (ICRAR, Curtin University) Workshop on East-Asian Collaboration on the SKA Daejeon, Korea, November 30 – December 2, 2011 A long baseline.
Power and Cooling at Texas Advanced Computing Center Tommy Minyard, Ph.D. Director of Advanced Computing Systems 42 nd HPC User Forum September 8, 2011.
The Square Kilometre Array Dr. Minh Huynh (International Centre for Radio Astronomy Research and SKA Program Development Office) Deputy International SKA.
Paul Alexander 2 nd SKADS Workshop October 2007 SKA and SKADS Costing The Future Paul Alexander Andrew Faulkner, Rosie Bolton.
Sensors and Instrumentation Computational and Data Challenges in Environmental Modelling Dr Peter M Allan Director, Hartree Centre, STFC.
Square Kilometre Array eInfrastructure: Requirements, Planning, Future Directions Duncan Hall SPDO Software and Computing EGEE 2009.
PRESENTATION DATEEXPReS- TITLE OF YOUR PRESENTATIONSlide #1 What is EXPReS? EXPReS = Express Production Real-time e-VLBI Service Three year project (March.
RI EGI-InSPIRE RI Astronomy and Astrophysics Dr. Giuliano Taffoni Dr. Claudio Vuerli.
VLBI Developments in Australia ASTRONOMY AND SPACE SCIENCE Chris Phillips| LBA Lead Scientist 11 November 2014.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
1 ASTRON is part of the Netherlands Organisation for Scientific Research (NWO) Netherlands Institute for Radio Astronomy Astronomy at ASTRON George Heald.
The Science Data Processor and Regional Centre Overview Paul Alexander UK Science Director the SKA Organisation Leader the Science Data Processor Consortium.
Netherlands Institute for Radio Astronomy 1 ASTRON is part of the Netherlands Organisation for Scientific Research (NWO) Square Kilometer Array Low Central.
High Performance Computing (HPC)
NIIF HPC services for research and education
Building the Square Kilometer Array – a truly global project
Mid Frequency Aperture Arrays
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME Outreach SESAME,
The UniBoard A RadioNet FP7 Joint Research Activity, 9 partners
Appro Xtreme-X Supercomputers
SKA Regional Centre Coordination Group (SRCCG)
Barcelona Supercomputing Center
The Cambridge Research Computing Service
Presentation transcript:

Cape Town 2013 Dr Paul Calleja Director Cambridge HPC Service SKA The worlds largest Radio Telescope streaming data processor

Cape Town 2013 Introduction to Cambridge HPCS Overview of the SKA project SKA streaming data processing challenge The SKA SDP consortium Overview

Cape Town 2013 Cambridge University The University of Cambridge is a world leading teaching & research institution, consistently ranked within the top 3 Universities world wide Annual income of £1200M - 40% is research related - one of the largest R&D budgets within the UK HE sector students, 9,000 staff Cambridge is a major technology centre –1535 technology companies in surrounding science parks –£12B annual revenue –53000 staff The HPCS has a mandate to provide HPC services to both the University and wider technology company community

Cape Town 2013 Four domains of activity Dell HPC Solution Centre Industrial HPC Service Cambridge HPC Service Commodity HPC Centre of Excellence Promoting uptake of HPC by UK Industry Driving Discovery Advancing development and application of HPC HPC R& D

Cape Town registered users from 31 departments 856 Dell Servers TF sustained DP performance 128 node Westmere (1536 cores) (16 TF) 600 node (9600 core) full non blocking Mellanox FDR IB 2,6 GHz sandy bridge (200 TF) one of the fastest Intel clusters in he UK SKA GPU test bed -128 node 256 card NVIDIA K20 GPU Fastest GPU system in UK 250 TF Designed for maximum I/O throughput and message rate Full non blocking Dual rail Mellanox FDR Connect IB Design for maximum energy efficiency 2 in Green500 Most efficient air cooled supercomputer in the world 4 PB storage – Lustre parallel file system 50GB/s Run as a cost centre – charges our users – 20% income from industry Cambridge HPC vital statistics

Cape Town 2013 CORE – Industrial HPC service & consultancy

Cape Town 2013 Dell | Cambridge HPC Solution Centre The Solution Centre is a Dell Cambridge joint funded HPC centre of excellence, provide leading edge commodity open source HPC solutions.

Cape Town 2013 SA CHPC collaboration HPCS has a long term strategic partnership with CHPC HPCS has been working closely with CHPC for last 6 years Technology strategy, system design procurement HPC system stack development SKA platform development

Cape Town 2013 Next generation radio telescope Large multi national Project 100 x more sensitive X faster 5 square km of dish over 3000 km The next big science project Currently the worlds most ambitious IT Project First real exascale ready application Largest global big-data challenge Square Kilometre Array - SKA

Cape Town 2013 SKA location Needs a radio-quiet site Very low population density Large amount of space Two sites: Western Australia Karoo Desert RSA A Continental sized Radio Telescope

Cape Town 2013 SKA phase 1 implementation SKA1_Low SKA1_Mid incl MeerKAT SKA ElementLocation Dish ArraySKA1_MidRSA Low Frequency Aperture ArraySKA1_LowANZ Survey InstrumentSKA1_AIP_SurveyANZ SKA1_AIP_Survey incl ASKAP +

Cape Town 2013 SKA phase 2 implementation SKA2_Low SKA2_Mid_Dish SKA2_AIP_AA SKA ElementLocation Low Frequency Aperture ArraySKA2_LowANZ Mid Frequency Dish ArraySKA2_Mid_DishRSA Mid Frequency Aperture ArraySKA2_Mid_AARSA

Cape Town 2013 What is radio astronomy XXXXXX SKY Image Detect & amplify Digitise & delay Correlate Process Calibrate, grid, FFT Integrate s B 12 Astronomical signal (EM wave)

Cape Town 2013 SKA – Key scientific drivers Cradle of life Cosmic Magnetism Evolution of galaxies Pulsar survey gravity waves Exploring the dark ages

Cape Town 2013 SKA is a cosmic time machine

Cape Town 2013 But…… Most importantly the SKA will investigate phenomena we have not even imagined yet Most importantly the SKA will investigate phenomena we have not even imagined yet

Cape Town 2013 SKA timeline 2022Operations SKA : Operations SKA Construction of Full SKA, SKA 2 €2 B % SKA construction, SKA 1 €650M 2012Site selection Pre-Construction: 1 yr Detailed design€90M PEP 3 yr Production Readiness System design and refinement of specification Initial concepts stage Preliminary ideas and R&D

Cape Town 2013 SKA project structure SKA Board Director General Work Package Consortium 1 Work Package Consortium n Advisory Committees (Science, Engineering, Finance, Funding …) Advisory Committees (Science, Engineering, Finance, Funding …) … … Project Office (OSKAO) Locally funded

Cape Town 2013 Work package breakdown UK (lead), AU (CSIRO…), NL (ASTRON…) South Africa SKA, Industry (Intel, IBM…) UK (lead), AU (CSIRO…), NL (ASTRON…) South Africa SKA, Industry (Intel, IBM…) 1.System 2.Science 3.Maintenance and support /Operations Plan 4.Site preparation 5.Dishes 6.Aperture arrays 7.Signal transport 8.Data networks 9.Signal processing 10.Science Data Processor 11.Monitor and Control 12. Power SPO

Cape Town 2013 SKA = Streaming data processor Challenge The SDP consortium led by Paul Alexander University of Cambridge 3 year design phase has now started (as of November 2013) To deliver SKA ICT infrastructure need a strong multi-disciplinary team Radio astronomy expertise HPC expertise (scalable software implementations; management) HPC hardware (heterogeneous processors; interconnects; storage) Delivery of data to users (cloud; UI …) Building a broad global consortium: 11 countries: UK, USA, AUS, NZ, Canada, NL, Germany, China, France, Spain, South Korea Radio astronomy observatories; HPC centres; Multi-national ICT companies; sub-contractors

Cape Town 2013 SDP consortium members Management GroupingsWorkshare (%) University of Cambridge (Astrophysics & HPFCS) 9.15 Netherlands Institute for Radio Astronomy 9.25 International Centre for Radio Astronomy Research 8.35 SKA South Africa / CHPC 8.15 STFC Laboratories 4.05 Non-Imaging Processing Team 6.95 University of Manchester Max-Planck-Institut für Radioastronomie University of Oxford (Physics) University of Oxford (OeRC) 4.85 Chinese Universities Collaboration 5.85 New Zealand Universities Collaboration 3.55 Canadian Collaboration Forschungszentrum Jülich 2.95 Centre for High Performance Computing South Africa 3.95 iVEC Australia (Pawsey) 1.85 Centro Nacional de Supercomputación 2.25 Fundación Centro de Supercomputación de Castilla y León 1.85 Instituto de Telecomunicações 3.95 University of Southampton 2.35 University College London 2.35 University of Melbourne 1.85 French Universities Collaboration 1.85 Universidad de Chile 1.85

Cape Town 2013 SDP –strong industrial partnership Discussions under way with DelI, NVIDIA, Intel, HP IBM, SGI, l, ARM, Microsoft Research Xyratex, Mellanox, Cray, DDN NAG, Cambridge Consultants, Parallel Scientific Amazon, Bull, AMD, Altera, Solar flare, Geomerics, Samsung, CISCO Apologies to those I’ve forgotten to list

Cape Town 2013 SDP work packages

Cape Town 2013 SKA data rates 16 Tb/s4 Pb/s 24 Tb/s 20 Gb/s 1000Tb/s

Cape Town 2013 SKA conceptual data flow

Cape Town 2013 SKA conceptual data flow

Cape Town 2013 Science data processor pipeline 10 Pflop 1 Eflop 100 Pflop Software complexity 10 Tb/s 200 Pflop 10 Eflop … Incoming Data from collectors Switch Buffer store Switch Buffer store Bulk Store Correlator Beamformer UV Processor Imaging: Non-Imaging: Corner Turning Course Delays Fine F-step/ Correlation Visibility Steering Observation Buffer Gridding Visibilities Imaging Image Storage Corner Turning Course Delays Beamforming/ De-dispersion Beam Steering Observation Buffer Time-series Searching Search analysis Object/timing Storage HPC science processing Image Processor 1000Tb/s 1 Eflop 10 EB/y SKA 2 SKA 1 1 EB/y 10 Tb/s 50 PB 10/1 TB/s

Cape Town 2013 SDP processing rack – feasibility model Host processor Multi-core X86 M-Core - >10TFLOP/s To rack switches Disk 1 ≥1TB 56Gb/s PCI Bus Disk 2 ≥1TB Disk 3 ≥1TB Disk 4 ≥1TB Processing blade 1 Processing blade 2 Processing blade 3 Processing blade 4 Processing blade 5 Processing blade 6 Processing blade 7 Processing blade 8 Processing blade 9 Processing blade 10 Processing blade 11 Processing blade 12 Processing blade 13 Processing blade 14 Processing blade 15 Processing blade 16 Processing blade 17 Processing blade 18 Processing blade 19 Processing blade 20 Leaf Switch-1 56Gb/s Leaf Switch-2 56Gb/s 42U Rack Processing Blade: GGPU, MIC,…? 20 TFlop 2x56 Gb/s comms 4 TB storage <1kW power Capable host (dual Xeon) Programmable Significant RAM 20 TFlop 2x56 Gb/s comms 4 TB storage <1kW power Capable host (dual Xeon) Programmable Significant RAM Blade Specification

Cape Town 2013 SKA feasibility model

Cape Town 2013 SKA conceptual software stack

Cape Town 2013 HPC development and prototyping lab for SKA Coordinated out of Cambridge and run jointly by HPCS and CHPC Will work closely with COMP to test and design various potential compute, networking, storage and HPC system / application software components Rigorous system engineering approach, which describes a formalised design and prototyping loop Provides a managed, global lab for the whole of the SDP consortium Provide touch stone and practical place of work for interaction with vendors First major test bed in the form of a Dell / Mellanox / NVIDIA GPU cluster has been deployed in the lab last month and will be used by consortium to drive design R&D SKA Open Architecture Lab

Cape Town 2013 The SKA SDP compute facility will be at the time of deployment one of the largest HPC systems in existence Operational management of large HPC systems is challenging at the best of times - When HPC systems are housed in well established research centres with good IT logistics and experienced Linux HPC staff The SKA SDP could be housed in a desert location with little surrounding IT infrastructure, with poor IT logistics and little prior HPC history at the site Potential SKA SDP exascale systems are likely to consist of 100,000 nodes occupy 800 cabinets and consume 30 MW. This is very large – around 5 times the size of one today largest supercomputer –Titan Cray at Oakridge national labs. The SKA SDP HPC operations will be very challenging SKA Exascale computing in the desert

Cape Town 2013 Although the operational aspects of the SKA SDP exacscale facility are challenging they are tractable if dealt with systematically and in collaboration with the HPC community. The challenge is tractable

Cape Town 2013 We can describe the operational aspects by functional element Machine room requirements ** SDP data connectivity requirements SDP workflow requirements System service level requirements System management software requirements** Commissioning & acceptance test procedures System administration procedure User access procedures Security procedure Maintenance & logistical procedures ** Refresh procedure System staffing & training procedures ** SKA HPC operations – functional elements

Cape Town 2013 Machine room infrastructure for exascale HPC facilities is challenging 800 racks, 1600M squared 30MW IT load ~40 Kw of heat per rack Cooling efficiency and heat density management is vital Machine infrastructure at this scale is both costly and time comsuming The power cost alone at todays cost is £30M per year Desert location presents particular problems for data centre Hot ambient temperature - difficult for compressor less cooling Lack of water- difficult for compressor less cooling Very dry air- difficult for humidification Remote location- difficult for DC maintenance Machine room requirements

Cape Town 2013 System management software is the vital element in HPC operations System management software today does not scale to exascale Worldwide coordinated effort to develop system management software for exascale Elements of system management software stack:- Power management Network management Storage management Workflow management OS Runtime environment Security management System resilience System monitoring System data analytics Development tool System management software

Cape Town 2013 Current HPC technology MBTF for hardware and system software result in failure rates of ~ 2 nodes per week on a cluster a ~600 nodes. It is expected that SKA exascale systems could contain ~100,000 nodes Thus expected failure rates of 300 nodes per week could be realistic During system commissioning this will be 3 or 4 X Fixing nodes quickly is vital otherwise the system will soon degrade into a non functional state The manual engineering processes for fault detection and diagnosis on 600 will not scale to 100,000 nodes. This needs to be automated by the system software layer Vendor hardware replacement logistics need to cope with high turn around rates Maintenance logistics

Cape Town 2013 Providing functional staffing levels and experience at remote desert location will be challenging Its hard enough finding good HPC staff to run small scale HPC systems in Cambridge – finding orders of magnitude more staff to run much more complicated systems in a remote desert location will be very Challenging Operational procedures using a combination of remote system administration staff and DC smart hands will be needed. HPC training programmes need to be implemented to skill up way in advance Staffing levels and training

Cape Town 2013 Early Cambridge SKA solution - EDSAC 1 Maurice Wilkes