SKA in WP19 Iain Emsley Rahim ‘Raz’ Lakhoo David Wallom 3 rd Annual Meeting.

Slides:



Advertisements
Similar presentations
Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.
Advertisements

PowerEdge T20 Channel NDA presentation Dell Confidential – NDA Required.
The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
© 2010 Cisco and/or its affiliates. All rights reserved. 1 Microsoft Exchange Server Ready UCS Data Center & Virtualizacion Cisco Latin America Valid through.
4/11/2017 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Custom’s K-12 Education Technology Council Presents… Custom Computer Specialists Server Technology Solutions Designed for NYCDOE Affordable and.
IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
ASKAP Central Processor: Design and Implementation Calibration and Imaging Workshop 2014 ASTRONOMY AND SPACE SCIENCE Ben Humphreys | ASKAP Software and.
2.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 2: Installing Windows Server.
Lesson 15 – INSTALL AND SET UP NETWARE 5.1. Understanding NetWare 5.1 Preparing for installation Installing NetWare 5.1 Configuring NetWare 5.1 client.
Introduction to Flash Jeremy Johnson & Frank Witmer Computing and Research Services 8 Jul 2014.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
 Model: ASUS SABERTOOTH Z77 Intel Series 7 Motherboard – ATX, Socket H2 (LGA115), Intel Z77 Express, 1866MHz DDR3, SATA III (6Gb/s), RAID, 8-CH Audio,
Computer Design Weber.
SUNY IT Master's Project Using Open Source Virtualization Technology In Computer Education By: Ronny L. Bull Advised By: Geethapriya Thamilarasu, Ph.D.
5.3 HS23 Blade Server. The HS23 blade server is a dual CPU socket blade running Intel´s new Xeon® processor, the E5-2600, and is the first IBM BladeCenter.
Digital Graphics and Computers. Hardware and Software Working with graphic images requires suitable hardware and software to produce the best results.
Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Basic Computer Structure and Knowledge Project Work.
PMC Proprietary & Confidential November 2010 Product Presentation.
HPC at IISER Pune Neet Deo System Administrator
High Performance Computing G Burton – ICG – Oct12 – v1.1 1.
Twin + Platform DCO Platform GPU Tower
Microsoft Virtual Academy. Microsoft Virtual Academy.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
Reliable Datagram Sockets and InfiniBand Hanan Hit NoCOUG Staff 2010.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
MY PERSONAL COMPUTER Monica Sheffo. MOTHERBOARD  Model: Intel BOXDZ77GA-70K Intel Extreme Motherboard  Supported Processors: 2 nd generation Intel Core.
DAC-FF The Ultimate Fibre-to-Fibre Channel External RAID Controller Solution for High Performance Servers, Clusters, and Storage Area Networks (SAN)
The NE010 iWARP Adapter Gary Montry Senior Scientist
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
Sandor Acs 05/07/
Securing and Monitoring 10GbE WAN Links Steven Carter Center for Computational Sciences Oak Ridge National Laboratory.
CRISP & SKA WP19 Status. Overview Staffing SKA Preconstruction phase Tiered Data Delivery Infrastructure Prototype deployment.
Hardware. Make sure you have paper and pen to hand as you will need to take notes and write down answers and thoughts that you can refer to later on.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Wright Technology Corp. Minh Duong Tina Mendoza Tina Mendoza Mark Rivera.
4 Dec 2006 Testing the machine (X7DBE-X) with 6 D-RORCs 1 Evaluation of the LDC Computing Platform for Point 2 SuperMicro X7DBE-X Andrey Shevel CERN PH-AID.
PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
Building a Computer Total System Price: $ Brandon Collins 3/1/13 Pd.2.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
Latest ideas in DAQ development for LHC B. Gorini - CERN 1.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
Weekly Report By: Devin Trejo Week of June 21, 2015-> June 28, 2015.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Leading in the compute.
 System Requirements are the prerequisites needed in order for a software or any other resources to execute efficiently.  Most software defines two.
Niko Neufeld HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
1 1 SKA - CRISP WP19 17 th March 2013 Dr. Rahim Lakhoo (a.k.a Raz) P.I – Dr. David Wallom.
Computer Systems Unit 2. Download the unit specification from moodle or the BTEC website Or alternatively visit ahmedictlecturer.wikispaces.com.
Acer Altos T110 F3 E3-1220v3 4GB 500GB: 24,900 Baht
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
1062m0656 between 10692m2192 DS/ICI/CIF EqualLogic PS6510E
By: Mike Kennelly. Motherboard Model : ASUS P8Z77-V PRO Intel 7 Series Motherboard Supported Processor(s): Any third of second generation Intel Processor.
FusionCube At-a-Glance. 1 Application Scenarios Enterprise Cloud Data Centers Desktop Cloud Database Application Acceleration Midrange Computer Substitution.
TLDK overview Konstantin Ananyev 05/08/2016.
LHCb and InfiniBand on FPGA
Joint Genome Institute
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Computer Hardware.
Computer Selection - Hardware Components
QNX Technology Overview
Presentation transcript:

SKA in WP19 Iain Emsley Rahim ‘Raz’ Lakhoo David Wallom 3 rd Annual Meeting

Science Data Processor Consortium from: Australia, Canada, Chile, Germany, The Netherlands, New Zealand, Portugal, Spain, UK and USA

SDP Tasks and CRISP Compute platform –Input Handling Data Layer –Buffer –External interfaces Data delivery and tiered data model –Tiered data delivery model

SKA WP 19 tasks Data Requirements Network Requirements Technology Survey Testbed

Data Requirements ~3.8 EB of processed data products over the project lifetime of SKA 1, 11 different types of data product types, Expected linear growth of data volume and data requests over the project lifetime, Data distributed to a number of regional centres* Access methods should include IVOA standards Support both SKA scientists and also provide public interfaces 19 published Science usecases –Range of data volumes and operational methodologies that will affect data volumes and transmission requirements

SKA1-SYS_REQ-2366 Distribution of data products. Distribution of data products. As limited by resource constraints, it will be possible to deliver science data products to approved off-site facilities, which may be globally distributed

Network Requirements ‘1Tbps link from each site would support SKA-1’ *Not actually available currently from either site 1.4Tbps technology over existing infrastructure has been shown over ‘long’ distance Liaising with Signal and Data Transport consortium to connect to NREN and fibre providers (F2F in Manchester right now)

Delivery Tools Selection Level 3 – User Facing tools – e.g. CyberSKA Level 2 – Data distribution and management –e.g. iRODS, PhEDEx, NGAS Level 1 – Data transfer –e.g. GridFTP, RFT, FTS Level 0 – Lower level tools and protocols –Networking, messaging, RDMA

Example Review Criteria Scalability –Data demand –Data size –High-speed data transfers –Storage expandability –Architecture Data integrity Interoperability Data Discovery Management Fault tolerance Location aware Support and maintenance Software dependencies Open and standard protocols Software license Open standard security

Test Bed Adaptec internal ports RAID card, PCIe 3.0 (x8), Mini-HD SAS, with 1GB DDR3 Cache Sixteen 2.5 inch 128GB SATA3/6Gb OCZ Vector MLC NAND SSD's, ~2TB per node (RAID 0). Mellanox FDR Infiniband 112Gbit/s (dual Connect-IB PCIe 3.0 (x16) Quad port Intel i350 GbE GPUs include NVIDIA K10 (PCIe 3.0) and K40 (PCIe 3.0) Dual 920W redundant PSUs 2U chassis with x inch drive bays Dual socket Intel E (3.8GHz turbo), 135W TDP, 20MB Cache, 8GT/s QPI, Quad memory channel (Max. 51.2GB/s) X9DRW-3LN4F+ Supermicro motherboard, BIOS version GiB (8GiB x 8) ECC DDR3 1600MHz CL11 Single rank

Test Harness Aims are re-usability and repeatability Installs PXE boot images of Debian, Centos with common settings –Ubuntu is in planning stage Puppet configuration management –Installs common packages, and –Software such as OSKAR and casacore Shared storage space for results

nVidia K40 Intel E5 CPU I/B 16 x SS D RAI D0 nVidia K40 Intel E5 CPU I/B 16 x SS D RAI D0 nVidia K40 Intel E5 CPU 16 x SS D RAI D0 Intel E5 CPU I/B 16 x SS D RAI D0 nVidia K40 Intel E5 CPU COMP.INP/DATA.BUF ~72TB gives ~10s buffer nVidia K40 I/B 16 x SS D RAI D0 I/B Intel E5 CPU Me m nVidia K40 I/B 16 x SS D RAI D0 I/B Me m

Intel E5 CPU I/B 16 x SS D RAI D0 nVidia K40 Intel E5 CPU I/B 16 x SS D RAI D0 nVidia K40 Intel E5 CPU 16 x SS D RAI D0 nVidia K40 Intel E5 CPU nVidia K40 Intel E5 CPU I/B 16 x SS D RAI D0 I/B 16 x SS D RAI D0 nVidia K40 Intel E5 CPU DATA.EXT I/B 16 x SS D RAI D0 Me m

nVidia K40 Intel E5 CPU I/B 16 x SS D RAI D0 nVidia K40 Intel E5 CPU DATA DELIVERY I/B 16 x SS D RAI D0 nVidia K40 Intel E5 CPU I/B 16 x SS D RAI D0 nVidia K40 Intel E5 CPU I/B 16 x SS D RAI D0 nVidia K40 Intel E5 CPU I/B 16 x SS D RAI D0 Me m I/B 16 x SS D RAI D0 nVidia K40 Intel E5 CPU I/B 16 x SS D RAI D0 nVidia K40 Intel E5 CPU I/B 16 x SS D RAI D0 nVidia K40 Intel E5 CPU I/B 16 x SS D RAI D0 Me m

Initial Energy Profiling Initial profiling using EMPPACK package with external Watt’s Up Pro meters Testbed chassis power monitoring shows the same power usage, but with less resolution 16 Both power readings show approx. 338 watts - basic power device verification

Enhanced Power Monitoring 17 Use RAPL, IPMI and NVML, to give CPU and DRAM, Chassis and GPU power readings.

Developing Enhanced Application Profiling Low overhead profiling tools and methods, Capture power and performance for NICs, CPUs, GPUs (inc. GPUDirect), I/O devices, and Memory, Method needs to scale from small (i.e. ARM) to large systems (i.e. compute node). Kernel-level tool chains and tracers: I/O profiling/tracing and replay has been done with a 2-5% overhead. Aiming for a ‘one stop shop’ for metric gathering 18

Profiling with Power 19 Average MB/s Approx. 230 watts power Please ignore time scale on the right graph

Data Write

Data Read

Future Plans Point to point tests between SDP DELIV members on selected tools Configuration of protocols for lightpath setup to host countries Publish analysis of delivery tools Completing all workpackge L2/L3 requirements decomposition Profiling LOFAR and MWA algorithms Hardware FLOP counters to compliment hardware power counters Working with Mellanox for NIC power measurements GPUDirect configuration and integration in testbed Investigating BFS CPU scheduler and Kernel tweaks