NW-GRID, HEP and sustainability Cliff Addison Computing Services July 2008

Slides:



Advertisements
Similar presentations
Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.
Advertisements

Symantec 2010 Windows 7 Migration Global Results.
Clusters, Grids and their applications in Physics David Barnes (Astro) Lyle Winton (EPP)
An open source approach for grids Bob Jones CERN EU DataGrid Project Deputy Project Leader EU EGEE Designated Technical Director
S.L.LloydATSE e-Science Visit April 2004Slide 1 GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and.
ESLEA and HEPs Work on UKLight Network. ESLEA Exploitation of Switched Lightpaths in E- sciences Applications Exploitation of Switched Lightpaths in E-
Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
© University of Reading David Spence 20 April 2014 e-Research: Activities and Needs.
Founded in 2010: UCL, Southampton, Oxford and Bristol Key Objectives of the Consortium: Prove the concept of shared, regional e-infrastructure services.
Computing Infrastructure
White Rose Grid Infrastructure Overview Chris Cartledge Deputy Director Corporate Information and Computing Services, The University of Sheffield
Finnish Material Sciences Grid (M-grid) Arto Teräs Nordic-Sgi Meeting October 28, 2004.
Rob Allan Daresbury Laboratory NW-GRID Training Event 26 th January 2007 NW-GRID Future Developments R.J. Allan CCLRC Daresbury Laboratory.
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
Overview of High Performance Computing at KFUPM Khawar Saeed Khan ITC, KFUPM.
2. Computer Clusters for Scalable Parallel Computing
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Information Technology Center Introduction to High Performance Computing at KFUPM.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
Academic and Research Technology (A&RT)
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Computing/Tier 3 Status at Panjab S. Gautam, V. Bhatnagar India-CMS Meeting, Sept 27-28, 2007 Delhi University, Delhi Centre of Advanced Study in Physics,
High Performance Computing G Burton – ICG – Oct12 – v1.1 1.
Operational computing environment at EARS Jure Jerman Meteorological Office Environmental Agency of Slovenia (EARS)
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
“SEMI-AUTOMATED PARALLELISM USING STAR-P " “SEMI-AUTOMATED PARALLELISM USING STAR-P " Dana Schaa 1, David Kaeli 1 and Alan Edelman 2 2 Interactive Supercomputing.
Gurcharan S. Khanna Director of Research Computing RIT
University of Southampton Clusters: Changing the Face of Campus Computing Kenji Takeda School of Engineering Sciences Ian Hardy Oz Parchment Southampton.
Extracted directly from:
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
Rensselaer Why not change the world? Rensselaer Why not change the world? 1.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
Edinburgh Investment in e-Science Infrastructure Dr Arthur Trew.
CRISP & SKA WP19 Status. Overview Staffing SKA Preconstruction phase Tiered Data Delivery Infrastructure Prototype deployment.
Cliff Addison University of Liverpool Campus Grids Workshop October 2007 Setting the scene Cliff Addison.
The Birmingham Environment for Academic Research Setting the Scene Peter Watkins, School of Physics and Astronomy (on behalf of the Blue Bear team)
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
Rob Allan Daresbury Laboratory NW-GRID Training Event 25 th January 2007 North West Grid Overview R.J. Allan CCLRC Daresbury Laboratory A world-class Grid.
Virtualisation & Cloud Computing at RAL Ian Collier- RAL Tier 1 HEPiX Prague 25 April 2012.
Rob Allan Daresbury Laboratory NW-GRID Training Event 25 th January 2007 Introduction to NW-GRID R.J. Allan CCLRC Daresbury Laboratory.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing,
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
SCI-BUS project Pre-kick-off meeting University of Westminster Centre for Parallel Computing Tamas Kiss, Stephen Winter, Gabor.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
NIIF HPC services for research and education
Clouds , Grids and Clusters
White Rose Grid Infrastructure Overview
Grid Computing.
Constructing a system with multiple computers or processors
Low-Cost High-Performance Computing Via Consumer GPUs
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
The National Grid Service Mike Mineter NeSC-TOE
Types of Parallel Computers
Presentation transcript:

NW-GRID, HEP and sustainability Cliff Addison Computing Services July

Top end: HPCx and CSAR Hooks to other Grid consortia: NGS, WRG Applications and industry Sensor networks and experimental facilities Technology “tuned to the needs of practicing scientists”. Pharma, meds, bio, social, env, CCPs User interfaces Desktop pools: Condor etc. Portals, client toolkits, active overlays Advanced Network technology Mid range: NW- GRID and local clusters NW-GRID Vision

Project Aims and Partners ● Aims: – Establish, for the region, a world-class activity in the deployment and exploitation of Grid middleware – realise the capabilities of the Grid in leading edge academic, industrial and business computing applications – Leverage 100 posts plus £15M of additional investment ● Project Partners: – Daresbury Laboratory: CSED and e-Science Centre – Lancaster University: Management School, Physics, e- science and computer science – University of Liverpool: Physics and Computer Services – University of Manchester: Computing, Computer Science, Chemistry, bio-informatics + systems biology – Proudman Oceanographic Laboratory, Liverpool

Project Funding ● North West Development Agency ● £5M over 4 years commencing April 2004 – So funding has just ended - we’re looking for more! ● £2M capital for systems at four participating sites with initial systems in year 1 (Jan 2006) and upgrades in year 3 (Jan 2008) ● £3M for staff – about 15 staff for 3 years ● POL plus institutional contributions from Daresbury and Lancaster Complemented by “Teragrid competitive” private Gbit/s link among sites.

Hardware 2008 upgrade ● Lancaster and Liverpool procured upgrades that are now coming on stream. ● Procured systems: – now contain dual-core Opterons – are slowly being upgraded to quad-core Barcelona ● Lancaster: – 67 Sun x2200 nodes, ( GB, GB) – 24 TB storage (Sun x4500) ● Liverpool – 110 Sun x2200 nodes, (73 – 32 GB, 27 – 16 GB) – 24 TB storage (Sun x4500) + 48 TB back-up – Complete rework of existing cluster into two TFlops systems (connected via 10 gbps fibre).

Liverpool Opteron Clusters High Capability Cluster Gig-Ether Cluster 58 dual processor, dual core nodes (140 cores),2.4GHz, 8GB RAM, 200GB disk Front-end node(ulgbc3) Infinipath Interconnect Gigabit Ethernet Interconnect 44 dual processor, dual core nodes (176 cores),2.2GHz, 8GB RAM, 72 GB disk Panasas Disk Subsystem (8 TB) SATA RAID Disk Subsystem (5.2 TB) 50 dual processor, quad core nodes (400 cores),2.3GHz, 32GB RAM, 500GB disk 23 dual processor, quad core nodes (184 cores),2.3GHz, 32GB RAM, 500 GB disk 37 dual processor, quad core nodes (296 cores),2.3GHz, 16GB RAM, 500 GB disk 24 TB “Thumper” disk subsystem Totals: 212 nodes, 1196 cores, 37 TB disk = Upgrade in 1Q2008 Front-end node(ulgbc5) 10 gbps

Bipedal Gait Analysis Researchers used 170,000 core hours to estimate the maximum running speeds of Tyrannosaurus rex, Allosaurus, Dilophosaurus, Velociraptor, Compsognathus, an ostrich, an emu and a human T. rex might have been too fast for humans to out-run!

NW-GRID for HEP ● Why? – Liv-HEP systems largely limited to 1 GB memory / processor – NW-GRID nodes minimum 2 GB / core with new Barcelona nodes on stream by end of July - 8 cores per node – The future is multi-core (more on that later) ● How? – CE in Physics connects to multiple nodes on GigE cluster ● Effectively these nodes are part of Liv-HEP ● Details on how this connection is made are being worked out ● Some concerns about network traffic as SE still in Physics – Ideally image nodes for Liv-HEP, re-image for NW-GRID quickly so potential there for moving nodes quickly between NW-GRID and Liv-HEP ● Also want some serial NW-GRID traffic onto Liv-HEP

Why multi-core?? ● Heat – multi-core chip has same thermal profile as its single-core cousin ● Space – Liverpool NW-GRID has ~1200 cores in 8 racks ● Management – Tends to be at the node level - more cores per node, the easier the management

BUT!!

Why not multi-core? ● Exploitation requires separate threads of execution for each core. – Multiple tasks - e.g. N serial processes on N cores – Parallel tasks - N MPI processes on N cores – Parallel threads - single program with “loop- level” parallelism (shared-memory parallel) ● Multi-core shifts the serial bottleneck – Memory access – Communicating off-node (e.g NICs)

Multi-core - good news ● Experience with multi-core nodes at Liverpool suggests – N cores usually better than.8*N performance of 1 core ● HEP codes often have thousands of cases to run - multi-core just needs a good job scheduler. ● Lots of experience and software for developing SMP parallel versions of codes – OpenMP - directives based parallel specification – Good performance on 4-8 threads often fairly easy. – Higher levels of performance often require careful look at load balance over threads. ● Race conditions the difficult debugging problem

Sustainability issues ● Multi-core to reduce heat, ease management just tip of the iceberg! ● How fund refresh of necessary compute infrastructure? ● How cope with needs to reduce watt per flop? ● How exploit new developments taking place?

New Developments ● Heterogeneous architectures – Largely driven by watt per flop considerations – IBM: Opteron and Cell - new petaflops system ● PowerXCell 8i processors ~ 100 Gflop/s – AMD and Nvidia - multi-core “general purpose” GPUs ● AMD: 500 cores, 200 Gflop/s dp at 150W for $1000 ● Nvidia: 1U node: 4 GPU (960 cores), ~800 Gflop/s at 700W for $8,000 ● Think vector processing ● Homogeneous but lots of cores – Sun Niagara cores, 16 threads per core ● (but only one FP unit per core) ● Can put multiple chips on a node

Higher-level developments ● Cloud computing – Package code into a virtual appliance (os+application) then run anywhere on the cloud (Sound familiar?) – Charge per CPU-hr ● Storage services (e.g. Amazon S3) – Charge per Gbyte of storage, plus access – Current commercial systems not slanted for scientific computing ● Idea - build next generation GridPP around these concepts - and sell excess…

NW-GRID and sustainability ● Want to cover costs and refresh over 3 years ● Need additional money – Universities view as a strategic resource – Push computational modelling by University groups for external contracts – Lease hardware to groups with “small” hardware grants – Work with system vendors to act as a “buyer’s friend” – Direct external contracts

External contracts ● Direct industry use impeded by licensing models - this is slowly changing. ● Attract interest of ISV’s to new hardware on site (latest processors+memory+interconnect) ● Target priority areas as defined by funding bodies (e.g. NWDA, TSB) ● Involve regional Knowledge Transfer activities (e.g. Centre for Materials Discovery, NWVEC) ● Become a “cloud computing” site - possible for Daresbury.

Conclusions ● After saying it for 20 years, the age of parallel computing is finally upon us! ● Heat and space constraints will greatly limit choice of future systems. – Associated hardware changes will require some major software redesign ● Must virtualise software stack so can easily run serial code almost anywhere (build with just enough OS). – Lots of work needed to “virtualise” MPI codes ● How do you back-up a petabyte of data?