Heterogeneous Computation Team HybriLIT

Slides:



Advertisements
Similar presentations
Ravi Sankar Technology Evangelist | Microsoft
Advertisements

1 Dynamic DNS. 2 Module - Dynamic DNS ♦ Overview The domain names and IP addresses of hosts and the devices may change for many reasons. This module focuses.
Planning Server Deployments
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 11 Windows Server 2008 Virtualization.
Jaeyoung Choi School of Computing, Soongsil University 1-1, Sangdo-Dong, Dongjak-Ku Seoul , Korea {heaven, psiver,
Introduction to DoC Private Cloud
1 petaFLOPS+ in 10 racks TB2–TL system announcement Rev 1A.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
© 2010 VMware Inc. All rights reserved VMware ESX and ESXi Module 3.
1 MASTERING (VIRTUAL) NETWORKS A Case Study of Virtualizing Internet Lab Avin Chen Borokhovich Michael Goldfeld Arik.
Vincenzo Vagnoni LHCb Real Time Trigger Challenge Meeting CERN, 24 th February 2005.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
THE AFFORDABLE SUPERCOMPUTER HARRISON CARRANZA APARICIO CARRANZA JOSE REYES ALAMO CUNY – NEW YORK CITY COLLEGE OF TECHNOLOGY ECC Conference 2015 – June.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Configuration of Linux Terminal Server Group: LNS10A6 Thebe Laxmi, Sharma Prabhakar, Patrick Appiah.
SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.
Appendix B Planning a Virtualization Strategy for Exchange Server 2010.
University of Illinois at Urbana-Champaign NCSA Supercluster Administration NT Cluster Group Computing and Communications Division NCSA Avneesh Pant
การติดตั้งและทดสอบการทำคลัสเต อร์เสมือนบน Xen, ROCKS, และไท ยกริด Roll Implementation of Virtualization Clusters based on Xen, ROCKS, and ThaiGrid Roll.
UPPMAX and UPPNEX: Enabling high performance bioinformatics Ola Spjuth, UPPMAX
Module 1: Installing and Configuring Servers. Module Overview Installing Windows Server 2008 Managing Server Roles and Features Overview of the Server.
Linux development on embedded PowerPC 405 Jarosław Szewiński.
Sandor Acs 05/07/
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,
Chapter 2 Introduction to Systems Architecture. Chapter goals Discuss the development of automated computing Describe the general capabilities of a computer.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
CCNA4 v3 Module 6 v3 CCNA 4 Module 6 JEOPARDY K. Martin.
Microsoft Windows XP Professional MCSE Exam
Virtualization Technology and Microsoft Virtual PC 2007 YOU ARE WELCOME By : Osama Tamimi.
Chapter 9: Networking with Unix and Linux. Objectives: Describe the origins and history of the UNIX operating system Identify similarities and differences.
A Silvio Pardi on behalf of the SuperB Collaboration a INFN-Napoli -Campus di M.S.Angelo Via Cinthia– 80126, Napoli, Italy CHEP12 – New York – USA – May.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
How to use HybriLIT Matveev M. A., Zuev M.I. Heterogeneous Computations team HybriLIT Laboratory of Information Technologies (LIT), Joint Institute for.
1 Emulab's Current Support For IXPs: An example of support for non-PCs.
Planning Server Deployments Chapter 1. Server Deployment When planning a server deployment for a large enterprise network, the operating system edition.
Running clusters on a Shoestring US Lattice QCD Fermilab SC 2007.
Advanced Computing Facility Introduction
Enhancements for Voltaire’s InfiniBand simulator
Andrea Chierici Virtualization tutorial Catania 1-3 dicember 2010
Chapter 1 Introducing Windows Server 2012/R2
NFV Compute Acceleration APIs and Evaluation
Cumulus - dynamic cluster available under Clusterix
HPC usage and software packages
OpenMosix, Open SSI, and LinuxPMI
Router Startup and Setup
Bridges and Clouds Sergiu Sanielevici, PSC Director of User Support for Scientific Applications October 12, 2017 © 2017 Pittsburgh Supercomputing Center.
Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node
Diskless Remote Boot Linux
Reducing OS noise using offload driver on Intel® Xeon Phi™ Processor
Oracle Solaris Zones Study Purpose Only
Deploy OpenStack with Ubuntu Autopilot
Networking for Home and Small Businesses – Chapter 2
Advanced Computing Facility Introduction
Support for ”interactive batch”
High Performance Computing in Bioinformatics
Networking for Home and Small Businesses – Chapter 2
Hardware Accelerated Video Decoding in
Operating Systems Networking for Home and Small Businesses – Chapter 2 – Introduction To Networking.
Router Startup and Setup
The Neuronix HPC Cluster:
ENA Cloud Services.
A Virtual Machine Monitor for Utilizing Non-dedicated Clusters
HC VMware Module
Presentation transcript:

Heterogeneous Computation Team HybriLIT HPC Cluster. The Modern Paradigm Matveev Mikhail on behalf of the Heterogeneous Computation Team HybriLIT Laboratory of Information Technologies, JINR Complex GIMM FPEIP is intended for simulation of thermal processes in materials irradiated by heavy ion beams. July 5 Dubna, GRID 2016

Content Essentials of HybriLIT cluster configuration; The network boot method; Upgrades: SLURM, CVMFS, Modules.

Essentials of HybriLIT cluster configuration 7 blades include specific GPU accelerator sets. Driven by NVIDIA CUDA software. 1 blade includes 2 PHI accelerators. Driven by Intel MPSS software. 1 blade includes 1 PHI and 1 GPU accelerators. Mixed NVIDIA CUDA and Intel MPSS software. 1 blade includes 2 multi-core CPU processors. Large ~7 Tb storage area Presently the HybriLIT cluster includes 10 distinct computation physical nodes in terms of: gpu, cpu & phi

The network boot method Typical diskless boot Our diskless boot to ramfs PC power ON, Network stack loads to RAM; Ask IP and IP of TFTP server from DHCP server Linux kernel loads to RAM Linux initrd loads to RAM (modified initrd to network boot) Mount network filesystem Start modified init Linux services starts PC power ON, Network stack loads to RAM; Ask IP and IP of TFTP server from DHCP server Linux kernel loads to RAM Linux initrd loads to RAM ( modified initrd to network boot) Load and unpack-by-the-fly packed image with filesystem to RAM 7. Start modified init 8. Linux services starts 9. Our cluster services starts nanoramfs

The network boot method: nanoramfs We have prepared an image interface that includes commands of network settings. This small image is called nanoramfs. It is downloaded during the load of the core operation system. Hi! My name is blade04. Please, load me with gpu tasks! Hi! My name is blade03. Have you phi tasks for me? Hi! My name is blade02. You can load me with any task. Hi! How are you doing?

The network boot method: ramfs The ramfs image comes next. It includes commands for setting specific computation elements such as cpu, gpu, phi, as well as associated software services. Thank you! I can use my gpu accelerators! Thank you! I can use my phi accelerators! Just wait a few second, please. After this reboot process the computation node becomes available for users` tasks

The network boot method: general Three basic purposes: To enable dynamic cluster extension in the future by allowing quick add of new computation nodes to the structure of the cluster; To make simultaneous software changes or upgrades at all computation nodes; To get quick setup of nodes, including errors after reboot.

SLURM: accommodation of the new blades Simple Linux Utility for Resource Management interactive (1xblade =1 mic, 1 gpu, 2 cpu sockets), cpu (2xblades=4 cpu sockets ), gpu (4xblades=12 gpu sockets), phi (1xblade =2 mic sockets), gpuK80 (3xblades=8 gpu sockets). Blade specifications: 2x Intel Xeon E5-2695 v3 2x 14 cores; 4x NVIDIA Tesla K80 4x 4992 CUDA cores; 512 Gb RAM NEW CALCULATION NODE CPU cores GPU cores PHI cores 224+28 57216+19968 182

CVMFS: network access to the CERN repository CERN Virtual Machine File System The operational CVMFS interface at HybriLIT has two main characteristics: Reservation of 32 Gb SSD storage at each node for CVMFS packages; Dedicated extension of MODULES environment. hybrilit cern Task with ID 2447 uses blade02 by CUDA 7.5 CUDA 7.5 Each node includes ssd storage with 32 Gb capacity for CVMFS packages. ROOT Task with ID 2448 uses blade04 by ROOT

MODULES: CVMFS devoted newly added modules [user@hydra] module avail hlit/opencv/2.4.9 hlit/cuda/5.5 hlit/fairsoft/nov15 hlit/openmpi/1.6.5 hlit/cuda/6.0 hlit/gcc/4.8.4 hlit/openmpi/1.8.1 hlit/cuda/6.5 hlit/gcc/4.9.3 hlit/magma/2.0.0 hlit/cuda/6.5 hlit/java/jdk-1.6.0_45 hlit/scotch/6.0.4 hlit/cuda/7.0 hlit/java/jdk-1.7.0_60 hlit/zeromq/4.1.3 hlit/cuda/7.5 hlit/java/jdk-1.8.0_05 hlit/zyre/1.1.0 hlit/czmq/3.0.2 … and so on for HybriLIT own modules …

Conclusions The developed hardware-software environment of the HybriLIT cluster secures efficient system administration; Its features match the requirements of scalability and high fault tolerance; Network access to remote software resources secures efficient fulfillment of users` needs; The acquired expertise creates hope for connection of resources of remote heterogeneous clusters.

Thank you for your attention!