Willkommen Welcome Bienvenue How we work with users in a small environment Patrik Burkhalter.

Slides:



Advertisements
Similar presentations
Computing Infrastructure
Advertisements

VMWare to Hyper-V FOR SERVER What we looked at before migration  Performance – Hyper-V performs at near native speeds.  OS Compatibility – Hyper-V.
PRAKTICKÝ ÚVOD DO SUPERPOČÍTAČE ANSELM Infrastruktura, přístup a podpora uživatelů David Hrbáč
Information Technology Center Introduction to High Performance Computing at KFUPM.
1 Web Server Administration Chapter 3 Installing the Server.
1.1 Installing Windows Server 2008 Windows Server 2008 Editions Windows Server 2008 Installation Requirements X64 Installation Considerations Preparing.
Software: Systems and Application Software
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
Introduction to Flash Jeremy Johnson & Frank Witmer Computing and Research Services 8 Jul 2014.
ILLiad Migration & Server Upgrade: From Your Library's' IT Point of View Juan Denzer Library System Specialist August 1, 2013.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Paper on Best implemented scientific concept for E-Governance Virtual Machine By Nitin V. Choudhari, DIO,NIC,Akola By Nitin V. Choudhari, DIO,NIC,Akola.
THE QUE GROUP WOULD LIKE TO THANK THE 2013 SPONSORS.
Client Management. Introduction In a typical organization there are a lot of client machines used for day to day operations Client management is a necessary.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Introduction to UNIX/Linux Exercises Dan Stanzione.
Fundamentals of Networking Discovery 1, Chapter 2 Operating Systems.
1 Web Server Administration Chapter 3 Installing the Server.
Paper on Best implemented scientific concept for E-Governance projects Virtual Machine By Nitin V. Choudhari, DIO,NIC,Akola.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010.
SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.
Copyright ®xSpring Pte Ltd, All rights reserved Versions DateVersionDescriptionAuthor May First version. Modified from Enterprise edition.NBL.
Local Area Networks (LAN) are small networks, with a short distance for the cables to run, typically a room, a floor, or a building. - LANs are limited.
Introduction to the HPCC Jim Leikert System Administrator High Performance Computing Center.
1 Web Server Administration Chapter 3 Installing the Server.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
HPC at HCC Jun Wang Outline of Workshop1 Overview of HPC Computing Resources at HCC How to obtain an account at HCC How to login a Linux cluster at HCC.
Introduction to the HPCC Dirk Colbry Research Specialist Institute for Cyber Enabled Research.
Module 1: Installing and Configuring Servers. Module Overview Installing Windows Server 2008 Managing Server Roles and Features Overview of the Server.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Lab System Environment
© Blackboard, Inc. All rights reserved. Deploying a complex building block Andre Koehorst Learning Lab Universiteit Maastricht, the Netherlands July 18.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Computer Component. A computer is a machine that is used to store and process data electronically Computer Definition.
 Load balancing is the process of distributing a workload evenly throughout a group or cluster of computers to maximize throughput.  This means that.
1 Lattice QCD Clusters Amitoj Singh Fermi National Accelerator Laboratory.
Welcome to the PVFS BOF! Rob Ross, Rob Latham, Neill Miller Argonne National Laboratory Walt Ligon, Phil Carns Clemson University.
ITGS Network Architecture. ITGS Network architecture –The way computers are logically organized on a network, and the role each takes. Client/server network.
PTA Linux Series Copyright Professional Training Academy, CSIS, University of Limerick, 2006 © Workshop V Files and the File System Part B – File System.
An answer to your common XACML dilemmas Asela Pathberiya Senior Software Engineer.
CSC190 Introduction to Computing Operating Systems and Utility Programs.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
GCSE Computing: A451 Computer Systems & Programming Topic 3 Software System Software (1) The Operating System.
MEMORY is part of the Central Processing Unit, or CPU, where data and information are stored. There are two main types of memory in a computer – RAM.
System Software (1) The Operating System
UNIX U.Y: 1435/1436 H Operating System Concept. What is an Operating System?  The operating system (OS) is the program which starts up when you turn.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Speed Cash System. Purpose of the Project  online Banking Transaction Information.  keeping in view of the distributed client server computing technology,
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Advanced Computing Facility Introduction
Compute and Storage For the Farm at Jlab
Welcome to Indiana University Clusters
Cluster / Grid Status Update
UBUNTU INSTALLATION
Heterogeneous Computation Team HybriLIT
Computer Software.
VI-SEEM Data Discovery Service
Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node
Introduction to Computers
Overview Introduction VPS Understanding VPS Architecture
College of Engineering
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Advanced Computing Facility Introduction
Press ESC for Startup Options © Microsoft Corporation.
High Performance Computing in Bioinformatics
Introduction to High Performance Computing Using Sapelo2 at GACRC
IST346: Operating Systems / Command Line Interfaces
Presentation transcript:

Willkommen Welcome Bienvenue How we work with users in a small environment Patrik Burkhalter

How we work with users in a small environment Patrik Burkhalter System administrator HPC cluster at Empa At Empa since 2012 Linux system admin before Empa (mainly web, db and app servers)

Situation at Empa Agenda Situation at Empa Cluster support User support Enforcement

Situation at Empa At the moment, we have 2 clusters at Empa Ipazia, the cluster which we have since 2006 Hypatia, the new cluster which we have built this year The computing nodes from the old cluster will be dettached from Ipazia and connected to Hypatia step by step

Situation at Empa Ipazia the Empa HPC cluster 102 nodes (Dell) Built with the help of Partec and CSCS Parastation cluster middleware from Partec Torque resource manager Maui scheduler Infiniband DDR interconnect Lustre file systems

Situation at Empa Ipazia hardware Front end node PowerEdge * Intel(R) Xeon(R) CPU 2.33GHz (4 cores) 4GB RAM 1TB shared /home Computing nodes Node : deactivated, old 4 core pizza boxes Node : PowerEdge M605 2 * Quad-Core AMD Opteron(tm) Processor GB RAM Node 47…102: PowerEdge M610 2 * Intel(R) Xeon(R) CPU 2.53GHz 24GB RAM

Situation at Empa Hypatia the new Empa cluster Built from scratch by Empa 32 nodes in 2 Dell M1000e chassis Torque resource manager Maui scheduler Infiniband FDR interconnect Lustre file systems Know-How (we have support for the SAN units) Well documented In production. Nodes from Ipazia are getting migrated to Hypatia soon.

Situation at Empa Hypatia hardware Front end node PowerEdge R620 2 * Intel(R) Xeon(R) CPU E GHz (16*2 cores, hyper threading) 32GB RAM Computing nodes PowerEdge M620 2 * Intel(R) Xeon(R) CPU E GHz (16 cores) 64GB RAM

Situation at Empa pbstop on Ipazia

Situation at Empa pbstop on Hypatia (new cluster)

Situation at Empa Lustre storage available to both clusters 25TB for backuped data (/project) 35TB speed optimized space (/scratch) Due to the amount of disks

Situation at Empa We changed our support model this year from external support to inhouse support. Why did we do this? We felt confident, that it is possible We can save money on the service contracts We can now fix (almost) everything by ourselves We can provide a better user support, because we have a deeper understanding How did we minimize the risk that we break the cluster We built a new cluster and did leave the running cluster alone A lot of users are using the new cluster already We can migrate the nodes to the new cluster when the stability is proven

Situation at Empa Ipazia Pizza nodes removed 2 new chassis 1 new front end 1 new SSD storage

Situation at Empa Support team Daniele Passerone (5% FTE) Carlo Pignedoli (5% FTE) Patrik Burkhalter (50% FTE)

Cluster Support Agenda Situation at Empa Cluster support User support Enforcement

Cluster Support Support we provide Introduction to basic Linux usage Connecting to the system using a SSH client Linux basic commands File system hierarchy Introduction of new users to the cluster Planning of future jobs Reservation of nodes for users Installation, compilation and testing of new software GNU and Intel compilers, MPI (openmpi/mvapich2), OpenFOAM, Abaqus Every software requested by the user System updates Hardware, OS Software updates Acquiring and installing new hardware New nodes GPU node Replacing failed hardware

Cluster Support Documentation of the cluster architecture

Cluster Support Documentation of the cluster usage

Cluster Support Lustre file system maintenance and extension At the moment, we are migrating our Lustre file systems workspc and storage to project and scratch while the file systems are online 1 complete new file system named project using new hardware SSD Meta data (MDT)

Cluster Support Lustre file system maintenance and extension 1 new file system scratch out of the file systems workspc and storage We deactivate one OST per fs from the old file systems We are using `lfs find’ to find the files having stripes on the deactivated OSTs We copy the files to a new location on the same fs Finally, we move them to the origin location

Cluster Support OST gets disabled temporary on the ionode This makes sure the OST will stay readable lctl dl | grep ‘ osc ‘ lctl –-device deactivate

Cluster Support Migration for files with an access time > 14 days Copies quickly but is kind of dirty TMPDIR="/mnt/storage/tmp" for i in $(lfs find --obd storage-OST atime +14 /mnt/storage); do DIR=$(dirname $i) FILE=$(basename $i) TMPPATH="$TMPDIR/$FILE"; SRCPATH="$DIR/$FILE"; # testing above values, continue to next entry if one test fails echo -en "$SRCPATH: " cp -p $SRCPATH $TMPPATH || exit 1 mv $TMPPATH $SRCPATH || exit 1 echo done done

Cluster Support Migration for newer files Checks if file was changed during the migration process Does not check if file is open on another node Therefore we only touch users which have no jobs and no running processes on the front end node lfs find --obd storage-OST0003 /mnt/storage/pbu | lfs_migrate -y

Cluster Support After the migration, the nodes gets deactivated permanently lctl conf_param storage-OST0003.osc.active=0

Cluster Support Situation after the migration

Cluster Support Problems we experience during the migration A lot of small files are hard to migrate The user tends to “hoard” data

Cluster Support We also provide several shell environments for the users to ease up the cluster usage. We are using the Modules environment ( A module can be loaded with the command: `module load / ` The module sets the user environment variables as defined in the module We provide modules for each self compiled app and library This is particular handy for users which like to compile their own software We started to use this approach this year

Cluster Support Modules on Ipazia

Cluster Support Modules on Hypatia New modules are getting installed by user request

Cluster Support Example output of a module A simple module for ffmpeg We are trying to get rid of LD_LIBRARY_PATH and use RPATH instead This makes sure that a compiled binary uses the proper libraries independently from the user environment The module concept was new to our users but was accepted well

User Support Agenda Situation at Empa Cluster support User support Enforcement

Situation at Empa Users from Empa and Eawag ~120 users 40 active users in the last 30 days last | awk '{print $1}' | sort | uniq | wc –l

User Support Typical vendor to customer situation does not work at Empa We cannot provide a Service Level Agreement (SLA) We only can provide support on a best effort basis No support during the night or on weekend Unplanned down time can happen

User Support Typical IT user support does not work We cannot offer out of the box solution We don’t like to “just solve the problem now” We often don’t know the solution right now

User Support User as partner does work best for us The user gets threaded as equal. “If you think your users are idiots, only idiots will use it.” Linus Torvalds

User as a Partner The user has a strong scientific know how and sometimes just uses the software The engineer has a strong know how about clusters, but this means: A request by a scientist has to be reduced to the point at which the engineer is able to understand it The problem gets fixed by the engineer The solution gets communicated to the scientist in detail, until the scientist understands the particular situation It gets tested by the user It is important that each side understands the issue, otherwise potential optimization of the system gets lost.

User as a Partner If an user is experienced, tasks are getting delegated to the user. This could be: Compilation of apps and libraries Testing of a new package Problem analysis The solution always gets deployed by root to make sure all standards are fulfilled. If it is in the repository of our Linux distribution, it gets installed using the package manager If it is too old or not available, it gets compiled and installed in /share/apps or /share/libs The are modules provided to set the user environment module load / Our software gets compiled on a computing node and installed on the share file system

User as a Partner Example, Abaqus A Finite Element Method (FEM) software used by the mechanical systems engineering department of Empa. The users have a strong background in mechanical engineering The users are using Abaqus on Windows to engineer parts We made a wrapper to simplify the job submission

Enforcement Agenda Situation at Empa Cluster support User support Enforcement

At the moment, we only do enforcement of: Obviously - the root password is not given to the users Disk quotas are in place (size and inodes) Maui scheduling configuration Optimization is planned for Hypatia, the new cluster

Enforcement login screen provides some information to make the user aware of the cluster situation

Thanks for listening Any questions, thoughts?