Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.

Slides:



Advertisements
Similar presentations
Florida Tech Grid Cluster P. Ford 2 * X. Fave 1 * M. Hohlmann 1 High Energy Physics Group 1 Department of Physics and Space Sciences 2 Department of Electrical.
Advertisements

Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.
CHEPREO Tier-3 Center Achievements. FIU Tier-3 Center Tier-3 Centers in the CMS computing model –Primarily employed in support of local CMS physics community.
Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.
Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team.
Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
Computing at COSM by Lawrence Sorrillo COSM Center.
Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.
HPC at IISER Pune Neet Deo System Administrator
Computing/Tier 3 Status at Panjab S. Gautam, V. Bhatnagar India-CMS Meeting, Sept 27-28, 2007 Delhi University, Delhi Centre of Advanced Study in Physics,
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.
Alain Romeyer - 15/06/20041 CMS farm Mons Final goal : included in the GRID CMS framework To be involved in the CMS data processing scheme.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
Block1 Wrapping Your Nugget Around Distributed Processing.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
Online Monitoring and Analysis for Muon Tomography Readout System M. Phipps, M. Staib, C. Zelenka, M. Hohlmann Florida Institute of Technology Department.
Wide Area Network Access to CMS Data Using the Lustre Filesystem J. L. Rodriguez †, P. Avery*, T. Brody †, D. Bourilkov *, Y.Fu *, B. Kim *, C. Prescott.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
D0SAR - September 2005 Andre Sznajder 1 Rio GRID Initiatives : T2-HEPGRID Andre Sznajder UERJ(Brazil)
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.
São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
UMD TIER-3 EXPERIENCES Malina Kirn October 23, 2008 UMD T3 experiences 1.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
NCPHEP ATLAS/CMS Tier3: status update V.Mossolov, S.Yanush, Dz.Yermak National Centre of Particle and High Energy Physics of Belarusian State University.
COMSATS Institute of Information Technology, Islamabad PK-CIIT Grid Operations in Pakistan COMSATS Dr. Saif-ur-Rehman Muhammad Waqar Asia Tier Center Forum.
1 Development of a High-Throughput Computing Cluster at Florida Tech P. FORD, R. PENA, J. HELSBY, R. HOCH, M. HOHLMANN Physics and Space Sciences Dept,
LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,
Tier 3 Status at Panjab V. Bhatnagar, S. Gautam India-CMS Meeting, July 20-21, 2007 BARC, Mumbai Centre of Advanced Study in Physics, Panjab University,
A Silvio Pardi on behalf of the SuperB Collaboration a INFN-Napoli -Campus di M.S.Angelo Via Cinthia– 80126, Napoli, Italy CHEP12 – New York – USA – May.
Status of India CMS Grid Computing Facility (T2-IN-TIFR) Rajesh Babu Muda TIFR, Mumbai On behalf of IndiaCMS T2 Team July 28, 20111Status of India CMS.
CMS Usage of the Open Science Grid and the US Tier-2 Centers Ajit Mohapatra, University of Wisconsin, Madison (On Behalf of CMS Offline and Computing Projects)
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Florida Tier2 Site Report USCMS Tier2 Workshop Livingston, LA March 3, 2009 Presented by Yu Fu for the University of Florida Tier2 Team (Paul Avery, Bourilkov.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL May 8 th, 2008.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Purdue RP Highlights TeraGrid Round Table May 20, 2010 Preston Smith Manager - HPC Grid Systems Rosen Center for Advanced Computing Purdue University.
Evangelos Markatos and Charalampos Gkikas FORTH-ICS Athens, th Mar Institute of Computer Science - FORTH Christos.
INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.
Next Generation of Apache Hadoop MapReduce Owen
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Grid technologies for large-scale projects N. S. Astakhov, A. S. Baginyan, S. D. Belov, A. G. Dolbilov, A. O. Golunov, I. N. Gorbunov, N. I. Gromova, I.
VO Experiences with Open Science Grid Storage OSG Storage Forum | Wednesday September 22, 2010 (10:30am)
What is HPC? High Performance Computing (HPC)
Low-Cost High-Performance Computing Via Consumer GPUs
Grid site as a tool for data processing and data analysis
Cluster / Grid Status Update
Southwest Tier 2.
PK-CIIT Grid Operations in Pakistan
US CMS Testbed.
Composition and Operation of a Tier-3 cluster on the Open Science Grid
Florida Tech Grid Cluster
Presentation transcript:

Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space Sciences Dept. & Computer Sciences Dept., Florida Institute of Technology, 150 W. University Blvd, Melbourne, FL Abstract CMS Software and Computing Hardware Conclusion, Summary & Outlook Being firmly established on the OSG and contributing computing resources to CMS simulations, our site has become an official Tier 3 CMS site - thus concluding Phase I of the project. We are currently adding a dedicated development node with 64GB RAM for running experimental code that can have large memory footprints (such as Expectation Maximization algorithms). We will also be expanding the types of CMS jobs that the cluster can process, including data sets recorded by the CMS detector at the Large Hadron Collider. Visit to follow this project. Software ROCKS Cluster Operating System - We have upgraded the cluster operating system to Rocks 5.0 and optimized the installation profiles of compute nodes and network attached storage. The HPC cluster at Florida Tech is two years into development and has met several milestones that effectively finalize its construction and implementation. The system has been upgraded to the latest versions of the Rocks OS and the Condor batch-job manager. In addition to software upgrades, the cluster has been integrated into the Open Science Grid Production grid and has become an official USCMS Tier-3 compute element, having processed 125,000 hours of CMS data to-date. We have also allowed several faculty members to use our resources alongside our own Muon Tomography simulations. The hardware has been upgraded with top-of-the-line machines resulting in 160 available processor cores. We detail the final design and performance of the cluster, as well as the core configuration of the system. The concept of Tier-3 sites and our participation in the CMS project is outlined. Phase I of the FLTECH cluster hardware has reached completion. The cluster consists of 20 new professionally-built servers with 8 Xeon CPUs and 16GB of RAM in each machine. The Network-attached-storage is a similar machine but with ~10TB of data storage in a RAID6 configuration. User home directories and important research data are stored on the NAS. Figure 4: Machines available to Condor (above), and running Jobs (right) References and Acknowledgments Rocks Clusters User Guide: Open Science Grid: Condor v7.2 Manual: For further information, contact Thanks to Bockjoo Kim (UF-USCMS) and the OSG-GOC for their guidance. Figure 5: A map of CMS Tier 2 and 3 sites. Our site is located on the east coast of Florida. (B. Kim, U. of Florida) Condor Batch-job Manager - Condor has been upgraded to version 7.2, giving the cluster increased security and troubleshooting ability. We have redesigned the batch scheduler to disable job pre-emption - meaning jobs will always run to completion and then give up their slot to a higher-priority job. This is an optimization for grid computing since most batch jobs do not checkpoint (save) their progress periodically. Figure 3: Ganglia Cluster Monitoring Figure 1: New High-end Cluster Hardware (NAS) Open Science Grid & CMS Tier 3 Production Site Figure 2: New cluster topology including all new servers (above). All new hardware incorporated into a single 50U rack (left) Figure 6: Utilization of CMS T3 Sites as monitored by the Gratia tool In the summer of 2008, we moved the cluster registration from the integration test bed to the OSG production grid and began processing real grid workflows. Upon achieving this, we opened our resources to the World LHC Computing Grid for CMS data processing. Due to our optimizations for grid computing, meeting the requirements for processing CMS jobs was painless - and required only a few tweaks to our grid middleware. We are now recognized as a CMS Tier 3 site, and have since contributed well over 125,000 resource hours to CMS. In addition to our contribution of resource hours to the CMS experiment, we have been actively participating in improving the overall operation of the Grid. First, we attended the CMS Tier 3 conference at the Fermi National Lab in Illinois (a Tier 1) where we discussed, along with other Tier 3 admins, the scope and objectives of a Tier 3 site. On the hardware side, we have performed network filesystem tests with the University of Florida in order to determine whether a Lustre filesystem mount is an effective way to distribute software and data between sites. A sample of our results is given below. The benchmarked performance of the cluster is a quarter trillion floating point operations per second (250 GFLOPS). Job Count Wallclock Hours Results from 2 benchmarks: **dd MB/s Write 72.47MB/s Read **bonnie MB/s Write 65.27MB/s Read (Write FIT->UF Read UF->FIT)