LGI Leiden Grid Infrastructure Dr. M. F. Somers Theoretical Chemistry Leiden Institute of Chemistry University of Leiden

Slides:



Advertisements
Similar presentations
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
Advertisements

Bookshelf.EXE - BX A dynamic version of Bookshelf –Automatic submission of algorithm implementations, data and benchmarks into database Distributed computing.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
Introduction to DoC Private Cloud
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
Understanding and Managing WebSphere V5
Google AppEngine. Google App Engine enables you to build and host web apps on the same systems that power Google applications. App Engine offers fast.
SEEM4570: XAMPP, Eclipse, Summary of Html Kangfei Zhao Room 711,ERB
22-Aug-15 | 1 |1 | Help! I need more servers! What do I do? Scaling a PHP application.
Barracuda Load Balancer Server Availability and Scalability.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
GRID job tracking and monitoring Dmitry Rogozin Laboratory of Particle Physics, JINR 07/08/ /09/2006.
Sakai/OSP Portfolio UvA Bas Toeter Universiteit van Amsterdam
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
Designing and Deploying a Scalable EPM Solution Ken Toole Platform Test Manager MS Project Microsoft.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
June 6, 2007TeraGrid '071 Clustering the Reliable File Transfer Service Jim Basney and Patrick Duda NCSA, University of Illinois This material is based.
PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
Enabling cache for monitoring application Alexandre Beche.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
Installation and Configuration of A-REX Iván Márton (NIIFI) Zsombor Nagy (NIIFI)
Tier 3 Status at Panjab V. Bhatnagar, S. Gautam India-CMS Meeting, July 20-21, 2007 BARC, Mumbai Centre of Advanced Study in Physics, Panjab University,
+ AliEn site services and monitoring Miguel Martinez Pedreira.
FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.
AMGA-Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 05 July 2006.
NCBI Grid Presentation. NCBI Grid Structure NetCache NetSchedule Load Balancer (LBSM) Load Balancer (LBSM) Worker Nodes CGI Gateway.
Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015 The ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka,
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
GSU-Schedule File Transformation Tools Presented by: Radhika Eedara Pratima Dharmala Phanendar Movva Advisor: Dr. Soon Ok Park CPSC Spring 2016.
The LGI Pilot job portal EGI Technical Forum 20 September 2011 Jan Just Keijser Willem van Engen Mark Somers.
1 Design and Implementation of a High-Performance Distributed Web Crawler Polytechnic University Vladislav Shkapenyuk, Torsten Suel 06/13/2006 석사 2 학기.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
Lessons learned administering a larger setup for LHCb
Servizi core INFN Grid presso il CNAF: setup attuale
Fundamental of Databases
Jean-Philippe Baud, IT-GD, CERN November 2007
CernVM-FS vs Dataset Sharing
Understanding and Improving Server Performance
RHEV Platform at LHCb Red Hat at CERN 17-18/1/17
Building Scalable Resilient Websites in Azure
Remote execution of long-running CGIs
High Availability Linux (HA Linux)
How to connect your DG to EDGeS? Zoltán Farkas, MTA SZTAKI
Database backed DNS.
Web Hosting with OpenShift
Work report Xianghu Zhao Nov 11, 2014.
PROOF – Parallel ROOT Facility
Securing the Network Perimeter with ISA 2004
StratusLab Tutorial (Bordeaux, France)
Introduction to CVMFS A way to distribute HEP software on cloud
German Cancio CERN IT .quattro architecture German Cancio CERN IT.
Virtualization in the gLite Grid Middleware software process
Cloud based Open Source Backup/Restore Tool
Xen Summit Spring 2007 Platform Virtualization with XenEnterprise
2018 Amazon AWS DevOps Engineer Professional Dumps - DumpsProfessor
CS6604 Digital Libraries IDEAL Webpages Presented by
湖南大学-信息科学与工程学院-计算机与科学系
NCSA Supercluster Administration
Moodle Scalability What is Scalability?
Introduction to Apache
Support for ”interactive batch”
Alice Software Demonstration
AWS Cloud Computing Masaki.
The EU DataGrid Fabric Management Services
Distributing META-pipe on ELIXIR compute resources
Cloud Security AWS as an example.
Presentation transcript:

LGI Leiden Grid Infrastructure Dr. M. F. Somers Theoretical Chemistry Leiden Institute of Chemistry University of Leiden July 2010.

Outline Why LGI? Design of LGI. Current setup & example session. Scalability & performance. LGI as pilot job framework. Possible storage applications.

Why LGI? - Bundle clusters, supercomputers, grids, BOINC project & workstations behind firewalls, routers & outside our administrative domain. - Only few (high-performance) applications hard to install (some licensed, binary only or very system specific). - Easy to setup, administrate, manage & scale. - Easy interface for users.

Design of LGI

- Use x509 certificates for XML over https & LAMP stack as project server. NO PROXY's! - Project is nothing more than a database & single server can run multiple projects. - User management on project server & on resources: a global LGI user namespace per project & in x509 certs. - Scheduling & QoS possible on project server. Currently FIFO & FCFS & heartbeat. - Three general interfaces: web, CLI & Python. Other specific interfaces use API.

Design of LGI - Each job in DB has 'input' & 'output' blobs with application specific content. - Each job has repository (URL on a server) to upload & download big files. - Each job has application specific XML field 'JobSpecifics' for references to repository & possible restart data in other repositories. - Each job can be scheduled / submitted to specific / 'any' resource that reported back support for application.

Design of LGI - Daemon in C++ with libcURL & STL runs as non- root. If run as root: automatic sand boxing. - Daemon caches on disk & several daemons can run concurrently on resource. - Daemon monitors project server for jobs & 'abort' signals / rescheduling of jobs due to lost heartbeats / errors. Reports capabilities. - Implement applications through back-end scripts: Torque/PBS, LoadLeveler, BOINC, gLite, forking, SGE...

Example daemon config /home/mark/LGI/certificates/LGI+CA.crt /home/mark/LGI/daemon/runhere 20 1 LGI hello_world /proc/cpuinfo /home/mark/LGI/daemon/hello_world_scripts/check_system_limits_script /home/mark/LGI/daemon/hello_world_scripts/job_check_limits_script /home/mark/LGI/daemon/hello_world_scripts/job_check_running_script /home/mark/LGI/daemon/hello_world_scripts/job_check_finished_script /home/mark/LGI/daemon/hello_world_scripts/job_prologue_script /home/mark/LGI/daemon/hello_world_scripts/job_run_script /home/mark/LGI/daemon/hello_world_scripts/job_epilogue_script /home/mark/LGI/daemon/hello_world_scripts/job_abort_script

Current setup in Leiden - Applications: SA-DVR, SPO-DVR, CPMD, Gaussian, MolPro, ADF, VASP, nonmem, Classical & hello_world. - Resources: Huygens, Lisa, Boeddha, Nocona, Woodcrest, Harpertown, Nehalem, Gainestown, DAS3, gLite/globus through gridui & gliteui & our own BOINC-project.

Example session Demonstration of CLI & WEB interface: …students run Gaussian on grid within minutes...

Performance 2 CPU 3.0 GHz Xeon EM64T Nocona with HT, 4 GB DDRII 400MHz RAM, 74 GB 10k rpm HD, 100Mb Ethernet on SL 4.3: 100k jobs in database, 100 concurrent resources using '-ft 5' & 4 KB RSA keys. load of 30 (84% user, 15% system), MySQL 15% to 50% CPU with 300 MB RAM & 2.6 GB virtual, DB 250 MB on disk, no swapping, 1s, no queries >5s.

Possible Should be possible to have >1M jobs & ~10k resources on ~3k euro hardware in a single server setup under normal conditions. Have not tested this... yet !

Scalability Several project servers as slaves with each a separate MySQL & storage back-end. or Load balance a single 'project server' with firewall / DNS round robin to several Apache+PHP servers, run MySQL cluster across servers & use GlusterFS to scale storage for repository.

LGI as pilot job framework - Use a resource daemon as pilot job manager to schedule pilot jobs on grid. - Each pilot job on grid is a resource daemon with application binaries & configuration polling a project server. - If no work for some time pilot jobs stop automatically but some pilot jobs are kept alive for fast response on grid. - Under development by Willem v. Engen & JanJust Keijser at NIKHEF. - Testing & reporting fase.

Possible storage applications 1) Resource accepts 'file_storage' job. 2) Resource stores data from repository or other URL part of input any way it likes. 3) Resource creates link / reference to new storage location. Reference could be web interface using x509 & https. 4) Resource finished job with reference as job output. 5) Resource prunes storage / keeps track of storage job by checking job repository existence.

Support LGI is supported by NWO/NCF & NIKHEF... Thanks...