Wellcome Trust Sanger Institute Informatics Systems Group Ensembl Compute Grid issues James Cuff Informatics Systems Group Wellcome Trust Sanger Institute.

Slides:



Advertisements
Similar presentations
Site Report: The Linux Farm at the RCF HEPIX-HEPNT October 22-25, 2002 Ofer Rind RHIC Computing Facility Brookhaven National Laboratory.
Advertisements

Computing Infrastructure
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
HPC in the Human Genome Project James Cuff
Cloud Computing at Amazon’s EC2 Joe Steele
Network Administration Procedures Tools –Ping –SNMP –Ethereal –Graphs 10 commandments for PC security.
IPPS 981 Berkeley FY98 Resource Working Group David E. Culler Computer Science Division U.C. Berkeley
UK -Tomato Chromosome Four Sarah Butcher Bioinformatics Support Service Centre For Bioinformatics Imperial College London
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
1 Objectives Discuss the Windows Printer Model and how it is implemented in Windows Server 2008 Install the Print Services components of Windows Server.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Computerized Networking of HIV Providers Networking Fundamentals Presented by: Tom Lang – LCG Technologies Corp. May 8, 2003.
1 Copyright © 2009, Oracle. All rights reserved. Exploring the Oracle Database Architecture.
1 Networks, advantages & types of What is a network? Two or more computers that are interconnected so they can exchange data, information & resources.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Introduction to Computers Personal Computing 10. What is a computer? Electronic device Performs instructions in a program Performs four functions –Accepts.
Peter Clapham Informatics Support Group. About the Institute ● Funded by Wellcome Trust. ● 2 nd largest research charity in the world. ● ~700 employees.
Databases and the Internet. Lecture Objectives Databases and the Internet Characteristics and Benefits of Internet Server-Side vs. Client-Side Special.
Current Job Components Information Technology Department Network Systems Administration Telecommunications Database Design and Administration.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
3rd Nov 2000HEPiX/HEPNT CDF-UK MINI-GRID Ian McArthur Oxford University, Physics Department
| nectar.org.au NECTAR TRAINING Module 5 The Research Cloud Lifecycle.
Windows Small Business Server 2003 Setting up and Connecting David Overton Partner Technical Specialist.
How computer’s are linked together.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Module 1: Configuring Windows Server Module Overview Describe Windows Server 2008 roles Describe Windows Server 2008 features Describe Windows Server.
CERN IT Department CH-1211 Geneva 23 Switzerland t Daniel Gomez Ruben Gaspar Ignacio Coterillo * Dawid Wojcik *CERN/CSIC funded by Spanish.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
Kurt Mueller San Diego Supercomputer Center NPACI HotPage Updates.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.
LAT HSK Data Handling from B33 Cleanroom. ISOC Software Architecture.
The CRI compute cluster CRUK Cambridge Research Institute.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
CASPUR Site Report Andrei Maslennikov Lead - Systems Amsterdam, May 2003.
The impacts of climate change on global hydrology and water resources Simon Gosling and Nigel Arnell, Walker Institute for Climate System Research, University.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation and performance analysis of.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
A proposal: from CDR to CDH 1 Paolo Valente – INFN Roma [Acknowledgements to A. Di Girolamo] Liverpool, Aug. 2013NA62 collaboration meeting.
Batch Software at JLAB Ian Bird Jefferson Lab CHEP February, 2000.
CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.
AMGA-Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 05 July 2006.
CCJ introduction RIKEN Nishina Center Kohei Shoji.
AZ PASS User Group Azure Data Factory Overview Josh Sivey, Solution Partner October
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
Chapter 3 Getting Started. Copyright © 2005 Pearson Addison-Wesley. All rights reserved. Objectives To give an overview of the structure of a contemporary.
Compute and Storage For the Farm at Jlab
REDCap General Overview
Open OnDemand: Open Source General Purpose HPC Portal
High Availability Linux (HA Linux)
Consulting Services JobScheduler Architecture Decision Template
Chapter 2: System Structures
NGS Oracle Service.
Cloud based Open Source Backup/Restore Tool
Chapter 27: System Security
Cloud computing mechanisms
Background: Currently CCP4i puts each structure determination into a separate project directory, and automatically keeps a “Project History Database” recording.
Production Manager Tools (New Architecture)
Presentation transcript:

Wellcome Trust Sanger Institute Informatics Systems Group Ensembl Compute Grid issues James Cuff Informatics Systems Group Wellcome Trust Sanger Institute

Informatics Systems Group The Problem Find all the genes and syntenic regions between species 4.3Gb DNA, 480,000 sequences (human) 16 odd million traces (mouse) 7x coverage of mouse (3Gb) [lets not even talk about fugu, zebrafish, mosquito, rat etc.] 16+ analysis types –Automatic submission of jobs –Dependencies between them –Track their progress –Retry failed jobs –Need access to large file based databases –Store output –It must be easy to include new analysis types

Wellcome Trust Sanger Institute Informatics Systems Group e ! - System overview Based on MySQL relational database and Perl Submission/tracking independent of the data Reusable analysis components –Standalone objects –Database aware objects Simple interfaces for rapid development Open Source, Open Standards…

Wellcome Trust Sanger Institute Informatics Systems Group The Compute >1142 hosts (1,200CPUs) in one LSF cluster 360 Alpha DS10L’s 1GB, 60GB, 467MHz 768 Intel RLX blades, 1GB, 80GB, 800MHz 6xES45, 8xES40s 667 / 1000 MHz, 8-16GB 10+ TB of Fibre Channel storage

Wellcome Trust Sanger Institute Informatics Systems Group Typical CPU usage (last week) 768 nodes for >1 day ~ 2 years of CPU I/O and CPU sustain is totally non-trivial

Wellcome Trust Sanger Institute Informatics Systems Group Contingency cluster backup engines +storage 8 X ES x DS20 SAN attached Tape silos SAN Backup/ mirrors Ensembl cluster 8 X ES40, 6 X ES40 Large scale assembly, sequencing & trace data 19 X ES40, 4 X DS20 Front-end Compute Servers Desk top devices Pathogen 15 x ES ds10 alpha Oracle Cluster 6xDS20 2xES40 Informatics Development 5xES40 PFAM SAN attached Tape libraries GS way 128GB mem. Extranet Web Cluster 2X ES40 0.5Tb disk Internal Router FIREWALL DMZ The ‘Internet’ Mail-hub, local ftp, secure login, Aceserver, Dial-in hubs Ensembl web Blast services 12 ES40 + 6TB storage Cancer Project X-linked disease 4 X ES40 4Tb disk High throughput Farm 768 RLX nodes GS way 128GB mem. Humgen 8 X ES45 Sanger Compute User X at Institute Y ?

Wellcome Trust Sanger Institute Informatics Systems Group Whitehead Collaboration (the problem) blastn all by all comparison: –WI and TIGR Human BAC ends (800k entries) against Human entries in Genbank (5.7GB) Existing pipeline from WI Java JDBC / Oracle pipeline based on XML Tight 2 week time frame (as always!)

Wellcome Trust Sanger Institute Informatics Systems Group Whitehead Collaboration (the solution?) ssh / scp / ftp access to Sanger and WI systems… 2 weeks to run and setup: –Oracle instance –Set up user account, familiarisation with system –Oracle dumps, copy ddl and input results –Total data size: 21GB I/O –System failures (recovery) –A great many telephone / discussions Only took 2 days total compute on just 360 nodes…

Wellcome Trust Sanger Institute Informatics Systems Group Computational Farms NFS/CIFS/AFS (network share) meltdown –Creation of batch scripts (100,000’s of jobs – some take < 1min) –Reading NFS-mounted binaries –Reading NFS-mounted data files –Writing output to NFS-mounted directories MySQL / Oracle meltdown –Too many simultaneous connections –Queries blocking each other LSF mbatchd meltdown (DRM failure in general) –Broken code in general – both developer and sysadmin error Even when you are supposed to… “Know what you are doing…” (and their equivalent of foot and mouth)

Wellcome Trust Sanger Institute Informatics Systems Group External CPU and Data Collaborations (How would an ‘ ideal world GRID ’ help?) Rapid data distribution to and from SI and external site? Zero to little setup time? ‘Direct’ connections to remote Oracle/MySQL instances at Sanger (i.e. via replication)? No need for local account [shell] access? Single ‘system image’ – e.g. no need to find out where java/perl/binaries live, how the queues work etc.?

Wellcome Trust Sanger Institute Informatics Systems Group MySQL – remote access DS20, 250GB Alpha in DMZ with Ensembl data From cisco firewall logs, 1 st Oct 2001 to 1 st Oct 2002: – 159,251 port 3306 TCP connections – Corresponds to 1,016 unique hosts – 348 hosts with more than 10 connections