Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wellcome Trust Sanger Institute Informatics Systems Group Ensembl Compute Grid issues James Cuff Informatics Systems Group Wellcome Trust Sanger Institute.

Similar presentations


Presentation on theme: "Wellcome Trust Sanger Institute Informatics Systems Group Ensembl Compute Grid issues James Cuff Informatics Systems Group Wellcome Trust Sanger Institute."— Presentation transcript:

1 Wellcome Trust Sanger Institute Informatics Systems Group Ensembl Compute Grid issues James Cuff Informatics Systems Group Wellcome Trust Sanger Institute

2 Informatics Systems Group The Problem Find all the genes and syntenic regions between species 4.3Gb DNA, 480,000 sequences (human) 16 odd million traces (mouse) 7x coverage of mouse (3Gb) [lets not even talk about fugu, zebrafish, mosquito, rat etc.] 16+ analysis types –Automatic submission of jobs –Dependencies between them –Track their progress –Retry failed jobs –Need access to large file based databases –Store output –It must be easy to include new analysis types

3 Wellcome Trust Sanger Institute Informatics Systems Group e ! - System overview Based on MySQL relational database and Perl Submission/tracking independent of the data Reusable analysis components –Standalone objects –Database aware objects Simple interfaces for rapid development Open Source, Open Standards…

4 Wellcome Trust Sanger Institute Informatics Systems Group The Compute >1142 hosts (1,200CPUs) in one LSF cluster 360 Alpha DS10L’s 1GB, 60GB, 467MHz 768 Intel RLX blades, 1GB, 80GB, 800MHz 6xES45, 8xES40s 667 / 1000 MHz, 8-16GB 10+ TB of Fibre Channel storage

5 Wellcome Trust Sanger Institute Informatics Systems Group Typical CPU usage (last week) 768 nodes for >1 day ~ 2 years of CPU I/O and CPU sustain is totally non-trivial

6 Wellcome Trust Sanger Institute Informatics Systems Group Contingency cluster backup engines +storage 8 X ES40 + 2 x DS20 SAN attached Tape silos SAN Backup/ mirrors Ensembl cluster 8 X ES40, 6 X ES40 Large scale assembly, sequencing & trace data 19 X ES40, 4 X DS20 Front-end Compute Servers Desk top devices Pathogen 15 x ES40 360 ds10 alpha Oracle Cluster 6xDS20 2xES40 Informatics Development 5xES40 PFAM SAN attached Tape libraries GS320 32-way 128GB mem. Extranet Web Cluster 2X ES40 0.5Tb disk Internal Router FIREWALL DMZ The ‘Internet’ Mail-hub, local ftp, secure login, Aceserver, Dial-in hubs Ensembl web Blast services 12 ES40 + 6TB storage Cancer Project X-linked disease 4 X ES40 4Tb disk High throughput Farm 768 RLX nodes GS320 32-way 128GB mem. Humgen 8 X ES45 Sanger Compute User X at Institute Y ?

7 Wellcome Trust Sanger Institute Informatics Systems Group Whitehead Collaboration (the problem) blastn all by all comparison: –WI and TIGR Human BAC ends (800k entries) against Human entries in Genbank (5.7GB) Existing pipeline from WI Java JDBC / Oracle pipeline based on XML Tight 2 week time frame (as always!)

8 Wellcome Trust Sanger Institute Informatics Systems Group Whitehead Collaboration (the solution?) ssh / scp / ftp access to Sanger and WI systems… 2 weeks to run and setup: –Oracle instance –Set up user account, familiarisation with system –Oracle dumps, copy ddl and input results –Total data size: 21GB I/O –System failures (recovery) –A great many telephone / e-mail discussions Only took 2 days total compute on just 360 nodes…

9 Wellcome Trust Sanger Institute Informatics Systems Group Computational Farms NFS/CIFS/AFS (network share) meltdown –Creation of batch scripts (100,000’s of jobs – some take < 1min) –Reading NFS-mounted binaries –Reading NFS-mounted data files –Writing output to NFS-mounted directories MySQL / Oracle meltdown –Too many simultaneous connections –Queries blocking each other LSF mbatchd meltdown (DRM failure in general) –Broken code in general – both developer and sysadmin error Even when you are supposed to… “Know what you are doing…” (and their equivalent of foot and mouth)

10 Wellcome Trust Sanger Institute Informatics Systems Group External CPU and Data Collaborations (How would an ‘ ideal world GRID ’ help?) Rapid data distribution to and from SI and external site? Zero to little setup time? ‘Direct’ connections to remote Oracle/MySQL instances at Sanger (i.e. via replication)? No need for local account [shell] access? Single ‘system image’ – e.g. no need to find out where java/perl/binaries live, how the queues work etc.?

11 Wellcome Trust Sanger Institute Informatics Systems Group MySQL – remote access DS20, 250GB Alpha in DMZ with Ensembl data From cisco firewall logs, 1 st Oct 2001 to 1 st Oct 2002: – 159,251 port 3306 TCP connections – Corresponds to 1,016 unique hosts – 348 hosts with more than 10 connections


Download ppt "Wellcome Trust Sanger Institute Informatics Systems Group Ensembl Compute Grid issues James Cuff Informatics Systems Group Wellcome Trust Sanger Institute."

Similar presentations


Ads by Google