Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experiences with a large-memory HP cluster – performance on benchmarks and genome codes Craig A. Stewart Executive Director, Pervasive.

Similar presentations


Presentation on theme: "Experiences with a large-memory HP cluster – performance on benchmarks and genome codes Craig A. Stewart Executive Director, Pervasive."— Presentation transcript:

1 Experiences with a large-memory HP cluster – performance on benchmarks and genome codes Craig A. Stewart (stewart@iu.edu) Executive Director, Pervasive Technology Institute Associate Dean, Research Technologies Associate Director, CREST Robert Henschel Manager, High Performance Applications, Research Technologies/PTI William K. Barnett Director, National Center for Genome Analysis Support Associate Director, Center for Applied Cybersecurity Research, PTI Thomas G. Doak Department of Biology Indiana University

2 License terms Please cite this presentation as: Stewart, C.A., R. Henschel, W. K. Barnett, T.G. Doak. 2011. Experiences with a large-memory HP cluster – performance on benchmarks and genome codes. Presented at: HP-CAST 17 - HP Consortium for Advanced Scientific and Technical Computing World-Wide User Group Meeting. Renaissance Hotel, 515 Madison Street, Seattle WA, USA, November 12th 2011. http://hdl.handle.net/2022/13879 Portions of this document that originated from sources outside IU are shown here and used by permission or under licenses indicated within this document. Items indicated with a © are under copyright and may not be reused without permission from the holder of copyright, except where license terms noted on a slide permit reuse. Except where otherwise noted, the contents of this presentation are copyright 2011 by the Trustees of Indiana University. This content is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work. 2

3 The human genome project was just the start More sequencing will help us: –Understand the basic building blocks and mechanisms of life –Improve crop yields or disease resistance by genetic modification, or add new nutrients to crops, –Understand disease variability by mapping the genetic variation of diseases such as cancer or by studying the human microbiome and how it interacts with us and our conditions, –Create personalized treatments for various illnesses by understanding human genetic variability, or create gene therapies as new treatments –Really begin to understand genome variability as a population issue as well as having at hand the genome of one particular individual 3

4 Genome sequencer model Year introduced Raw image data per run Data productsSequence per run Read length Doctoral Student (hard working) as sequencer Circa 1980s n.a.Several exposed films /day on a good day 2 Kbp100-200 nt ABI 373020020.03 GB 2 GB/day 60 Kbp800 nt 454 Titanium200539 GB 9 GB/day500 Mbp400 nt Illumina-Solexa G12006 600 GB 100 GB/day 50 Gbp300 nt ABI SOLiD 42007 680 GB 25 GB/day 70 Gbp90 nt Illumina HiSeq 2000 2010 600 GB 150 GB/day 200 Gbp200 nt Evolution of sequencers over time 4

5 DateCost per Mb of DNA sequence Cost per human genome March 2002$3,898.64$70,175,437 April 2004$1,135.70$20,442,576 April 2006$651.81$11,732,535 April 2008$15.03$1,352,982 April 2010$0.35$31,512 Cost of sequencing over time 5

6 Mason – a HP ProLiant DL580 G7 16 node cluster 10GE interconnect –Cisco Nexus 7018 –Compute nodes are oversubscribed 4 : 1 –This is the same switch that we use for DC and other 10G connected equipment. Quad socket nodes –8 core Xeon L7555, 1.87 GHz base frequency –32 cores per node –512 GByte of memory per node 6

7 Why 512 GB – sweet spot in RAM requirements Largest genome that can be assembled on a computer with 512 GB of memory, assuming maximum memory usage and k-mers of 20 bp “Memory is a problem that admits only money as a solution” – David Moffett ApplicationGenome / Genome sizeRAM required (per node) ABySSVarious plant genomes7.8 GB RAM per node (distributed memory parallel code) [based on McCombie lab at CSHL] SOAPdenov o Panda / 2.3 Gbp512 GB Human gut metagenome / est. 4Gbp 512 GB Human Genome150 GB VelvetHoneybee / ~300 Mbp128 GB Daphnia / 200 Mbp> 512 GB [based on runs by Lynch lab at IU Duckweed / 150 Mbp> 512 GB [based on McCombie lab at CSHL] CoverageMaximum assemblable genome (Gbp)Percentile of distribution of genome sizes PlantAnimal 20x1.33244 40x0.61615 60x0.479 7

8 Community trust matters Right now the codes most trusted by the community require large amounts of memory in a single name space. More side by side testing may lead to more trust in the distributed memory codes but for now…. 8 ApplicationYear initially publishedNumber of citations as of August 2010 de Bruijn graph methods ABySS2009783 EULER20011870 SOAPdenovo2010254 Velvet20081420 Overlap/layout/consensus Arachne 22003578 Celera Assembler20003579 Newbler2005999

9 The National Center for Genome Analysis Support Dedicated to supporting life science researchers who need computational support for genomics analysis Initially funded by the National Science Foundation Advances in Biological Informatics (ABI) program, grant no. 1062432 A Cyberinfrastructure Service Center affiliated with the Pervasive Technology Institute at Indiana University (http://pti.iu.edu)http://pti.iu.edu Provides support for genomics analysis software on supercomputers customized for genomics studies, including Mason and systems which are part of XSEDE Provides distributions of hardened versions of popular codes Particularly dedicated for genome assembly codes such as: –de Bruijn graph methods: SOAPdeNovo, Velvet, ABySS –consensus methods: Celera, Newbler, Arachne 2 For more information, see http://ncgas.orghttp://ncgas.org 9

10 Benchmark overview High Performance Computing Challenge Benchmark (HPCC) –http://icl.cs.utk.edu/hpcc/http://icl.cs.utk.edu/hpcc/ SPEC OpenMP –http://www.spec.org/omp2001/http://www.spec.org/omp2001/ SPEC MPI –http://www.spec.org/mpi2007/http://www.spec.org/mpi2007/ 10

11 High Performance Computing Challenge benchmark Innovative Computing Laboratory at the University of Tennessee (Jack Dongarra and Piotr Luszczek) Announced at Supercomputing 2004 Version 1.0 available June 2005 Current: Version 1.4.1 Our results are not yet published, because we are unsatisfied with 8 node HPCC runs. This is likely due to our oversubscription of the switch. (4 to 1) 11

12 High Performance Computing Challenge benchmark Raw results, 1 to 16 nodes HPCC Benchmark Target HPCC G-HPL G- PTRANS G-FFTE G- Random G- STREAM EP- STREAM EP- DGEMM Random Ring Bandwidth Random Ring Latency % HPL Peak # Nodes#CPUs#CoresTFLOP/sGB/sGLFOP/sGup/sGB/s GFLOP/sGB/susecPercent 16645123.3835.11217.9620.2451084.6822.1197.1580.010229.56482.59 8322561.6082.6488.9380.1534549.1842.1457.1360.011169.07978.51 4161280.8471.5755.3560.123267.2972.0887.1410.014119.73682.66 28640.4243.54510.0950.152137.7902.1537.1280.08371.98882.85 14320.2226.46311.5420.22566.9362.0927.1570.3243.48386.78 12

13 High Performance Computing Challenge benchmark HPL Efficiency from 1 to 16 nodes –Highlighting our issue at 8 nodes –However, for a 10GE system, not so bad! 13

14 SPEC benchmarks High Performance Group (HPG) of the Standard Performance Evaluation Corporation (SPEC) Robust framework for measurement Industry / education mix Result review before publication Fair use policy and its enforcement Concept of reference machine, base / peak runs, different datasets 14

15 SPEC OpenMP Evaluate performance of OpenMP applications (single node) Benchmark consists of 11 applications, medium and large dataset available Our results: –Large and Medium: http://www.spec.org/omp/results/res2011q3/ 15

16 The SPEC OpenMP application suite 16 310.wupwise_m and 311.wupwise_lquantum chromodynamics 312.swim_m and 313.swim_lshallow water modeling 314.mgrid_m and 315.mgrid_lmulti-grid solver in 3D potential field 316.applu_m and 317.applu_lparabolic/elliptic partial differential equations 318.galgel_mfluid dynamics analysis of oscillatory instability 330.art_m and 331.art_lneural network simulation of adaptive resonance theory 320.equake_m and 321.equake_lfinite element simulation of earthquake modeling 332.ammp_mcomputational chemistry 328.fma3d_m and 329.fma3d_lfinite-element crash simulation 324.apsi_m and 325.apsi_ltemperature, wind, distribution of pollutants 326.gafort_m and 327.gafort_lgenetic algorithm code

17 SPEC OpenMP Medium 17 Hyperthreading OFFHyperthreading ON Benchmarks Base Ref Time Base Run Time Base Ratio Base Run Time Base Ratio 310.wupwise_m 600046.712858337.3160774 312.swim_m 600084.17133774.2 80847 314.mgrid_m 730096.97533287.7 83222 316.applu_m 400029.913396726.1153288 318.galgel_m 5100115.044387 114.0 44802 320.equake_m 260052.34969147.9 54295 324.apsi_m 340044.77612846.5 73134 326.gafort_m 870098.288571 109.0 79651 328.fma3d_m 460088.35210892.8 49543 330.art_m 640031.120593532.9194318 332.ammp_m 7000152.045953 161.0 43469 SPECompMbase2001 78307 80989 Hyper- Threading beneficial

18 SPEC OpenMP Medium 18

19 SPEC OpenMP Large 19 Hyperthreading OFFHyperthreading ON Benchmarks Base Ref Time Base Run Time Base Ratio Base Run Time Base Ratio 311.wupwise_l9200203723729 211 697727 313.swim_l12500602331955 628 318338 315.mgrid_l13500518416695 528 409378 317.applu_l13500562384230 590 366221 321.equake_l13000575361456 542 383513 325.apsi_l10500271620871 286 587380 327.gafort_l11000391450003 359 490814 329.fma3d_l235001166322462 941 399786 331.art_l250002901377258 2771445765 SPECompLbase2001 493152 504788 Hyper- Threading beneficial

20 SPEC MPI Evaluate performance of MPI applications in the whole cluster Benchmark consists of 12 applications, medium and large dataset available Our results are not yet published, as we are still lacking a 16 node run –We spent a lot of time on the 8 node run –We think we know what the source of the problem is, we just have not yet been able to fix it –The problem is the result of rational configuration choices, and does not impact our primary intended uses of the system 20

21 SPEC MPI Scalability study, 1 to 8 nodes – Preliminary results, not yet published by SPEC Endeavor: Intel Xeon X5560, 2.80 GHz, IB, Feb 2009 Atlantis: Intel Xeon X5482, 3.20 GHz, IB, Mar 2009 IU/HP: Intel Xeon L7555, 1.87 GHz, 10 GigE 21

22 Early users overview Metagenomics Sequences Analysis Genome Assembly and Annotation Genome Informatics for Animals and Plants Imputation of Genotypes And Sequence Alignment Daphnia Population Genomics 22

23 Metagenomics Sequences Analysis Yuzhen Ye's Lab (IUB School of Informatics) Environmental sequencing –Sampling DNA sequences directly from the environment –Since the sequences consist of DNA fragments from hundreds or even thousands of species, the analysis is far more difficult than traditional sequence analysis that involves only one species. Assembling metagenomic sequences and getting genes from the assembled dataset Dynamic programming is used to find the optimal mapping of consecutive contigs out of the assembly Since the number of contigs is enormous for most metagenomic datasets, a large memory computing system is required to perform the dynamic programming algorithm so that the task can be completed in polynomial time 23

24 Genome Assembly and Annotation Michael Lynch's Lab (IUB Department of Biology) Assembles and annotates genomes in the Paramecium aurelia species complex in order to eventually study the evolutionary fates of duplicate genes after whole-genome duplication. This project also has been performing RNAseq on each genome, which is currently being used to aid in genome annotations and will later be used to detect expression differences between paralogs. The assembler used is based on an overlap-layout-consensus method instead of a de Bruijn graph method (like some of the newer assemblers). It is more memory intensive – requires performing pairwise alignments between all pairs of reads. The annotation of the genome assemblies involves programs such as GMAP, GSNAP, PASA, and Augustus. To use these programs, we need to load-in millions of RNAseq and EST reads and map them back to the genome. 24

25 Genome Informatics for Animals and Plants Genome Informatics Lab (Don Gilbert) (IUB Department of Biology) This project is to find genes in animals and plants, using the vast amounts of new gene information coming from next generation sequencing technology. These improvements are applied to newly deciphered genomes for an environmental sentinel animal, the waterflea (Daphnia), the agricultural pest insect Pea aphid, the evolutionarily interesting jewel wasp (Nasonia), and the chocolate bean tree (Th. cacao) which will bring genomics insights to sustainable agriculture of cacao. Large memory compute systems are needed for biological genome and gene transcript assembly because assembly of genomic DNA or gene RNA sequence reads (in billions of fragments) into full genomic or gene sequences requires a minimum of 128 GB of shared memory, more depending on data set. These programs build graph matrices of sequence alignments in memory. 25

26 Imputation of Genotypes And Sequence Alignment Tatiana Foroud's Lab (IUPUI Department of Medical and Molecular Genetics) Study the complex disorders by using imputation of genotypes typically for genome wide association studies as well as sequence alignment and post- processing of whole genome and whole exome sequencing. Requires analysis of markers in a genetic region (such as a chromosome) in several hundred representative individuals genotyped for the full reference panel of SNPs, with extrapolation of the inferred haplotype structures. More memory allows the imputation algorithms to evaluate haplotypes across much broader genomic regions, reducing or eliminating the need to partition the chromosomes into segments. This would result in imputed genotypes with both increased accuracy and speed, allowing for improved evaluation of detailed within-study results as well as communication and collaboration (including meta-analysis) using the disease study results with other researchers. 26

27 Daphnia Population Genomics Michael Lynch's Lab (IUB Department of Biology) This project involves the whole genome shotgun sequences of over 20 more diploid genomes with genomes sizes >200 Megabases each. With each genome sequenced to over 30 x coverage, the full project involves both the mapping of reads to a reference genome and the de novo assembly of each individual genome. The genome assembly of millions of small reads often requires excessive memory use for which we once turned to Dash at SDSC. With Mason now online at IU, we have been able to run our assemblies and analysis programs here at IU. 27

28 Mason as an example of effective campus bridging The goal of campus bridging is to make local, regional, and national cyberinfrastructure facilities appear as if they were peripherals to your laptop Mason is designed for a specific set of tasks that drive a different configuration than XSEDE (the eXtreme Science and Engineering Discovery Environment – http://xsede.org/) For more information on campus bridging: http://pti.iu.edu/campusbridging/ 28

29 Key points The increased amount of data and decreased kmer length that are driving growing demands for data analysis in genome assembly The codes the biological community trusts are the codes they trust. Over time, testing may enable more use of distributed memory codes. But for now if we want to serve the biological community most effectively we need to implement systems that match their research needs now. In the performance analysis of Mason we found two outcomes of note: –There is a problem in our switch configuration that we still have not sorted that is causing odd HPL results, and we will continue to work on that. –The summary result on hyperthreading is “sometimes it helps, sometimes not” If we as a community are frustrated by the attention that senior administrators give placement on the Top500 list, and how that affects system configuration, we need to take more time to publish SPEC and / or HPCC benchmark results. –Much of the time this may mean “we got what we expected.” But more data will make it easier to identify and understand results we don’t expect. By implementing Mason – a lot of memory with some some processors attached to it – we have enabled research that would otherwise not be possible. 29

30 Absolutely Shameless Plugs XSEDE12: Bridging from the eXtreme to the campus and beyond July 16-20, 2012 | Chicago The XSEDE12 Conference will be held at the beautiful Intercontinental Chicago (Magnificent Mile) at 505 N. Michigan Ave. The hotel is in the heart of Chicago's most interesting tourist destinations and best shopping. Watch for Calls for Participation – coming early January 30 And please visit the XSEDE and IU displays in the SC11 Exhibition Hallway!

31 Thanks Danke fuer das Einladung: Herr Dr. Frank Baetke, Eva-Marie Markert, und HP Thanks to HP, particularly James Kovach, for partnership efforts over many years, including the implementation of Mason. Staff of the Research Technologies Division of University Information Technology Services, affiliated with the Pervasive Technology Institute, who led the implementation of Mason and benchmarking activities at IU: George Turner, Robert Henschel, David Y. Hancock, Matthew R. Link Our many collaborators in the Pervasive Technology Institute, particularly the co-PIs of NCGAS: Michael Lynch, Matthew Hahn, and Geoffrey C. Fox Those involved in campus bridging activities: Guy Almes, Von Welch, Patrick Dreher, Jim Pepin, Dave Jent, Stan Ahalt, Bill Barnett, Therese Miller, Malinda Lingwall, Maria Morris, Gabrielle Allen, Jennifer Schopf, Ed Seidel All of the IU Research Technologies and Pervasive Technology Institute staff who have contributed to the development of IU’s advanced cyberinfrastructure and its support NSF for funding support (Awards 040777, 1059812, 0948142, 1002526, 0829462, 1062432, OCI-1053575 – which supports the Extreme Science and Engineering Discovery Environment) Lilly Endowment, Inc. and the Indiana University Pervasive Technology Institute Any opinions presented here are those of the presenter and do not necessarily represent the opinions of the National Science Foundation or any other funding agencies 31

32 Thank you! Questions and discussion? 32


Download ppt "Experiences with a large-memory HP cluster – performance on benchmarks and genome codes Craig A. Stewart Executive Director, Pervasive."

Similar presentations


Ads by Google