Presentation is loading. Please wait.

Presentation is loading. Please wait.

HPC Solutions for Life Science Research

Similar presentations


Presentation on theme: "HPC Solutions for Life Science Research"— Presentation transcript:

1 HPC Solutions for Life Science Research
Glen Otero, Ph.D. Bioinformatics & NGS Solution Architect HPC & Research Computing

2 Accelerating Individual Treatments for Pediatric Cancer
Partnership Accelerating Individual Treatments for Pediatric Cancer

3 Precision Medicine At Work Today Against “Adult” Cancers
Look inside cancer cells, identify specific vulnerabilities and customize a treatment Once the molecular-mechanisms of disease are aligned with the right targeted drug we can dramatically improve patient survival. We will do this for children with neuroblastoma Examples of Precision Medicine Leukemia Gene Target: ECR-ABL 30% increase in survival rate 41 yrs from indentified cause to targeted treatment Breast Cancer Gene Target: HER2 20% increase in survival rate 17 yrs from identified cause to targeted treatment Melanoma Gene Target: BRAF 52% response rate and increased survival by six months 10 yrs from identified cause to treatment Lung Cancer Gene Target: EGFR Partial or complete response in 93% of patients with DNA mutation 9 yrs from identified cause to treatment Medical imaging proof of dramatic tumor reduction after 8 weeks on personalized treatment regimen. The Good News: More drugs coming, a knowledge explosion, technology advances The Bad News: Increasing drug development and (hence) healthcare costs, more drugs coming, knowledge-clinical utility gap N Engl J Med Sep 17;361(12):

4 Why Pediatric Cancer? Neuroblastoma (NB) Pediatric Oncology
Leading cause of disease-related death in children ages 1-14 A child diagnosed with cancer every hour Cure rates have not improved in the last decade Pharmaceutical companies fund 60% of adult oncology research and virtually no pediatric cancer research Neuroblastoma (NB) Worst clinical outcome of any pediatric cancer 5% chance of cure in children with advanced cancer Even with remission, the cancer often returns and is untreatable Despite “modest market”….Let’s provide hope to those who have none while establishing a scalable pipeline for precision medicine Complexity of cancer: Cancer isn’t just one disease, its hundreds, looking at biology of specific tumor and treating it directly. Layers of benefit: Effect care delivery: Each cancer genome is different, rather than treat every kid with NB the same, molecular understanding of disease for custom treatment (position existing drug) Effect Basic Research: measurements from multiple research, create db to mine, for new drug development for ped Dell Cloud – drug development, as discover + match molecular features to known drugs, design clinical trials locally and recruit patients

5 Dell & TGen collaboration will result in...
More children offered a chance at a cure for cancer 100% increase in patients treated the first year and a future global platform to treat 100K+ children and adults

6 Applying Precision Treatments for Pediatric Cancer
Neuroblastoma: 15 days for personalized treatment Patient / Physician diagnosis, treatment, ongoing management Tumor Sample Complete molecular characterization of the diseased tumor Analytical tool for mapping patient data against database for recommended treatment Integration of scientific & clinical evidence for future research Treatments with a more reasonable chance of a cure 5 business day turnaround – built for scale using current technologies and approaches but…. Overlay Dell capabilities including social media aspect Workflow Patient/physician interaction – Educate Patients Tumor analyzed using next generation sequencing Dell cloud with infrastructure & processes will act as backend for all data that gets generated at every level of the workflow, every application will be optimized to work with the cloud –data gets integrated, archived The way cancer develops, there is variability between individuals. What weren’t able to do before is look inside cell and see the differences at the patient level, to determine a custom treatment 2-week time frame (nearly real-time) from biopsy to treatment recommendation – industrialized to scale, set period of time, because some patients don’t have 6 + weeks to wait. Without error with a lot of fidelity What is the milestone for the sick child? Minimizing trial and errors, chances are these kids wont see treatment that gives them any reasonable chance of a cure Minimizing trial and errors Understanding the individual disease Accelerating targeted treatment options Creating platform to scale to 100k+ patients Archival Storage Analytics HPCC

7 Genomics Data Processing Pipeline

8 TGen RNA-Seq Data Analysis
QC Visualize average q-score for each sequencing cycle Trim Trim off bases starting with the under-performing cycles Align to Genome Using Bowtie Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Map Splice Junctions Using Tophat Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Assemble Isoforms and Estimate Abundances Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Using Cufflinks Timeline Comparison and Goals Visualize Regions of Interest Using IGV The Broad Institute, Development: Jim Robinson, Helga Thorvaldsdóttir, Marc-Danie Nazaire, Documentation: Judy McLaughlin, PI: Jill Mesirov

9 TGen RNA-Seq Performance
Data analysis time was 7 days. Goal was 5 days. Worked with TGen to parallelize their RNA-Seq pipeline Confidential

10 M420s Energy Efficiency Application PowerEdge R820 @ 2.7 GHz
4 servers, 128 cores PowerEdge 2.3 GHz 32 servers, 512 cores HPL EE: 4% higher Best EE: 15% higher LU Best EE: ~6% higher Similar EE to PowerEdge M620 WRF EE: 12% lower Best EE: ~13% higher ANSYS Fluent Best EE for truck_poly_14m (+16%) Best EE for truck_111m (+23%) MILC EE: ~18% lower Best EE: ~37% higher NAMD (Issues when executing program) Better EE: ~5 % higher * EE is shorthand for energy efficiency. The PowerEdge M620 cluster 2.7 GHz) is used as the baseline for comparison. Higher is better for EE. Confidential

11 Why the M420s 48GB on M420 and 64GB on M620
130, 115 & 95 Watt Processors 20% more GFLOPS/$ with M620s 64% more GFLOPS/U with M420s 15% more GFLOPS/W with M420s Confidential

12 TGen RNA-Seq Performance
Data analysis time was 7 days. Goal was 5 days. Worked with TGen to parallelize their RNA-Seq pipeline One 10U chassis of 32 x M420s can complete RNA-Seq pipeline in 4 hours. Confidential

13 TGen RNA-Seq Performance
Data analysis time was 7 days. Goal was 5 days. Worked with TGen to parallelize their RNA-Seq pipeline One 10U chassis of 32 x M420s can complete RNA-Seq pipeline in 4 hours. 5 days hoped for. 4 hours delivered. Confidential

14 The Dell Scalable Unit (DSU) for Life Sciences: SANGER (Sequence Analysis ‘N’ GEnomics Research)
Glen Otero, Ph.D. Life Sciences HPC Solution Architect Dell | Research Computing Solutions Mark R. Fernandez, Ph.D. HPC Computer Scientist Dell | Research Computing Solutions Jeff Layton, Ph.D. HPC Enterprise Technologist Dell | Research Computing Solutions

15 SANGER 2U Plenum Actual placement in racks may vary. NSS-HA Pair NSS User Data HSS Metadata Pair HSS OSS Pair HSS User Data Infrastructure: Dell PE, PC & F10 Challenge: Experiment processing takes too long and delays patient treatment Solution: Dell NGS Cluster Appliance Single Rack Solution 9 Teraflops of Sandy Bridge Processors Lustre File Storage Intel SW tools Benefits: Includes everything you need for NGS - compute, storage, software, networking, infrastructure, installation, deployment, training, service & support Dell NSS (NFS) (up to 180TB) Dell HSS (Lustre) (up to 360TB) M420 (Compute) (up to 32 nodes)

16 Genomic Research - Customer Challenges
Data explosion Data management, access, migration are challenging Large data and compute requirements Processing required measured in days and weeks Resource requirements inhibit clinical / commercial viability Innovation, iteration and refinement Genomic software and pipelines rapidly evolving Rich set of open source, shareware, and commercial applications Cumbersome infrastructure Costly, complex deployment, overbearing maintenance Shared, limited-access resources Confidentiality and privacy concerns Biomedical research facilities University laboratories and core facilities Government research labs Non-profit (private) research institutes Industrial R&D labs in agriculture, energy, pharmaceuticals, biotechnology, chemical Bioinformatic software development / genomic data services companies Systems integrators serving end user customers above with turnkey services and solutions Vertically-oriented cloud service providers serving end user customers with turnkey service April 23, 2013 NHGRI symposium commemorates 10th anniversary of the Human Genome Project NHGRI Director Eric Green, M.D., Ph.D. reminded the audience that generating the first human genome sequence required six to eight years of active sequencing and cost about $1 billion. But advances in DNA sequencing technologies have reduced both the cost and the time required to sequence a human genome to just a few thousand dollars and a few days, respectively. Sequence Align Identify Variants Visualize Define Therapy Archive

17 whole genome pipeline analysis overview
High level description of the pipeline and how things are parallelized.

18 SANGER whole genome analysis info
A variant calling pipeline on public NA12878 data from Illumina's Platinum Genome project: Aligner: bwa; variant caller: GATK Reference Genome: GRCh37 (Genome Reference Consortium Human build 37) Software pipeline framework: How to install pipeline framework: How to run the whole genome pipeline: Config file: Input files ftp.sra.ebi.ac.uk/vol1/fastq/ERR091/ERR091571/ERR091571_1.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/ERR091/ERR091571/ERR091571_2.fastq.gz Number of reads: 212 million (~10x coverage) Separate trials conducted by running 1, 6 and 30 genomes analyses simultaneously Key points to note here are: 1) genome coverage of data set is only 10x. Not shabby, but most analyses are 30x. 2) data is publicly available; 3) software is open source

19 SANGER genomics workload performance
12 human genomes (30x) processed per day Consume kWh per genome Open Integrated Efficiency Performance Architected for genomic workloads Service and Support Performance metrics above based on May 2013 internal Dell HPC lab benchmark testing (SW: bwa, GATK, bcbio-nextgen Ref. Gehome: GRCh37 Input reads: 212M, ~10X coverage), confirming 1) genomes per day throughput on an Active Infrastructure for HPC Life Sciences configuration across 480 cores 2) confirming 38.9 genomes per day throughput compared to 2.01 genomes per day across 16 cores. 3) the complete analysis of a genome in 6.8 hours across 96 cores 4) energy consumption of kW when running 30 concurrent genome analyses, resulting in kilowatts/genome ( kW/30 genomes). 5) confirmed sustained 8.26 Tflops, 88% of a theoretical maximum of 9.4 Tflops, based on May 2013 internal Dell HPC lab benchmark testing. Actual performance will vary based on configuration, usage and manufacturing variability. Confidential

20 SANGER Realize genomics R&D potential
Best-in-class density and energy efficiency Open, factory integrated HPC infrastructure World class support and expertise Leverage Dell domain expertise and complete solutions Be in production faster Reduce time to market Identify treatments in clinically relevant timeframes Innovate Confidential

21 SANGER available in Design Solution Center for testing
Confidential

22 Thanks! Intel Ketan Paranjape, Kristina Kermanshahche, John O’Neill, Drew Peterson, Ed Kurtzer Terascala Ben Rosen, Larry Bazinett Bright Computing Matthijs van Leeuwen, Craig Reagan TGen James Lowey, Nelson Kick, Jason Corneveaux, Matt Huentelman Dell Jeff Layton, Mark Fernandez, Onur Celebioglu, Nishanth Dandapanthula, Will Cottay, Neil Klosterman, Christine Fronczak Harvard Brad Chapman, Oliver Hofmann, Rory Kirchner, John Morrissey

23 NGS that drives clinical actions in real time
TGen pediatric cancer trial TGen Center for Rare Childhood Disorders

24 Dell and TGen take the fight against pediatric cancer personally
Dell and TGen take the fight against pediatric cancer personally... one child at a time Glen Otero, Ph.D. Bioinformatics & NGS Solution Architect HPC & Research Computing


Download ppt "HPC Solutions for Life Science Research"

Similar presentations


Ads by Google