We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byLayton Lamberth
Modified about 1 year ago
© 2014 Cypher Genomics, Inc.Page 1Proprietary and Confidential The information disclosed in this document, including all designs and related materials, is the valuable property of Cypher Genomics, Inc. and its licensors. Cypher Genomics, Inc. and its licensors, as appropriate, reserve all patent, copyright and other proprietary rights to this document, including all design, manufacturing, reproduction, use, and sales rights, except to the extent said rights are expressly granted to others. Choosing a Cloud Provider for your Genomic Applications Phillip Pham Director, Technology Development November 13 th, 2014
© 2014 Cypher Genomics, Inc.Page 2Proprietary and Confidential Developed Cypher Genomics technology at the Scripps Translational Science Institute Designed and deployed high performance and parallel computing systems for annotation and genome interpretation Expertise in genome analysis workflows Interested in the genetic architecture influencing drug efficacy and response in clinical trials Bioinformatics: Bioengineering, B.S. Patents: – Pham P, Deshpande S, Systems and Methods for Genomic Variant Annotation, U.S. Patent 13/841,575, filed March Patent Pending. Biography
© 2014 Cypher Genomics, Inc.Page 3Proprietary and Confidential Cypher Genomics Problem Space Simplified Logical Architecture Pain Points Evaluating Alternatives Lessons Learned Summary Agenda
© 2014 Cypher Genomics, Inc.Page 4Proprietary and Confidential
© 2014 Cypher Genomics, Inc.Page 5Proprietary and Confidential Cypher Genomics provides automated Annotation, Interpretation and Biomarker Discovery for Whole Genomes, Exomes, CNV, etc. – Sequencing = AGCTTGAGGATCAACTAGTGCATGCTATACCTGC… – Alignment & Variant Calling Align the appropriate sets of nucleotides to their place in the genome. – 150 Gb BAM files (compressed) Compare the aligned input genome to the reference genome Identify variants (i.e. mutations) in the input genome. Our Problem Space – Annotation Each variant annotated with 90+ attributes from 50+ reference data sources as well as Cypher’s proprietary data and prediction algorithms. Web-Based UI for accessing all data, applying analytic filters, etc. – Interpretation Human Readable PDF Report with Cypher Synthesis summary of most important findings. – Includes all variants with Cypher Synthesis and references to supporting evidence
© 2014 Cypher Genomics, Inc.Page 6Proprietary and Confidential Mantis ™ Workflow – Clinical Use Case Patient Sample Sent to Lab Sample is Sequenced, Aligned & Variant Called 150GB BAM 125MB VCF Next Gen Sequencing Data Uploaded to Cypher Mantis Interprets Genome Data Report Generated.PDF 3MB Clinical Summary Delivered 200M+ Variants Annotated and growing 90+ Annotations per Variant 50+ Reference DB’s 40+ TB (and growing)
© 2014 Cypher Genomics, Inc.Page 7Proprietary and Confidential Automated solutions are essential to provide precision medicine at population scales YESTERDAY Medical College of Wisconsin Life-threatening bowel disease Whole exome sequencing (1% of genome) Cost: $15,000 Single mutation in XIAP Scripps IDIOM / Cypher Genomics Debilitating neuromuscular disease Whole genome sequencing Automated interpretation Mutations in ADCY5 Nicholas Volker 2010 Lilly Grossman 2013 TODAYTOMORROW Every Patient 2018 Driven by decrease in sequencing cost (e.g. illumina X10 - $1,000 genome) Whole genome – baseline Updated interpretation over time Genomic clinical decision support ++
© 2014 Cypher Genomics, Inc.Page 8Proprietary and Confidential Coral ™ Workflow – Pharma Use Case Patient Data Collected Study Samples Sequenced Next Gen Sequencing Data to Cypher Cypher Runs Genomes at Scale Coral Produces Predictive Models Biomarker Identified Data and Compute usage grow in population studies
© 2014 Cypher Genomics, Inc.Page 9Proprietary and Confidential Web Front-End – User Interface – Apache Web Server – Zend Framework Server Cypher Core Services (CCS) – REST API Implementation to Service User Interface Automated Integration – CDH – Hadoop Ecosystem HDFS – Holds raw reference data HBase – Variant Annotation DB MapReduce – Process variant data and annotation information at scale – MongoDB Consolidated Reference Data Demographics Data – Vertica Analytics DB for real-time, interactive analytic investigation Cypher Analytic Pipeline (CAP) - Novel Variant Annotation Pipeline – Penguin on Demand – HPC with Torque scheduler – Custom parallelization algorithm for annotating, running predictive algorithms – Final annotation of Variant goes into CCS Annotation DB Logical Deployment Components
© 2014 Cypher Genomics, Inc.Page 10Proprietary and Confidential CAP Master Application Server Annotation of Novel Variants CAP HPC Job Scheduler Consolidated Reference DB (Mongo) Annotation DB (Hbase) HDFS Hadoop & Map/Reduce Cluster CCS Master Application Server Annotation of known Variants Interactive Analytics Back-End Interpretation & Reporting CCS Demographic Data (Mongo) Analytics DB (Vertica) Cypher Core Services Cypher Annotation Pipeline (Novel Variants) HPC Environment TORQUE Source Reference Data Web App Server Web Server Front-end Applications REST APIs Web Server Web App Server Simplified Logical Architecture HPC Provider Cloud Provider
© 2014 Cypher Genomics, Inc.Page 11Proprietary and Confidential Downtime – Unscheduled system reboots and upgrades. Temporary loss of data and configurations – Poor communication Difficult to manage temporary cluster instantiation Firewall architecture imposes inflexible rules on our naming conventions Limits to cluster and storage quota (either programmatically or through the Management Console). – Results in requiring us to call the vendor to raise our limits. Ease of getting to HIPAA Compliance Pain Points
© 2014 Cypher Genomics, Inc.Page 12Proprietary and Confidential Comparable (or better) Performance per $ Better Uptime / Less Dramatic Impact during Maintenance Start and stop temporary clusters with ease Flexible firewall rules Unlimited storage scaling on demand Strategic options for utilizing service offerings HIPAA compliance and business associate agreement Evaluating Alternatives
© 2014 Cypher Genomics, Inc.Page 13Proprietary and Confidential Most Time Consuming Stages of our Processing – Reformat Input Files – Query Annotations – Process Annotations – Split Results – Organize Results And – Total End-to-End processing time What did we test?
© 2014 Cypher Genomics, Inc.Page 14Proprietary and Confidential Current Cloud Vendor Test Configuration – “Original Apple” – 6 Nodes 8 CPU 32 GB RAM 40 TB Block Storage (Magnetic) No Encryption NOTE: The above configuration is smaller than our entire product and is just for performance comparison purposes. You should draw NO conclusions regarding the performance of our product from these comparisons. New Cloud Vendor Configuration 1 – “Pear” – 6 Nodes 8 CPU 32 GB RAM 40 TB Block Storage (SSD) All Disks Encrypted Configuration 2 – “Melon” – 6 Nodes 32 CPUs 60 GB RAM 40 TB Local Disk (Magnetic) All Disks Encrypted Test Configurations – Hadoop (Hbase/MapReduce) Cluster
© 2014 Cypher Genomics, Inc.Page 15Proprietary and Confidential Apples to Pears to Melons… Oh my! (Reformat)
© 2014 Cypher Genomics, Inc.Page 16Proprietary and Confidential Apples to Pears to Melons… Oh my! (Query)
© 2014 Cypher Genomics, Inc.Page 17Proprietary and Confidential Apples to Pears to Melons… Oh my! (Process Annots)
© 2014 Cypher Genomics, Inc.Page 18Proprietary and Confidential Apples to Pears to Melons… Oh my! (Split Results)
© 2014 Cypher Genomics, Inc.Page 19Proprietary and Confidential Apples to Pears to Melons… Oh my! (Organize)
© 2014 Cypher Genomics, Inc.Page 20Proprietary and Confidential Apples to Pears to Melons… Oh my! (Overall)
© 2014 Cypher Genomics, Inc.Page 21Proprietary and Confidential Environment set-up and tweaking – 80% of time spent here – Be prepared to iterate on this as you test larger and larger batches. Work with your vendor to provide experts in Hadoop, Performance and Capacity Planning to aid in your evaluation. Work with your vendor to cost out expenditures – Don’t forget security, load balancing, auto-scaling services, etc. – Look closely at the advantages of paying a portion up-front to get good discounts on monthly costs. Lessons Learned
© 2014 Cypher Genomics, Inc.Page 22Proprietary and Confidential Find a Cloud Provider that will be a Strategic Partner! – Resources to aid in your evaluation. – Resources to aid in optimizing costs. Instrument your applications to capture detailed performance stats per logical computational step. – Gives delta between current and new Cloud Provider infrastructure. During your testing establish how responsive the Cloud Partner is to issues unrelated to your pilot (i.e. machine upgrades, failures in nodes, networking and/or storage). Ensure that you can stop clusters that you are not testing so that you aren’t billed more than once Make sure they will sign a HIPAA BAA and that you know exactly what is covered Pick a Cloud Partner that offers services for automating deployment, scaling, load balancing, security and log management (even if you don’t plan to use it now). Summary – Key Requirements of a Genomics Cloud Provider
© 2014 Cypher Genomics, Inc.Page 23Proprietary and Confidential Javier Velazquez-Muriel – Bioinformatics Engineer Patrick Ravenel – CTO & VP of Engineering Ashley Van Zeeland – CEO & Founder Ali Torkamani – CSO & Founder Nicholas Schork – Founder Eric Topol – Founder Acknowledgements
© 2014 Cypher Genomics, Inc.Page 24Proprietary and Confidential Questions?
2 Google GFS Bigtable Mapreduce Yahoo Hadoop.
Oracle Industry Solutions Complex Equipment Manufacturing Information Age Applications.
1 A Cloud Reference Framework … for discussion only … Please send comments and suggestions to Bhumip Khasnabish Friday,
3 Google GFS Bigtable Mapreduce Yahoo Hadoop.
Best Practices for Implementing An Information Solution By Even Brande.
Introduction to knowledge management. What is knowledge management Knowledge management can be difficult to define, because it encompasses a wide range.
GENI Distributed Services Preliminary Requirements and Design Tom Anderson and Amin Vahdat (co-chairs) David Andersen, Mic Bowman, Frans Kaashoek, Arvind.
Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”
Technology that changes everything. About this Powerpoint Show The prime objective of this PPT is to introduce GP partners to the scope and depth of Trinitys.
April 21, 2005EPSRC E-Science Meeting, NeSC Real-time Text Mining for the Biomedical Literature a collaboration between Discovery Net & myGrid Rob Gaizauskas.
Microsoft Dynamics AX Name Title Microsoft Corporation Industrial Equipment Manufacturing.
© 2011 VMware Inc. All rights reserved VMware Sales Byte Net New Customer Improve Business Continuity and Disaster Recovery (BCDR) with Managed Virtualization.
“Try not. Do, or do not. There is no try.” - Yoda Yoda finally admits he does not understand exception handling...
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
10-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
Your Data Any Place, Any Time Data Warehouse Platform.
Next Generation Genomics: Petascale data in the life sciences Guy Coates Wellcome Trust Sanger Institute
Exchange 2010 Hosting Service How Hosted Exchange Works and Benefits Businesses.
Whats New in vSphere 5.0? Dan Wofford Staff Systems Engineer - VMware.
© 2012 IBM Corporation January 19, 2014 The Big Deal About Big Data Dean Compher Data Management Technical Professional for UT, NV
Presenter name Presenter Organization Location and date Clinical Information Systems Adapted from Improving Chronic Illness Care
Architecting to be Cloud Native On Windows Azure or Otherwise BU MET CS755, Cloud Computing, Dino Konstantopoulos 21-Mar-2013 (6:00 – 9:00 PM EDT) Bill.
Distributed Computing Dr. Eng. Ahmed Moustafa Elmahalawy Computer Science and Engineering Department.
© 2016 SlidePlayer.com Inc. All rights reserved.