We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byLayton Lamberth
Modified about 1 year ago
© 2014 Cypher Genomics, Inc.Page 1Proprietary and Confidential The information disclosed in this document, including all designs and related materials, is the valuable property of Cypher Genomics, Inc. and its licensors. Cypher Genomics, Inc. and its licensors, as appropriate, reserve all patent, copyright and other proprietary rights to this document, including all design, manufacturing, reproduction, use, and sales rights, except to the extent said rights are expressly granted to others. Choosing a Cloud Provider for your Genomic Applications Phillip Pham Director, Technology Development November 13 th, 2014
© 2014 Cypher Genomics, Inc.Page 2Proprietary and Confidential Developed Cypher Genomics technology at the Scripps Translational Science Institute Designed and deployed high performance and parallel computing systems for annotation and genome interpretation Expertise in genome analysis workflows Interested in the genetic architecture influencing drug efficacy and response in clinical trials Bioinformatics: Bioengineering, B.S. Patents: – Pham P, Deshpande S, Systems and Methods for Genomic Variant Annotation, U.S. Patent 13/841,575, filed March Patent Pending. Biography
© 2014 Cypher Genomics, Inc.Page 3Proprietary and Confidential Cypher Genomics Problem Space Simplified Logical Architecture Pain Points Evaluating Alternatives Lessons Learned Summary Agenda
© 2014 Cypher Genomics, Inc.Page 4Proprietary and Confidential
© 2014 Cypher Genomics, Inc.Page 5Proprietary and Confidential Cypher Genomics provides automated Annotation, Interpretation and Biomarker Discovery for Whole Genomes, Exomes, CNV, etc. – Sequencing = AGCTTGAGGATCAACTAGTGCATGCTATACCTGC… – Alignment & Variant Calling Align the appropriate sets of nucleotides to their place in the genome. – 150 Gb BAM files (compressed) Compare the aligned input genome to the reference genome Identify variants (i.e. mutations) in the input genome. Our Problem Space – Annotation Each variant annotated with 90+ attributes from 50+ reference data sources as well as Cypher’s proprietary data and prediction algorithms. Web-Based UI for accessing all data, applying analytic filters, etc. – Interpretation Human Readable PDF Report with Cypher Synthesis summary of most important findings. – Includes all variants with Cypher Synthesis and references to supporting evidence
© 2014 Cypher Genomics, Inc.Page 6Proprietary and Confidential Mantis ™ Workflow – Clinical Use Case Patient Sample Sent to Lab Sample is Sequenced, Aligned & Variant Called 150GB BAM 125MB VCF Next Gen Sequencing Data Uploaded to Cypher Mantis Interprets Genome Data Report Generated.PDF 3MB Clinical Summary Delivered 200M+ Variants Annotated and growing 90+ Annotations per Variant 50+ Reference DB’s 40+ TB (and growing)
© 2014 Cypher Genomics, Inc.Page 7Proprietary and Confidential Automated solutions are essential to provide precision medicine at population scales YESTERDAY Medical College of Wisconsin Life-threatening bowel disease Whole exome sequencing (1% of genome) Cost: $15,000 Single mutation in XIAP Scripps IDIOM / Cypher Genomics Debilitating neuromuscular disease Whole genome sequencing Automated interpretation Mutations in ADCY5 Nicholas Volker 2010 Lilly Grossman 2013 TODAYTOMORROW Every Patient 2018 Driven by decrease in sequencing cost (e.g. illumina X10 - $1,000 genome) Whole genome – baseline Updated interpretation over time Genomic clinical decision support ++
© 2014 Cypher Genomics, Inc.Page 8Proprietary and Confidential Coral ™ Workflow – Pharma Use Case Patient Data Collected Study Samples Sequenced Next Gen Sequencing Data to Cypher Cypher Runs Genomes at Scale Coral Produces Predictive Models Biomarker Identified Data and Compute usage grow in population studies
© 2014 Cypher Genomics, Inc.Page 9Proprietary and Confidential Web Front-End – User Interface – Apache Web Server – Zend Framework Server Cypher Core Services (CCS) – REST API Implementation to Service User Interface Automated Integration – CDH – Hadoop Ecosystem HDFS – Holds raw reference data HBase – Variant Annotation DB MapReduce – Process variant data and annotation information at scale – MongoDB Consolidated Reference Data Demographics Data – Vertica Analytics DB for real-time, interactive analytic investigation Cypher Analytic Pipeline (CAP) - Novel Variant Annotation Pipeline – Penguin on Demand – HPC with Torque scheduler – Custom parallelization algorithm for annotating, running predictive algorithms – Final annotation of Variant goes into CCS Annotation DB Logical Deployment Components
© 2014 Cypher Genomics, Inc.Page 10Proprietary and Confidential CAP Master Application Server Annotation of Novel Variants CAP HPC Job Scheduler Consolidated Reference DB (Mongo) Annotation DB (Hbase) HDFS Hadoop & Map/Reduce Cluster CCS Master Application Server Annotation of known Variants Interactive Analytics Back-End Interpretation & Reporting CCS Demographic Data (Mongo) Analytics DB (Vertica) Cypher Core Services Cypher Annotation Pipeline (Novel Variants) HPC Environment TORQUE Source Reference Data Web App Server Web Server Front-end Applications REST APIs Web Server Web App Server Simplified Logical Architecture HPC Provider Cloud Provider
© 2014 Cypher Genomics, Inc.Page 11Proprietary and Confidential Downtime – Unscheduled system reboots and upgrades. Temporary loss of data and configurations – Poor communication Difficult to manage temporary cluster instantiation Firewall architecture imposes inflexible rules on our naming conventions Limits to cluster and storage quota (either programmatically or through the Management Console). – Results in requiring us to call the vendor to raise our limits. Ease of getting to HIPAA Compliance Pain Points
© 2014 Cypher Genomics, Inc.Page 12Proprietary and Confidential Comparable (or better) Performance per $ Better Uptime / Less Dramatic Impact during Maintenance Start and stop temporary clusters with ease Flexible firewall rules Unlimited storage scaling on demand Strategic options for utilizing service offerings HIPAA compliance and business associate agreement Evaluating Alternatives
© 2014 Cypher Genomics, Inc.Page 13Proprietary and Confidential Most Time Consuming Stages of our Processing – Reformat Input Files – Query Annotations – Process Annotations – Split Results – Organize Results And – Total End-to-End processing time What did we test?
© 2014 Cypher Genomics, Inc.Page 14Proprietary and Confidential Current Cloud Vendor Test Configuration – “Original Apple” – 6 Nodes 8 CPU 32 GB RAM 40 TB Block Storage (Magnetic) No Encryption NOTE: The above configuration is smaller than our entire product and is just for performance comparison purposes. You should draw NO conclusions regarding the performance of our product from these comparisons. New Cloud Vendor Configuration 1 – “Pear” – 6 Nodes 8 CPU 32 GB RAM 40 TB Block Storage (SSD) All Disks Encrypted Configuration 2 – “Melon” – 6 Nodes 32 CPUs 60 GB RAM 40 TB Local Disk (Magnetic) All Disks Encrypted Test Configurations – Hadoop (Hbase/MapReduce) Cluster
© 2014 Cypher Genomics, Inc.Page 15Proprietary and Confidential Apples to Pears to Melons… Oh my! (Reformat)
© 2014 Cypher Genomics, Inc.Page 16Proprietary and Confidential Apples to Pears to Melons… Oh my! (Query)
© 2014 Cypher Genomics, Inc.Page 17Proprietary and Confidential Apples to Pears to Melons… Oh my! (Process Annots)
© 2014 Cypher Genomics, Inc.Page 18Proprietary and Confidential Apples to Pears to Melons… Oh my! (Split Results)
© 2014 Cypher Genomics, Inc.Page 19Proprietary and Confidential Apples to Pears to Melons… Oh my! (Organize)
© 2014 Cypher Genomics, Inc.Page 20Proprietary and Confidential Apples to Pears to Melons… Oh my! (Overall)
© 2014 Cypher Genomics, Inc.Page 21Proprietary and Confidential Environment set-up and tweaking – 80% of time spent here – Be prepared to iterate on this as you test larger and larger batches. Work with your vendor to provide experts in Hadoop, Performance and Capacity Planning to aid in your evaluation. Work with your vendor to cost out expenditures – Don’t forget security, load balancing, auto-scaling services, etc. – Look closely at the advantages of paying a portion up-front to get good discounts on monthly costs. Lessons Learned
© 2014 Cypher Genomics, Inc.Page 22Proprietary and Confidential Find a Cloud Provider that will be a Strategic Partner! – Resources to aid in your evaluation. – Resources to aid in optimizing costs. Instrument your applications to capture detailed performance stats per logical computational step. – Gives delta between current and new Cloud Provider infrastructure. During your testing establish how responsive the Cloud Partner is to issues unrelated to your pilot (i.e. machine upgrades, failures in nodes, networking and/or storage). Ensure that you can stop clusters that you are not testing so that you aren’t billed more than once Make sure they will sign a HIPAA BAA and that you know exactly what is covered Pick a Cloud Partner that offers services for automating deployment, scaling, load balancing, security and log management (even if you don’t plan to use it now). Summary – Key Requirements of a Genomics Cloud Provider
© 2014 Cypher Genomics, Inc.Page 23Proprietary and Confidential Javier Velazquez-Muriel – Bioinformatics Engineer Patrick Ravenel – CTO & VP of Engineering Ashley Van Zeeland – CEO & Founder Ali Torkamani – CSO & Founder Nicholas Schork – Founder Eric Topol – Founder Acknowledgements
© 2014 Cypher Genomics, Inc.Page 24Proprietary and Confidential Questions?
Canadian Bioinformatics Workshops
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
The Future of Whole Human Genome Data Management and Analysis, Available on the Microsoft Azure Platform Today MICROSOFT AZURE APP BUILDER PROFILE: SPIRAL.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
SCRIPPS GENOME ADVISER Galina Erikson Senior Bioinformatics Programmer The Scripps Translational Science Institute Scripps Translational Science Institute.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
Cloud Computing Resource provisioning Keke Chen. Outline For Web applications statistical Learning and automatic control for datacenters For data.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Ivan Pleština Amazon Simple Storage Service (S3) Amazon Elastic Block Storage (EBS) Amazon Elastic Compute Cloud (EC2)
Windows Azure LARISA KOCSIS PRIYA RAGUPATHY 1. Windows Azure Microsoft’s cloud computing platform Operating system for the cloud Provides three Core services:
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Sql Server Architecture for World Domination Tristan Wilson.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
Introduction to Hadoop and HDFS. Table of Contents Hadoop – Overview Hadoop Cluster HDFS.
Next Generation of Apache Hadoop MapReduce Owen
© 2016 Catalyze, Inc. Go-To-Market Services HIPAA Compliance in the Cloud: Catalyze Provides Microsoft Azure Customers with a HITRUST Certified Platform-as-a-Service.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Case Study: Amazon AWS CSE – Cloud Computing Prof. Douglas Thain University of Notre Dame.
Accelerate your ambition Partner Billing and Reporting.
1 NETE4631 Managing the Cloud and Capacity Planning Lecture Notes #8.
SM STRATA PRESENTATION Tim Garnto - SVP Engineering, edo Interactive Rob Rosen – Big Data Field Lead, Pentaho.
Business Intelligence Appliance Powerful pay as you grow BI solutions with Engineered Systems.
Visual Studio Windows Azure Portal Rest APIs / PS Cmdlets US-North Central Region FC TOR PDU Servers TOR PDU Servers TOR PDU Servers TOR PDU.
Citrix Partner Update The Citrix Delivery Centre.
HPHC - PERFORMANCE TESTING Dec 15, 2015 Natarajan Mahalingam.
Cloud Computing from a Developer’s Perspective Shlomo Swidler CTO & Founder mydrifts.com 25 January 2009.
Complete Event Log Viewing, Monitoring and Management.
Microsoft ® SQL Server ® 2008 and SQL Server 2008 R2 Infrastructure Planning and Design Published: February 2009 Updated: January 2012.
Cloud Computing: Theirs, Mine and Ours Belinda G. Watkins, VP EIS - Network Computing FedEx Services March 11, 2011.
Starfish: A Self-tuning System for Big Data Analytics.
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Data Directions Greg.
` tuplejump The data engineering platform. A startup with a vision to simplify data engineering and empower the next generation of data powered miracles!
Faster, More Scalable Computing in the Cloud Pavan Pant, Director Product Management.
Stairway to the cloud or can we take the highway? Taivo Liik.
Solutions Summit 2014 High Volume Processing Workshop Kaki Wynn, Dave Tanner, Andre Curione, Kim Weber, Tammy Huff.
Copyright © 2009 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Paolo Masera BDM PBS Works South EMEA, Altair Engineering.
Microsoft ® System Center Service Manager 2010 Infrastructure Planning and Design Published: December 2010.
Multi-Data-Center Hadoop in a Snap Dr. Konstantin Boudnik Vice President, Open Source Development.
Million Veteran Program: Industry Day Genomic Data Processing and Storage Saiju Pyarajan, PhD and Philip Tsao, PhD Million Veteran Program: Industry Day.
CategoryDynamic Datacenter Toolkit for Hosters (DDTK-H)Dynamic Datacenter Toolkit (DDTK) Service OfferingEnables hosting service providers to offer on-demand.
Plan Introduction What is Cloud Computing? Why is it called ‘’Cloud Computing’’? Characteristics of Cloud Computing Advantages of Cloud Computing.
The Citrix Delivery Center. 2 © 2008 Citrix Systems, Inc. — All rights reserved Every Day, IT Gets More Complex EMPLOYEES PARTNERS CUSTOMERS.
Java in the cloud PaaS Platform in Comparison By Srini Kumar VP MSat IT Evangelist & Strategy Advisor.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
© 2017 SlidePlayer.com Inc. All rights reserved.