Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2010 LabKey Software Managing Next Generation Sequencing and Multiplexed Genotyping Data Using Open Source LabKey Server Adam Rauch

Similar presentations


Presentation on theme: "© 2010 LabKey Software Managing Next Generation Sequencing and Multiplexed Genotyping Data Using Open Source LabKey Server Adam Rauch"— Presentation transcript:

1 © 2010 LabKey Software Managing Next Generation Sequencing and Multiplexed Genotyping Data Using Open Source LabKey Server Adam Rauch

2 LabKey Software 2010 LabKey Software Company Overview  LabKey Software is a consulting company  Spun off from the McIntosh Lab (part owned by FHCRC)  Professional software engineers from Amazon, Microsoft, BEA etc  Work in partnership with scientists  For-profit fee-for-service contracts  Non-profit grant sub-awards –Co-investigators with a shared research agenda  All development approved by and relevant to FHCRC  Development & support around LabKey Server  Extending the base LabKey Server platform  Creating customized lab-specific solutions  Hosting LabKey server  Support 2

3 LabKey Software 2010 What Is LabKey Server?  An open-source, web-based platform for organizing, analyzing & sharing scientific data  Data integration analysis for assays  Proteomics, flow cytometry, plate-based assays, etc.  Study Data Management  Combines demographic, clinical, assay & specimen data  LabKey Server powers many deployments…  CPAS: FHCRC proteomics repository  Atlas Science Portal: SCHARP’s HIV vaccine studies  AdaptiveTCR: Customer analytics for ImmunoSEQ NGS  UW (Katze, Heinecke, et al), USC, Markey, Harvard, IDRI, TGen, Wisconsin Primate EHR, UC Denver, etc. 3

4 LabKey Software 2010 Dave O’Connor Lab, University of Wisconsin  Academic research lab  Focus: understanding SIV using nonhuman primate models & applying NHP methods to human HIV disease research  Academic research lab  Focus: understanding SIV using nonhuman primate models & applying NHP methods to human HIV disease research

5 Source: modified from Yewdell et al., Nature Reviews Immunology 2003 Source: Korber et al., British Medical Bulletin 2001 Host Immune Genetics Virus Genetics O’Connor Lab SIV/HIV Research

6 Source: modified from Yewdell et al., Nature Reviews Immunology 2003 Host Immune Genetics  MHC class I molecules dictate immunity to disease  High degree of polymorphism within the MHC class I peptide-binding domain  Specific MHC alleles associated with superior control of HIV infection Importance of MHC Class I

7 Source: Korber et al., British Medical Bulletin 2001 Virus Genetics  HIV has fast replication cycle, high mutation rate  Evolution of the virus causes escape from immune responses  Specific mutations are associated with resistance to antiretroviral drug therapy Importance of Viral Variability

8 LabKey Software 2010 Sequencing in the O’Connor Lab 8  2005 – 2009 Sanger sequencing  “Prohibitively expensive” for most experiments  2009 Roche/454 GS FLX at UIUC  2010 Roche/454 GS Junior in lab  Roche/454 GS Junior  Long-read instrument, critical for genotyping  Identical to GS FLX, but 1/8 throughput & lower cost  ~100,000 reads per run (~1¢ per read), average ~560bp read length  115 runs this year  MID tagging  Allows pooling multiple samples (30-100) into a single run  Galaxy server  Open-source sequence analysis tool (Giardine et al, Genome Res 2005)  Lab has built custom workflow to match sequences to known MHC alleles  Uses BLAT, transitioning to AGILE (Northwestern alignment tool)

9 Roche/454 MHC Workflow Total RNA isolation and cDNA synthesis – RNA isolation ~4 hrs; cDNA synthesis ~2 hrs Primary PCR amplification – plus SPRI purification, quantification, pooling ~3 hrs emPCR – set-up ~1 hr, run ~5.5 hrs Breaking and enrichment – ~3 hrs Roche/454 GS Junior run – set-up ~1.5 hrs; run time ~10 hrs Data processing and analysis – run processing ~2 hrs; analysis time varies m

10 LabKey Software 2010 PROBLEM: DATA MANAGEMENT! There is a real disconnect between the ability to collect next-generation sequence data (easy) and the ability to analyze it meaningfully (hard)  Dave O’Connor 10

11 LabKey Software 2010 Problem: Data Management  As volume has increased, lab has found it difficult to manage all their sequencing data & meta data:  Run meta data  Run metrics  Sequencing reads and quality scores  Sample information and multiplex identifiers (MIDs)  Reference sequences for genotyping experiments  Genotyping matches  O’Connor asked LabKey to build a system that can:  Store sequencing and genotyping data in a single database that links all the tables, allowing arbitrary queries and reports  Provide tools for analysis, querying, visualization and export  Automate data workflows for efficiency & consistency  Eventually, link sequencing results to their primate EHR system 11

12 LabKey Software 2010 LabKey Sequencing System 12 Reads Quality Scores Metrics Sample Information Sequencing and Genotyping Database External Tools AnalysisReportingExport Galaxy Genotyping Workflow Reference Sequences Visualization

13 Database Schema 13

14 LabKey Software 2010 Demo 14

15 LabKey Software 2010 Possible Future Directions  Respond to O’Connor lab’s near-term needs  Genomics-specific analytics  Additional export formats  Tighter integration with Galaxy  Support for amplicon-designated reads  Match combining  Simplify configuration and operation  Integrate with Wisconsin primate EHR  Better integration with R / Bioconductor  Visualization  Other sequencing platforms: Illumina, PacBio… 15

16 LabKey Software 2010 Acknowledgements  O’Connor Laboratory  David O’Connor  Simon Lank  Julie Karl  Benjamin Bimber  LabKey Software  Mark Igra  Brian Connolly  Elizabeth Nelson  Josh Eckels  Matthew Bellew  Et al

17 LabKey Software 2010 Questions? 17


Download ppt "© 2010 LabKey Software Managing Next Generation Sequencing and Multiplexed Genotyping Data Using Open Source LabKey Server Adam Rauch"

Similar presentations


Ads by Google