Introduction Over the last two years, the production teams at The Genome Center (GC) have been working to design flexible pipelines that are able to handle.

Introduction Over the last two years, the production teams at The Genome Center (GC) have been working to design flexible pipelines that are able to handle a multitude of diverse projects. With the advent of next generation sequencing technologies that allow rapid and low cost data generation, many more samples are needed to fill the production pipelines. The cost of data generation has fallen to a point where new and creative project designs can be considered that were not feasible in the past. Because of this greater degree of freedom with project design, the pipeline needed to be designed to handle projects with a large numbers of samples needing focused data generation as well as projects with only a few samples needing genome-wide coverage. The first step in this process was to develop a long term plan that involved every aspect of The Genome Center. The senior leadership from informatics, technology development, analysis and production meet weekly to monitor the pipelines, discuss new projects and decide how best to take advantage of new technologies. Every project we accept first undergoes a thorough review to make sure that we can meet the expectations of the collaborators in a cost-effective and timely manner. We then strive to work with collaborators to explain the approach and attendant timelines. The incoming sample management group is the entry point to the pipeline and was designed to handle large numbers of samples. Software tools allow the efficient import of barcoded or non-barcoded samples into the laboratory information management system database (LIMS). Project tracking and sample progress monitoring tools facilitate communication of progress with each collaborator. Within Sample Management, the resource bank houses all of the samples brought into the center, provides samples for QC, and manages distribution to the appropriate group when project production work begins. In addition, all of the lab and analytical processes associated with each sample are catalogued in a new “work order” system developed by the LIMS group. The downstream production groups such as library construction and the data generation groups (3730, 454 and Illumina) work from uniform, SOP-formatted protocols and LIMS tools. These groups process large numbers of samples, using automation to provide more uniform results, and to minimize sample processing mistakes and contamination. The protocols also allow for pooling of samples which reduces the cost of the project by utilizing next generation sequencing platforms in a more cost-effective manner. After data generation, projects enter the specified analysis pipeline(s). The Genome Center offers a wide range of analysis capabilities, including single nucleotide, in/del variant and structural variation detection. We have high-throughput validation schemes available, as well as replication capabilities. Project Management Work Flow The Estimates group is contacted before a project begins. The sample submission sheet is sent and completed by collaborator or internal contact. Samples arrive at The Genome Center with the completed sample submission sheet, and are delivered to project management. Project Management Team checks for correct paperwork including: Sample submission sheet, signed estimate, IRB, billing contact and account based upon the funding source. A project heading is communicated to the laboratory information management system database, and the project is created. Work order is created with information from the submission sheet, the samples are barcoded, checked into freezer, and work order sent to appropriate groups. Resource bank group claims samples, quantitates, and creates aliquots. Samples sent to library construction, and are then forwarded to appropriate sequencing pipeline after library construction is complete. PROJECT MANAGEMENT, DATA GENERATION, AND ANALYSIS AT THE GENOME CENTER Elaine Mardis, Lucinda Fulton, and The Genome Center Production Group The Genome Center, Washington University School of Medicine, St. Louis, Missouri 63108, USA Number of GAIIx’s = 50Number of HiSeqs = 3 101 Cycle Paired End Run Avg. Gb/ run34 Reads/ run170 M PF Run Time10 days Avg. Error rate R12.07 Avg. Error rate R21.55 Sample received from Lib Core Titration Enrichment Experiment (TERM) emulsification and amplification TERM Recovery & Enrichment Large Volume (LV) Emulsion emulsification and amplification LV Recovery & Enrichment Sequencing Run Preparation Sequencing Run To Analysis 7 hour process 4 hour process 8 hour process 4 hour process 2 hour process 9 hour process 454 Sequencing Pipeline 454 Sequencing Capacity 8 total 454 instruments 120 Titanium runs per month (50.4 Gb per month) Average fragment yield per run: 1.2 million reads 350 average readlength 420 average megabases Currently, ~10% of pipeline is dedicated to HMP projects. Total Libraries per week Resource Bank Work Flow Sterile Environment for Sample Assessment –Pipettes, tips, water, empty aliquot tubes are sterilized in a hood with UV –Countertop in the hood is washed with DNA Away prior to introducing samples –Disposable Lab Coats, Face Masks, and Gloves are worn when handling samples –Filtered pipette tips are utilized –Water used to generate dilutions is disposed of daily per technician QC Metrics –Quantitation Method: Qubit Assay –Quality Metric: Agarose Gel or Agilent Electropherogram Sample Tracking via LIMS –Initial Source volume is recorded –Qubit Concentration is recorded –Agarose Gel and/or Agilent Trace is saved –Source tubes are linked to each aliquot created –All samples (stocks and aliquots) are logged into a freezer system –Inventory reports can be generated to determine material remaining Sample Distribution –Manual Aliquots are performed in a sterile environment in the hood –Automated Aliquots are being tested for sterility utilizing the EpMotion Library Construction Work Flow Library Preparation Environment –Technician work space is wiped down with bleach prior to and immediately following the completion of library construction –Lab Coat, Goggles and Gloves are worn when handling samples –Library reagents are aliquot specifically for one library event to prevent cross contamination Library Prep Methods –454 Library Prep: manual 125 MIDs exist for samples that need to be pooled prior to data generation –Illumina Library Prep: manual or automated 12 indexes exist for samples that need to be pooled prior to data generation Library QC Metrics –Initial sample verification: aliquot barcode and volume is confirmed against resource bank information –Final Library Quantitaion: Qubit Assay –Final Library Quality: Agilent Electropherogram Library Construction Capacity Total Libraries per week Illumina Sequencing Pipeline MethodsQubit Assessments Agilent Assessments Manual Aliquot Generation Number of samples per person per day 1004860 Total Samples per Week 500240300 Resource Bank Technician Allocation 10.5 Total Sample Assessment per Week 500150 Poster designed by Kerri Ochoa The Genome Center has a robust system for variant validation that can be extended from single nucleotide variants (SNV), to small indels (~<100bp), to larger structural variations (large indels, translocations, and inversions). Several methods are available, and chosen on the basis of scale, turnaround times, or other project specific guidelines. The methods are outlined briefly below. Targeting Method PCR: Specific targeting, scaleable from 1 to 1000’s, applicable to all sequencing platforms, cost effective at small scale (<1000 sites), and rapid turnaround (~5 days from site definition to oligo in hand) Capture: Very efficient design and capture efficiencies, cost effective in large scale (>1000 sites), applicable to next generation sequencing platforms Sequencing Method 3730: Advantages - Rapid turnaround (~1 day template to seq), Inexpensive at small scale Disadvantages - expensive at larger scale, mixed alleles in single sequence 454:Advantages – 2-3 day turnaround, Cost effective at large scale, digital (clonal) and deep read counts for calculation of variant allele frequencies, orthogonal sequencing to the common Illumina discovery platform Disadvantages – Ambiguities in mononucleotide runs, Cost per sequences still higher than Illumina, products are pooled so normalization is necessary Illumina: Advantages- Extremely cost effective at large scale, digital (clonal) and deep read counts for calculation of variant allele frequencies Disadvantages – Concatenation and shearing necessary in library construction, not orthogonal sequence validation of common discovery Illumina platform Variant Validation Analysis

Introduction Over the last two years, the production teams at The Genome Center (GC) have been working to design flexible pipelines that are able to handle.

Similar presentations

Presentation on theme: "Introduction Over the last two years, the production teams at The Genome Center (GC) have been working to design flexible pipelines that are able to handle."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction Over the last two years, the production teams at The Genome Center (GC) have been working to design flexible pipelines that are able to handle.

Similar presentations

Presentation on theme: "Introduction Over the last two years, the production teams at The Genome Center (GC) have been working to design flexible pipelines that are able to handle."— Presentation transcript:

Similar presentations

About project

Feedback