Mutation Analysis Server Nagarajanlab. © Copyright 2005, Washington University School of Medicine. 2 Agenda Mutation pipeline overview High level design.

Mutation Analysis Server Nagarajanlab

© Copyright 2005, Washington University School of Medicine. 3 Mutation pipeline overview Investigator Bioinformatics core Request Primer Design Primers Send Samples Bioinformatics core Traces Run traces through Mutation Analysis Server Verify Mutations using Mutation viewer Analysis Result

© Copyright 2005, Washington University School of Medicine. 4 High level design of Mutation Analysis Server Download data from GSC’s FTP site / ensure all files sent by Clinical Genomic core are in correct folder. Check file name in correct format Rename files Filter out bad quality traces Load information in database FTP trace files and reference sequence to analysis server

© Copyright 2005, Washington University School of Medicine. 5 High level design of Mutation Analysis Server (cont…) Make XMLRPC request for analysis On the XMLRPC server:  Rename traces once again.  Run the traces through Phred, Phrap and Needle.  Do inserts in database Call mutation names using BCTLoader Calculate intra-peak area drop using tracepeakanalysis software

© Copyright 2005, Washington University School of Medicine. 6 Details of Mutation Analysis Server This section is divided in following sub- sections:  Prepare Files  Filter File  Load info in database  XML RPC  Base change text loader and Trace Peak analysis software

© Copyright 2005, Washington University School of Medicine. 7 Prepare files: CGC put files on our server. GSC put files on their FTP server and we download them on our server. Both cores’s have different naming format. We rename the files to a common format. Common Format: H_BS-1016AA02PCR70429m70430_70429.b1

© Copyright 2005, Washington University School of Medicine. 8 Prepare files: Understanding format H_BS-1016AA02PCR70429m70430_70429.b1 H_BS: Prefix that identifies the Experiment (refer to table MUTATION_EXPERIMENT) 1016: The Array (plate) this trace belongs to. (refer to table SAMPLE_ARRAY) A: Just an extra character A02: Well position in the Array (refer to table SAMPLE_ARRAY) PCR: Constant text PCR 70429: Fwd Sequence ID (ID for Forward primer) m: used as seperator 70430: Reverse Sequence ID (ID for reverse primer) _70429: Helps Identify which primer was used to sequence this trace. B1: Also tells us that it done using fwd primer & G1 tells us that it was done using reverse primer.

© Copyright 2005, Washington University School of Medicine. 9 Prepare files (rename to correct format): CGC names files as: H_BS-1016AA02PCR70429m70430_70429.abi So we just need to figure out from sequence Ids and rename them to B1 or G1. GSC names files as: H_BS-2541nPCR0004067_001a.b1 First ask user what is his preferred Array ID incase there is a conflict. We use Sample ID (2541) and users preferred Array ID to figure out Well Position. 4067 is the locus ID. 001 is the Amplicon Number. We use these two to query Amplicon_Sequence table to get FWD and reverse Sequence ID. If we don’t find them then assign two new Sequence Ids and insert them in amplicon Sequence table. We know weather it’s a Fwd sequence or Reverse Sequence by B1 or G1. Note: Sometimes you may get file’s extension as B2 / G2 or even B3. It just because they did this amplicon 2 or 3 times to get the trace. This is also known as Version of trace.

© Copyright 2005, Washington University School of Medicine. 10 Filter File (common to both CGC and GSC) Run all traces through Phred to let it trim what it thinks non trace data. Move all Fwd sequences to one folder Move all reverse sequence to another folder. Take reverse complement of forward tail and use it to call Custom trim on all traces done by reverse primers. Similarly take reverse tail’s reverse complement and use it to Custom trim on all traces done by Fwd primers. To Custom trim a trace, search and trim the biggest sequence from the trimming sequence that exactly matches the given end of the actual sequence. If the trimming sequence is less than 6 bases then we do not trim the trace. All Custom trim results are put back in a common folder.

© Copyright 2005, Washington University School of Medicine. 11 Filter File (Specific to CGC) Query PRIMERGENERALDETAILS_NEW and PRIMERMANUALPICKS to get length of Amplicons along with their respective sequence Ids. For each trace get Percent score of bases that have Phred score of 25 or more. Percent is calculated based on amplicon length. If 50% of bases have Phred score more than 25 then it is considered pass, else it is put in failed folder.

© Copyright 2005, Washington University School of Medicine. 12 Filter File (Specific to GSC) For each trace get Percent score of bases that have Phred score of 25 or more. Percent is calculated based on number of bases between left clip and right clip. If the length between left and Right clip is less than 100 then the trace is failed directly. If 50% of bases have Phred score more than 25 then it is considered pass, else it is put in failed folder.

© Copyright 2005, Washington University School of Medicine. 13 Load info in database For each trace do following steps: Pull out Array ID, well position, Fwd sequence ID, reverse Sequence ID, sequencing primer ID and Version information from the filename. Get Biggest continuous region in trace that has Phred score of 20 or more, 30 or more, 40 or more. Load all the above information in TRACEFILE_ANALYSIS_OLD table. For each Record in TRACEFILE_ANALYSIS_OLD table do following steps:  Get information from FINALPHRED20PRECENT50NEW view, FINALPHRED30PRECENT50NEW view and FINALPHRED40PRECENT50NEW view, FINALPHRED20PRECENT400NEW view, FINALPHRED30PRECENT400NEW view, FINALPHRED40PRECENT400NEW view and FINALAVERAGESNEW view for this trace.  Insert information into SUBMITTEDPRIMERS  Check if NM_REFSEQID table already has information, else insert into nm_refseqID

© Copyright 2005, Washington University School of Medicine. 14 XML RPC For each file collect following information:  Sample ID  List of Reference Sequence IDs that have this traces sequence ID (For transcripts)  For Each reference Sequence, get the reference Sequence and gene ID  Get Mutation Experiment ID from the trace Filenames Prefix (H_BS)  Get Patient number for this trace  Generate a new filename in following format: GeneID-LocusID- UniqueSampleID. FwdPrimerID_RevPrimerID_SeqencingPrimerID_version.scf Group Files that have same value for GeneID-LocusID- UniqueSampleID

© Copyright 2005, Washington University School of Medicine. 15 XML RPC (Cont…) For each group of files, for each reference sequence connect to XML RPC Server, put refseq sequence and traces in appropriate folder. While uploading traces, rename them to new filename generated in previous step. Call XML RPC server Request. XML RPC server in a nutshell will run the traces through following steps: Phred, Phrap and Needle. Put the results in appropriate tables for MV. (see next slide for diagrammatic representation)

© Copyright 2005, Washington University School of Medicine. 16 XML RPC (Cont…) diagrammatic representation of what is happening A color on trace file represents a unique combination of Patient ID and Gene ID Go through all files and group them according to Gene ID and Patient ID For a given gene and patient combination assemble traces into contigs and Singletons. Contigs are in Red, singletons in Blue and reference Sequence in Green color Get Parts of reference Sequence for all the contigs and singletons using Blast. Parts of Reference sequence are cut that match the contigs and Singletons.. Use these parts with Infomax software to get all Mutation related information. Insertions, Deletion, Mutations etc are marked on the contigs WRT ref-seq Map back the co-ordinates WRT original reference Sequence Insert all this information in appropriate tables Repeate steps for unique pair of GeneID and Patient ID

© Copyright 2005, Washington University School of Medicine. 17 Base change text loader and Trace Peak analysis software After all the analysis is done, call base change text loader software (BCT Loader) BCTLoader queries CONFIDENCE_BASE_PAIR table and gives each mutation appropriate name and inserts it into save_mutation. Trace Peak Analysis software calculates Inrapeak Area drop for all calls are “Subsitution” type.

© Copyright 2005, Washington University School of Medicine. 18 Post analysis… Usually After analysis after verifying that everything ran fine, we generate High probability mutation list for investigators. They use this list to give Yes / No / Unsure calls for mutations listed in mutation viewer. After they are done we generate a comprehensive report for them.

Mutation Analysis Server Nagarajanlab. © Copyright 2005, Washington University School of Medicine. 2 Agenda Mutation pipeline overview High level design.

Similar presentations

Presentation on theme: "Mutation Analysis Server Nagarajanlab. © Copyright 2005, Washington University School of Medicine. 2 Agenda Mutation pipeline overview High level design."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mutation Analysis Server Nagarajanlab. © Copyright 2005, Washington University School of Medicine. 2 Agenda Mutation pipeline overview High level design.

Similar presentations

Presentation on theme: "Mutation Analysis Server Nagarajanlab. © Copyright 2005, Washington University School of Medicine. 2 Agenda Mutation pipeline overview High level design."— Presentation transcript:

Similar presentations

About project

Feedback