Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequencing Data Analysis

Similar presentations


Presentation on theme: "Sequencing Data Analysis"— Presentation transcript:

1 Sequencing Data Analysis
Debashis Sahoo Department of Computer Science CSE291 – H00 – Lecture 17

2 Sanger dideoxy sequencing--basic method
Single stranded DNA 3’ 5’ 5’ 3’ a) Anneal the primer

3 An automated sequencer
The output

4 Sequence output Computer calls Raw data
GNNTNNTGTGNCGGATACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACATATGCACCACCAC CACCACCACCCCATGGGTATGAATAAGCAAAAGGTTTGTCCTGCTTGTGAATCTGCGGAACTTATTTATGATCCAGAAAG GGGGGAAATAGTCTGTGCCAAGTGCGGTTATGTAATAGAAGAGAACATAATTGATATGGGTCCTAAGTGGCGTGCTTTTG ATGCTTCTCAAAGGGAACGCAGGTCTAGAACTGGTGCACCAGAAAGTATTCTTCTTCATGACAAGGGGCTTTCAACTGCA ATTGGAATTGACAGATCGCTTTCCGGATTAATGAGAGAGAAGATGTACCGTTTGAGGAAGTGGCANTCCANATTANGAGT TAGTGATGCAGCANANAGGAACCTAGCTTTTGCCCTAAGTGAGTTGGATAGAATTNCTGCTCAGTTAAAACTTCCNNGAC ATGTAGAGGAAGAAGCTGCAANGCTGNACANAGANGCAGNGNGANAGGGACTTATTNGANGCAGATCTATTGAGAGCGTT ATGGCGGCANGTGTTTACCCTGCTTGTAGGTTATTAAAAGNTCCCGGGACTCTGGATGAGATTGCTGATATTGCTAGAGC

5 Amplifying DNA in Vitro: The Polymerase Chain Reaction (PCR)
The polymerase chain reaction, PCR, can produce many copies of a specific target segment of DNA A three-step cycle—heating, cooling, and replication—brings about a chain reaction that produces an exponentially growing population of identical DNA molecules

6 The three main steps of PCR
Step 1: Denature DNA At 95C, the DNA is denatured (i.e. the two strands are separated) Step 2: Primers Anneal At 40C- 65C, the primers anneal (or bind to) their complementary sequences on the single strands of DNA Step 3: DNA polymerase Extends the DNA chain At 72C, DNA Polymerase extends the DNA chain by adding nucleotides to the 3’ ends of the primers.

7 PCR: Polymerase Chain Reaction
Step 1: denaturation Step 2: annealing Step 3: extension

8 PCR PCR tubes PCR C1000 Thermal Cycler

9 Denaturation of DNA This occurs at 95 ºC mimicking the function of helicase in the cell.

10 Step 2 Annealing or Primers Binding
Reverse Primer Forward Primer Primers bind to the complimentary sequence on the target DNA. Primers are chosen such that one is complimentary to the one strand at one end of the target sequence and that the other is complimentary to the other strand at the other end of the target sequence.

11 Step 3 Extension or Primer Extension
DNA polymerase catalyzes the extension of the strand in the 5-3 direction, starting at the primers, attaching the appropriate nucleotide (A-T, C-G)

12 The next cycle will begin by denaturing the new DNA strands formed in the previous cycle

13 The Size of the DNA Fragment Produced in PCR is Dependent on the Primers
The PCR reaction will amplify the DNA section between the two primers. If the DNA sequence is known, primers can be developed to amplify any piece of an organism’s DNA. Forward primer Reverse primer Size of fragment that is amplified

14 FASTA >SEQUENCE_1 MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL >SEQUENCE_2 SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH

15 FASTQ @HWUSI-EAS466_0001:1:1:6:1464#0/1 CAAATGTCTATTTTNTCCGTCAATCTGTGAGTGNCA +HWUSI-EAS466_0001:1:1:6:1464#0/1 ACCTGGTCCTCTTTNAAGACGCGATGTGTCACGNTG +HWUSI-EAS466_0001:1:1:6:579#0/1 CGAATATCGTGACCNACCGCGGTACAATTGCATNCT +HWUSI-EAS466_0001:1:1:6:1050#0/1 a``aaaa`Y\T`aaBaa_^_``\`a```[O]__Ba`

16 The different types of BLAST
BLAST = Basic Local Alignment Search Tool “The most popular data mining tool ever” BLASTN DNA sequence vs. DNA sequence database BLASTP protein sequence vs. protein sequence database BLASTX DNA sequence translated in 6 reading frames vs. protein sequence database tBLASTX DNA sequence translated in 6 reading frames vs. DNA sequence database translated in 6 frames

17 Steps to use Blast #1) Paste sequence here #2) Choose search set
(Either nucleotide collection or Protein Data Bank) #4 push blast button #3) select program to use

18 An example of aligning text strings
Raw Data ??? T C A T G C A T T G 2 matches, 0 gaps T C A T G | | C A T T G 3 matches (2 end gaps) T C A T G | | | C A T T G 4 matches, 1 insertion T C A - T G | | | | C A T T G T C A T - G | | | | C A T T G

19 Terminologies of sequence comparison
Sequence identity -- exactly the same Amino Acid or Nucleotide in the same position. Sequence similarity -- Substitutions with similar chemical properties. Sequence homology -- general term that indicates evolutionary relatedness among sequences; we usually measure of percentage identity of sequence homology Pairwise alignment -- used to find the best-matching piecewise (local) or global alignments of two query sequences. Pairwise alignments can only be used between two sequences at a time. Multiple sequence alignment -- try to align all of the sequences in a given query set.

20 Where are the coding regions?
TCAGCGAAGATGAGATAGTTTTTAAAGGTGGGATTTCCCCACCTTTAAAAAGCGAGAAGTCCCGGTTTTAAAGAGGAGTAAAATCCTCTTTTTCTAGCCCACTCAGGTGGTTTTTTTGGTTTTCGCTCCTTGCCGCATCTTCTGTGCCTTTGATGGCGGCTGGTTGGGGTGAAAGGCTGCATATTCCAGAATTTCAGACAGTAGATTGTTTTTGAAATCTTCCGTTTTATCGTTGACGAACTTAACCATCCTGTTGAAATCATCTTCCTTTGATACACCTTCAGGAAATGCCTTAGGAACTGATGTTTGGCTATCCAAGGCATCTTGCAATATCTGCACGATCTCCGAATTCATTGATCGCCCATTGGCCTTTGCTCTGGCGGCAACTGCGTCACGCATACCGTCAGGCATCCTAACTGTAAATCTCTCAATGAAAGCTGGATCTTCTTTTTCAGTCATCATCTTAAACCATAAAAATTTATACAAAACACACTAGCATCATATTGACATTACCCACAATGACATCATAATGGTGTCAGGCATCAAAATGATGTCATCATGACAAGGGGAAAGTAAATGCAAGATGTTCTCTATACAGGTCGTAAGAACGACAGCTTTCAGCTTCGTCTGCCTGAGCGAATGAAAGAAGAGATCCGTCGCATGGCAGAGATGGACGGCATTTCGATTAATTCTGCAATCGTGCAGCGCCTTGCTAAAAGCTTGCGTGAGGAAAGAGTTAATGGGCAGTAAAAACAGCGAAGCCCGGAAGTGTGGGGACACTAACCGGGCTTCTAATGTCAGTTACCTAGCGGGAAACCAACAATGACCAGTATAGCAATCTTTGAAGCAGTAAACACTATCTCTCTTCCATTCCACGGACAGAAGATCATAACTGCGATGGTGGCGGGTGTGGCGTATGTGGCAATGAAGCCCATCGTGGAAAACATCGGTTTAGACTGGAAGAGCCAGTATGCCAAGCTCGTTAGTCAGCGTGAAAAGTTCGGGTGTGGTGATATCACCATACCTACCAAAGGTGGTGTTCAGCAGATGCTTTGCATCCCTTTGAAGAAACTGAATGGATGGCTCTTCAGCATTAACCCAGCAAAAGTACGTGATGCAGTTCGTGAAGGTTTAATTCGCTATCAAGAAGAGTGTTTTACAGCTTTGCACGATTACTGGAGCAAAGGTGTTGCAACGAATCCCCGGACACCGAAGAAACAGGAAGACAAAAAGTCACGCTATCACGTTCGCGTTATTGTCTATGACAACCTGTTTGGTGGATGCGTTGAATTTCAGGGGCGTGCGGATACGTTTCGGGGGATTGCATCGGGTGTAGCAACCGATATGGGATTTAAGCCAACAGGATTTATCGAGCAGCCTTACGCTGTTGAAAAAATGAGGAAGGTCTACTGATTGGCGTATTGGAAGGCGCAAAAAGAAAAGCCAGCAGATGGGCTGCTGGCATTCATTGGGTATATGAACTTTCGGAGAACATATGAAGTCAATTATCAAGCATTTTGAGTTTAAGTCAAGTGAAGGGCATGTAGTGAGCCTTGAGGCTGCAAGCTTTAAAGGCAAGCCAGTTTTTTTAGCAATTGATTTGGCTAAGGCTCTCGGGTACTCAAATCCGTCA

21 Exon prediction in Eukaryotic DNA using Genescan: Net result is a protein sequence
GeneScan looks for start and stop codons, promoters, splice sites, polyA tails, provides statistics for coding potential

22 NGS sequencing pipeline

23 Sequencing steps Library preparation Library amplification
Parallel sequencing Voelkerding KV et al., J Mol Diagn (2010) 12,

24 NGS Application Whole genome sequencing Whole exome sequencing
RNA sequencing ChIP-seq/ChIP-exo CLIP-seq GRO-seq/PRO-seq Bisulfite-Seq

25 Shyr D, Liu Q. Biol Proced Online. (2013)15,4
Patient Technologies Data Analysis Integration and interpretation point mutation Small indels Further understanding of cancer and clinical applications Genomics WGS, WES Copy number variation Functional effect of mutation Structural variation Differential expression Transcriptomics RNA-Seq Network and pathway analysis Gene fusion Alternative splicing RNA editing Integrative analysis Methylation Epigenomics Bisulfite-Seq ChIP-Seq Histone modification Transcription Factor binding Shyr D, Liu Q. Biol Proced Online. (2013)15,4


Download ppt "Sequencing Data Analysis"

Similar presentations


Ads by Google