Presentation is loading. Please wait.

Presentation is loading. Please wait.

MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Similar presentations


Presentation on theme: "MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University."— Presentation transcript:

1 MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University College of Medicine Genome Informatics I (2015 Spring)

2 Overview Goal of this lecture –You will learn the basic technologies and properties of Next Generation Sequencing Sequencing technologies –Sanger sequencing –Next generation sequencing Illumina sequencing 454/Ion torrent sequencing Other sequencing –Raw data (fastq) Format/Phred Quality –Practice meet the raw data Genome Informatics I (2015 Spring)

3 SEQUENCING TECHNOLOGIES Genome Informatics I (2015 Spring)

4 Traditional Sequencing Genome Informatics I (2015 Spring) 1.Genomic DNA is fragmented, then cloned to a plasmid vector and used to transform E. coli 2.For each sequencing reaction, a single bacterial colony is picked and plasmid DNA isolated 3.Each cycle sequencing reaction takes place within a microliter-scale volume

5 Sanger Sequencing Genome Informatics I (2015 Spring)

6 Next Generation Sequencing No cloning –DNA to be sequenced is used to construct a library of fragments that have synthetic DNAs (adapters) added covalently to each fragment end by use of DNA ligase Amplification can be done in parallel –Library fragments are amplified in situ on a solid surface Sequencing can be done in parallel (in 3 iterative steps) –a nucleotide addition step –a detection step –a wash step Genome Informatics I (2015 Spring)

7 Illumina Sequencing Genome Informatics I (2015 Spring)

8 Illumina Sequencing Genome Informatics I (2015 Spring)

9 Illumina Sequencing Genome Informatics I (2015 Spring)

10 Illumina Sequencing Genome Informatics I (2015 Spring) https://www.youtube.com/watch?v=HMyCqWhwB8E

11 Ion Torrent Sequencing Genome Informatics I (2015 Spring) 1.DNA capture on beads 2.Single bead in a well 3.Attach one nucleotide (A/T/G/C) at one time 4.Detect pH change 1.Measure the level of change for homopolymer detection

12 Ion Torrent Sequencing Genome Informatics I (2015 Spring)

13 Ion Torrent Sequencing Genome Informatics I (2015 Spring)

14 Ion Torrent Sequencing Genome Informatics I (2015 Spring)

15 Pacbio SMRT sequencing Genome Informatics I (2015 Spring) zero-mode waveguide (ZMW) http://www.pacificbiosciences.com/products/smrt-technology/

16 Nanopore sequencing Genome Informatics I (2015 Spring) https://www.youtube.com/watch?v=3UHw22hBpAk

17 Comparison Genome Informatics I (2015 Spring)

18 18

19 19

20 NGS DATA raw data (FASTQ) Genome Informatics I (2015 Spring)

21 FASTA format Genome Informatics I (2015 Spring) A format for DNA (or protein) sequence

22 FASTQ format (NGS raw data) Genome Informatics I (2015 Spring) one read sequenc e quality A format for NGS read (FASTQ + quality)

23 Practice First look on NGS data Genome Informatics I (2015 Spring)  cd /scratch/2015_GenomeInformatics/public/fastq  ls  less sample1.fastq

24 sequence Genome Informatics I (2015 Spring) @SRR1798798.1 D4LHBFN1:204:D1B2UACXX:6:1101:1156:1996 length=101 NCTCTCACCGAGCTCCACGAACGATAAGGGAATCAGTCTTAAAAGAGCCGCGAGTTACAGGCACACCTGAGAGAAAGAGATGTTTGTA TTCACCTTAGAAC +SRR1798798.1 D4LHBFN1:204:D1B2UACXX:6:1101:1156:1996 length=101 #1:BDDDDF?FF@B>:ACFIBCGB3BF@C @ABBBB?BBBBBBBB?@:?AA@B@?(:4:>? >ABBB

25 Quality Each basecall (a call for nucleotide – ‘A’,’T’,’C’,’G’) has its own quality –quality is a confidence of the machine Genome Informatics I (2015 Spring)

26 Phred scale quality Genome Informatics I (2015 Spring) @SRR1798798.1 D4LHBFN1:204:D1B2UACXX:6:1101:1156:1996 length=101 NCTCTCACCGAGCTCCACGAACGATAAGGGAATCAGTCTTAAAAGAGCCGCGAGTTACAGGCACACCTGAGAGAAAGAGATGTTTGTA TTCACCTTAGAAC +SRR1798798.1 D4LHBFN1:204:D1B2UACXX:6:1101:1156:1996 length=101 #1:BDDDDF?FF@B>:ACFIBCGB3BF@C @ABBBB?BBBBBBBB?@:?AA@B@?(:4:>? >ABBB Q = -10log 10 (e) Probability of the base call being wrong 10%, 1%, 0.1%, 0.01%... 10, 20, 30, 40… Quality score +33 +,5,?,I… ASCII code table

27 practice pick any sequence and find out where it is from calculate what is the probability of a basecall with quality ‘D’ is wrong (advanced) write a python code that transforms Q to e (or vice versa) –hint: function chr(i) converts the integer i to its matching ASCII code character. e.g. chr(65)=‘A’ –function ord(c) converts the character c to its matching ASCII code integer. e.g. ord(‘A’)=65 –math.log(10, x) calculates the log10 value of X You must import math library at the first line (import math) –answer is in public/script/qtoe.py Genome Informatics I (2015 Spring)


Download ppt "MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University."

Similar presentations


Ads by Google