Presentation is loading. Please wait.

Presentation is loading. Please wait.

Canadian Bioinformatics Workshops

Similar presentations


Presentation on theme: "Canadian Bioinformatics Workshops"— Presentation transcript:

1 Canadian Bioinformatics Workshops

2

3 Introduction to next-gen sequencing
Informatics on High Throughput Sequencing Data Introduction to next-gen sequencing Francis Ouellette July 25th 2008

4 Outline Sequencing DNA Next Generation Technologies
Solexa SOLiD 454 Helicos AB’s color space What next, & things to keep in mind!

5 Adapted from John McPherson, OICR
Biological Research

6 History of DNA Sequencing
Adapted from Eric Green, NIH; Adapted from Messing & Llaca, PNAS (1998) Avery: Proposes DNA as ‘Genetic Material’ Watson & Crick: Double Helix Structure of DNA Holley: Sequences Yeast tRNAAla 1870 1953 1940 1965 1970 1977 1980 1990 2002 Miescher: Discovers DNA Wu: Sequences  Cohesive End DNA Sanger: Dideoxy Chain Termination Gilbert: Chemical Degradation Messing: M13 Cloning Hood et al.: Partial Automation Cycle Sequencing Improved Sequencing Enzymes Improved Fluorescent Detection Schemes 1986 Next Generation Sequencing Improved enzymes and chemistry Improved image processing Efficiency (bp/person/year) 1 15 150 1,500 15,000 25,000 50,000 200,000 50,000,000 100,000,000,000 2008

7 Basics of the “old” technology
Clone the DNA. Generate a ladder of labeled (colored) molecules that are different by 1 nucleotide. Separate mixture on some matrix. Detect fluorochrome by laser. Interpret peaks as string of DNA. Strings are 500 to 1,000 letters long 1 machine generates 57,000 nucleotides/run Assemble all strings into a genome.

8 Basics of the “new” technology
Get DNA. Attach it to something. Extend and amplify signal with some color scheme. Detect fluorochrome by microscopy. Interpret series of spots as short strings of DNA. Strings are letters long Multiple images are interpreted as 0.4 to 1.2 GB/run (1,200,000,000 letters/day). Map or align strings to one or many genome.

9 From Debbie Nickerson, Department of Genome Sciences, University of Washington,

10 Differences between the various platforms:
Nanotechnology used. Resolution of the image analysis. Chemistry and enzymology. Signal to noise detection in the software Software/images/file size/pipeline Cost $$$

11 Next Generation DNA Sequencing Technologies
Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” Next Generation DNA Sequencing Technologies 3 Gb ==

12 Solexa

13 Solexa-based Whole Genome Sequencing
Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome”

14 Solexa-based Whole Genome Sequencing
Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” Solexa flow cell ~50M clusters are sequenced per flow cell.

15 Debbie Nickerson, Department of Genome Sciences, University of Washington, http://tinyurl.com/6zbzh4

16 454

17 Roche / 454 : GS FLX Real Time Sequencing by Synthesis
Chemiluminescence detection in pico titer plates Amplification: emulsion PCR Pyrosequencing up to 400,000 reads / run on average 250 bases / read (and longer) up to 100 Mb / run

18 Roche / 454 : GS FLX Made for de novo sequencing.
Too expensive for resequencing. For example, this platform will be used a lot by laboratories doing new bacterial genomes. Baylor Genome Center involved in Sea Urchin, Bee, Platypus genomes: They have a number of 454.

19 Helicos

20 Single Molecule Sequencing
Adapted from: Barak Cohen, Washington University, Bio Single Molecule Sequencing Microscope slide * * * Single DNA molecule Super-cooled TIRF microscope primer dNTP-Cy3 * Helicos Biosciences Corp.

21 Helicos Approximate Data Production per Run at Current Peak Throughput (1 strand/µ2)
Single Pass Dual Pass 7 day run 14 day run Image Data: TB 60 TB Diagnostic Images: 350 GB 600 GB Object Table: TB 6 TB Sequence Data: GB GB Log Files: GB 600 GB Total ~4.5 TB ~7.8 TB (w/o full image stack)

22 ABI SOLiD

23

24 File management

25 SOLiD color space

26

27

28

29

30

31

32

33

34

35

36

37

38

39 It’s more complicated! Get files with quality scores
Get files with miss-matches Need to align them to a reference genome Multiple tools do this today … and there will be more later. What do you do? Do it all!

40 Things to keep in mind All people are learning, if you don’t know, ask, and they probably won’t know either, and you can figure it out together! The technology is changing – This workshop next year will be totally different! We can only do so much in two days – you will need to find things, find people who can help you, and you will need to teach your friends!

41 Other factors Changing technology Changing price structure
New and disappearing companies? Changing price structure Cost of machine Cost of operation (reagents/people) Service from the company 1 machine vs (2 or 3 machines) vs 40 machines. Changing software and processing

42 Pacific Biosystems (PacBio)

43 Questions? Coffee break!

44 Day 1


Download ppt "Canadian Bioinformatics Workshops"

Similar presentations


Ads by Google