Presentation is loading. Please wait.

Presentation is loading. Please wait.

BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Similar presentations


Presentation on theme: "BME 130 – Genomes Lecture 5 Genome assembly I The good old days."— Presentation transcript:

1 BME 130 – Genomes Lecture 5 Genome assembly I The good old days

2 Administrivia Homework 1 – on the website today, due Friday; homework policy Student-led paper discussion; choose groups and pick paper Guest lecture Friday – Bob Kuhn will demo the UCSC genome browser

3 Genomics in the news Genomic Fossils Calibrate the Long-Term Evolution of Hepadnaviruses Citation: Gilbert C, Feschotte C (2010) Genomic Fossils Calibrate the Long- Term Evolution of Hepadnaviruses. PLoS Biol 8(9): e doi: /journal.pbio

4 Figure 4.10 Genomes 3 (© Garland Science 2007)

5 Figure 4.10 part 1 of 2 Genomes 3 (© Garland Science 2007)

6 Figure 4.10 part 2 of 2 Genomes 3 (© Garland Science 2007)

7 Sequence assembly de novo reference- guided overlap layoutconsensus s1 s2 s3 s4 s5 s6 s1s2s3s4s5s6 s1 s2 s5 s3 s4 s6 s1 s2 s5 s3 s4 s6 s1 s2 s5s3s4 s6 Reference sequence

8 de novo sequence assembly overlap s1 s2 s3 s4 s5 s6s1s2s3s4s5s6 Most CPU and memory demanding stage Phusion: group reads sharing >= 11 k-mers of 17 bases Phrap: “banded” alignment of reads around k-mer matches; tolerate alignment mismatches of low-quality bases Celera: k-mer seed and extend alignment of reads Arachne: 24-mer seed and extend alignment of reads newbler: flowgram similarities (?)

9 Generate alignments s1 s2 s5 s3 s4 s6 de novo sequence assembly Wide range of strategies for the layout stage, many using mate-pair information s1 s2 s3 s4 s5 s6 s1s2s3s4s5s6 Find connected components s1s2 s3 s4 s5 s6

10 consensus s1 s2 s5 s3 s6 de novo Sequence assembly s4 PHRAP Consensus base is base with highest quality score Quality score for position is based on all reads quality scores PCAP/CAP3 Sum up quality scores for each base take base with highest sum Quality score for position: highest sum – all other sums

11 s1 s2 s5s3s4 s6 Reference sequence Reference-guided sequence assembly Advantages (much) faster (much) less memory Disadvantages Indels/rearragements Lack of closely related reference Bias towards reference similarity Pop M et al., “Comparative Genome Assembly” Brief Bioinform Sep;5(3):

12 Figure 4.11a Genomes 3 (© Garland Science 2007) Why is this called a sequence gap and not a physical gap?

13 Closing a physical gap means finding a physical clone to sequence that will span the gap

14 Figure 4.11b Genomes 3 (© Garland Science 2007) Genomic DNA is template for this PCR

15 Figure 4.12 Genomes 3 (© Garland Science 2007) Chromosome walking (is slow)

16 Figure 4.13 Genomes 3 (© Garland Science 2007) PCR from clone library Insert 1 connects to who?

17 Figure 4.14 Genomes 3 (© Garland Science 2007)

18 Figure 4.15 Genomes 3 (© Garland Science 2007)

19 Figure 4.15a Genomes 3 (© Garland Science 2007)

20 Figure 4.15b Genomes 3 (© Garland Science 2007)

21 Figure 4.15c Genomes 3 (© Garland Science 2007)

22 Figure 4.15d Genomes 3 (© Garland Science 2007)

23 Figure 4.16 Genomes 3 (© Garland Science 2007) Assembly can by validated by mate-pair information

24 Figure 4.16a Genomes 3 (© Garland Science 2007)

25 Figure 4.16b Genomes 3 (© Garland Science 2007)

26 Figure 4.17a Genomes 3 (© Garland Science 2007)

27 Figure 4.17b Genomes 3 (© Garland Science 2007)

28 Figure 4.18 Genomes 3 (© Garland Science 2007)


Download ppt "BME 130 – Genomes Lecture 5 Genome assembly I The good old days."

Similar presentations


Ads by Google