Presentation on theme: "BME 130 – Genomes Lecture 5 Genome assembly I The good old days."— Presentation transcript:
BME 130 – Genomes Lecture 5 Genome assembly I The good old days
Administrivia Homework 1 – on the website today, due Friday; homework policy Student-led paper discussion; choose groups and pick paper Guest lecture Friday – Bob Kuhn will demo the UCSC genome browser
Genomics in the news Genomic Fossils Calibrate the Long-Term Evolution of Hepadnaviruses Citation: Gilbert C, Feschotte C (2010) Genomic Fossils Calibrate the Long- Term Evolution of Hepadnaviruses. PLoS Biol 8(9): e1000495. doi:10.1371/journal.pbio.1000495
de novo sequence assembly overlap s1 s2 s3 s4 s5 s6s1s2s3s4s5s6 Most CPU and memory demanding stage Phusion: group reads sharing >= 11 k-mers of 17 bases Phrap: “banded” alignment of reads around k-mer matches; tolerate alignment mismatches of low-quality bases Celera: k-mer seed and extend alignment of reads Arachne: 24-mer seed and extend alignment of reads newbler: flowgram similarities (?)
Generate alignments s1 s2 s5 s3 s4 s6 de novo sequence assembly Wide range of strategies for the layout stage, many using mate-pair information s1 s2 s3 s4 s5 s6 s1s2s3s4s5s6 Find connected components s1s2 s3 s4 s5 s6
consensus s1 s2 s5 s3 s6 de novo Sequence assembly s4 PHRAP Consensus base is base with highest quality score Quality score for position is based on all reads quality scores PCAP/CAP3 Sum up quality scores for each base take base with highest sum Quality score for position: highest sum – all other sums
s1 s2 s5s3s4 s6 Reference sequence Reference-guided sequence assembly Advantages (much) faster (much) less memory Disadvantages Indels/rearragements Lack of closely related reference Bias towards reference similarity Pop M et al., “Comparative Genome Assembly” Brief Bioinform. 2004 Sep;5(3):237-48.