Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

Similar presentations


Presentation on theme: "Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly."— Presentation transcript:

1 Gao Song 2010/07/14

2 Outline Overview of Metagenomices Current Assemblers Genovo Assembly

3 Overview of Metagemices

4 Metagenomics is: Why Do We Need Metagenomics? Snapshot of bacterial community Cannot be cultivated Motivation <1%

5 Monitoring the impact of pollutants on ecosystems Discovery of new genes, enzymes… - Global Ocean Sampling Expedition Human Microbiome Project JGI sequenced Acid Mine Drainage sample Applications

6 Marker Gene Sequencing 16s rRNA: Two ways Other marker genes: RuBisCo, NifH Only composition Whole Genome Sequencing (WGS) Detailed picture of community Two Paradigms

7 Complex Communities >1000 X5000 200L 1million

8 Current Assembler

9 Why not assemble reads? ORFome assembler* Three steps: The putative ORFs are annotated for each read ORFs are assembled using EULER ORF homologs are searched for in Integrated Microbial Genomics (IMG) database Existing WGS assemblers Sanger reads: Phrap, Celera, Arachne, JAZZ… Short reads: Velvet, Newbler… Current Status * Y. Ye and H. Tang, "An orfome assembly approach to metagenomics sequences analysis." Journal of bioinformatics and computational biology, vol. 7, no. 3, pp. 455-471, June 2009

10 Genovo: De Novo Assembly for Metagenomes Jonathan Laserson, Vladimir Jojic and Daphne Koller. RECOMB 2010, LNBI 6044, pp. 341-356, 2010

11 Main Idea Propose a generative model for Metagenome data Using iterated conditional modes (ICM) Using hill-climbing steps iteratively Design a score for evaluation

12 Model Initialize contigs: Infinite contigs with infinite length Partition the reads Using Chinese Restaurant Process

13 Model Generate the starting point o i Generate the length of read Quality of assembly of each read

14 Algorithm Using ICM Starting from initial condition, hill-climbing moves are performed iteratively Move 1: Consensus Sequence: Select the most frequent base

15 Algorithm Move 2: Read Mapping For read i, first remove it, then recalculate its contig and alignment First, for each potential location, compute alignment Then, select the location according to possibility Filtering: using common 10-mer

16 Algorithm Move 3: update geometric variable -> Globle moves: Propose indels Center Merge contigs Chimeric reads Disassemble the dangling contigs

17 Evaluation BLAST PFAM Designed score 1 st term: quality of assembly 2 nd term: penalty for total length 3 rd term: prefer to merge when V>V0

18 Results Using 454 reads Compare with Newbler, Velvet and EULER-SR Single Genome

19 Result Metagenome data Score PFAM

20 Discussion New idea Apply a mature algorithm to assembly domain Systematically describe and analyze the problem and algorithm Results are better

21 Discussion Slowly: minute vs. hours for 300k 454 reads Main idea: try to extend as long as possible, so they will have more hits for BLAST Why choose 20 for V0? How to deal with branching? Repeats? Model: Why it can capture the property of metagenomic data? How to argue the correctness of that model? The distribution of starting points

22 Thank you


Download ppt "Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly."

Similar presentations


Ads by Google