Analyses of ORFans in microbial and viral genomes Journal club presentation on Mar. 14 Albert Yu.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Genomic island analysis: Improved web-based software and insights into an apparent gene pool associated with genomic islands William Hsiao Brinkman Laboratory.
Transcriptomics Breakout. Topics Discussed Transcriptomics Applications and Challenges For Each Systems Biology Project –Host and Pathogen Bacteria Viruses.
Pfam(Protein families )
Xenolog: Homologs resulting from horizontal gene transfer.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Brock Biology of Microorganisms
Are transposons selfish? Rob Edwards Ramy Aziz.
Lecture 1. Microorganisms: an overview Chapter 1. Microorganisms and Microbiology Chapter 2. An overview of microbial life. Cell and viral structures DNA.
AN INTRODUCTION TO TAXONOMY: THE BACTERIA
Bacterial Genetics Xiao-Kui GUO PhD.
Prokaryote Taxonomy & Diversity Classification, Nomenclature & Identification Phenetic Classification Molecular Phylogeny Approach Classification (hierarchical.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Probes can be designed in an evolutionary hierarchy.
Genomics in Drug Organon, Oss Tim Hulsen.
Cottrell, M. T., L. A. Waldner, L. Yu, and D. L. Kirchman Bacterial diversity of metagenomic and PCR libraries from the Delaware River. Environmental.
Identify gene markers for different taxonomic groups in Archaea and Bacteria Genomes Dongying Wu 1,2, Jonathan A. Eisen 1,2 1. DOE Joint Genome Institute,
1/17 Identification of thermophilic species by the amino acid compositions deduced from their genomes Reporter: Yu Lun Kuo
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
Bradeen Lab – University of Minnesota Lab Goal: Development of “allelic mining” techniques and strategies for R genes, enabling multi-genotype isolation.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Construction of Substitution Matrices
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Genomic ORFans: Past, Present and Future Naomi Siew and Daniel Fischer Ben-Gurion University Be’er-Sheva, Israel.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 29, 2011 Metagenome analysis: use case.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Construction of Substitution matrices
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
ASSEMBLY AND ALIGNMENT-FREE METHOD OF PHYLOGENY RECONSTRUCTION FROM NGS DATA Huan Fan, Anthony R. Ives, Yann Surget-Groba and Charles H. Cannon.
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
(H)MMs in gene prediction and similarity searches.
Chapter 26 Phylogeny and Systematics. Tree of Life Phylogeny – evolutionary history of a species or group - draw information from fossil record - organisms.
Identifying probable prophage DNA in mycobacterial genomes Bobby Chaggar BNFO 301 Lysogeny Group.
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used for.
Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment Raja Jothi, Teresa.
Justin S Hogg et al. {Genome Biology} 2007, 8:R103 Metagenomics Seminar, Spring 2008 Presenter : Kwangmin Choi.
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Comparative Genome Analysis and Genome Evolution of Members of the Magnaporthaceae Family of Fungi.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
Anoop Mayampurath Evolution of Symbiotic Bacteria in the Distal Human Intestine Xu et. al, PLoS Biology 2007, 5 (7),
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
Bacterial infection by lytic virus
Phylogeny and the Tree of Life
Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments Xinjun Zhang.
bacteria and eukaryotes
Metagenomic Species Diversity.
Bacterial infection by lytic virus
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used.
Virus Basics - part I Viruses are genetic parasites that are smaller than living cells. They are much more complex than molecules, but clearly not alive,
BLAST program selection guide
Basics of Comparative Genomics
Target selection strategies for the mouse genome
Strategies for annotation of a genome
Identify D. melanogaster ortholog
Isolation and Annotation of Arthrobacteriophage
Bacterial genomics: The controlled chaos of shifty pathogens
What do you with a whole genome sequence?
Basic Local Alignment Search Tool
Functional Genomics of Bacillus Phages
Basics of Comparative Genomics
Basic Local Alignment Search Tool
The Amazon viral scaffolds and viral genomes most important for river and plume segregation. The Amazon viral scaffolds and viral genomes most important.
Fig. 3. Phylogenetic relationship of the replicons of the family Burkholderiaceae. An unrooted RAxML maximum ... Fig. 3. Phylogenetic relationship of the.
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Analyses of ORFans in microbial and viral genomes Journal club presentation on Mar. 14 Albert Yu

ORFan Defenition: an ORF with no detectable sequence similarity to other ORFs in the database considered Nearly all genomes have ORFans (df %) The more genomes sequenced, the more ORFans have found Most are annotated as hypothetical proteins of unknown function (no exp.)

ORFan continue More data… real, functional proteins 3D nstructure conserved in closely related species (Ka/Ks) Origin of ORFans ???????? Viral genome Microbial genome ? Viral laterally transferred genes (especially phages)

Viral genome Microbial genome

Question: the origin of ORFans Test hypothesis: ORFans have been acquired through lateral gene transfer from viruses To find homologs to these microbial ORFans within the virus sequence database

Genome-wide quantitative study BLASTP 277 microbial genomes 1456 viral genomes H(g): the number of genomes having at least one homolog of ORFan g U(g): uniqueness: the genomic distance between the genomes with ORFan g

Classification of ORFans Singleton: without any homolog wherever H=1, BLASTP=1 Paralogous: homologs in the same genome H=1, BLASTP>1 Orthologous: homologs within very closely related microbial genome H>1, U <= 0.1(by observations)

The U-value for all ORFs in prokaryote genomes In total: ORFs: ORFans: S: 64324(7.8%) P: 10419(1.3%) O: 35443(4.3%) 0.64 S or p O

ORFans-VH%(OVH): % of ORFans having homologs in viruses (0% ~ 63.8%) Non-ORFans-VH%(NOVH): % of non- ORFans having homologs in viruses (4.1% ~ 18.2%) The strength of the hypothesis = the value between these two VH%

Percentages of microbial ORFs with homologs in viruses Red: OVH Blue: NOVH 24 phylogenetic clades Bacteria Archea Firmicutes Gamma proteobacteria

The average % of OVH and NOVH in various groups % vs 9 % 8.5% vs 2.7 % 6.6% vs 0.8 %

Conclusion Most OVH << NOVH: current evidence supporting the hypothesis is weak Firmicutes and Gamma-proteobacteria have the highest number of homologs in viruses (viral database is biased) Viral database bias 1456 viruses 280 phages (109--Gamma; 102--Firmicutes; 69--others) Sampling ?????

Viral genome Microbial genome

277 Microbial genomes 1456 viruses All-virus-DB: ORFs 280 phages (20%) Phage-DB: ORFs (42%) ORFans: all-virus: 13078(30%) (v.s. all-virus-DB) 8200 (v.s. all nr, env-nr) all-phage: 6765 (v.s. all-virus-DB) 7047 (v.s. phage-DB)

Some characteristics of ORFans Bacterial ORFans are shorter than non- ORFans on average Bacterial ORFans have significant lower GC3 content than non-ORFans

The length of Viral ORFans and non-ORFans Length: Non-ORFans > ORFans

Length: ORFans < non-ORFans GC3%: ORFans < non-ORFans

The number of ORFs per genome in 1456 viruses Focusing on phage: higher %

The growing of the number of phage ORFans (consistent) Drop to 0 ? Keep increasing 38.4%

Each microbial species is a host for at least 10 phage species --- the phage diversity is at least 10 times higher than microbial diversity Only 280 phage genomes in database (low phage sampling)

Less than 5 phages Virus sampling bias between and within groups

The H-value percentages for all phage ORFs and prokaryotic ORFs prokaryotes phages 9.1% - ORFans 11.3% - ortho 38.4% - ORFans 32.4% - ortho

the H-value percentages of phage ORFs

4397(61.5%) / 7150(63.8%) / (prophage/ prokaryotic homologs/ phage non-ORFans) 589(44.7%) / 1317(18.7%) / 7047 (prophage/ prokaryotic homologs/ phage ORFans) 4987(58.9%)/8467(46.4%)/18248 (prophage/ prokaryotic homologs/ phage ORFs)