Presentation is loading. Please wait.

Presentation is loading. Please wait.

Taxonomic distribution of large DNA viruses in the sea

Similar presentations


Presentation on theme: "Taxonomic distribution of large DNA viruses in the sea"— Presentation transcript:

1 Taxonomic distribution of large DNA viruses in the sea
Adam Monier, Jean-Michel Claverie & Hiroyuki Ogata Genome Biology 2008, 9:R106

2 Virus A small infectious agent that can replicate only inside the living cells of other organisms. Infect all types of organisms—animals, plants, bacteria and archaea. Found in almost every ecosystem on Earth The most abundant type of biological entity Consist of two or three parts: DNA or RNA (genetic information) Capsid protein(protects its gene) Some may have an envelope

3 Viruses in marine system
Abundant in the marine system: 106 to 109 virus-like particles per milliliter of sea water Infect marine organisms from oxygen-producing phytoplankton to whales Regulate the population of many sea organisms and are important effectors of global biogeochemical fluxes Hold a great genetic diversity May significantly contribute to the evolution of microorganisms in marine ecosystems.

4 Goal of the paper A quantitative description of the marine virosphere
The determination of the relative abundance of virus families The assessment of the level of their genetic diversity.

5 Data set The first phase of the Sorcerer II Global Ocean Sampling (GOS) Expedition The GOS data comprise a large environmental shotgun sequence collection, with 7.7 million sequencing reads assembled into 4.9 billion bp contigs At least 3% of the predicted proteins contained within the GOS data are of viral origin Most DNA samples were extracted from the μsized fraction

6 Methods for determining taxonomic distribution
‘Binning' is the first step to analyze microbial populations in metagenomic sequences Drawbacks of the use of homology search programs BLAST scores are highly sensitive to alignment sizes and to insertions/deletions Difficult to infer evolutionary distances among high scoring hits only from the BLAST scores.

7 Phylogenetic analysis
Phylogenetic analysis is the process used to determine the evolutionary relationships between organisms. The results of an analysis can be drawn in a hierarchical diagram called phylogenetic tree. Branches are based on the hypothesized evolutionary relationships between organisms. Each member in a branch is assumed to be descended from a common ancestor.

8 B-family DNA polymerase (PolB)
A DNA polymerase is an enzyme that catalyzes the polymerization of deoxyribonucleotides into a DNA strand during the process of replication. B-family DNA polymerase (PolB) sequences are conserved in all known members of nucleocytoplasmic large DNA viruses The presence of PolB homologs in bacteria is limited Have strong sequence conservation and an apparently low frequency of recent horizontal transfer Pol B is a useful marker to examine taxonomic distribution of large DNA viruses in a metagenomic sequence collection

9 Defect of normal phylogenetic methods
Short sequences in the environmental shotgun sequences. Large variation in size and correspond to different parts of a selected marker gene Normal phylogenetic analysis does not provide an appropriate alignment

10 Phylogenetic mapping A new phylogeny-based method discovered by the author Analyzes individual metagenomic sequences one by one Determines their phylogenetic positions using a reference multiple sequence alignment (MSA) and a reference tree

11 This paper… The taxonomic richness and the relative abundance of different large DNA viruses in marine environments Analyzed the GOS data set by phylogenetic mapping Use PolB sequences as reference

12 Results Phylogenetic mapping
Validation of the mapping results using long PolB fragments Comparison of the abundance of viral PolB genes with the bacterial ones Geographic distributions of viral PolBs Examination of additional ORFs

13 1. Phylogenetic mapping Step1: calculation of PolB fragments
Step2: generation of a reference MSA and a maximum likelihood tree Step3: examinination of PolB fragments’ phylogenetic position

14 Step1: Calculation of PolB fragments
Searched the GOS data set for PolB-like sequences using the Pfam hidden Markov profile (PF00136). A set of 1,947 sequences ‘PolB fragments’

15 Step2: Reference MSA and Maximum likelihood tree
PolB homologs from known organisms Built a reference MSA corresponding to the polymerase domains of PolB homologs (contains 101 sequences) Generate a maximum likelihood tree which were selected to achieve the widest possible taxonomic/ paralog coverage for the analysis of the GOS metagenomic data.

16 Cont.

17 Step3: Examinination of PolB fragments’ phylogenetic position
Reduce the reference MSA (51 representitives) and the reference tree (99 branches). Conserve the original topology of the full reference tree Align each of the PolB fragments on the reference MSA using T-Coffee profile method. Compute the likelihoods for all 99 possible branching positions by ProtML. Assess the tatistical significance for the best tree by RELL bootstrap method.

18 Taxonomic distribution of the GOS PolB fragments
Assign the best branching position for 1,423 PolB fragments 1,224 (86%) were mapped on viral branches 869 were supported by RELL (bootstrap value ≥ 75%) 811 were on viral branches Phages Chloroviruses Mimiviruses

19 2. Validation of the mapping results using long PolB fragments
Examined the phylogenetic mapping result and the sequence diversity of the PolB fragments classified in large eukaryotic virus groups (NCLDVs). A single alignment of the selected long PolB fragments together with the reference PolB sequences from large eukaryotic virus groups

20 Cont.

21 3. Comparison of the abundance of viral PolB genes with the bacterial ones
Read coverage was used to measure the abundance of the cognate DNA molecules. Compute the read coverage of each contig harboring a PolB fragment Obtain the median of the read coverage values for each branch

22 Viral PolBs are more diverse than bacterial PolBs
Viral branches : a large number of mapped contigs exhibiting a low coverage. Bacterial branches: a lower number of mapped contigs with a larger read coverage. Virus populations are numerous and very diverse.

23 4. Geographic distributions of viral PolBs
Compare the relative abundance of the predicted viral PolB fragments and the associated metadata across different GOS sampling sites GOS metadata provide physicochemical and biological parameters associated with each sampling site, such as water temperature, salinity, chlorophyll a concentration, and sample's water depth. These data offer additional dimensions to analyze the viral PolB fragments identified by our phylogenetic mapping.

24 Geographic localization

25 5. Examination of additional ORFs
Searched the putative viral contigs against NRDB by BLASTX ‘Virus-specific’ genes next to the PolB homologs OtV5 putative major capsid gene [chlorovirus group branch] regA (translation repressor of early genes) or uvsX (recA-like recombination and DNA repair protein genes) [cyanophage P-SSM4 branch]

26 Prediction of ‘new’ viral genes
An ORF similar to RimK--a protein involved in post-translational modification of the ribosomal protein S6 – on the cyanophage P-SSM4 branch. No rimK homolog has been found in a viral genome Use this viral RimK homolog as a query of TBLASTN and screene the entire GOS data set.

27 GOS contigs with putative RimK sequences
Identify more than 100 contigs harboring RimK homologs with higher similarities than those exhibited by cellular homologs in NRDB. Many of these contigs have additional ORFs usually specific to phages.

28 Maximum likelihood tree of RimK sequences
The RimK homologs are closely related to each other and distantly related to bacterial RimK . The existence of phages carrying rimK homologs in marine environments. --‘new’ viral gene

29 Conclusion The phylogenetic mapping approach provided a comprehensive picture of the taxonomic distribution of large viruses enclosed in the GOS metagenomic data. The highest genetic richness corresponded to phages. The Mimiviridae represent a major and ubiquitous component of large eukaryotic DNA viruses in diverse marine environments. Prediction of ‘new’ viral genes

30 Thank you!

31 Pfam Pfam is a large collection of protein families, represented by multiple sequence alignments and hidden Markov models (HMMs)

32 T-Coffee A multiple sequence alignment program.
Compare all the sequences two by two, producing a global alignment and a series of local alignments Then combine all these alignments into a multiple alignment. Allows you to combine results obtained with several alignment methods. T-Coffee will combine all that information and produce a new multiple sequence having the best agreement whith all these methods.

33 ProtML Maximum Likelihood Inference of Protein Phylogeny
developed by Felsenstein Implements the maximum likelihood method for protein amino acid sequences. It uses the either the Jones-Taylor-Thornton or the Dayhoff probability model of change between amino acids. Uses a Hidden Markov Model (HMM) method of inferring different rates of evolution at different amino acid positions.

34 Read coverage Read coverage of a contig is the number of reads that contribute to the contig consensus.


Download ppt "Taxonomic distribution of large DNA viruses in the sea"

Similar presentations


Ads by Google