Presentation is loading. Please wait.

Presentation is loading. Please wait.

First release of HOGENOM, a database of homologous genes from complete genome Equipe Bioinformatique et Génomique Evolutive Laboratoire de Biométrie et.

Similar presentations


Presentation on theme: "First release of HOGENOM, a database of homologous genes from complete genome Equipe Bioinformatique et Génomique Evolutive Laboratoire de Biométrie et."— Presentation transcript:

1 First release of HOGENOM, a database of homologous genes from complete genome Equipe Bioinformatique et Génomique Evolutive Laboratoire de Biométrie et Biologie Evolutive Université Claude Bernard - Lyon 1 Simon Penel, Laurent Duret, Pascal Calvat, Jean-François Dufayard, Guy Perrière, Manolo Gouy. POSTER JO 60

2 Homologous Genes Databases Research fields: Proteome/genome comparative analysis Phylogenetic studies Orthology/Paralogy relationship assignments Development of generic databases, specialised databases –HOVERGEN: families of homologous vertebrate genes –HOBACGEN: families of homologous bacterial genes –NureBase, RTKdb, Hoppsigen, Mitalib, Polymorphix..

3 Contents: Nucleic and protein sequences Sequence annotations Taxonomic data Protein multiple alignments Phylogenetic trees The HoGenom database: Homologous Genes Families from fully Sequenced Organisms European project TEMBLOR

4 The HoGenom database: Building of Database European Bioinformatic Institute Data selection 1 sequence  many species Proteome sets Rat etc. Mouse Human SwissProt TrEMBL TrEMBL-new Protein sequences 1 sequence  1 species

5 Filtering (SEG) Local pairwise alignments  The HoGenom database: Building of Database Similarity search BLASTP BLOSUM62 E ≤ 10 -4 Parralelised calculations at IN2P3

6 Clustering into families A B A C HSP ≥ 80 % length Similarity ≥ 50 % 1 : Clustering of complete sequences into families 2 : Including partial sequences to the families defined previously The HoGenom database: Building of Database C B A Cluster A, B, C Protein Family

7 Protein family ABCDEFGABCDEFG BIONJ Neighbor joining, Observed divergence Partial sequences: distance matrix with missing values Multiple alignment ABCDEFGABCDEFG Rooting: mid-point Phylogenetic tree G F E D C B A CLUSTAL W Default parameters Alignments and trees The HoGenom database: Building of Database

8 10 16 91 Arabidopsis thaliana (plant) Caenorhabditis elegans (nematod) Drosophila melanogaster (fly) Encephalitozoon cuniculi (microsporidia) Guillardia theta (alguae) Homo sapiens (man) Mus musculus (mouse) Rattus norvegicus (rat) Saccharomyces cerevisiae (yeast) Schizosaccharomyces pombe (fungus) 423 577 proteins, 527 925 cds 41 907 families 31% 9% 60% 117 organisms The HoGenom database: Contents

9 WWW Query Query on sequences and families according to multiple criteria Cross Taxa Query on families according to complex taxonomic criteria Querying the databases

10 POSTER JO-60 à suivre…


Download ppt "First release of HOGENOM, a database of homologous genes from complete genome Equipe Bioinformatique et Génomique Evolutive Laboratoire de Biométrie et."

Similar presentations


Ads by Google