The Integrated Microbial Genome (IMG) systems Nikos Kyrpides 1
Data analysis Data Integration Comparative Analysis
Data management system for comparative analysis of biological data What is the Matrix? Data management system for comparative analysis of biological data IMG Genes Genomes Functions Metadata Clusters SNPs Proteomics Regulons Transcriptomes I M G
Integrated Microbial Genomes (IMG) [It’s easier to analyze 1000 genomes than a single one] http://img.jgi.doe.gov/ What is IMG: IMG is a data management system for comparative analysis and annotation of all publicly available genomes from three domains of life in a uniquely integrated context. Mission: To become the Home of Microbial Genome and Metagenome Analysis Background: Launched on March 2005 3 Releases/Year >5,000 unique visitors per month >300 citations Current Status: 10,671 Genomes 24 Million Genes http://img.jgi.doe.gov/ http://img.jgi.doe.gov/ Bacteria: 5709 Archaea: 201 Eukarya: 183 Plasmids: 1190 Viruses: 2809 USERS CAN Search data Browse data Compare data Export data Gfragments:579
http://img.jgi.doe.gov/ USERS CAN Search data Browse data Compare data Export data USERS CAN Submit data Annotate data
Data Model Abstraction Example: IMG Operations Genes present in G1 and absent from G2, G3, G4 and G5 G1 G2 G3 G4 G5 g3 g2 g1 + + + + + + + - + + + - - - - Gene occurrence profile across genomes Genes Gene occurrence profiles across pathways Genomes Pathways shared by genomes Perhaps you can mention that the dimensional modeling approach has a positive impact in data exploration. 1 and 2 are examples of slice and dice and the result is data reduction and focus on relevant to the question data set. Functions/ Pathways
IMG Data Integration Genes Genomes Functions 24.2M 10671 COG GO Pfam TIGRfam InterPro KEGG BioCyc SEED Protein product MyIMG IMG Terms IMG Pathways IMG Networks Groupings Phylogenetic Phenotypic Ecotypic Disease Geographical Isolation RNAs, Proteins Sequence Clusters Positional clusters Regulatory clusters Fusions Operons Expression Genes 24.2M Genomes Functions 10671
IMG Toolkit Chromosome Map Function Profile Gene Synteny Abundance Profiles Functional Categories Projects IMG Pathway Metadata Search Phylogenetic Genome Clustering Compare Annotations KEGG Maps Distribution Chromosomal Artemis VISTA Recruitment Plot Fragment
Challenges and Opportunities Annotations Annotations Quality Metadata Genes Functions Data Analysis New data types and tools Integration # genes and genomes Scaling
Metadata Curation Metadata Types Organism Information K. Liolios www.genomesonline.org Metadata Types Organism Information Genome Project Information Sequencing Information Environmental Metadata Host Metadata Organism Metadata
Metagenome Classification Genomes vs Metagenomes
Challenges and Opportunities Annotations Annotations Quality Metadata Genes Functions Data Analysis New data types and tools Integration # genes and genomes Scaling
Finding unique genes Obligate parasite of horses Causes human disease in tropical areas (melioidosis)
Phylogenetic profiler finds 548 unique genes in B. mallei However, 497 of them in fact exist in B. pseudomallei, but they have not been called as real genes. The difference in gene models reveals 89.2% error rate in unique genes
Program Informatics Production Challenges Annotations Quality Data Management IMG Single cells OMICS data Scale # genes and genomes Scaling
MGM Workshop Attendees http://www.jgi.doe.gov/meetings/mgm/index.html Europe: 79 Belgium 2 Czech Rep 1 Denmark 16 Estonia 1 Finland 6 France 1 Germany 13 Greece 4 Ireland 4 Italy 1 Hungary 1 Netherlands 4 Norway 1 Russia 4 Portugal 1 Poland 1 Spain 3 Sweden 4 Switzerland 1 UK 10 Asia: 70 China 20 Hong Kong 5 India 18 Israel 4 Japan 3 Korea 4 Malaysia 3 Philipines 1 Saudi Arabia 4 Singapore 2 Taiwan 3 Thailand 2 Turkey 1 North America: 356 Canada 33 Mexico 5 USA 318 South America: 21 Argentina 4 Brazil 7 Chile 1 Colombia 5 Ecuador 2 Peru 1 Uruguay 1 Africa: 7 Algeria 1 Egypt 5 Ethiopia 1 Oceania: 12 Australia 10 New Zeeland 2 545 /48 Countries April 20, 2012