Presentation is loading. Please wait.

Presentation is loading. Please wait.

Flexible genome retrieval for supporting in-silico studies of endobacteria-AMFs S. Montani 1, G. Leonardi 1, S. Ghignone 2, L. Lanfranco 2 1 Dipartimento.

Similar presentations


Presentation on theme: "Flexible genome retrieval for supporting in-silico studies of endobacteria-AMFs S. Montani 1, G. Leonardi 1, S. Ghignone 2, L. Lanfranco 2 1 Dipartimento."— Presentation transcript:

1 Flexible genome retrieval for supporting in-silico studies of endobacteria-AMFs S. Montani 1, G. Leonardi 1, S. Ghignone 2, L. Lanfranco 2 1 Dipartimento di Informatica, University of Piemonte Orientale, Alessandria, Italy 2 Dipartimento di Biologia Vegetale, University of Turin, Italy

2 Arbuscular mycorrhizal fungi (AMFs)‏ Obligate symbionts in strict association with roots of land plants In soil: positive impacts on plants health and productivity Often in further symbiosis with bacteria Tripartite system: (i) endobacterium (ii) AMF (iii) plant roots AMF Spore AMF Hypha Endobacteria

3 Studying the tripartite system Potentially strong practical impacts symbiotic consortia may lead to: new metabolic pathways appearance of interesting molecules for sustainable agriculture and (possibly) for industrial biotechnological applications Comparative genomics approach to infer phylogenetic relationships genome evolution metabolic functions of a given organism (also with few available data)‏ Key part of the study: genomic data of the endobacteria and AMF-endobacteria interaction

4 A computational environment for AMF-endobacteria interaction Genomic study of the system AMF Gigaspora margarita (isolate BEG34) and of its endobacterium Candidatus Glomeribacter gigasporarum BIOBITS project, Regione Piemonte - Converging Technologies Modular architecture Database Synteny and visualization tools BIOBITS research tools Generic Model Organism Database (GMOD) project: open source tools for creating and managing genome-scale biological databases

5 Architecture of the system Flexible retrieval

6 Data storage CHADO DB Bacterial genomes, known annotations, proteins and metabolic pathways, and newly discovered annotations Manually loaded with genomes of Candidatus Glomeribacter’s relatives Import modules and RRE - Queries information retrieved from the biological databases accessible through the Internet (e.g. GenBank)

7 Data visualization GMOD customizable modules for comparative genomics CMap allows to view comparisons of genetic and physical maps GBrowse_syn is a synteny browser to display multiple genomes, with a central reference species SyBil is a system for comparative genomics visualizations

8 New applications (BIOBITS research tools)‏ Biomart-based tools reorganizes the information into a data warehouse analyzes the data by means of clustering and data mining techniques Flexible retrieval tool Case-based reasoning paradigm

9 Case-based retrieval retrieve past cases similar to the current one reuse past successful solutions after, if necessary, properly revising them retain the current case

10 Case representation Sequence of nucleotides, properly aligned with the same reference organism Percentage of similarity with the aligned nucleotide in the reference organism

11 Case representation

12 Flexible retrieval Abstracting the data at different levels in a taxonomy “Bird’s eye” view of similarity Example: DCW region (cellular division) About 10 genes Region conserved in relatives a single gene may not

13 Flexible retrieval Abstracting the data at different “states” granularity levels Similar to the (state) Temporal Abstraction technique: from points to intervals sharing a common persistent behavior Each state specialized in further subdivisions

14 Efficient retrieval Multi-dimensional index structures Queries at any level of detail Interactivity

15 Query answering Query: similarity string at any detail level (Hv..Hv) Query generalization to find index root Hv..Hv -> H..H -> H Index navigation backwards respect to query generalization steps

16 Computation time Efficient retrieval particularly critical in very large databases (bacteria genome DBs growing very fast) Existing implementation in the haemodialysis domain 1475 real haemodialysis patients cases Fast index-based TA is (41 msec on Intel Core 2 Duo T9400 processor running at 2.53 GHz, equipped with 4 Gb of DDR2 ram)

17 Conclusions Modular architecture for in-silico comparative genomics studies of AMF-endobacteria interaction Flexible genome retrieval tool Flexible query definition, at different levels of abstractions Efficient index-based retrieval Interactive query refinement/generalization

18 Future work Complete tool implementation Experiments on RefSeq NCBI data Tool usability New applications published as new GMOD modules


Download ppt "Flexible genome retrieval for supporting in-silico studies of endobacteria-AMFs S. Montani 1, G. Leonardi 1, S. Ghignone 2, L. Lanfranco 2 1 Dipartimento."

Similar presentations


Ads by Google