Presentation is loading. Please wait.

Presentation is loading. Please wait.

Making Sense of Life Sciences Data Nigel Martin 21 st May 2008.

Similar presentations

Presentation on theme: "Making Sense of Life Sciences Data Nigel Martin 21 st May 2008."— Presentation transcript:

1 Making Sense of Life Sciences Data Nigel Martin 21 st May 2008

2 The development and use of computational methods for the acquisition management analysis and interpretation of biological and medical information to determine biological functions and mechanisms as well as their applications in user communities This biological and medical information is encoded in the vast amounts of data now generated in the life sciences e.g. dna datadna Life Sciences Informatics


4 CACCTG … Homo sapiens

5 Genome (made of DNA) RNA Protein A gene Gene expression Permanent copy Temporary copy Product FUNCTION Job Biological Processes

6 The primary data of DNA and protein sequences are held in large repositories such as the EMBL Nucleotide Sequence Database EMBL Nucleotide Sequence Database The latest release contains 114,475,051 sequences comprising 215,540,553,360 nucleotides But life sciences data comprises of much besides sequence data… Life Sciences Data is Complex

7 e.g. CATH protein structure classification

8 Life Sciences Data is Complex e.g. herpesvirus evolutionary tree

9 Life Sciences Data is Complex e.g. Kegg metabolic pathway

10 e.g. PubMed medical abstract Toxicol Appl Pharmacol Dec 1;201(2): Related Articles,Related Articles, Links cDNA microarray analysis of rat alveolar epithelial cells following exposure to organic extract of diesel exhaust particles. Koike E, Hirano S, Furuyama A, Kobayashi T. Particulate Matter (PM2.5) and Diesel Exhaust Particles (DEP) Research Project, National Institute for Environmental Studies, Tsukuba, Ibaraki, , Japan. Diesel exhaust particles (DEP) induce pulmonary diseases including asthma and chronic bronchitis. Comprehensive evaluation is required to know the mechanisms underlying the effects of air pollutants including DEP on lung diseases. Using a cDNA microarray, we examined changes in gene expression in SV40T2 cells, a rat alveolar type II epithelial cell line, following exposure to an organic extract of DEP. We identified candidate sensitive genes that were up- or down-regulated in response to DEP. The cDNA microarray analysis revealed that a 6-h exposure to the DEP extract (30 mug/ml) increased (>2-fold) the expression of 51 genes associated with drug metabolism, antioxidation, cell cycle/proliferation/apoptosis, coagulation/fibrinolysis, and expressed sequence tags (ESTs), and decreased (<0.5-fold) that of 20 genes. In the present study, heme oxygenase (HO)- 1, an antioxidative enzyme, showed the maximum increase in gene expression; and type II transglutaminase (TGM-2), a regulator of coagulation, showed the most prominent decrease among the genes. We confirmed the change in the HO-1 protein level by Western blot analysis and that in the enzyme activity of TGM-2. The organic extract of DEP increased the expression of HO-1 protein and decreased the enzyme activity of TGM-2. Furthermore, these effects of DEP on either HO-1 or TGM-2 were reduced by N-acetyl-l-cysteine (NAC), thus suggesting that oxidative stress caused by this organic fraction of DEP may have induced these cellular responses. Therefore, an increase in HO-1 and a decrease in TGM-2 might be good markers of the biological response to organic compounds of airborne particulate substances. PMID: [PubMed - in process] Life Sciences Data is Complex

11 e.g. Gene Ontology GO: : biological_process ( ) GO: : cellular_component ( ) GO: : molecular_function ( ) GO: : antioxidant activity ( 478 ) GO: : binding ( ) GO: : catalytic activity ( ) GO: : chaperone regulator activity ( 14 ) GO: : enzyme regulator activity ( 2087 ) GO: : molecular_function unknown ( ) GO: : motor activity ( 522 ) GO: : nutrient reservoir activity ( 36 ) GO: : signal transducer activity ( 8356 ) GO: : structural molecule activity ( 3428 ) GO: : transcription regulator activity ( 8552 ) GO: : negative regulator of basal transcription activity ( 15 ) GO: : RNA polymerase I transcription factor activity ( 31 ) GO: : RNA polymerase II transcription factor activity ( 982 ) GO: : RNA polymerase III transcription factor activity ( 41 ) GO: : transcription antiterminator activity ( 16 ) GO: : transcription cofactor activity ( 731 ) GO: : transcription factor activity ( 5510 ) GO: : transcription initiation factor activity ( 82 ) GO: : transcription initiation factor antagonist activity ( 9 ) GO: : transcription termination factor activity ( 38 ) GO: : transcriptional activator activity ( 499 ) GO: : transcriptional elongation regulator activity ( 97 ) GO: : transcriptional repressor activity ( 507 ) GO: : two-component response regulator activity ( 394 ) GO: : translation regulator activity ( 687 ) GO: : transporter activity ( 9054 ) GO: : triplet codon-amino acid adaptor activity ( 555 ) Life Sciences Data is Complex

12 Life Sciences Informatics in Birkbeck Comp Sci Evolutionary analysis: reconstruction of evolutionary events from genomic and related data Integration of life sciences data: data and knowledge management techniques to support the integration, analysis, mining and visualisation of life sciences data Medical informatics: data integration, semantic modelling, fuzzy inferencing and data mining techniques to support virtual integration of medical records For full details of topics, people, projects, publications… Example Research Areas:

13 Evolutionary Analysis Annotating evolutionary trees Mathematical models and algorithms addressing problems such as: Given an evolutionary species tree and a set of trees built on the same extant species according to similarity between individual gene families, find a mapping of the individual gene trees onto the species tree exhibiting gene duplications and losses to account for the differences Given an evolutionary species tree and patterns of presence/absence of genes in the extant species, compute evolutionary scenarios of gene gain, horizantal transfer and loss events to account for the patterns

14 Evolutionary Analysis Applied to the analysis of evolutionary gains and loss of functions in herpesvirus genomes Reconstructed history of HPF161 Host–virus interaction

15 Integration of Life Sciences Data Integrating transcriptomics and structural data to reveal protein functions: BioMapBioMap A data warehouse to support analysis and mining integrating data including microarray gene expression data, protein structure data, CATH structural classification data, functional data including Gene Ontology, KEGG (Gene, Orthology, Genome, Pathway…) Creation of a pilot Grid for proteomics resources: ISpiderISpider An integrated platform of proteomics resources supporting techniques for distributed querying, workflows and data analysis tasks in a Grid Research approach based on semantic mapping services using the techniques developed in the AutoMed project

16 Existing Resources PS WS PF WS TR WS GS WS FA WS PPI WS PID WS PRIDE WS PEDRo WS ISPIDER Resources Integrated Proteomics Informatics Platform - Architecture Vanilla Query Client 2D Gel Visualisation Client + Aspergil. Extensions + Phosph. Extensions PPI Validation + Analysis Client Protein ID Client Existing E-Science Infrastructure ISPIDER Proteomics Grid Infrastructure ISPIDER Proteomics Clients Public Proteomic Resources my Grid Ontology Services my Grid DQP DASAutoMed my Grid Workflows Proteome Request Handler Instance Ident/Mapping Services Proteomic Ontologies/ Vocabularies Source Selection Services Data Cleaning Services Phos WS WP1 WP2 WP3 WP4 WP5 WP6 WP3 KEY: WS = Web services, GS = Genome sequence, TR = transcriptomic data, PS = protein structure, PF = protein family, FA = functional annotation, PPI = protein-protein interaction data, WP = Work Package Web services

17 Medical Informatics ASsociation Studies assisted by Inference and Semantic Technologies – ASSIST ASsociation Studies assisted by Inference and Semantic Technologies – ASSIST 10 E.U. partners: U.K., Greece, Belgium, Germany, Spain The main objectives of ASSIST are to: Allow researchers to combine phenotypic and genotypic data Unify multiple patient records repositories Automate the process of evaluating medical hypotheses Provide an inference engine capable of statistically evaluating medical data Offer expressive, graphical tools for medical researchers to post their queries.

18 Medical Informatics ASSIST query processing builds on AutoMed technology with integrated ontology and inference rules capabilities

19 Making Sense of Life Sciences Data on-going and future research Some areas of on-going and future research automated reasoning using ontologies and wider domain knowledge automated reasoning using ontologies and wider domain knowledge evolutionary reconstruction exploiting domain knowledge evolutionary reconstruction exploiting domain knowledge analysis and mining of heterogeneous distributed resources analysis and mining of heterogeneous distributed resources metrics for data integration quality metrics for data integration quality The overarching motivation is the potential to make scientific discoveries that can improve quality of life

20 Some Collaborators Funding Further Information

Download ppt "Making Sense of Life Sciences Data Nigel Martin 21 st May 2008."

Similar presentations

Ads by Google