Presentation is loading. Please wait.

Presentation is loading. Please wait.

Integrating Ontological Prior Knowledge into Relational Learning for Protein Function Prediction TexPoint fonts used in EMF. Read the TexPoint manual before.

Similar presentations


Presentation on theme: "Integrating Ontological Prior Knowledge into Relational Learning for Protein Function Prediction TexPoint fonts used in EMF. Read the TexPoint manual before."— Presentation transcript:

1 Integrating Ontological Prior Knowledge into Relational Learning for Protein Function Prediction TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A AAAA A A AA Stefan Reckow Max Planck Institute of Psychiatry Volker Tresp Siemens, Corporate Technology

2 Page 1 Proteins and Protein Ontologies

3 Page 2 Protein and Protein Functions motivation motivation proteins – molecular machines in any organismproteins – molecular machines in any organism understanding protein function is essential for all areas of bio-sciencesunderstanding protein function is essential for all areas of bio-sciences diverse sources of knowledge about proteinsdiverse sources of knowledge about proteins challenges challenges experimental determination of functions difficult and expensiveexperimental determination of functions difficult and expensive homologies can be misleadinghomologies can be misleading most proteins have several functionsmost proteins have several functions

4 Page 3 Protein function prediction catalytic activity (catalyzes a reaction) isomerase activity intramolecular oxidoreductase activity intramolecular oxidoreductase activity, interconverting aldoses and ketoses triose-phosphate isomerase activity ( catalyzes a very specific reaction) specificity What function does this protein have?

5 Page 4 “Function” Ontologies function energy glycolysisfermentationrespiration aerobicanaerobic transcriptioncell fate cell growthcell death ontologies are a way of bringing order in the function of proteins ontologies are a way of bringing order in the function of proteins an ontology is a description of concepts of a domain and their relationships an ontology is a description of concepts of a domain and their relationships hierarchical representation (subclass-relationship) hierarchical representation (subclass-relationship) treetree directed, acyclic graphdirected, acyclic graph

6 Page 5 Complex Cytoskeleton Actin filaments Microtubules 10 nm filaments Intermediate filaments Septin filaments Proteasome Intracellular transport Clathrin Golgi transport complex: structure formed by a group of two or more proteins to perfom certain functions concertedly complex: structure formed by a group of two or more proteins to perfom certain functions concertedly “Complex” Ontology

7 Page 6 Ontologies as Great Source of Prior Knowledge in Machine Learning A considerable amount of community effort is invested in designing ontologies A considerable amount of community effort is invested in designing ontologies Typically this prior knowledge is deterministic (logical constraints) Typically this prior knowledge is deterministic (logical constraints) Machine Learning should be able to exploit this knowledge Machine Learning should be able to exploit this knowledge Interactions of proteins is an important information for predicting function: statistical relational learningInteractions of proteins is an important information for predicting function: statistical relational learning

8 Page 7 Statistical Relational Learning with the IHRM

9 Page 8 SRL generalizes standard Machine Learning to domains where relations between entities (and not just entity attributes) play a significant role SRL generalizes standard Machine Learning to domains where relations between entities (and not just entity attributes) play a significant role Examples: PRM, DAPER, MLN, RMN, RDN Examples: PRM, DAPER, MLN, RMN, RDN The IHRM is an easily applicable general model, performs a cluster analysis of relational domains and requires no structural learning The IHRM is an easily applicable general model, performs a cluster analysis of relational domains and requires no structural learning Z. Xu, V. Tresp, K. Yu, and H.-P. Kriegel. Infinite hidden relational models. In Proc. 22nd UAI, 2006 Z. Xu, V. Tresp, K. Yu, and H.-P. Kriegel. Infinite hidden relational models. In Proc. 22nd UAI, 2006 Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. & Ueda, N. (2006). Learning systems of concepts with an infinite relational model. AAAI 2006 Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. & Ueda, N. (2006). Learning systems of concepts with an infinite relational model. AAAI 2006 Statistical Relational Learning (SRL)

10 Page 9 Standard Latent Model for Protein Mixture Models Protein1Protein2 In a Bayesian approach, we can permit an infinite number of states in the latent variables and achieve a Dirichlet Process Mixture Model (DPM) In a Bayesian approach, we can permit an infinite number of states in the latent variables and achieve a Dirichlet Process Mixture Model (DPM) Advantage: the model only uses a finite number of those states; thus no time consuming structural optimization is required Advantage: the model only uses a finite number of those states; thus no time consuming structural optimization is required

11 Page 10 Infinite Hidden Relational Model (IHRM) Protein1 Protein2 Protein3 interact interact interact Permits us to include protein-protein interactions into the modelPermits us to include protein-protein interactions into the model

12 Page 11 Ground Network Z1Z1 motif complex function motifcomplex function motif complex function Z2Z2 interact Z3Z3

13 Page 12 Experimental Results KDD Cup 2001 Yeast genome data Yeast genome data 1243 genes/proteins: 862 (training) / 381 (test) 1243 genes/proteins: 862 (training) / 381 (test) Attributes Attributes ChromosomeChromosome Motif (351) [1-6]: A gene might contain one or more characteristic motifs (information about the amino acid sequence of the protein)Motif (351) [1-6]: A gene might contain one or more characteristic motifs (information about the amino acid sequence of the protein) EssentialEssential Structural class (24) [1-2] The protein coded by the gene might belong to one or more structural categories (24) [1-2]Structural class (24) [1-2] The protein coded by the gene might belong to one or more structural categories (24) [1-2] Phenotype (11)[1-6] observed phenotypes in the organismPhenotype (11)[1-6] observed phenotypes in the organism InteractionInteraction Complex (56)[1-3] The expression of the gene can complex with others to form a larger proteinComplex (56)[1-3] The expression of the gene can complex with others to form a larger protein Function (14)[1-4] (cell growth, cell organization, transport, … )Function (14)[1-4] (cell growth, cell organization, transport, … ) genes were anonymous genes were anonymous

14 Page 13 Results ROC curve Comparison with Supervised ModelsIHRM93.16 Krogel et al. 93.63 SVM93.48 Model Accuracy

15 Page 14 IHRM Result Node: gene Link: interaction Color: cluster.

16 Page 15 Integrating Ontological Prior Knowledge into the IHRM

17 Page 16 Integration of ontologies Deductive closure

18 Page 17 Integration of ontologies ZiZi motif function signal peptidase actin filamentsmicrotubules independent concepts dependent concepts cytoskeleton translocon complex

19 Page 18 Experiments: Including “Complex” Ontology Data collected from CYGD of MIPS 1000 genes/proteins: 800 (Training) / 200 (Test) 1000 genes/proteins: 800 (Training) / 200 (Test) Attributes Attributes chromosome, motif, essential, structural class, phenotype, interaction, complex, functionchromosome, motif, essential, structural class, phenotype, interaction, complex, function interactions from DIP interactions from DIP usage of ontological knowledge on complex usage of ontological knowledge on complex five levels of hierarchalfive levels of hierarchal in our model 258 nodes (concepts) using 66 top level categoriesin our model 258 nodes (concepts) using 66 top level categories every protein has at least one complex annotationevery protein has at least one complex annotation After including ontological constraints: about three annotations per protein on averageAfter including ontological constraints: about three annotations per protein on average

20 Page 19 Results 800 (training) / 200 (test) 200 (training) / 200 (test) w/o ontology: 0.895 with ontology: 0.928 w/o ontology: 0.832 with ontology: 0.894 AUC

21 Page 20 Results explicit modeling of dependencies

22 Page 21 Results proteins acting in cell division control proteins "Septins“: Septins have several roles throughout the cell cycle and carry out essential functions in cytokinesis The three highlighted proteins fit into this cluster ( "cell fate" and "cell type differentiation“) proteins concerned with secretion and transportation The "Golgi apparatus" works together with the "endoplasmatic reticulum (ER)" as the transport and delivery system of the cell. "SNARE" proteins help to direct material to the correct destination Test proteins also "cellular transport" Grey: in test set

23 Page 22 Results sampling convergence

24 Page 23 Results Distribution of proteins in the clusters

25 Page 24 Results Tasks occurring during DNA replication The former singleton "DNA polymerase", as a main actor in replication, obviously is assigned the correct cluster here Cellular Transport Cluster The former singleton "Clathrin light chain", as a major constituent of coated vesicles (a component for transport) fits into this cluster quite well Grey: former singletons

26 Page 25 Conclusion application of the IHRM to function prediction application of the IHRM to function prediction competitive with supervised learning methodscompetitive with supervised learning methods insights into the solutioninsights into the solution advantages of integrating ontological knowledge advantages of integrating ontological knowledge improvement of the clustering structureimprovement of the clustering structure robustness: stable results with varying parameterizationrobustness: stable results with varying parameterization deductive closure prior to learning is a general powerful principledeductive closure prior to learning is a general powerful principle future challenges future challenges usage of several or more complex ontologiesusage of several or more complex ontologies further analysis of dependent vs. independent conceptsfurther analysis of dependent vs. independent concepts Acknowledgements: Acknowledgements: Karsten Borgwardt (MPIs Tübingen); Hans-Peter Kriegel (LMU) Karsten Borgwardt (MPIs Tübingen); Hans-Peter Kriegel (LMU)


Download ppt "Integrating Ontological Prior Knowledge into Relational Learning for Protein Function Prediction TexPoint fonts used in EMF. Read the TexPoint manual before."

Similar presentations


Ads by Google