Presentation is loading. Please wait.

Presentation is loading. Please wait.

23-06-2015 DI FC UL1 Gene Function Prediction by Mining Biomedical Literature Pooja Jain Master in Bioinformatics Supervisor - Mário Jorge Costa Gaspar.

Similar presentations


Presentation on theme: "23-06-2015 DI FC UL1 Gene Function Prediction by Mining Biomedical Literature Pooja Jain Master in Bioinformatics Supervisor - Mário Jorge Costa Gaspar."— Presentation transcript:

1 23-06-2015 DI FC UL1 Gene Function Prediction by Mining Biomedical Literature Pooja Jain Master in Bioinformatics Supervisor - Mário Jorge Costa Gaspar da Silva Co-Supervisor - Jörg Dieter Becker

2 23-06-2015 DI FC UL2 Introduction: Context The central problems of post genomic era: Data Management Annotation of the Data Annotation is Crucial Main source of annotations : Published Literature

3 23-06-2015 DI FC UL3 Introduction: Motivation & Objective The main motivations were, The lack of functional annotations To decrease the time and efforts for manual annotation Prediction of functions for genes from biomedical literature

4 23-06-2015 DI FC UL4 Outline of Presentation  Introduction  Concepts  ProFAL  APEG  Results  Conclusions & Future Directions

5 23-06-2015 DI FC UL5 Some Concepts Text Mining Ontology Information Modeling

6 23-06-2015 DI FC UL6 Outline of Presentation  Introduction  Concepts  ProFAL  APEG  Results  Conclusions & Future Directions

7 23-06-2015 DI FC UL7 Related Work: ProFAL Biological Database Literature Database ProFAL Annotations GOA GO Terms Validated Annotations FiGO Retrieval Extraction Validation Relevant Documents

8 23-06-2015 DI FC UL8 Outline of Presentation  Introduction  Concepts  ProFAL  APEG  Results  Conclusions and Future Directions

9 23-06-2015 DI FC UL9 APEG (Arabidopsis Pollen Expressed Genes) What is APEG ? Repository of Arabidopsis pollen expressed genes Web interface for different user types What are its contents? Results from expression studies Cross references to GenBank, SwissProt and TAIR Cross references to relevant literature Automatically extracted knowledge

10 23-06-2015 DI FC UL10 ProFAL APEG : Class Model

11 23-06-2015 DI FC UL11 Population of APEG Genome Chip Get probe set identifier Search at TAIR Get TAIR Id Get SwissProt Id Get GenBank Id Search at Pfam Get Family Input for APEG

12 23-06-2015 DI FC UL12 Document Retrieval TAIR Id SwissProt Id GenBank Id Search at PubMed Get PubMed Id Get Abstract

13 23-06-2015 DI FC UL13 Annotation Extraction GO Term from GOA lateral root morphogenesis

14 23-06-2015 DI FC UL14

15 23-06-2015 DI FC UL15

16 23-06-2015 DI FC UL16

17 23-06-2015 DI FC UL17

18 23-06-2015 DI FC UL18

19 23-06-2015 DI FC UL19

20 23-06-2015 DI FC UL20

21 23-06-2015 DI FC UL21

22 23-06-2015 DI FC UL22

23 23-06-2015 DI FC UL23 Outline of Presentation  Introduction  Concepts  ProFAL  APEG  Results  Conclusions and Future Directions

24 23-06-2015 DI FC UL24 Automatic Extraction Inspection Comparison Manual Extraction ObservedExpected (By ProFAL)(By Curator) Results : Evaluation and Validation Document Retrieval

25 23-06-2015 DI FC UL25 Results : Document Retrieval 55 distinct documents to 71 genes out of 147 genes (48%) using 117 distinct citations SP = SwissProt, GB = GenBank

26 23-06-2015 DI FC UL26 Results : Annotation Extraction

27 23-06-2015 DI FC UL27 Results : Observations Documents retrieved for 48% of genes. Low precision and recall

28 23-06-2015 DI FC UL28 Results Analysis The main reason was An High number of false positives FP annotations were derived from: Terms other than Molecular Function GO Obsolete and Non existing GO terms In coherent Evidence text Evidence texts containing numbers, abbreviations and negation

29 23-06-2015 DI FC UL29 Improved Results Improvements implemented Use of GO terms only from Molecular Function gene ontology Avoid obsolete and non existing GO terms

30 23-06-2015 DI FC UL30 Discussion Specific Annotations Existing Vs Extracted Annotations – TP annotations for 20 genes out of 31 genes Probable Functions – 21 functions for 8 genes out of 31 genes

31 23-06-2015 DI FC UL31 Outline of Presentations  Introduction  Concepts  ProFAL  Approach  Results  Conclusions and Future Directions

32 23-06-2015 DI FC UL32 Conclusions APEG Database System Improvements in ProFAL In my opinion Text mining is useful for Biologists

33 23-06-2015 DI FC UL33 Future Directions Improvement in Document Retrieval Integration of a NLP technique Usability study of the proposed approach Validation of approach with a larger set of genes

34 23-06-2015 DI FC UL34 Key References  Couto, F., Silva, M. & Coutinho, P. (2004). FiGO: Finding GO terms in unstructured text. EMBO BioCreative Workshop - Handouts, Granada, Spain.  Becker, J.D., Boavida, L., Carneiro, J., Haury, M. & Feijó, J.A. (2003). Transcriptional profiling of Arabidopsis tissues reveals the unique characteristics of the pollen transcriptome. Plant Physiology, 133, 713-725.  Couto, F., Silva, M. & Coutinho, P. (2003). ProFAL: PROtein Functional Annotation through Literature. In E. Pimentel, N.R. Brisaboa & J. Gomez, eds., VIII Conference on Software Engineering and Databases (JISBD), 747-756, Alicante, Spain.  Shatkay, H. & Feldman, R. (2003). Mining the biomedical literature in the genomic era: an overview. Journal of Computational Biology, 10, 821-55, PMID: 14980013.  Mack, R. & Hehenberger, M. (2002). Text-based knowledge discovery: search and mining of life-sciences documents. Drug Discovery Today, 7, S89-S98.  The Gene Ontology Consortium (2001). Creating the Gene Ontology Resource: Design and Implementation. Genome Research, 11, 1425-1433.

35 23-06-2015 DI FC UL35 Thank you for your attention http://xldb.fc.ul.pt/apeg/


Download ppt "23-06-2015 DI FC UL1 Gene Function Prediction by Mining Biomedical Literature Pooja Jain Master in Bioinformatics Supervisor - Mário Jorge Costa Gaspar."

Similar presentations


Ads by Google