Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF.

Similar presentations


Presentation on theme: "1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF."— Presentation transcript:

1 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

2 2 Motivation Online biological data: Highly diverse in granularity and variety Various formats Different terminologies, ID systems, units

3 3 How to Build a Gene Extraction Ontology? Concepts Relationship sets Constraints Data Frames

4 4 How to Build a Gene Extraction Ontology? (G*A*U*C*)* (G*A*T*C*)*

5 5 Knowledge Sources Gene Ontology Thousands of terms All Species Toolkit 1,231,935 species names Protein Databases Thousands of protein names (Molecular Function, Biological Process,Molecular FunctionBiological Process, Cellular Component Cellular Component )

6 6 Extraction Rules Statistical NLP Machine learning Naïve Bayes Hidden Markov Models Decision Trees

7 7 Integration

8 8

9 9

10 10

11 11

12 12

13 13 Integration Information Hidden behind Links

14 14

15 15

16 16

17 17 Query-based Extraction Query the gene extraction ontology Find applicable resources Fill out forms Extract information

18 18 Query-based Extraction Example: “Find the alfR gene, its sequence, its protein's function, and any mutant that inhibits this gene.” Gene Name Gene Sequence Gene Mutant Protein Function Mutant Function

19 19

20 20

21 21

22 22

23 23 Contribution Provides a way to automatically integrate online biological data from different sources Provides an approach that can find proper online resources, fill out online forms and extract data depending on user’s query


Download ppt "1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF."

Similar presentations


Ads by Google