Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

Similar presentations


Presentation on theme: "1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui."— Presentation transcript:

1 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui Yao, Xiaofang Zhang Instructor/Organizer: Dejing Dou Week 10 (Dec. 1)

2 2 Outline Personal Information Management (PIM) Semantic Integration in PIM Medical Informatics and Bioinformatics Semantic Integration in Biomedical Informatics

3 3 Personal Information Personal Information Homepages (HTML, XML) Personal Emails (Text) Spreadsheets (E.g. Microsoft Excel ) Contact Lists (Text) Calendar Publications and Presentations ( Word, Latex, PowerPoint ) Personal Databases (SQL)……

4 4 Personal Information Management (PIM) How to organize personal information resources – They are currently organized by applications and locations. How to integrate and share the data – Mostly manually (e.g. copy&paste) How to search (query). – E.g. Prof. Wang wants to know the papers his students presented in the conferences and travel expenses from grants. Good news: The development of Internet, Web and Wireless communication makes personal information accessible from desktop, laptop, palm and cellphone. The problems: Different formats and data structures, different contents based on applications.

5 5 Association(Relationship)-based PIM Organize the personal information resources based on their associations (relationships). – Emails  Contact Lists – Homepage  Publications – Calendar  Spreadsheets Use a domain ontology to define those concepts and store associations (relationships) as mappings. Develop an integration engine to process the data and query based on the domain ontology and mappings.

6 6 Association(Relationship)-based PIM (cont ’ d) Domain ontology Person Homepage Contacts SpreadSheet Publications Calendar Emails Information Resources (Data) Integration Engine User Personal DBs SQL

7 7 Main Topics in Association-based PIM How to integrate structured data and unstructured data – Databases and SpreadSheets are structured, XML and Latex are semi-structured. – Emails, HTML, Contacts, Word are unstructured text. How to define the domain ontology. The concepts of different resources use different hierarchy. How to express the mapping (rules) of different information resources. How can integration engine use those mappings to integrate data and answer query. – Emails  Contact Lists – Homepage  Publications  Personal Databases – Calendar  Spreadsheets

8 8 Bioinformatics and Medical Informatics What it is The analysis of biological and medical information using computers and statistical techniques; the science of developing and utilizing computer databases and algorithms to accelerate and enhance biological and medical research. What it can do – In genomics, bioinformatics includes the development of methods to search databases quickly, to analyze DNA sequence information, and to predict protein sequence and structure from DNA sequence data. – In neuroscience, medical informatics can analyze the EEG and MRI data to study functions of neurons and human brain. – In pharmacy, medical informatics can help study drug use and drug interactions. – In clinical study, medical informatics (e.g. expert system) can help study diseases and treatment of patients.

9 9 Good news and problems Good news – Most biomedical data has been stored in databases. They are structured data. – Statistics-based data mining techniques has been used successfully to get the pattern of data. Problems in biomedical data integration. – Most biomedical databases were developed locally and application- oriented, there is few agreement in their schemas. – It is difficult for other people, especially people without biomedical knowledge, to understand the schemas. – Database schemas are not expressive for the meaning (“semantics”) of data and pattern of data.

10 10 Integrating Neuronal Databases Cooperation with Yale Medical Informatics Center to integrate Senselab (Yale) and CNDB (Cornell)’s web- based neuronal databases. – Senselab: model and structure information of a particular class of neurons. – CNDB: experimental data for individual neurons measured at a particular day. Researchers in Senselab have marked up their data and database schema with EDSP[Marenco etal03], an XML specification. Cornell’s researchers also have marked up their data and database schema with another XML dialect. Structure image Experimental EEG Data Electroencephalography

11 11 Integrating Neuronal Databases(cont ’ d) Get their database schemas from XML files and transform them to class and property definitions. Find the mapping of these two neuronal database schemas with the help of domain experts, neuroscientists. Merge these two database schemas with bridging axioms. e.g: (forall (n - neuron) (if (@cndb:funct_area n hippocampal.CA1) (@senselab:Neurons @senselab:Hippocampus n))) We have developed some initial semi-automatic tools and GUIs to help domain experts, such as neuroscientists, to map and merge two neuronal database schemas.

12 12 Interactive Axioms Composition by Domain Experts Ontology Mapping by similarity matching using dictionaries. e.g. Protein vs. Enzyme Axiom Production: Allow Domain Experts give some concrete examples about how two symbols in different ontologies (database schemas) are related. Generalize examples to usable bridging axioms, an machine learning approach to generate mapping rules. Pattern Reuse: Based on the fact a large number of correspondences can usually be sorted into a small set of patterns, allow domain experts to note and reuse these patterns. Consistency testing: D etect contradiction of generated bridging axioms; Display the bugs to domain experts and allow axioms to be edited.

13 13 The mappings between EEG and MRI data EEG Data acquisition Magnetic resonance imaging (MRI)

14 14 Ontology-based Data Analysis (Mining) You can consider it as an expert system. At least useful for training purposes. Data M Data R Inference Engine OROR OMOM EEG, MRI …data Computational tools What are the features (patterns) of processed data What can the patterns tell us (e.g. any function and disease of brain)

15 15 Ontology-based Genome DB Mediation Integrating databases with the domain ontology. The system can process meaningful query and data based on the mapping rules. …… DB 2 DB 1 DB 3 Onto 1 Domain Ontology (includes GO) Onto 2 Onto 3 Query based on domain ontology e.g. ZFINe.g. another Zebrafish Lab DB e.g. Human DB

16 16 Genotypes + Environment => Phenotypes Data P Data G OGOG OPOP The features (makeup) of Gene The Observable characteristics produced by genotype interacting with the environment Data E OEOE + Environment Features GO (gene ontology) Cellular Component Molecular Functions Biological Process temperature pressure light …… Too many Features


Download ppt "1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui."

Similar presentations


Ads by Google