Presentation is loading. Please wait.

Presentation is loading. Please wait.

SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory.

Similar presentations


Presentation on theme: "SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory."— Presentation transcript:

1 SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory March 2002

2 SDM center Outline l Motivation l Approach l Specific use cases l Introduction to others

3 SDM center Different users end up doing the same thing. Motivated by current state of the art in genomics data access. Source Specific Schema The user is required to perform all data management tasks. dbEST SCoP SWISS-PROT User applications Transform Map data format similar concepts ParseAccess input/the data output PDB

4 SDM center What is the ideal environment? A single location that provides effective access to a consistent view of data and tools from many sources through an intuitive and useful interface. Transform Map data format similar concepts Parse Access input/ the data output : User applications

5 SDM center What is the ideal environment? A single location that provides effective access to a consistent view of data and tools from many sources through an intuitive and useful interface. Transform Map data format similar concepts Parse Access input/ the data output : User applications a realistic

6 SDM center SDM Center Data Integration Infrastructure Matt GUIGUI Query Dispatch and Collection (QDaC) : Medline XPath Wrapper XPath Wrapper VIPAR Wrapper Externa l Tools Metadata Registry XWrap DF PDB XPath Wrapper XPath Wrapper XPath Wrapper Model- Based Mediator Semantic Wrapper Semantic Wrapper Semantic Wrapper

7 SDM center The more sources queried, the more valuable the results :::::: Unfortunately, Matt cannot query all of the relevant data sources. Use case 1: Find everything related to a sequence Matt MILLAFSSGRRLDFVHRSGVF FFQTLLWILCATVCGTEQYFN Blast :::::: Provide access to many more sources than Matt currently has

8 SDM center Use case 1: Find everything related to a sequence :::::: Matt Additional Desired Capabilities Handle hundreds of sequences Search using other tools Preprocess sequence(s) Use results as input to other tools and queries Blast

9 SDM center Use case 2: Identifying model sequences Matt MILLAFSSGRRLDFVHRSGVF FFQTLLWILCATVCGTEQYFN Hundreds of sequences Clusfavor Gene name / accession # Genbank Sequence Blast against HTGS Model builder Homologs Filter Subseq to 2000bp Accession # Transfac Sequence Model sequence

10 SDM center Summary l Matt’s current research objectives focus on Use Case 2  That is our current target l Details of current status in following talks  Context-sensitive Service Composition for Support of Scientific Workflows  Mladen A. Vouk  XWRAPComposer: A wrapper generation system for Integrating Bioinformatics Data Sources  Ling Liu  Constructing Workflows by Integrating Interactive Information Sources  Amarnath Gupta

11 SDM center Questions?

12 SDM center People LLNL l Terence Critchlow (lead) Georgia Tech l Calton Pu l Ling Liu l David Buttler l Dan Rocco l Henrique Paques l Wei Han SDSC l Bertram Ludaescher l Amarnath Gupta l Ilkay Altintas Agent Technology l Tom Potok (ORNL) l Mladen Vouk (NCSU) Target Users l Matt Coleman (LLNL)  Allen Christian (LLNL)  Phil Bourne (PDB)

13 SDM center This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405- ENG-48.


Download ppt "SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory."

Similar presentations


Ads by Google