Presentation is loading. Please wait.

Presentation is loading. Please wait.

GeneConnect Use Cases and Design August 3, 2006. GeneConnect Database IDs are linked by Direct Annotation, Inferred Annotation, or Sequence Alignment.

Similar presentations


Presentation on theme: "GeneConnect Use Cases and Design August 3, 2006. GeneConnect Database IDs are linked by Direct Annotation, Inferred Annotation, or Sequence Alignment."— Presentation transcript:

1 GeneConnect Use Cases and Design August 3, 2006

2 GeneConnect Database IDs are linked by Direct Annotation, Inferred Annotation, or Sequence Alignment GenBank mRNA (no RefSeq) Ensembl Transcript Ensembl Protein GenBank Protein (no RefSeq) RefSeq mRNA RefSeq Protein UniProtKB Ensembl Gene UniGene Entrez Gene Gene mRNA Protein

3 GeneConnect UML Model Genomic Identifier Standard CDEs

4 Basic Genomic ID Search Find the all of the other gene IDs (UniGene, Ensemble Gene) that correspond to Entrez Gene A1. Find the Ensembl Gene and Ensembl Transcript IDs that correspond to Entrez Gene ID A1. Entrez Gene ID Ensembl Gene ID Ensembl Transcript ID A1B1C1 A1B1C2 A1B2C3

5 Basic Genomic ID Search Search on one or more attributes within a gene, mRNA, or protein class and return results from that search as a list of objects of the same class Traverse the model to get data from the other classes

6 GeneConnect UML Model Limit result set by confidence score, ONT, and link type

7 Limit Query Based on Confidence Find the Ensembl Gene and Ensembl Transcript IDs that correspond to Entrez Gene ID A1 and where the result set has a confidence score > 0.5. Entrez Gene ID Ensembl Gene ID Ensembl Transcript ID Confidence A1B1C10.7 A1B1C20.2 A1B2C30.1

8 Limit Query Based on Confidence Search on one or more attributes within a gene, mRNA, or protein class with a given or higher confidence score (from GenomicIdentifierSet) Traverse the model to get data from the other classes

9 Limit Query Based on Order of Node Traversal (ONT) Find the Ensembl Gene and Ensembl Transcript IDs that correspond to Entrez Gene ID A1 and where the ONT is Entrez Gene  Ensembl Gene  Ensembl Transcript. Entrez Gene ID Ensembl Gene ID Ensembl Transcript ID ConfidenceONT A1B1C10.7EnzG -> EnsG->EnsT A1B1C20.2EnzG -> RefSeqT -> EnsT->EnsG A1B2C30.1EnzG -> RefSeqT -> EnsT->EnsG

10 Limit Query Based on Order of Node Traversal (ONT) Search on one or more attributes within a gene, mRNA, or protein class with a given ONT Traverse the model to get data from the other classes

11 Limit Query By Node Traversal Find the Ensembl Gene and Ensembl Transcript IDs that correspond to Entrez Gene ID A1 but use only Ensembl Gene and Ensembl Transcript for traversal. Entrez Gene ID Ensembl Gene ID Ensembl Transcript ID ConfidenceONT A1B1C11.0EnzG -> EnsG->EnsT

12 Limit Query By Node Traversal Search on one or more attributes within a gene, mRNA, or protein class with a given set of nodes for traversal Traverse the model to get data from the other classes

13 GeneConnect UML Model Limit result set by ID frequency

14 Limit Query by ID Frequency Genomic ID Frequency A11 B10.67 B20.33 C10.33 C20.33 C30.33 Entrez Gene ID Ensembl Gene ID Ensembl Transcript ID Confidence A1B1C10.7 A1B1C20.2 A1B2C30.1 Find the Ensembl Gene and Ensembl Transcript IDs that correspond to Entrez Gene ID A1 and that have a frequency of at least 0.5.

15 Limit Query by ID Frequency Search on one or more attributes within a gene, mRNA, or protein class with a given set of minimum ID frequencies Traverse the model to get data from the other classes

16 GC Architecture Diagram Web Server AnnotationParser Library Gene Connect Server Data Downloader Thread Data file queue Data Transformer Thread Database Loader Thread Parsed data file queue Gene Connect Database Correlate Genomic Identifiers Push Downloaded file in queue Transformed data file Consume downloaded File Download Data File using FTP, HTTP API Write data to GeneConnect database Spawn new thread Consume parsed file HTTP request Objects JOBMANAGERJOBMANAGER API caCORE API caGRID API Public Data Sources Unigene Ensembl Web browser Java Apps XMLRPC Server (for BLAST) External Parsers

17 Design principles Extensible annotation server – reused from caFunctionExpress code base Ability to add new parsers without making any code change to the framework Parsers can be written in any language and plugged in the framework

18 Query Interface caCORE like API caGrid API –caCORE APIs will be modified/extended to implement the business logic specific to GeneConnect. Web Interface –Calls the caCORE API’s internally to get the results of user query.


Download ppt "GeneConnect Use Cases and Design August 3, 2006. GeneConnect Database IDs are linked by Direct Annotation, Inferred Annotation, or Sequence Alignment."

Similar presentations


Ads by Google