Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Refgene Database.

Similar presentations


Presentation on theme: "The Refgene Database."— Presentation transcript:

1 The Refgene Database

2 Currently we use Google Spreadsheets to Track Reference Genes

3 Currently we use Google Spreadsheets to Track Reference Genes

4 Problems with Google Spreadsheets
Occasional errors picking a gene previously selected Every group has their own Spreadsheet Makes cross-access difficult They are not integrated in any real way They are hard to Maintain Every month they are updated by hand Hand editing leads to mistakes

5 It would be Easier to Have a Database to Keep Track of Reference Genes
Data from all reference genomes would be integrated and searchable They could be automatically updated New genes and their homologs in every species could populate automatically from Kara’s homology compute. MODs could provide database reports to update stats on annotation progress Stats for pubs and grant updates could be retrieved easily

6 Proposed Interfaces Refgene interface: Used to set new target genes each month. As genes are targeted, homologs are automatically populated from P-POD clusters MOD interface: Used by MOD curators Allows MOD curators to hand-modify individual records

7 Fields Required homolog gene identifiers for a MOD
Date that the gene was deemed comprehensively annotated Date the gene was chosen as a reference gene

8 Functionality that should be on the Ref. Gen. Curation Home Page
Log in Enter new curation target genes Homologs: - Upload orthologs for those genes - Manually add orthologs - Enter 'curation status' Generate reports

9 The Database Should also Accept Automated Loads from MODs
Curation status Curation target genes MOD ID SGD SGD SGD

10 1. Curator log in - 1 Curator Login To Curation Central
Once logged in, the curator name and MOD can be assumed anywhere that data is needed. Also prevents random folks from editing and should probably also restrict editing capability to only homologs from the species you are a curator of? We all still need to be able to add new target genes regardless of our MOD however. Curator Login To Curation Central Login: Password: Login Reference Genomes Snapshot Target genes: 275 Species Homologs No homolog Comprehensive A.thaliana 220 10 190 C.elegans 200 30 189 D.rerio 175 35 170 D.discoideum 250 7 D.melanogaster 185 19 160 E.coli 100 25 98 G.gallus 120 H.sapiens 274 2 Etc… 86%

11 Target Completion Date (MM-YYYY)
1. Curator log in - 2 Curation Targets Logged in: doughowe | ZFIN Gene Name Completion target date homologs Report [can edit] View/enter homologs View Curation Report Upload New Targets Target Completion Date (MM-YYYY) Targets TAB file: Browse Upload Upload orthologs Select month: pull-down list Search Targets [view all] Gene Symbol Or Entrez Gene ID: Search result takes you to the homolog Add/Update page shown on the bottom of slide 13 for the specific target gene located in the search. More than one gene located by the search shows a list of these genes which then link to the homolog View/Add/Update page for each gene Search Reports: TO DO List (not comprehensive) ISS could be added Potential outliers View reports by gene: by organism: Access to reports could be done a couple of ways enter gene Symbol or MOD ID drop down list GO

12 2. Upload New Targets - 1 Check that all required data is included
Check that no genes on new targets list have been a target or called as homologs previously…if so, reject load and alert curator to any/all duplications so they can select new targets and try again.

13 Target Completion Date (MM-YYYY) Upload Targets TAB file:
2. Upload New Targets - 2 Gene Name MOD ID Completion target date [edit/autofill] (A) [edit/autofill] option 1 Upload New Targets Target Completion Date (MM-YYYY) Upload Targets TAB file: Browse option 2 Upload Upload Curator enters target date here; and that info is applied to the new table load file Gene Name MOD ID Completion target date RNR1 SGD CDC2 ADE4 POL6 MOD ID SGD SGD SGD Upload (A) We need to be able to enter either an ID or a gene name (in the same or in a different column (B) Need a check for genes already selected

14 2. Upload New Targets - 3 Your upload was successful!
Gene Name MOD ID Completion target date RNR1 SGD CDC2 ADE4 POL6 etc… Upload Your upload was successful! Error: CDC2 (SGD ) has already been selected. Please go back and replace this entry.

15 Target Completion Date (MM-YYYY)
3. Upload orthologs - 1 Logged in: doughowe | ZFIN Upload New Targets Target Completion Date (MM-YYYY) Targets TAB file: Browse Upload Upload orthologs Select month: GO pull-down list Search Targets [view all] Gene Symbol Or Entrez Gene ID: Search Reports: TO DO List (not comprehensive) ISS could be added Potential outliers View reports by gene: by organism: enter gene Symbol or MOD ID drop down list GO

16 3. Orthologs Upload orthologs
loads calculated data from P-POD and other available methods Select month: GO S. cerevisiae E. coli H. sapiens D. discoideum D. reiro D. melanogaster M. musculus etc RNR1|SGD nrdA|P00832 RNR1|P34234 rnrA|DDB098908 rnra|DR23423 rnrA|DM8787 RNR1|MGI:2332 CDC2|SGD no ortholog found PK12 cdc22 ADE4|SGD adeS|P32434 POL6|SGD polR|P3434 etc…

17 Clicking View/Enter homologs goes to new Target gene-specific
Target Gene Symbol Completion target date Homologues Report POLA P09884 Aug 2006 View/enter homologs View Curation Report ADH1A P07327 Nov 2007 GOT1 P17174 Jan 2008 GOT2 P00505 Clicking View/Enter homologs goes to new Target gene-specific page to edit/add homologs Homology Determination Methods View/Enter/Edit Homolog page: Target Gene: POLA (H.Sap) Published homolog Manual Analysis HomoloGene InParanoid TreeFam OrthoMCL P-POD Organism Gene Symbol Comprehensive curation (date) Curator D.mel pol1a 1/20/2008 [Curator name] [note] D.rer pola - M.mus No homolog 4/10/2007 C.ele Species PD curator enters symbol curation date curator name PD [note] This is an interface to support adding individual homologs to specific target genes as well as edit previously added homologs..more detail next slide Add

18 curator enters symbol/ID
Homology Determination Methods View/Enter/Edit Homolog page: Target Gene: POLA (H.Sap) Published homolog Manual Curation HomoloGene InParanoid TreeFam OrthoMCL P-POD Organism Gene Comprehensive curation (date) Curator Submit curator enters symbol/ID curation date curator name [note] H. sapiens C. elegans D. Discoideum D. melanogaster E. Coli M. musculus r. norvegicus S. pombe Etc… Curators enter gene symbols OR MOD object IDs. TAIR gene symbols are not unique. pull down menu listing curators. If user login is supported, this column could probably be dropped and would be assumed behind the scenes based on the user login. Currently logged in user displayed at top of page like this: Logged In: dhowe|ZFIN. A checkbox set to assert which homology determination methods were used to support the homology call. Full set may be more than shown here. pull down menu OR organism already chosen based upon curator log in name Clicking submit adds a new homolog to the DB after checking that the gene has not already been added as a target gene or homolog Notes are used to describe anything specific about the orthology call

19 4. Reports View curation report: POLA Options: select all/unselect all
Organism Homologue Date comprehensively curated Arabidopsis thaliana AT541 Caenorhabditis elegans CE1121 Danio rerio CD2121 CD2122 Dictyostelium discoideum no homolog Drosophila melanogaster DM0022 Escherichia coli Gallus gallus GGAA11 Homo sapiens P09884 Mus musculus MGI: Rattus norvegicus RGD: Saccharomyces cerevisiae Schizosaccharomyces pombe Options: select all/unselect all View annotations: * all * non-IEA * experimental


Download ppt "The Refgene Database."

Similar presentations


Ads by Google