Presentation is loading. Please wait.

Presentation is loading. Please wait.

Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846.

Similar presentations


Presentation on theme: "Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846."— Presentation transcript:

1 Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

2 1.Some of what we’ve been doing :Confirmation of predicted/hypothetical proteins in chicken 2. Something of more interest to almost everyone in here for analyzing your data.

3 Educate researchers who need to use GO. University of Delaware, 12-13 November, 2007. …… currently working with researchers from the Universities of Delaware and Maryland to provide GO annotations necessary to facilitate publication of array data. First residential workshop at MSU in May 20-22 2008.

4 Avian Genome Conference 18-20 May, 2008 GO Annotation Jamboree 21-22 May, 2008 agbase@cse.msstate.edu

5

6 “Hypothetical” and “predicted” proteins Naive and activated purified CD4+ T cells; transformed CD4+ T cells; spleen; brain tissues; bursal B and stromal cells; muscle; and serum. Database of all predicted proteins, from chicken build 2.1, using DFF-2D LC MS2 and our computational pipeline. Experimentally-confirmed 7,809 chicken predicted proteins: 52% were expressed in more than one tissue. 6,027 (77%) of these proteins mapped to human and mouse orthologs and we assigned standardized nomenclature to 5,326 (64%). 8,213 GO associations to 21% of the identified chicken proteins using the ISS evidence code to transfer function between human-chicken and human-mouse orthologs increased the current chicken GO annotations by 8% and doubled the number of chicken manually- curated annotations. In PRIDE and NCBI databases and being used at NCBI to promote XP (computational model) to NP (confirmed product) accessions i.e. the words “hypothetical” and “predicted” are removed. We also add experimentally-derived cell component GO annotations.

7 48% (3,779) 1% (61) 4% (313) 7% (561) 26% (2,020) 14% (1,073) 0% (0) 0% (2) In one tissueIn two tissuesIn three tissuesIn four tissues In five tissuesIn six tissuesIn seven tissues In all eight tissues Tissue distribution of expressed ‘predicted’ proteins 0 1000 2000 3000 4000 5000 6000 Spleen UA01Stroma Tcells B-cells Serum Muscle Brain Tissue type Number of proteins Tissue specific proteins Proteins identified in other tissues

8 chicken: human/mouse orthologs (1:1) 236 Mouse orthologs Human orthologs 5,685106 No human or mouse orthologs 1,784

9

10

11 Cumulative external visits to AgBase 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 05 06 07 JAuSeOcNoDeJaFeMaApMaJuJAuSeOcNoDeJaFeMaApMaJuJAuSeOcNoDe 07

12

13

14 Summary of GO annotations for last 12 months 11,716 GO annotations for chicken & cow: 214 cow gene products GO annotated (1,521 GO annotations) 1,762 chicken gene products GO annotated (10,194 GO annotations) in addition, orthology with human and mouse genes used to GO annotate 7,809 computationally ‘predicted’ chicken proteins (8,213 GO annotations)

15 Annotation metrics

16

17 Database distribution of AgBase GO Annotations AgBase Community file GO Consortium file Chicken Dec '07 Cow Dec '07

18 GO Annotation of Arrays

19 Functional annotation using Gene Ontology Nomenclature (species’ genome nomenclature committees) Other annotations using other bio- ontologies e.g. Anatomy Ontology Structural Annotation including Sequence Ontology Genomic Annotation

20 Quality improvement of annotations Pre-annotationRe-annotation

21 GO annotation of arrays. Array IDs ‘known’ genes from public databases ‘predicted’ genes from genome sequencing Are strict mammalian orthologs available ? GO annotation of literature Is functional literature available ? Gene product IDs Electronic GO annotation using InterPro data (IEA) GO annotation from orthologs (ISO) Collate GO annotations Submit to EBI-GOA, GOC YES NO structural mapping link to array IDs (updateable)

22 AgBase: annotating arrays 1. Del-Mar 14K Chicken Integrated Systems microarray (GPL1731). 14,053 chicken genes represented 9,587 contigs GO annotated (CC:3,514; MF:6,640; BP:4,623) 3,101 singletons GO annotated (CC:487; MF: 881; BP:646) many singletons map to chicken ESTs with no associated GO

23 metabolic process transport cell communication development immune response cell death cell differentiation response to stress sensory perception cell motility regulation of biological process cellular organization and biogenesis behavior response to chemical stimulus process unknown Figure 1A: Biological Process associated with Del-Mar 14K array

24 Relative amount of GO BP associated with Del-Mar 14K array compared to total chicken GO. -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 development immune response cell death response to stress process unknown cell motility cell differentiation behavior transport regulation of biological process sensory perception response to chemical stimulus secretion cellular organization and biogenesis response to stimulus metabolic process cell communication Array GO/total chicken GO GO Biological Processes

25 AgBase: annotating arrays 2. TAMU Agilent 44K chicken array approx 44,000 chicken genes represented added GO annotation for 8,731 chicken gene products many of the array IDs with no associated GO annotation map to chicken EST sequences

26 AgBase: annotating arrays 3. FHCRC Chicken 13K v2.0 (GPL1836) 13,007 chicken genes represented 2,491 array IDs mapped to chicken gene products & GO annotated 628 mapped to chicken gene products with no GO approx 2,000 array IDs mapped to human or mouse gene products with GO annotation

27 GO Annotation Quality Score: “GAQ” GAQ : no. annotations; DAG depth; GO evidence code calculate overall GAQ score for any dataset (eg. array) calculate GAQ for subsets (eg. biological processes studied using arrays)

28 “Gene Ontology” “Biological Process” IEA inferred from electronic annotation ISS inferred from sequence similarity IMP inferred from mutant phenotype IGI inferred from genetic interaction IPI inferred from physical interaction IDA inferred from direct assay IEP inferred from expression pattern TAS traceable author statement NAS non-traceable author statement ND no biological data available RCA inferred from reviewed computational analysis IC inferred by curator Evidence Code Your Favorite Gene Low GAQ score  Your NEW Favorite gene High GAQ score

29 Quantification of re-annotation Metrics GranularitySpecificity # previous annotations# chicken annotations # re-annotations# human/mouse annotations Quality Gene Annotation Quality (GAQ) score

30 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Whole Array Chicken Human/Mouse Annotation type Number of annotations Pre-annotation Re-annotation 13% of previous annotations to other species were corrected to chicken specific annotations 300% increase 50% increase 700% increase GRANULARITYSPECIFICITY Bart van den Berg, CVM MSU/ Sue Lamont and Huaijun Zhu

31 2.8579,599207,869Total GAQ score 4.84,240886Total # proteins (Breadth) 2.8108,53739,355Confidence score total 2.7231,18487,250Depth Fold differenceRe-annotationPre-annotation GAQ score summary

32 Quality improvement of annotations Pre-annotationRe-annotation

33 GO biological process annotations -4.88 -3.61 -1.80 -0.75 -0.04 0.18 0.33 0.46 1.04 1.06 1.26 1.64 5.12 -6 -4 -2 0 2 4 6 cell communication metabolic process catabolic process transport regulation of biological process Macromolecule metabolic process biological_process cell motility response to stimulus Nucleobase, nucleoside, nucleotide and nucleic acid metabolic process cell differentiation cell death multicellular organismal development GO Term Relative difference microarray GO / total chicken GO

34 Modeling using the GO Functional Understanding ImpliedDerived Physiology (= Cellular Component + Biological Process + Molecular Function) Network ModelingGene Ontology (interactions)

35 Hypothesis-driven GO-based data interrogation Buza, J. J. and S.C. Burgess. Modeling the proteome of a Marek's disease transformed cell line: a natural animal model for CD30 over-expressing lymphomas. Proteomics, 2007. 7:1316-26.

36 Avian Genome Conference 18-20 May, 2008 GO Annotation Jamboree 21-22 May, 2008 agbase@cse.msstate.edu


Download ppt "Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846."

Similar presentations


Ads by Google