Presentation on theme: "The Consensus CoDing Sequence (CCDS) Database"— Presentation transcript:
1 The Consensus CoDing Sequence (CCDS) Database Kim D. PruittMouse Genome Annotation Summit MeetingMarch 12-13, 2008
2 Why is the CCDS project needed? The Problem:Annotation of the genome sequence is essential– but beware of different interpretations!The availability of the human and mouse genome sequence has had a significant impact on disease and health research.Most scientists rely on annotation information when designing, interpreting, and evaluating research results.Inconsistencies in annotation results among the main public resources hampers use of this important data.Researchers may not realize that a different annotation result is available elsewhere – possibly leading to erroneous or incomplete interpretations.
3 CCDS - A collaborative project Initiated by the main public annotation/browser groups to address concerns by the scientific community about inconsistencies in the human and mouse genome annotation.Built by consensus among the collaborating members, which include:European Bioinformatics Institute (EBI)National Center for Biotechnology Information (NCBI)University of California, Santa Cruz (UCSC)Sanger Institute (WTSI)
4 What is the CCDS project? Project Goalsidentify a core set of protein-coding genes that are consistently annotated and of high qualitysupport convergence toward a standard set of gene annotationsScope:Human and mouse protein coding regionsUpdate frequencyVariableDepends on frequency of genome annotation updates
5 Process flow – calculating updates NCBI (computational)Havana (manual)Ensembl (computational)RefSeq (manual)CompareCDS(Annotation+Sequence)EnsemblmergedannotationQAIdenticalSimilarNovelExisting CCDSRetainLostNew matchNew CCDS IDOut of scope
6 Quality assessment tests include: Assessing QualityCCDS status is conservatively applied:Annotated CDS coordinates are identicalAnnotation is of high quality and passes QA tests, or curator reviewExisting CCDS proteins can be flagged for review by the collaborating membersUpdates and removals are by consensus agreement.Quality assessment tests include:Consensus splice sites ("GY..AG" or "AT..AC")Valid start and stop codons with no internal stopsNMDLow complexityRepeat-containingInsufficient protein homologyGenome conservationPutative pseudogeneQA test results are reviewed by curatorsOver-rides are set to retain supported CDSs
8 Curation – how are updates curated and coordinated? Any member of the collaboration can flag a CCDS for reviewUpdate the CDS definition (alter N-terminus extent internal splice site)Withdraw the CCDS ID (insufficiently supported, or non-protein coding)NCBI provides a collaboration web site to coordinate this reviewAll collaborators must agree with a change to finalize a decisionWithdrawal of a CCDS may happen between genome annotation updatesAn update to a CCDS is indicated by:Status change: a status of ‘pending update’ is reported when there is collaborative agreement that a change is neededVersion change: The CCDS version number is incremented once the change is reflected in public annotation. This only occurs after a genome annotation update and CCDS analysis has taken place.CCDS curation is fully integrated with RefSeq curation
21 Current RefSeq representation of the region - two protein coding loci 1 vs. 2 vs. ‘n’ genesCurrent RefSeq representation of the region- two protein coding loci- one non-coding loci for the non-coding transcript product (a read-through transcript)Chkb (CCDS )Cpt1b (CCDS )Chkb-cpt1b (PMID: )
22 Translation start site Curation ConsiderationsPublication reports (CDS begins at ‘n’)Other cDNA sequencing reveals the ORF can be extended further upstreamEvaluate:Genome conservationLiterature reports for the proteinPutative Kozak signalsPresence of in-frame upstream stop codonINSDC submissions from an experimental lab source that do have the longer ORF extent annotated.Consult with an expert
Your consent to our cookies if you continue to use this website.