Presentation is loading. Please wait.

Presentation is loading. Please wait.

ESTminer CHADO adaptor The University of Georgia Alan Gingle, Yecheng Huang,

Similar presentations


Presentation on theme: "ESTminer CHADO adaptor The University of Georgia Alan Gingle, Yecheng Huang,"— Presentation transcript:

1 ESTminer CHADO adaptor The University of Georgia Alan Gingle, agingle@uga.eduagingle@uga.edu Yecheng Huang, yhuang@uga.eduyhuang@uga.edu http://cggc.agtec.uga.edu/ Nov 1, 2004

2 Introduction Purpose of this presentation is to draft an EST chado schema that is open for community comments Examples are used to demonstrate our approach to applying CHADO to EST data. Contents: ESTMiner_CHADO schema overview Control Vocabulary -- Ontology and definition Feature, and its properties, relationship and location Appendix (example used in slides, minor tables)

3 ESTminer CHADO schema overview Major part of CHADO that is relevant to the ESTMiner project

4 EST Control vocabulary I - Ontology 1: Read3’’ 8: Scr1o 7: GB_ACC_# 4: Cluster 3: Contig 2: Sequence 6: Library 9: Scr1e 12: QUAL16o 10: Scr2o 11: Src2e 13: QUAL16e 15: QUAL20e 14: QUAL20o 16: GB_Access 17: Identity_threshold 18: Length_threshold 19: Library_name 20: stage 24: strain 23: organ 21: cultivar 22: cell_type … 25: Organism 26: imo27: ipo 5: ESTName … … 27: numofcontig 26: numofSeq …

5 EST Control vocabulary II - Definition cvterm_id12345678 nameRead3sequenceContigClusterNameLibGB_AccessScr1o definition3’ read EST Sequence EST Contig EST Cluster EST Name Library GenBank Access Number Screen offset 1 cvterm_id910111213141516 nameScr1eScr2oScr2eQUAL16oQUAL16eQUAL20oQUAL20e definition Screen end 1 Screen offset 2 Screen end 2 Quality16 offset Quality16 end Quality20 offset Quality20 end cvterm_id1718192021222324 nameIdentity_thresholdLength_thresholdLibrary_namestagecultivarCell_typeOrganstrain definition cvterm_id2526272829… nameOrganismimoiponumofseqnumofcontig… definitionOrganism and speciesIs member ofIs part ofNumber of seqNumber of contig… insert into cv (cv_id,name,definition) values (1, ‘CGGC_UGA‘,’University of Georgia, Comparative Grass Genomic Center’ ); insert into cvterm(cvterm_id, cv_id, name, definition, dbxef_id) valuses (1, 1, ‘Read5’, ‘5\’ read’, 1 );

6 EST Feature insert into feature (feature_id, uniquename, residues, seqlen, type_id, …) values (1, ‘IP1_1_F11.g1_A002‘, ‘TGAG…CATTT’, 788,1,… ); feature_id123 uniquenameIP1_1_F11.g1_A002IP1Q20_1 residues TGAG…CATTTTTT...TGGA seqlen788579 type_id161 feature_id456 uniquenameQ16_ 1 CTGSB_100848CLSB_1540 residues TTT…TTCCGATConsensus residues seqlen618…… type_id134 **** Check the example at the appendix ****

7 EST Feature and Properties feature table Feaure_idUniquenameType_id 1IP1_1_F11.g1_A0021(sequence) 2IP16(Library) 5CTGSB_1008483(contig) 6CLSB_15404(cluster) … feature_property table IP1_1_F11.g1_A002 Feaureprop_idFeature_idType_idvalue 112(sequence) 215(ESTname)IP1_1_F11.g1_ A002 3112(QUAL16o)11 4113(QUAL16e)628 5114(QUAL20o)11 6115(QUAL20e)589 7116(GB_Access)BG946868 feature_property table IP1 Feaureprop_idFeature_idType_idvalue 8219 Library_nameIP1 9220 10221 cultivarBTx623 11222 cell_typeN/A 12223 organDeveloping preanthesis pannicles 13224 strainN/A 14225 OrganismSorghum Bicolor L. feature_property table CLSB_1540 Feaureprop_idFeature_idType_idvalue 16617 Iden_threshold95 17618 Len_threshold20 18628 numofcontig1 feature_property table CTGSB_100848 Feaureprop_idFeature_idType_idvalue 15528 numofseq2

8 EST Feature Relationship Feature relationship table Feature_relationship_id345 subject_id111 object_id5 (contig) 2 (library) type_id26 (is member of)26 rank feature table Feaure_idUniquenameType_id 1IP1_1_F11.g1_A0021 (sequence) 2IP12 (library) 5CTGSB_1008483 (contig) 6CLSB_15404 (cluster) … feature_id 1 (sequence) feature_id 5 (contig) member of feature_id 6 (cluster) member of

9 EST Feature Location feature table Feaure_idUniquenameType_id 1IP1_1_F11.g1_A0021(sequence) 3Q20_11(sequence) 4Q16_11(sequence) … featureloc table Featureloc_idFeaure_idSrcfeature_idfminfmax 13111628 24111589 …… feature_id 1 177811 feature_id 3 feature_id 4 628589

10 Appendix – Example of EST Library IP1 STAGE: N/A FULL_NAME: Immature pannicle 1 CULTIVAR: BTx623 CELL_TYPE: N/A STRAIN: N/A ORGANISM: Sorghum bicolor L. BOTANICAL_NAME: S. bicolor ORGAN: Developing preanthesis pannicles CELL_LINE: N/A COMMENT_FOR_EST: Sequences have been trimmed to exclude PolyA, vector and regions below Phred quality 16. The threshold for high quality sequence is 20. Three-prime sequences, which are obtained with PolyTMix or T7 sequencing primer, are presented as the reverse complement. PUBLISH: Y HOST: N/A SEX: N/A RE_2: EcoRI TISSUE: N/A RE_1: XhoI LIB_NAME: IP1 VECTOR: pBluescript II SK(-) from Lambda Zap II V_TYPE: Plasmid DESCR: The library was made from poly-A RNA in the cloning vector lambda ZAP II. Clones to be sequenced were prepared by mass excision.

11 Appendix – Example of EST Sequence Seqence Name: IP1_1_F11.g1_A002 GenBank Access Number: BG946868 1112131415161718191 1TGAGTTTTTTTTTTTTTTTTTTGTTCTTAATTATTCAATTCATTCATGATACTACTGTCTGCTATTTCCACAGTAAATGTTCATATTACATAGGAGCCAC 101TGGCTCCTCCGGATTCCTTAAAAAAAATGTCCATATTACAATTGGATTTATGATACTACACAGGTTCGCGAAATCGAGCAGGTTAGAAAAGCTTCCACTT 201GCTGACCTCACTAAAAGTGAAACACAGTTCCGGGAAGTTCATACAGTTTTCCCATATAGATCAATTGATCCTATCTGAAACCTTGGATTAGAATGAGATT 301CTCTTACGCGTAGAAACCTAAACCGGAAAGCATTTGCTTTATATCTCTTATCCACTGTAAATGTTTTTCTAAGGAAACGGCTCTCAAACATTTCAGAATT 401CCGAGCATCAAGTAGATTCCAGGTGGAACCTGCATCTGTGCTCCCTTCAAGAACCCAGTCCATTGGATCCCTCTCTGGAGCATCATTAGCTGACATCAAA 501TCATATGACTCCAACTCACAACTTTTGCCAAGCTTGCATTGTATAAATCAGCCAACATCCTTTGGCTCCATCAGGCTCTTCCCATTTGGAAGAATGGATGC 601CGTCAAAAGCTGCTGTTGCAATTCCGATTGGGAGCTGTTCCCTGCTTGCAAGGACTGAACCTGAGCATACTCTGTTCCCCTCTGGGAAATGGTTGCCCTC 701TGTGAAAGAGGTATTANNTCTATAATACTCATATCTCATTACTGCATCCAGTGCTACTGGTAACGCTNAGGATGAGTGGATTGCATTT Length of Sequence: 788 Screened Vector Phred Qulity 20+ START:11 END:589 Phred Qulity 16+ START:11 END:628 Phred Quliaty Below 16

12 Appendix – Example of EST Cluster and Contig 95-20-CLSB_1540 Identity Threshold: 95 Length Threshold: 20 Cluster Name: CLSB_1540 Number of Contigs: 1 CTGSB_100848 Contig Name: CTGSB_100848 Number of Sequences: 2 IP1_1_F11.g1_A002 P1_48_H11.g1_A002

13 Appendix – Example of EST Database insert into db (db_id, name …) values (1, ‘CGGC_UGA’, …); insert into dbxref (dbxref_id, db_id,…) values (1, 1…); insert into dbxrefprop (dbxrefprop_id, dbxref_id, …) values (1,1…)

14 Appendix – Example of Analysis analysis_id1… nameCGGC_01… description… programblast… algorithmcagt_miner… analysisfeature_id12… analysis_id11… feature_id56…


Download ppt "ESTminer CHADO adaptor The University of Georgia Alan Gingle, Yecheng Huang,"

Similar presentations


Ads by Google