Presentation is loading. Please wait.

Presentation is loading. Please wait.

Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase Eric Just dictyBasehttp://dictybase.org Center for Genetic Medicine Northwestern.

Similar presentations


Presentation on theme: "Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase Eric Just dictyBasehttp://dictybase.org Center for Genetic Medicine Northwestern."— Presentation transcript:

1 Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase Eric Just dictyBasehttp://dictybase.org Center for Genetic Medicine Northwestern University

2 WHY? dictyBase based on SGD Increase flexibility in feature storage Want to use CHADO for feature data, but ‘dicty’ SGD schema for the rest ‘dicty’ SGD (Oracle) needs to link to CHADO Eric Just - dictyBase – Northwestern University

3 Schema porting SQL Fairy did most of this, but Had to tweak Oracle Producer Had to tweak Oracle Producer Object name limited to 30 characters, systematically truncate names Object name limited to 30 characters, systematically truncate names Unique/primary keys on CLOBs (text) not allowed, changed to varchar2(4000) Unique/primary keys on CLOBs (text) not allowed, changed to varchar2(4000) ‘SYNONYM’ reserved name in Oracle, changed name to ‘SYNONYM _’ ‘SYNONYM’ reserved name in Oracle, changed name to ‘SYNONYM _’ Eric Just - dictyBase – Northwestern University

4 Class::DBI Class::DBI provides nice ‘table level’ abstraction CRUD, follow references WITHOUT WRITING SQL Excellent tool for portability GMOD ships with Class::DBI configured for CHADO Had to fix/customize Oracle Driver Eric Just - dictyBase – Northwestern University

5 AutoDBI Package which loads Class::DBI classes for each table Keep class name Chado::Synonym but call set_up_table( ‘synonym_’ ) Made ‘residues’ a ‘lazy’ column of Chado::Feature No other Significant porting needed Eric Just - dictyBase – Northwestern University

6 Data Migration GFF3 CHADO Export chromosome sequences and locations in GFF3 Load GFF3 into CHADO schema Update references to features with new tables and id’s ‘dicty’ SGD Eric Just - dictyBase – Northwestern University

7 ‘rows’ method does not exist in Oracle DBI Driver GBrowse porting my $rows_returned = @{$sth->fetchall_arrayref()}; $sth->execute or Bio::Root::Root->throw(); if ( $rows_returned == 0) {…} if ($sth->rows() == 0) {…} Oracle fetchrow_hashref() is case sensitive $sth->fetchrow_hashref() $sth->fetchrow_hashref("NAME_lc") Eric Just - dictyBase – Northwestern University

8 GBrowse porting - Queries select f.feature_id, f.name, fl.fmin,fl.fmax from feature f join featureloc fl using (feature_id) where f.feature_id = 221659 and fl.rank=0; select f.feature_id, f.name, fl.fmin,fl.fmax from feature f join featureloc fl on f.feature_id = fl.feature_id where f.feature_id = 221659 and fl.rank=0; Oracle does not like anything in a ‘using’ clause to also be in the ‘where’ clause ‘substring’ becomes ‘substr’ Any SQL containing synonym table must be modified Any procedural SQL must be reproduced, in some cases this can be avoided Eric Just - dictyBase – Northwestern University

9 Tuning Added is_deleted flag to feature table Added some audit columns Added audit table and triggers Created Indexes Heuristically Added hints to some difficult queries Eric Just - dictyBase – Northwestern University

10 ‘Dbtable’ database abstraction layer Various middleware and presentation objects dictyBase Object Model dictyBase Presentation Layer Integrating into dictyBase I ‘dicty’ SGD Eric Just - dictyBase – Northwestern University

11 ‘Dbtable’ layer dictyBase Object Model dictyBase Presentation Layer Class::DBI layer Integrating into dictyBase II ‘dicty’ SGD CHADO Various middleware and presentation objects Eric Just - dictyBase – Northwestern University

12 dictyBase Objects Retrieve, insert, update, delete Interface ignorant of schema No presentation in data classes Easy to use interfaces Tuned with lazy evaluation most accessors 75 – 80% unit test coverage Eric Just - dictyBase – Northwestern University

13 Use BioPerl Use Bio::Seq to represent sequences Use Bio::SeqFeatures to represent transcript and alignment locations Harness the power of BioPerl for sequence tasks, file generation NOTE: BioPerl only used for sequence and location Eric Just - dictyBase – Northwestern University

14 Class Diagram Feature AlignedmRNAContigChromosome getOverlappingFeatures() getOverlappingAlignments() Bio::SeqFeature::Gene::TranscriptBio::SeqFeature::Generic Bio::Seq Eric Just - dictyBase – Northwestern University

15 Object use case: Add an Exon, dbxref, and Description #!perl use dicty::Feature; my $transcript = new dicty::Feature( -feature_no => 218420 ); $transcript->description( ‘Gene model derived from AU12345' ); $transcript->add_external_id( -source => ‘GenBank Accession Number', -id => 'AU12345' ); $bioperl = $transcript->bioperl(); [$bioperl->exons()]->[2]->start( 281050 ); my $exon = Bio::SeqFeature::Gene::Exon->new( -start => 280921, -end => 280959, -strand => -1 ); $exon->is_coding(1); $bioperl->add_exon($exon); $transcript->update(); Eric Just - dictyBase – Northwestern University

16 GenBank file GenBank file Using Apollo Request segment through SOAP message over HTTP Object layer generates GenBank File Modify, in Apollo send changed gene models back via SOAP Adaptor changes gene models and updates the database Chado Object layer Send GenBank File via SOAP message GenBank file GenBank file Eric Just - dictyBase – Northwestern University

17 New Curation Tools Gene and Feature curation had to be rewritten ‘Gene centric’ curation Added more evidence qualifiers Presentation classes that manipulate Object Layer Eric Just - dictyBase – Northwestern University

18 Where Are We Going Utilize the flexibility – New Feature Types, feature relations, and SO. Contribute back to GMOD Gradually port different areas into CHADO Provide feedback and testing ground for database independence Eric Just - dictyBase – Northwestern University

19 Acknowlegments dictyBasePIs Rex Chisholm, PhD Rex Chisholm, PhD Warren Kibbe, PhD Warren Kibbe, PhDProgrammer Sohel Merchant Sohel MerchantCurators Petra Fey Petra Fey Pascale Gaudet, PhD Pascale Gaudet, PhD Karen Pilcher Karen Pilcher Bioinformatics Core at Northwestern Other Groups Funding NIH (NIGMS and NHGRI) NIH (NIGMS and NHGRI)SGDGMOD CHADO CHADO GBrowse GBrowse Apollo ApolloBioPerl Eric Just - dictyBase – Northwestern University


Download ppt "Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase Eric Just dictyBasehttp://dictybase.org Center for Genetic Medicine Northwestern."

Similar presentations


Ads by Google