Presentation on theme: "The GMOD Project Lincoln Stein Cold Spring Harbor Laboratory."— Presentation transcript:
The GMOD Project Lincoln Stein Cold Spring Harbor Laboratory
Test Subject: Michael Caudy oDrosophila neurobiologist oProneural differentiation onotch pathway oHLH transcriptional activators/repressors oachaete/scute complex oNo computer science training oTook my “bioinformatics for biologists” course
“Simple” Problem oDiscover the transcriptional factor binding site code controlling proneural differentiation.
Regular Expression Search oUsing achaete promoter as exemplar, search for combinations of known binding sites in particular architectures
Mike’s Got Lots of Data o90-11,000 TF binding site clusters o100s-1000s of genes omillions of interactions oWhich genes are involved in neural differentiation? oWhich have interactions with the pathway? oWhich have suggestive mutant phenotypes?
Mike Needs a Database oDatabase management system for proneural differentiation genes. oVisualization/exploration tools for relationship of genes to putative TF clusters. oLiterature citations oLink out to FlyBase, Genbank & other DBs. oAdd notes and other annotations.
Try to do it with Filemaker o“Cluster-centric” vs “gene-centric”? oData import from FlyBase? oStoring images? oMaintaining relationships between genes & clusters? oUpdates?
Mike Needs a MOD oModel Organism Database oRepository for reagents oStocks, vectors, clones oGenetic & physical maps oLarge-scale data sets oGenome oEST sets, microarray results, 2-cell hybrid interactions oLiterature oOntologies & Nomenclature oMeetings, announcements
How WormBase Works ACeDB Images, Movies Database access library Web server Perl scripts You MySQL Genomic Data
Can Mike reuse WormBase to manage his data? No!
Sorry Mike oWormBase website difficult to install oData model nematode-centric oData entry tools very process- specific oCustomization difficult oSoftware documentation uneven oStandard operating procedure documentation uneven
MOD Redux oSGD, MGD, FlyBase, TAIR, RGD… oThe same basic idea as WormBase oImplementation entirely different oWheel reinvented many times oLittle software sharing oThis madness must stop!
The GMOD Project oPortable, open source software to support model organism databases oMultiple MODs involved oWorm, fly, yeast, mouse, arabidopsis, rat, monocot, [fugu], [E. coli] oFunded by NIH as of June 2002 oProgrammers, coordinator, quarterly meetings http://www.gmod.org
The GMOD Pyramid Open Source DBMS & Middleware Modular Schema Modular Applications
A MOD Construction Set genome genetic maps liter- ature genomes Middleware Layer Database Layer Appplication Layer mapscitations genome browser genome editor map browser map editor citation browser citation editor Bioperl BioJava BioPython annotation pipeline
Chado – Modular Schema oCommon schema for use by FlyBase and WormBase oOntology Driven oSmall number of generic tables e.g. “feature” oControlled vocabulary names object types and relationships among them: o“achaete protein is a HLH activator” o“m8 protein inhibits achaete transcription” oEvidence-Savvy
Apollo Data adapters oParser -> data models -> display oExisting data adapters oGAME XML oGFF oEnsembl CGI server oDAS oWrite your own data adapter! oExtend AbstractDataAdapter class oDisplay options defined in config file
Who is Using Apollo? oBDGP oReannotated Drosophila genome oBristol-Myers Squibb oLaunching Apollo from web browser via mime types oGNF oJDBC adapter layer over BioSQL oBiogen oView human genome alignment between public and Biogen internal database oConnected BLAT pipeline to Apollo oHGMP-RC Fugu Genomics group oDisplaying annotations on fugu scaffolds
Extensively Customizable oEnd-user oTurn tracks on and off, change order, change packing & labeling attributes (stored in cookie) oData provider oChange fonts, colors, text. oChange overview – genetic map, contigs, coverage, karyotype. oDefine new tracks using simple config file. oTinker with track appearance to hearts content.
Adding a New Track (a) Create a GFF file named “deletions.gff” Chr1 targeted deletion 1293224 1294901... Deletion d101k2 Chr1 targeted deletion 8239811 8241116... Deletion d680k2 Chr2 targeted deletion 5866382 5866500... Deletion d007k2 (b) Run the load_gff.pl script > load_gff.pl –d example_database deletions.gff Loading features… Done. 3 features loaded. (c) Add a new track “stanza” to the gbrowse configuration file [Knockout] feature = deletion glyph = span fgcolor = red key = Knockouts link = http://example.org/cgi-bin/knockout_details?$name citation = These are deletion knockouts produced by the example knockout consortium (http://example.org/knockouts.html)
Who is Using GBrowse? oGMOD Members oWormBase, FlyBase, RatDB oHGMP-RC Fugu genomics group oKEGG (multiple microorganisms) oIngenium AG (mouse) oBristoll-Myers Squibb (drosophila) oTexas A&M University (salmonella) oMcGill University (human chr7) oInstitute of Systems Biology (human)
Essential Pieces in Progress oGeneric MOD web site oStrain & phenotype curation tools oPathway tools and browsers oTree (e.g. phylogenetic) tools & browsers oBiopipe – genome annotation pipeline
Find out more about GMOD oGo to www.gmod.org oExamine software matrix oFind a project you’re interested in oContact project leader oOr contact Scott Cain: firstname.lastname@example.org oOr mail email@example.com
Credits CSHL Adrian Arva Shuly Avraham Scott Cain Ken Clark Allen Day Xiaokang Pan BDGP Nomi Harris Suzanna Lewis Chris Mungall John Richter ShengQiang Shu Colin Weil http://www.gmod.org EBI Michele Clamp Stephen Searle Carnegie Institute Sue Rhee Danny Yoo Harvard David Emmert Stan Letovsky Cornell Medical School Michael Caudy