Presentation is loading. Please wait.

Presentation is loading. Please wait.

PepcDB Reporting at CESG: More Trials and Fewer Tribulations PPCW Bottlenecks Meeting 20 March 2007 Craig A. Bingman (U54 GM074901-01.

Similar presentations

Presentation on theme: "PepcDB Reporting at CESG: More Trials and Fewer Tribulations PPCW Bottlenecks Meeting 20 March 2007 Craig A. Bingman (U54 GM074901-01."— Presentation transcript:

1 PepcDB Reporting at CESG: More Trials and Fewer Tribulations PPCW Bottlenecks Meeting 20 March 2007 Craig A. Bingman (U54 GM074901-01 P50 GM064598 JLM, P.I.)

2 CESG Bioinformatics George N. Phillips Jr.:Faculty Executive Craig Bingman: Section leader Xiaokang Pan: PepcDB, domains Gary Wesenberg:Scoring, RT, PDB Bryan Ramirez:System administrator Tony Kamenick:Assistant sysadmin Sesame John L. Markley:CESG P.I. Zsolt Zolnai:Sesame Project Managment John Primm:Project Manager David Aceti:QA, Sesame “Lab Master” All CESG Team Members

3 TargetDB vs. PepcDB TargetDB was conceived early/pre-PSI-1 as a mechanism for avoiding duplication of effort between structural genomics centers. –Asynchronous communication between centers and NIH. TargetDB communicates project status of target only. TargetDB is single-threaded. TargetDB was not meant to communicate information to the outside scientific community. PepcDB was conceived as a mechanism for communication of scientific details between centers and the outside world. –Asynchronous communication with the outside world. PepcDB communicates target status, protocols and timeline of efforts. PepcDB is multi-threaded. PepcDB is a contractural obligation for all PSI-2 centers. Along with structures deposited in PDB, and the materials repository, PepcDB will be one of the enduring legacies of PSI.

4 CESG PepcDB, Past and Present Year 2-3 data –Successful implementation of Sesame (hierarchical relationships between db items.) –TargetDB-centric, single-threaded view –Targets were constrained to exist in one workgroup from selection to structure solution. –Protocols were primitive. Year 4-5 data –Protocols became more descriptive. –Protocols described multiple pipeline stages. –Targets moved through multiple workgroups –Pipeline was assumed to move unidirectionally from Selection->Deposition PSI-2 data –Atomic protocols describing single pipeline stage. –Pipeline is multipass, multithreaded, characterized by extensive salvage. –Targets move back to vector selection, from initial selection, PCR and entry vector –Pipeline is non-deterministic, adaptive, dynamic to maximize success.

5 Failure of CESG PepcDB, Mark 1 Codebase had grown by accretion, not design. Code assumed linear, forward progression through pipeline stages. More than half of the code was devoted to data entry error trapping/handling. Global reset was required to handle new pipeline practices, dominated by multipath cloning strategy, multipath expression strategy, salvage intensive operation. New conceptualization of our PepcDB reporting was required. Core concept: Well-formed PepcDB = finite, directed, acyclic graph. Database items = nodes Directed links = edges Data in Sesame needed to be corrected.

6 Visualization Tool for Graphs dot, a language for describing graphs dot has a very simple syntax digraph G { A -> B -> D; A -> C; } dot has powerful layout minimizers to display hierarchical graphs Implementations are available for perl, python, java, others CESG has used the perl variant of dot/Graphviz to produce plots of linkages between database items.

7 Digraph G { A -> B; A -> C -> D; } Digraph G { A -> B -> D; A -> C -> D; } Digraph G { A -> B -> D; D -> A; }



10 CESG PepcDB Stats Protocols68 Targets7553 Trials14044 Protocol Instances57195 Each target has on average two trials Each trial has on average about four protocols

11 PSI PepcDB Toolkit Project database capable of establishing hierarchical relationships between units of work. Establish master database that manages unique keys for work units. Implement barcodes (e.g. ZPL) that extend database to physical items. Implement atomic protocols and associated actions. Develop tool set for visualizing data. Develop code capable of assembling lists of parent-child units of work, protocols, actions. Rehearse data entry prior to pipeline implementation of new techniques. Reach project-wide agreement on definition of actions and how to link units of work.

12 Future Push towards zero errors in PSI-2 PepcDB. Continue correcting PSI-1 data. Implement data visualization tools in Sesame. Expand the scope of data reported to PepcDB. –Report all crystallization trials (year 5-> now) –Consolidate and report data for new tags (elemental analysis, mass, etc.) Switch over to Sesame for PepcDB report generation.

Download ppt "PepcDB Reporting at CESG: More Trials and Fewer Tribulations PPCW Bottlenecks Meeting 20 March 2007 Craig A. Bingman (U54 GM074901-01."

Similar presentations

Ads by Google