Presentation is loading. Please wait.

Presentation is loading. Please wait.

Worldwide Protein Data Bank www.wwpdb.org Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.

Similar presentations


Presentation on theme: "Worldwide Protein Data Bank www.wwpdb.org Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable."— Presentation transcript:

1 Worldwide Protein Data Bank www.wwpdb.org Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

2 Worldwide Protein Data Bank www.wwpdb.org Ligand Processing Ligand Processing Release Processing Geometry CK Validation Geometry CK Validation Calculated annotations (Bio Assem) Calculated annotations (Bio Assem) Corrections (water trans, pro- chiral ck) User Interface WFE/API Requirements Design Progress Tracking/ Status Sequence Processing Module 4.1, 4.2 4.3 4.4 4.5 Delivered May 6, 2010 Annotation Pipeline

3 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting Technical Deliverable Details Master Format. Finalization of Physical Data Exchange Extended API Tracking DB creation/support Extended Work Flow Engine (WFE) Work Flow Manager (WFM) Work Flow Manager User Interface (WFM UI) Annotator graphical interface for sequence module Integration of all components creating the Sequence Processing “module”

4 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting Key Requirements Met  Complete and “correct” entries processed automatically  Sequence mutation – editing and visualization supported  Sequence mismatch – editing and visualization supported  Processing of very large structures, ie. Ribosome  Polymer processing, individual and in complex  Short peptide complex cross reference  Sequence matches sortable by % match  Annotator triggered global ALA/GLY substitutions  Support Self reference for cases with no Uniprot match.

5 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting Future Enhancement List  Automation of “gap” recognition and processing*  Implementation of Uniprot isoform, variant searches for mismatched proteins.*  Validation and checks within the Sequence Editor  Modified residues – support one to many sequence alignments (ie. chromophore)  Chimera processing  Conconavalin A Example (alternate splicing) *PDBe code to be packaged for module integration

6 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting Sequence Module Processing T1 - Initialzation Update workflow status Verify required data inputs Model file Taxonomy assignments T2 - Reference Sequence Search Determine unique polymers Run sequence database search Update reference sequence data files T3 - Assessment Check Author/Coordinate seequence conflicts Check sequence database assignments

7 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting Sequence Module Processing T3 Assessment Succeeds Check author/coordinate sequence conflicts Check sequence database assignments T4 - Update Apply residue mapping Apply database references Create new version of model file T5 - End Update workflow status T3 Assessment Fails Check author/coordinate sequence conflicts Check sequence database assignments T6 -Sequence Editor Interactive residue- level modifications Reference database selections or self- reference Reset taxonomy Run sequence databases search by entity. Add reference sequence by ID Export residue mapping and reference assignments T4 - Update Apply residue mapping Apply database references Create new version of model file

8 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting Under the covers… DP File System Archival Storage Deposition Data Set Id 1 Deposition Data Set Id 2 Depoisiton Data Set Id N Workflow Storage Deposition Data Set ID 1 Workflow Instance WF Inst ID 1WF Inst ID 2 WF Shared Storage WF Namespace A WF Namespace B Deposition Data Set ID 2 Workflow Instance WF Inst ID 3WF Inst ID 4 WF Shared Storage WF Namespace A WF Namespace B

9 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting Data Management ToparchiveD_000001 D_000001_model_P1.cif.V 1 workflowD_000001instanceshared ToparchiveD_000001 D_000001_model_P1.cif.V1 D_000001_seqdb-match_P1.cif.V1 D_000001_seqdb-match_P2.cif.V1 D_000001_seqdb-match_P3.cif.V1 workflowD_000001instanceW_000005 D_000001_model_P1.cif.V1 D_000001_seqdb-match_P1.cif.V1 D_000001_seqdb-match_P2.cif.V1 D_000001_seqdb-match_P3.cif.V1 shared File Prior to Seq. Processing File After Seq. Processing Database Search

10 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting Data Management Toparchive D_00000 1 D_000001_model_P1.cif.V1 D_000001_seqdb-match_P1.cif.V1 D_000001_seqdb-match_P2.cif.V1 D_000001_seqdb-match_P3.cif.V1 D_000001_seqdb-assign_P1.cif.V1 workflow D_00000 1 instanceW_000005 D_000001_model_P1.cif.V1 D_000001_seqdb-match_P1.cif.V1 D_000001_seqdb-match_P2.cif.V1 D_000001_seqdb-match_P3.cif.V1 W_000006 D_000001_model_P1.cif.V1 D_000001_seqdb-match_P1.cif.V1 D_000001_seqdb-match_P2.cif.V1 D_000001_seqdb-match_P3.cif.V1 D_000001_seqdb-assign_P1.cif.V1 shared File System After Seq. Processing Editor Task: New results returned to archival storage …

11 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting Workflow Manager User Interface Workflow engine Session ID + workflowID Domain data archive (local) API Start/Stop Launch module UIs Depositions Remote data – Snap Mirror share Applications Status Data View system activity – Tracking DB Tasks Tracking DB System Architecture

12 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting THE DEMO  A brief walk about the WFM  The System at Work –Selection of a raw file within the WFM –Trigger Sequence Processing Interface  Processing options –Tracking by the WFM of the task status  Blessing of the output

13 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting System Extensibility: Set up for adding New Functionality ProcessRunner ActionRegistry actions.xml Plugin Modules FileUtils PdbxUtils FormatUtils UtilsBase

14 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting Next Steps  Sequence Processing Module –Sequence Processing Module to go into targeted Testing –Modifications to be adopted as prioritized by the team and approved by the PI’s –User Manual development  Ligand Processing –Finalize requirements –Develop Design –Development Module with delivery target end of August

15 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting Process Overview - Ligand Processing Step 1.0 Deposition Format Check Step 1.0 Deposition Format Check Step 3.0 Ligand Processing Step 3.0 Ligand Processing Step 2.0 Sequence Processing Step 2.0 Sequence Processing Step 4.0 Calculation of Derived data Step 4.0 Calculation of Derived data Step 5.0 Corrections Water trans pro- chiral ck Step 5.0 Corrections Water trans pro- chiral ck Step 6.0 Calculated Annotation - Biological Assembly Step 6.0 Calculated Annotation - Biological Assembly Step 7.0 Geometry Ck Validation Step 7.0 Geometry Ck Validation Step 8.0 Release processing Generate Files Step 8.0 Release processing Generate Files Step 9 Send to Authors Step 9 Send to Authors WFE,API, WFM Graphical User Interface

16 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting Ligand Processing – Functional Requirements Annotator exchange – experience with, and analysis of, existing work flows Draft of new TO BE process – Level 1 Annotator Team elaborated - Level 2,3 Annotator Team created decision trees and SIPOCS for all process steps. Annotators documented key Use Cases Annotator Team mapped existing functional software components to the proposed workflow components. Annotator Team created interface mock ups for interactive components

17 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting Ligand Processing – Technical Requirements and Design  Create Plan, identify resources  Tech Team to review the requirements  Review Functional software components  Capture technical requirements  Complete the draft design for the Ligand processing module  Develop module

18 Worldwide Protein Data Bank Common D&A Project March 2010 Project Team Meeting Project Team


Download ppt "Worldwide Protein Data Bank www.wwpdb.org Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable."

Similar presentations


Ads by Google