Presentation is loading. Please wait.

Presentation is loading. Please wait.

Worldwide Protein Data Bank www.wwpdb.org wwPDB Common D&A Project January 28, 2010 Steering Committee Project Update.

Similar presentations


Presentation on theme: "Worldwide Protein Data Bank www.wwpdb.org wwPDB Common D&A Project January 28, 2010 Steering Committee Project Update."— Presentation transcript:

1 Worldwide Protein Data Bank www.wwpdb.org wwPDB Common D&A Project January 28, 2010 Steering Committee Project Update

2 Worldwide Protein Data Bank Common D&A Project January 2010 Update Update report  Status of D&A initial production deliverable: –Sequence Editor tool development –Integration within existing pipelines  Status of WF infrastructure initial implementation: –Sequence Processing components (external search, internal analysis etc) integrated by WF engine and manager into the “new” Sequence Processing Module. –Integration of Sequence Processing Module into existing pipeline. RECONSIDER Timeline Estimate and Strategy  Next Phase –Ligand Processing: Planning

3 Worldwide Protein Data Bank Common D&A Project January 2010 Update Overview of deliverable status for: Sequence Editor tool Deliverable timelines have been extended to enable full response to user testing input (expanded requirements) and to ensure development to agreed upon design.  Completion of Interface with additional prioritized requirements - projected Feb 15  Integration within current production pipelines –Initial implementation of Master Format and format conversion support  In Use by annotators by Feb 25

4 Worldwide Protein Data Bank Common D&A Project January 2010 Update Sequence Editor Tool Technologies and Standards  Model View Controller (MVC) Design – –Separates data/application from presentation as much as possible  Client/Server protocol –  AJAX using JSON protocol  REST style service definitions  Server –Apache with embedded WSGI (mod_wsgi)  Application – –Python with C++ extensions (Boost/Python) All the good acronyms!

5 Worldwide Protein Data Bank Common D&A Project January 2010 Update Sequence Editor Tool Architecture for Current and Future Deployment Sequence Data Store Current DP Pipeline WFE/WFM Sequence Editor Tool Annotated Sequence Data Future Workflow DP Pipeline PDB/FASTA PDBx/PreBlast PDB/PDBx WFE/WFM Sequence Editor

6 Worldwide Protein Data Bank Common D&A Project January 2010 Update Accomplishments  Annotator graphical interface for Sequence Editing –Prototype evaluation and prioritization of additional requirements by Annotators at all sites completed Jan 12 –Expanded functionality development expected to be completed and available for user testing Feb. 15, including:  Implements the capability to incrementally undo a process step (UNDO)  Summarization of sequence conflicts  Global editing features  Integration of this Sequence Editor tool (interface) into the existing data processing pipelines (Feb 26) –Input accepts existing sequence data files at PDBe and RCSB (e.g. PDBx + Blast report or PDB + FASTA) –Output integration via intermediate file to be integrated via Maxit

7 Worldwide Protein Data Bank Common D&A Project January 2010 Update Accomplishments  Master Format implementation (for current data model) –PDB to Master Format translation working with MAXIT  Final Test at PDBe –Validation and testing at all sites. –PDBj creation of new tool for Master Format Validation with extended diagnostics. –Issues with Master Format will be ongoing - with evolution of the PDB format, Hybrid methods etc.

8 Worldwide Protein Data Bank Common D&A Project January 2010 Update Sequence Editor Tool Development Lessons Learned  Iterative development and active Annotator involvement is essential – and takes time.  Addressing integration issues with existing systems in terms of modularity, process ordering and data availability poses significant challenges.  Agile process of development and planning supports adaptation to evolving requirements.  We will need to further consider the most efficient level of granularity for the deployment of new functionality in existing systems in future planning.

9 Worldwide Protein Data Bank Common D&A Project January 2010 Update Design Convergence Accomplishments Master Format, API, WFM, WFE, UI Distributed development on a complex project is challenging Tag team development of WFE and API’s  Straw men articulation – flush out WFE/API requirements for representative Use Cases  WFE pseudo code developed against straw men.  API integration layer will be developed against this pseudo code.  WFE will then be implemented against the API

10 Worldwide Protein Data Bank Common D&A Project January 2010 Update Accomplishments: WF infrastructure - Integration of Sequence Processing  Tracking and Status DB developed and installed at RCSB and PDBe for development purposes.  Work Flow Manager (WFM) –Prototype user testing on-going –Requirements refined and prototype updated –Infrastructure complete – to be deployed for testing this week  Work Flow Manager User Interface (WFM UI) –User prototype created, input received and prototype enhanced –Initial Level 1 annotator interface signed off by annotators –Level 2/3/4 interfaces prototyped and under review –Level 3 /4 under further development

11 Worldwide Protein Data Bank Common D&A Project January 2010 Update PDBe resource  Workflow XML –Luana/Tom : 1 day total to complete annotator requirements  WFE component supporting Sequence Processing : –Tom, 1-2 days per week ongoing, estimating 5-6 days (3 actual weeks) to complete after all api’s are in place  WFM –Luana : currently full time – work is being prioritised to define the subset of requirements to be delivered in March.  Web resources : interfaces and WFM –External services –technology requirements have been defined. Timeline tbd. Critical Path.  Other resources –Wim : python expertise –Swanand : python expertise (after 13 th Feb) – fall-back

12 Worldwide Protein Data Bank Common D&A Project January 2010 Update RCSB Resources  Web Tools - –Currently supporting development and alpha-testing sites –Will add production site for Feb deployment  Database Support – –MySQL database server for status and tracking database  Application Support –Project SVN code repository –JIRA issue tracking system –Project documentation and information site (Drupal) –Automated build system for API and application tools  People – –Vladimir – API and build system (Python/C++) –Li – DB system and status and tracking API (Python/SQL) –Rahip – Sequence Editor Tool (Javascript/CSS) –Zukang/Raul/John – DP applications (C++/Python)

13 Worldwide Protein Data Bank Common D&A Project January 2010 Update Updated Timeline Summary Sequence Processing 1. Sequence Editor Tool –Completion of Interface with prioritized additional requirements and beginning of final user testing - projected Feb 15 –Integration with current pipelines using Master Format In test by annotators by Feb 25 –In production – best estimate early March 2. Integration of Sequence processing components with new architecture (WFE/API and WFM) –User testing – April 3. Integration of module into Pipeline –Plan by end of March

14 Worldwide Protein Data Bank Common D&A Project January 2010 Update Competing/Complementary Priorities  Address On-going data quality issues and remediation  Three Validation task forces –Implementation of recommendations  New PDB Format – with the next 6 months?  De-programming Kim –For Ligand Processing: timeline end of March – early April Other strategic considerations  Stakeholders –Stress testing of new solutions against expectations and existing solutions must be managed and will take some time.

15 Worldwide Protein Data Bank Common D&A Project January 2010 Update Next Phase - Timeline Ligand Processing  Requirements –Plans in place for Annotator exchange –March requirements consolidation, initial design plan –March create overview plan and initial timeline  Kick off development  Deployment –Strategy to be defined based on current and ongoing lessons learned.

16 Worldwide Protein Data Bank Common D&A Project January 2010 Update Things that have kept us up at night  These are cornerstone deliverables requiring intense study and design consideration – beyond the proof of concept. –Organization of data, communication protocols, etc. –Clear consensus of design features has required an evolution of understanding – requiring wetting of hands  Ramp up of skill sets: Python, mmCIF (PDBe),  EBI External services: web-service set up  Site specific integration challenges  Resource issues

17 Worldwide Protein Data Bank Common D&A Project January 2010 Update BACK UP SLIDES

18 Worldwide Protein Data Bank Common D&A Project January 2010 Update Data and Application API Design  Unified Python language implementation  Provides all access to data and applications for the workflow manager and workflow engine  Subcomponents of the API provide access to: –Data objects and data values –Applications and tools –Tracking and status information –Site level configuration information

19 Worldwide Protein Data Bank Common D&A Project January 2010 Update Deliverable update: WFM Design Functional Architectural design  Will present progress and tracking information  Will start/stop and restart the workflow engine in executing data processing tasks  Will work in a fully distributed web-based mode  Will provide a launch point for tasks requiring interactive or graphical interactions. Two modes defined – Immediate mode – all processing occurs in a single session (simple case). Deferred mode – requests for input are registered with the workflow manager for later processing by annotator

20 Worldwide Protein Data Bank Common D&A Project January 2010 Update Process Overview With GO BACK functionality


Download ppt "Worldwide Protein Data Bank www.wwpdb.org wwPDB Common D&A Project January 28, 2010 Steering Committee Project Update."

Similar presentations


Ads by Google