Presentation is loading. Please wait.

Presentation is loading. Please wait.

Memops Data modelling and automatic code generation Edinburgh 9 September 2008.

Similar presentations


Presentation on theme: "Memops Data modelling and automatic code generation Edinburgh 9 September 2008."— Presentation transcript:

1 Memops Data modelling and automatic code generation Edinburgh 9 September 2008

2 Memops - main points ■Code generation framework ■Data access subroutine libraries ■Fully automatic code generation from model ■Several programming languages in parallel ■Precise, detailed, validated data

3 Memops ●Introduction ●Code generation ●Generated libraries ●Applications of Memops

4 The CCPN Project ■Collaborative Computing Project for NMR ■Since 1999 ■Unifying platform for NMR software similar to CCP4 for X-ray crystallography ■Community-based, open-source, software development ■Code generation, data model, applications, meetings

5 NMR Structural Biology Pipeline Sample Preparation NMR Machine Structure Calculation Data Processing Spectrum Analysis Repository Database Slow, complex, interactive

6 Native Anarchy Convert Task1 Task2 Convert Task2 Task1 Convert Task3 Convert Task3 Convert Task3

7 With Data Standard Data Standard Convert Task1 Convert Task2 Task1 Convert Task1 Convert Task3 Convert Task3 Convert Task3

8 Data standard - objectives ●Lossless data transfer between programs - different approaches and architectures ●All data needed for pipeline software ■Creating data, not analysing end results ■Intermediate results needed ■Comprehensive, detailed, complex ●Completeness, integrity of changing data ●Precisely defined standard ■A single central description ■Validation directly against standard

9 ■Standard API, no stable format ●easier to maintain as model changes ■Abstract data model ●Exact correspondence to APIs ■API implementations for several languages ■Transparent access to XML or DB storage ■Complete validation of model rules and constraints CCPN approach

10 Memops ●Introduction ●Code generation ●Generated libraries ●Applications of Memops

11 ■Model will change over time ●Several parallel implementations ●Synchronisation between APIs and model ●Maintenance and debugging ●Resources are limited ■Automatic Code Generation ●Write and debug once and for all ●Any domain, from Astrophysics to Zoology ●Quick and simple to extend model ■E.g. Application-specific packages Automatic Code generation

12 Code Generation Framework Domain Experts MEMOPS framework Software Developers User Documentation Application Deposition APIs Python Java C Storage SQL XML Handcoded(< 1%)‏ UML Model Package 1 Package 2 Package 3 Autogeneration Wrappers

13 Code Generation ObjectDomain UML data edit UML MetaModel In-Memory Model Python objects On-disk model XML file API code Schemas Mappings etc. Autogeneration CCPN code Off-the-shelf files CCPN generated Legend: Export

14 API generator ModelTraverseTextWriter ApiGenPyLanguage PyFileApiGen FileApiGenPyApiGenPyType Written in Python Modular Different generators share code

15 Memops ●Introduction ●Code generation ●Generated libraries ●Applications of Memops

16 Model features ■Packages to subdivide model, code, and data files ■Objects. Unique context, compare-by-identity ■Complex data types. Different contexts, compare-by-value ■Simple data types, PositiveInt, enumerations, … ■Attributes and links: ●Cardinality, frozen/modifiable, derived ●Unique/ordered collections (sets, lists, unique lists) ■Ad-hoc constraints on attributes, simple and complex datatypes, and objects.

17 Molstructure model package

18 CCPN APIs ■ Application Programming Interface ●Object oriented ●Data accessed in memory as if stored in the data model ■Implementations come with: ●Integrated, transparent I/O (file or database)‏ ●Complete validity checking ●Protection against casual change (data encapsulation) ●Versioning and backwards compatibility ●Event notifier system ●Slot for application-specific data

19 Science code User Interface Utility functions Python+XML at runtime Python API XML I/O code XML I/O mappings Data Storage XML files User application Data get, set. Validity check Generic XML read/write User data in CCPN XML format What to do for which element CCPN code Off-the-shelf Application code files CCPN generated Legend: XML parser

20 Java+DB at runtime CCPN code Off-the-shelf Application code files CCPN generated Legend: HQL Science code User Interface Utility functions Java API Hibernate Hibernate mappings Database Presentation layer Database Schema Hibernate Optional Custom queries (Hibernate Query Language)

21 Now Available ■Version 2.0 just released ■Python+XML, Java+XML, C+XML Java+DB (with Hibernate) ■Available under GPL license from Sourceforge or www.ccpn.ac.uk ■CCPN Data Standard: ●NMR, Macromolecules, LIMS ●46 packages ●552 classes and data types ●Python+XML implementation 800,000+ lines of code

22 Memops ●Introduction ●Code generation ●Generated libraries ●Applications of Memops

23 CcpNmr Suite ■Analysis ●Interactive NMR analysis ■FormatConverter ●Convert between 30+ NMR and structure formats ■Built on top of CCPN model (Python+XML) ■Version 2.0 released ■Widely used in macromlecular NMR

24 CcpNmr Analysis

25 ExtendNMR NMR pipeline ■Integrated macromolecular NMR pipeline - from sample to structure ■Pre-existing programs from 8 groups ■In-memory conversion to internal data structures ■Integrated versions released: ●ARIA (NMR structure generation) ●Bruker TOPSPIN, Manufacturers processing/analysis package

26 BIOXDM ■Software pipeline for on-synchrotron crystallography ●Exploit new technology (  goniometers) ●Experiment optimisation, acquisition, and on-line processing ■Independent data model, with Memops machinery ■Java+DB implementation for runtime concurrent access

27 EUROCarbDB ■Distributed deposition database ●Glycobiology and glycomics ●NMR, MS, HPLC and topology ■Java. Database storage using Hibernate ■CCPN model Java+DB implementation slot in as-is

28 Funding acknowledgements ■BBSRC CCPN grants ■European Union grants ●EXTEND-NMR, EU-NMR, NMR-Life, NMRQUAL, and TEMBLOR contracts ■Industry support ●AstraZeneca, Dupont Pharma (now BMS), Genentech, GlaxoSmithKline ●Peter Keller (BIOXDM) thanks Synchrotron ‘Soleil’, the Global Phasing Consortium and EU FP6 ‘BIOXHIT’

29 People ■Authors: Prof. Ernest Laue, Wayne Boucher, Rasmus Fogh, Tim Stevens, John Ionides, Wim Vranken (EBI), Peter Keller (Global Phasing) ■Collaborators at U. Cambridge: Dan O’Donovan, Wolfgang Rieping, Alan da Silva, Darima Lamazhapova ■Collaborators at EBI (MSD), Hinxton: Kim Henrick, Anne Pajon, Chris Penkett ■Special thanks to: Bruker Biospin GmbH (TOPSPIN), Michael Nilges (ARIA), Bas Leeflang (EUROCarbDB; FP6 contract RIDS-CT-2004- 01195

30 END

31 Overview ●Packages ●The Implementation package ■Objects ■DataTypes and DataObjTypes ●Access control

32 ARIA – structure generation from NMR data Custom conversion ARIA Data Model CCPN Data Model CCPN XML Application ARIA XML ■ARIA imports ●Peak Lists ●Constraints ●Sequences ●Chemical shifts ■ARIA exports ●Peak Assignments ●Filtered Constraints ●Violations ●Structures

33 API functions ■‘get’ and ‘set’ (Attributes and links)‏ ■‘add’ and ‘remove’ (Collection attributes and links)‏ ■‘sorted’ (Unordered collection links)‏ ■‘findFirst’ and ‘findAll’ (Collection links)‏ ●Simple filtering (attribute == value)‏ ■create and ‘new’ (Objects)‏ ●Normal and ‘factory function’ object creation ■delete (Objects)‏ ●‘Delete’ function – cascades to objects rendered invalid by deletion ■checkValid, checkAllValid (Objects)‏ ■API classes are strongly coupled. For efficiency reasons object-to-object links are two-way.

34 FormatConverter - The NMR Translator CCPN Data Model PeaksChemical shifts Acquisition parameters XEasyNmrViewXEasyNmrViewBrukerVarian... Generic peak converter Generic chemical shift converter Generic acquisition parameters converter Processing parameters XEasy NmrViewNMRPipeAzara... NmrView Format specific readers Data model entry Format specific writers Chemical shiftsPeaks

35 ExtendNMR: ARIA ■Structure generation from macromolecular NMR data, ambiguous distance constraints ■One of two leading programs ■Python and scripts, with CNS dynamics engine ■All input and output integrated to CCPN standard

36 ARIA: CCPN object selection

37 ExtendNMR: Bruker TOPSPIN ■NMR processing program of major NMR instrument company ■Java. In-memory conversion to CCPN Java+XML implementation ■CCPN output in current TOPSPIN release, Expanded in upcoming release.

38 Data Model v. Data Format Atom_IDelementNameBond_IDAtom_IDBond_IDbondOrder Relational Database : Abstract model (UML) : XML :. AtomBondAtom_Bond_Connect Atom +elementName: String = C Bond +bondOrder: Float = 1.0 * 2+bonds +atoms

39 Packages

40 ■Partition model, code, and data ■Import each other ■Can be omitted ■All import Implementation and AccessControl ■Each have a TopObject ■No links between data from rival Topbjects (different extents of data)‏

41 Root and TopObjects

42 TopObjects ■One in every package ●Ultimate parent to all objects in package ■Have globally unique identifier (‘guid’)‏ ■currentXyz links from root ■Links can constrain links between descendants ■In file implementations: ●Hold links to storage and backup locations ●Live in Implementation as almost empty shell

43 Overview ●Packages ●The Implementation package ■Objects ■DataTypes and DataObjTypes ●Access control

44 CcpNmr Analysis ■NMR Assignment Program ●Inspired by ANSIG and Sparky ●Demonstrates CCPN approach ●Modern interface and scripting ●Scalable and extensible ■Operating Systems ●Linux, Sun, SGI, OSX, Windows ■Languages ●Python ■Data model interaction ■Tk Graphical interface ■Scripting ●C ■OpenGL/Tk contours ■Structure display ■Mathematical operations

45 Implementation Package ■Model and Code: ●Supertypes that define all objects ■Objects ■DataTypes ■DataObjTyps ●Basic data types ■Data – how to access the real data: ●Data location pointers ●Current-package pointers ●Implementation data are not part of the data set, and are not in the database. ●Represent view or session?

46 Data Location

47 Objects and their Supertypes

48 Simple Data Types Boolean DataType Int DataType Float DataType String DataType Line DataType Text DataType Long DataType Double DataType Word DataType PositiveInt DataType SingleLine DataType NonNegativeInt DataType Dict DataType DateTime DataType StringKeyDict DataType Any DataType Token DataType NonNegativeFloat DataType FloatRatio DataType PositiveFloat DataType SpacelessString DataType LongWord DataType PositiveDouble DataType NonNegativeDouble DataType UrlProtocol DataType

49 Complex Data Types


Download ppt "Memops Data modelling and automatic code generation Edinburgh 9 September 2008."

Similar presentations


Ads by Google