Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Data Design Implementation and support for Build 2b November 30, 2011 Steve Hughes.

Similar presentations


Presentation on theme: "1 Data Design Implementation and support for Build 2b November 30, 2011 Steve Hughes."— Presentation transcript:

1 1 Data Design Implementation and support for Build 2b November 30, 2011 Steve Hughes

2 Topics Overview Key Requirements and Drivers Build 2b Deliverables Build 2b Deployment Issues Next Steps 2

3 PDS4 Architecture 3

4 Data Architecture Concepts Tagged Data Object (Information Object) Label Schema Used to Create Describes Extracted/Specialized Information Model Data Object Data Element Class has Planetary Science Data Dictionary Expressed As Product Validates

5 Topics Overview Key Requirements and Drivers Build 2b Deliverables Build 2b Deployment Issues Next Steps 5

6 DRIVERS FOR PDS4 Build 2a 6 RECOMMENDATION TO MC (2009)IMPLEMENTATION Replace PDS3 ad hoc information model with a PDS4 information model that is managed using modern tools The PDS4 Information Model has been designed and managed using the Protégé Ontology Modeling Tool. Replace ad hoc PDS3 product definitions with PDS4 products that are defined in the model The PDS4 Products and their components are defined using the modeling tool. The modeling tool provides rigorous definitions. The Product definition is based on the Open Archive Information System (OAIS) Reference Model, an ISO standard. Require data product formats to be derivations from a core set Support transformation from the core set. Four fundamental data structures have been defined. Additional data structures are subclasses of the four fundamental structures. Software written for the fundamental structures is inherited by the subclasses.

7 DRIVERS FOR PDS4 Build 2a 7 RECOMMENDATION TO MC (2009)IMPLEMENTATION Replace “homegrown” PDS data dictionary structure with an international standard. The PDS4 Data Dictionary structure is based on the ISO/IEC 11179 specification. Adopt a modern data language/grammar (XML) where possible for all tool implementations The PDS4 Information model is implemented in XML.

8 DRIVERS FOR PDS4 Build 2a 8 REQUIREMENTIMPLEMENTATION 1.3.X – Provide Data Dictionary The PDS4 data dictionary database was developed and is compliant with the ISO/IEC 11179 specification. It is used to produced both data dictionary documents and data dictionary products for the registry and data dictionary service. 1.4.1 PDS will define a standard for organizing, formatting, and documenting planetary science data The PDS4 Information Model defines the archive organization, data formats, and product labeling standards. The PDS4 Standards Reference documents additional requirements. 1.4.2 PDS will maintain a dictionary of terms, values, and relationships for standardized description of planetary science data The PDS4 Data Dictionary defined the attributes, classes, and relationships for defining planetary science data. 1.4.3 PDS will define a standard grammar for describing planetary science data XML and XML Schema 1.1 have been adopted for the PDS4 implementation.

9 DRIVERS FOR PDS4 Build 2a 9 REQUIREMENTIMPLEMENTATION 1.4.4 PDS will establish minimum content requirements for a data set (primary and ancillary data) The PDS4 Information Model defines observational and ancillary product types. These products are collected into PDS4 Collections and Archive Bundles. 1.4.5 PDS will, for each mission or other major data provider, produce a list of the minimum components required for archival data The PDS4 Information Model defines the archive bundle and its product collections. The archive bundle and its collections are customized for each mission. 3.1.2 PDS will develop and maintain online interfaces for discipline-specific searching The PDS4 Information Model and Data Dictionary defines information that is needed for search. 2.3.1 PDS will develop and publish procedures for determining syntactic and semantic compliance with its standards The adoption of XML and XML Schema 1.1 provide syntactic and semantic standards They provide utilities and tools for validation.

10 Topics Overview Key Requirements and Drivers Build 2a Deliverables Build 2b Deployment Issues Next Steps 10

11 Build 2a Scope Begin supporting PDS4 label design for LADEE and MAVEN; Begin planning/testing migration Support the Policy on Acceptable PDS4 Data Formats Support transition of the central catalog to the registry infrastructure Deploy early PDS4 software tools and services 11

12 Build 2a Deliverables 12 Document/ArtifactProcesses 1 Introduction Data Provider 2 Concepts Document Standards Development 3 Glossary 4 Jumpstart Guide 5 Data Provider’s Handbook 6 Standards Reference 7 Data Dictionary 8 Example Products 10 Generic Schemas 11 Information Model

13 PDS4 Documents in Context Concepts Document Big Picture Standards Reference Requirements User Friendly XML Schemas Blueprints PDS4 Product Labels Deliverables Data Dictionary Definitions PDS4 Information Model Specification Requirements Engineering Specification Informative Data Provider’s Handbook Cookbook derive generates references creates / validates instruct generates references Registry Configuration File Object Descriptions configures generates Registry Product Tracking and Cataloging generates Introduction to PDS4 Documentation Jumpstart Glossary Data Dictionary Tutorial Complete Some TBD Legend

14 Data Format Deliverables vis-à-vis Policy 14 PolicyDeliverable PDS shall accept the following PDS4 data formats: Fixed-width binary and ASCII tables that are composed of identically structured records Table_Base - The Table Base class defines a heterogeneous repeating record of scalars. Table_Character and Table_Binary are defined as types of Table_Base. N-dimensional arrays of homogeneous binary elements (N<=16) Array_Base - The Array Base class defines a homogeneous N-dimensional array of scalars.

15 Data Format Deliverables vis-à-vis Policy 15 PolicyDeliverable Variable-width character 'spreadsheets' that are composed of repeating, M- field, stream-delimited records where the fields themselves are (separately) delimited and may have variable widths (M>0) Delimited_Table - The Delimited_Table class defines a simple table (spreadsheet) with delimited fields and records. It is defined as a type of Parsable_Byte_Stream. NAIF/SPICE files The SPICE_Kernel_Binary and SPICE_Kernel_Text classes describe SPICE files. PDS shall accept ASCII text and PDF/A formats for PDS4 documentation. PDS shall accept JPEG, GIF, and TIFF images for figures accompanying documents. PDS shall accept any of the approved structures and formats for browse products. Product_Document - A Product Document is a product consisting of a single logical document comprised of one or more document formats. ASCII Text and PDF/A are currently allowed as document formats. JPEG, GIF, TIFF, and PNG are allowed as non- science image formats.

16 The Deliverables from 10K

17 PDS4 Model

18 PDS4 Products

19 PDS4 Data Formats 19 Base Extensions/ Restrictions

20 PDS4 Observational Product Identification_Area Cross_Reference_Area Observation_Area File_Area Digital_Object Subject_Area Bibliographic_Reference Mission_Area Node_Area Observing_System Reference_Entry [0..1] [1] [1..*] [0.*] [0..*] [1..*] [0..*] [1] Data_Standards [1]

21 Data Standards Development Process Domain Knowledge PDS4 Information Model Information Modeling Tool Domain expertise was captured in the PDS4 Information Model as an ontology. The model represents a consensus of the domain experts. The model is the single source for the PDS4 Data Standards, for example the generated XML Schemas. Filter and Translator XML Schema (Generic) XML Schema (Generic) XML Schema (Generic) XML Schema (Generic)

22 Topics Overview Key Requirements and Drivers Build 2b Deliverables Build 2b Deployment Issues Next Steps 22

23 Build 2b Deployment Resolve build 2a liens (to be discussed) and generate a build 2b deployment Generate a release of the information model, companion documents and supporting tutorial material Generate new schemas Generate registry configuration information Post key documents to PDS website 23

24 Topics Overview Key Requirements and Drivers Build 2b Deliverables Build 2b Deployment Issues Next Steps 24

25 Chart of Review Comments Total: 1173

26 Total: 1935

27 Build 2a Identified Liens 27 LienBrief Explanation Need to finalize and freeze the information model for Build 2b incorporating high priority changes identified in Build 2a. Address issues found with the information model focusing primarily on the core components of the product labels and the aggregate products, collections and bundles. Need capabilities to support local data dictionary validation and the creation of schema and human-readable definition lists. There is a lack of instructions for creating, validating, and using local keywords and classes (this includes lack of support for generating human- readable definition lists for peer review).

28 Build 2a Identified Liens 28 LienBrief Explanation Need to baseline the current documentation; Need to provide additional information/ changes. Documents are still overlapping, not up to date, inconsistent in areas, and have gaps. Need to finalize and freeze the XML Schema for Build 2b incorporating the extension schemas currently under testing by the DDWG Newer “extension” style schemas are not yet mature enough to be used by an external data provider. They seem to be preferred over the older but stable “flat” schemas that were available for the node exercises. Both are currently produced and produce similar labels.

29 Topics Overview Key Requirements and Drivers Build 2b Deliverables Build 2b Deployment Issues Next Steps 29

30 Build 2b Actions – Jan ‘12 Finalize and freeze the information model for Build 2b incorporating high priority changes identified in Build 2a. Use existing capabilities to support local data dictionary validation and the creation of schema and human-readable definition lists. Baseline the current documentation Add any additional information/ changes to an online resource (e.g., wiki) Finalize and freeze the XML Schema for Build 2b incorporating the extension schemas currently under testing by the DDWG. 30

31 Conclusion The PDS4 Information Model represents the DDWG consensus. A large number of decisions resulting from much discussion were captured in the model. All had a say, not everyone always got their way. On the scheduled date the model will be frozen and the PDS4 Data Standards will be generated and deployed. The schemas, the dictionary, and all other generated artifacts will be consistent with the model. The current consensus, as reflected in the model will be operational. 31

32 Acknowledgements* Ed Bell Richard Chen Dan Crichton Amy Culver Patty Garcia Ed Grayzeck Ed Guinness Mitch Gordon Sean Hardman Lyle Huber Steve Hughes Chris Isbell Steve Joy * Anyone who sat through a DDWG 2-hour telecon or provided useful input. Ronald Joyner Debra Kazden Todd King Joe Mafi Mike Martin Thomas Morgan Lynn Neakrase Paul Ramirez Anne Raugh Mark Rose Elizabeth Rye Boris Semenov Dick Simpson Susie Slavney Peter Allan David Heather Michel Gangloff Santa Martinez Thomas Roatsch Alain Sarkissian

33 Thank You Questions and Answers 33

34 Backup 34

35 Too Many {objects, classes, schemas, …} Abstract (vacuous) classes are used for organizational purposes. These are not included in the schemas and many are being deleted. Subclasses of the four fundamental structures are used to partition the set of allowed structures, for example the Array_2D_Image subclass of Array_Base. Question to be answered, does the PDS want to provide software specific to Array_2D_Image? All Array_Base software works for any Array_2D_Image. 35

36 Too Many {objects, classes, schemas, …} Subclasses of a product component are used to provide specificity, for example, the subclass Bundle_Member_Entry. There are three methods, change the name, change the namespace (new file), or use optional attributes. Some specific subclasses are used for special purposes, for example Table_Field_Checksum in an Inventory. Consider using Schematron Assert statements to validate.. 36

37 Too Many {objects, classes, schemas, …} Some classes result from the process of normalization, for example array_axis and array_element. Emperor Joseph II: …And there are simply too many notes, that's all. Just cut a few and it will be perfect. Mozart: Which few did you have in mind, Majesty? Emperor Joseph II Mozart. 37

38 Action Item Flowchart

39 By the numbers Fundamental Data Structures – 4 Lines of Schema Code Flat 18K Master 4k-6k Classes dropped (Master) – nn SimpleTypes dropped (Master) – 200 Actionable items closed – 1.5K Actionable items open - < 50 Issues from reviews – 1k+. 39

40 Totals InternalIPDAExternalReadinessTotal Narrative114181548 Documentation14315225087632 Actionable115163163 Discussion13764243174 Research85334490 Kudo342429188 System/Tools4632235 Discipline14141029 Process0130114 Total2152994052541173

41 Post Build 2b – Summer ‘12 Develop discipline level classes for the next phase of data set migration Refine the document suite and its organization. Support development of tools scheduled for the next build. Support development of data dictionary and local data dictionary services. 41

42 Capability Matrix 42

43 Capability Matrix 43

44 Capability Matrix 44

45 Capability Matrix 45


Download ppt "1 Data Design Implementation and support for Build 2b November 30, 2011 Steve Hughes."

Similar presentations


Ads by Google