Presentation is loading. Please wait.

Presentation is loading. Please wait.

Superset Me—Not: Why the JPTS Is Sufficient if You Use Appropriate Layer Validation Alexander (“Sasha”) Schwarzman American Geophysical Union (AGU) JATS-Con.

Similar presentations


Presentation on theme: "Superset Me—Not: Why the JPTS Is Sufficient if You Use Appropriate Layer Validation Alexander (“Sasha”) Schwarzman American Geophysical Union (AGU) JATS-Con."— Presentation transcript:

1 Superset Me—Not: Why the JPTS Is Sufficient if You Use Appropriate Layer Validation Alexander (“Sasha”) Schwarzman American Geophysical Union (AGU) JATS-Con November 2, 2010

2 Summary We have built a superset of the NLM Journal Publishing Tag Set in order to enforce business rules, data types, and house style and, having done that, realized that a JPTS subset could have been sufficient to meet AGU's needs if it were used in conjunction with the appropriate layer validation technology, such as Schematron Alexander (“Sasha”) Schwarzman2Superset Me—Not JATS-Con Nov 2, 2010

3 Contents Why we built a JPTS superset DTD vs. Schematron – Attribute values – Number of element occurrences – Element position & sequence – References Lessons learned Alexander (“Sasha”) Schwarzman3Superset Me—Not JATS-Con Nov 2, 2010

4 Why we built a JPTS superset No generic book model Lack of familiarity with Schematron Lack of mature tool support (running SVRL not a viable option in Production environment) Lack of expertise on integrating Schematron with validation against relational DB JATS v2.3: no Compound Keywords, not all content models parameterized Alexander (“Sasha”) Schwarzman4Superset Me—Not JATS-Con Nov 2, 2010

5 DTD vs. Schematron: Attribute values Requirement: Article type is required and can be one of three types: a regular article (rga), a correction (cor), or an editorial (edt) Strict DTD JPTS Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2, 20105

6 DTD vs. Schematron: Attribute values (cont’d) XML instance (contains non-allowed article type) ' ' not allowed, must be 'rga', 'cor', or edt' Schematron 'xxx' not allowed, must be 'rga', 'cor', or 'edt' Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2, 20106

7 DTD vs. Schematron: Number of element occurrences Requirement: Acknowledgments, if present, must contain exactly one paragraph, except for two journals (journal code ‘ja’ and ‘rg’) where Acknowledgments must contain two paragraphs Strict DTD JPTS Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2, 20107

8 DTD vs. Schematron: Number of occurrences (cont’d) XML instance (wrong number of paragraphs)... jb... Blah Blah-blah Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2, 20108

9 DTD vs. Schematron: Number of occurrences (cont’d) Schematron ' ' in ' ' must contain exactly two paragraphs ' ' in ' ' must contain only one paragraph Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2, 20109

10 DTD vs. Schematron: Number of occurrences (cont’d) Schematron message 'ack' in 'jb' must contain only one paragraph Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

11 DTD vs. Schematron: Element position & sequence Requirement: If a journal has subj. grouping (ToC category, subset) & article belongs to sp. collection (sp. section, theme), then subj. grouping info must precede special collection info Strict DTD JPTS Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

12 DTD vs. Schematron: Element position & sequence (cont’d) XML instance (wrong sequence of subject groups) New Methods and Applications of Earthquake Early Warning Solid Earth Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

13 DTD vs. Schematron: Element position & sequence (cont’d) Schematron ' must appear after a ToC Category or a Subset when either is present Schematron message must appear after a ToC Category or a Subset when either is present Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

14 DTD vs. Schematron: References Validating references is a challenge: Variety vs. the need to enforce editorial style Strict DTD: Fixed element order, no mixed content Punctuation, spacing, face markup – on output JPTS: Lots of elements, any order, mixed content Punctuation, spacing, face markup included Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

15 DTD vs. Schematron: References (cont’d) Strict DTD Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

16 DTD vs. Schematron: References (cont’d) JPTS Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

17 DTD vs. Schematron: References (cont’d) Example: Mood, A. M., and F. A. Graybill (1963), Introduction to the Theory Statistics, 2nd ed., 295 pp., McGraw-Hill, New York. Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

18 DTD vs. Schematron: References (cont’d) XML instance (strict DTD) Mood A. M. Graybill F. A Introduction to the Theory Statistics 2nd 295 pp McGraw-Hill New York Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

19 DTD vs. Schematron: References (cont’d) XML instance (JPTS) Mood, A. M., and F. A. Graybill ( 1963 ), Introduction to the Theory Statistics, 2 nd ed., 295 pp., McGraw-Hill, New York. Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

20 DTD vs. Schematron: References (cont’d) Schematron can check that all required elements are present and are in the correct sequence (note the required elements and that edition, if present, follows source ): Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

21 DTD vs. Schematron: References (cont’d) Schematron can check that all required elements are present: required element missing & that the elements are in the correct sequence: Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

22 DTD vs. Schematron: References (cont’d) XML instance (JPTS) (edition is in the wrong place) Mood, A. M., and F. A. Graybill ( 1963 ), 2 nd ed., Introduction to the Theory …, 295 pp., McGraw-Hill, New York. Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

23 DTD vs. Schematron: References (cont’d) This Schematron uses positional predicate [1] to check that year is immediately followed by source : ' ' must be followed by 'source', not by ' ' Schematron message 'year' must be immediately followed by 'source', not by 'edition' Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

24 DTD vs. Schematron: References (cont’d) But how to check the sequence of required elements when there might be optional elements interspersed between them? This Schematron checks that required publisher-name is preceded by required source, regardless of any optional elements that may occur in-between: ' ' must be preceded by 'source' Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

25 DTD vs. Schematron: References (cont’d) Rick Jelliffe’s approach combines flexibility of JPTS with benefits of a DTD-like fixed element order: – Each element rewritten as a string of its element names – Content model represented as a regular expression – Schematron checks the string of names against regex – Schematron generates an error message if content does not match the model Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

26 DTD vs. Schematron: References (cont’d) An XML file, e.g., citation-models.xml, specifies structured citation models:... ((string-name | person-group), year, source, edition, (string-name | person-group)?, size?, elocation-id?, publisher-name, publisher-loc)... Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

27 DTD vs. Schematron: References (cont’d) Advantages: – DTD is still DTD-valid – Mixed content is permitted – Type-sensitive handling of references is possible Caveat: XSLT 2.0! Alexander (“Sasha”) SchwarzmanSuperset Me—Not JATS-Con Nov 2,

28 Lessons learned AGU Tag Set + Schematron (200+ checks) – Ensures data quality – Ensures markup integrity – Provides control over production processes AGU Tag Set is a superset of JPTS – Based on JPTS – Uses the same modularization principles – Can be easily mapped to JPTS Were we to do this again we would have developed JPTS subset and a Schematron Alexander (“Sasha”) Schwarzman28Superset Me—Not JATS-Con Nov 2, 2010

29 Lessons learned (cont’d) Appropriate layer validation – Even the most “Prussian” DTD can’t enforce all business rules, data types, and house style – Rules-based checking needed anyway – May as well use “Californian” JPTS (de facto industry standard) adopted by publishers, conversion & composition vendors, archives, etc. Paradigm shift: the crux of validation shifts from XML parser to Schematron engine Alexander (“Sasha”) Schwarzman29Superset Me—Not JATS-Con Nov 2, 2010

30 Lessons learned (cont’d) This shift is not without costs: – Content may be valid to JPTS but make no sense – Dependency on Schematron for semantic integrity – Constraints on business partners: must be Schematron-capable and have tools – Schematron does not “fix” problems—people do. Processes and procedures must be well-defined Alexander (“Sasha”) Schwarzman30Superset Me—Not JATS-Con Nov 2, 2010

31 Lessons learned (cont’d) Writing a simple Schematron is easy; building a complex and efficient one is not: – Elicit, document, convey, and clarify the Requirements – Ensure Schematron fits into your workflow – Modularize Schematron – Ensure that individual Schematron rules aren’t in conflict – Optimize Schematron performance – Employ XSLT 2.0 – Test, test, test – Cultivate Schematron & XSLT 2.0 expertise in-house Alexander (“Sasha”) Schwarzman31Superset Me—Not JATS-Con Nov 2, 2010

32 Conclusion What about content that is not like a journal article, e.g., generic (non-NCBI) books and their parts/chapters? When this deficiency is addressed, the NLM Archiving and Interchange Tag Suite could truly say: “Superset Me—Not!” Alexander (“Sasha”) Schwarzman32Superset Me—Not JATS-Con Nov 2, 2010


Download ppt "Superset Me—Not: Why the JPTS Is Sufficient if You Use Appropriate Layer Validation Alexander (“Sasha”) Schwarzman American Geophysical Union (AGU) JATS-Con."

Similar presentations


Ads by Google