Presentation is loading. Please wait.

Presentation is loading. Please wait.

OASIS Electronic Trial Master File Standard Technical Committee Content Classification Layer January 20, 2014 9:00 – 10:00 AM PST.

Similar presentations


Presentation on theme: "OASIS Electronic Trial Master File Standard Technical Committee Content Classification Layer January 20, 2014 9:00 – 10:00 AM PST."— Presentation transcript:

1 OASIS Electronic Trial Master File Standard Technical Committee Content Classification Layer January 20, 2014 9:00 – 10:00 AM PST

2 Agenda TopicPresenter 9:00-9:05 Call to Order & Roll Call Zack Schmidt 9:05-9:10 Approval of Minutes https://www.oasis- open.org/committees/documents.php?wg_abbrev=etmf https://www.oasis- open.org/committees/documents.php?wg_abbrev=etmf All TC Process and Administration (deferred) Chet Ensign 2 9:10-9:20 Outreach Subcommittee - AllJennifer Alpert 9:20-9:50 Tech presentation – Content Classification Layer Z. Schmidt/Aliaa 9:50-9:55 New Business All 9:55-10:00 Next meeting agenda / Date Z. Schmidt

3 NameCompanyVoting StatusPresent? Jennifer Alpert Palchak CareLex Votery Aliaa BadrCareLexVotery Oleksiy (Alex) PalinkashCareLexVotery Troy JacobsonForte ResearchVotery Lou ChappuieIndividualVotery Lisa MulcahyIndividualNon-Votery Robert GehrkeMayo ClinicVotern Rich LustigOracleNon-Voter y Michael AgardParagon SolutionsNon-Votery Christopher McSpirittParagon SolutionsNon-Votery Jamie O’KeefeParagon SolutionsNon-Votern Fran RossParagon SolutionsNon-Votery Peter AltermanSAFE-BioPharmaVotery Catherine SchmidtSterlingBioVotery Zack SchmidtSureClinicalVotery Trish Whetzel, PhDSureClinicalNon-Votery Peter JungeBeijing SursenObservern Laura HiltyForte ResearchObservern Tony O’HareForte ResearchObservern Eldin RammellRammell ConsultingObservern Robin CoverOASIS staffNon-Votern Chet EnsignOASIS staffNon-Votern Roll Call

4 Meeting Etiquette Announce your name prior to making comments or suggestions Keep your phone on mute when not speaking (#6) Do not put your phone on hold –Hang up and dial in again when finished with your other call –Hold = Elevator Music = very frustrated speakers and participants Meetings will be recorded and posted –Another reason to keep your phone on mute when not speaking! Use the join.me “Chat” feature for questions / comments / Votes We will follow Robert’s Rules of Order NOTE: This meeting is being recorded and minutes will be posted on TC page after the meeting From eTMF Std TC to Participants: Hi everyone: remember to keep your phone on mute 4

5 Status – New Members: –Oracle – Joined –In Progress: EMC, Kaiser Permanente, Shire, Medtronics Activities / Milestones Outreach Subcommittee

6 Status Timeline In parallel with other Tech work from charter Tech Discussion

7 –Classification System Components: Classification Categories –Taxonomy, hierarchy Metadata (‘Tags’) –Characterizes content Content Model –Published set of classifications, metadata for a domain (e.g., eTMF) Content Classification System Discussion

8 Classification Categories Component –Hierarchy of categories Categories, subcategories, content types –Defined relationships with rules: Parent-Child –All categories, content types required to have unique names and machine codes –Each content type is associated with Metadata Properties (includes core and domain-specific) –Content items are linked to content types. –Unique classification and term codes based on Universal Decimal Classification System (UDC) numbering, widely used in libraries worldwide. Human and machine readable; infinitely expandable –Can be described, edited and validated using OWL editor (like open source editor Protégé’) –Supports any simple text vocabulary, including TMF Ref Model and other vocabularies –W3C OWL2 and RDF/XML supported Classification Categories Component Study Digital Content Classification Categories Hierarchy

9 Metadata Component –Used to tag or index digital content items Metadata Classes: Core - Comprised of four areas: File Properties, Classification, Audit Trail Business Process Domain-specific -- Metadata for a domain in life sciences such as eTMF, finance, legal administration, or others. Uses standards-based terms from groups like NCI Org Specific – Metadata that meets organizations needs – not standards based General – obtained from public standards-based vocabulary terminology resources like dublin core Annotation Properties Metadata about classification categories and metadata:  Core, Org-Specific metadata Metadata Component Core Metadata Example – File Properties:

10 Content Model Component –Contains classification hierarchy, metadata in machine readable format: Content Model Component

11 Term Sourcing Concepts: Terms adopted by standards bodies should be used first in eTMF model Primary Term Sources for eTMF Classification System: –Internet Standards Dev Orgs : W3C, IETF, ISO, etc. »Required for interoperability of machine code –NIH NCIthesaurus: Term database for FDA, CDISC, HL7, other orgs »Required for interoperability of clinical / health sciences data Secondary Term Sources for eTMF Classification System: Industry sources – widely used terms in enterprise content mgmt software, TMF RM Classification System – Term Sources *Spec, Table 6, p21

12 Classification Categories Component –Classification hierarchy and numbering is based on UDC library numbering standard and XML naming –Digital dot notation – Designed for human and machine readability –Each number is also a unique code for naming and ordering in the hierarchy –Primary Categories (PC): Three digit. eTMF: 100-200 –Subcategories (SC): Two digit: 10-99 –Content Types (CT): : Two digit: 10-99 –Maximum number of Sub- Category divisions is 5, excluding the 3-digits for the Primary Category [1] Per spec section 2.1.1; 6.0 Classification Categories Component Classification Categories Hierarchy and Numbering [1]: Hierarchy Numbering/Naming Considerations: Flexible, standards-based approach (W3C XML compliant naming*) Ability to add multiple hierarchy divisions / levels Proposed: 5 divisions = [100*90 5 ) = 5.9x10 11 Content Types Uniqueness of numbers – usable as machine code identifiers Machine readable, human readable No sorting issues, no need for leading zeros*, no special chars *Leading zeros in XML syntax are ignored: http://www.w3.org/TR/REC-xml/

13 Numbering and Naming Scheme Numbering Primary Categories and Sub-Categories : –Category Code number Content Type: –Content Type ID Naming Primary Categories and Sub-Categories –Simple text-based names –Unique name, 64 char limit –Abbreviation – 16 char limit suggested –Compatible with W3C XML naming standards : No special characters : ( ) ? / % # @ ! Classification Categories Component Example: Classification Categories Hierarchy, Naming, Numbering

14 Modifying Classification Category Entities – General Editing Rules Domain Specific – Classifications cannot be deleted –> Reserve/Unreserve – Modifications allowed to some annotation properties (see spec) – Codes (Category Codes, CT Type ID) cannot be generated Organization Specific – Classifications can be deleted – Modifications allowed for classification metadata, annotations – Codes (Category Codes, CT Type ID) can be generated Classification Categories Component Classification Category, Content Type Editing Rules* TypeImport TermsGenerate Code Add/ModifyDelete/Reserve Domain Specific YesNoNo/Yes**Reserve/Unreserve Organization Specific Yes Yes/YesDelete *Spec, Table 6, p21 **Annotation metadata

15 Classification Editing Tool – Free, Open Source Protégé (From Stanford University: http://protege.stanford.edu/ )http://protege.stanford.edu/ *Spec, Table 6, p21 Protégé Editor: -Edit Classification Taxonomy and Metadata Terms -Validate Taxonomy and Term name compliance -Create valid RDF/XML Ontology

16 Proposed Classification System has following Properties: Based on Naming and Numbering that is W3C XML compliant –No special characters: ( ) & # @ / … etc. –No leading zeros in classification numbers Based on Universal Decimal Classification (UDC) system for content classification: –100  199 : eTMF Domain –UDC system used in 170+ countries worldwide; expandable, human and machine readable, sortable http://en.wikipedia.org/wiki/Universal_Decimal_Classification http://en.wikipedia.org/wiki/Universal_Decimal_Classification Flexible and customizable for organizations, yet interoperable –Domain classifications – Standardized; Organization-specific classifications – Editable Defined set of rules for Editing, modifying Taxonomy Any Organization can Modify/Edit taxonomy using open source editors like Protégé Classification Categories - Summary *Spec, Table 6, p21

17 Appendix

18 Content Classification System – Core Terms needed for Architecture – Objectives: Classification, Subclassification concept - –Supports RDF/XML, OWL languages –Non-domain specific, generic terms –Easily understandable by anyone - conveys concept –Conveys hierarchy –No conflicts – not a reserved term in RDF/XML, OWL or other compilers/ IDE’s –First priority – Source terms from standards bodies Classification System – Core Terms *Spec, Table 6, p21

19 Content Classification System – Core Terms needed for Architecture Classification, Subclassification term concept: Classification System – Core Terms *Spec, Table 6, p21 Term Options:SourceDefinition Category, SubCategoryNIH NCIthesaurusCategory: ‘This term is used informally to mean a class of things’ (NCI code: C25372); C25372 Subcategory: ‘A subdivision that has common differentiating characteristics within a larger category.’ (NCI Code C25692) C25692 Class, SubClassW3C OWL Class: ‘Resources may be divided into groups called classes’ SubClass: ‘Subclasses are classes; If a class C is a subclass of a class C', then all instances of C will also be instances of C'. (W3C RDF Class def)RDF Class TMF Zone, SectionTMF Ref Model TMF Zone = Primary Classification (no published def found online) Section = SubClassification (no published def found online) Proposed Term

20 Content Classification System – Core Terms needed for Architecture Classification, Subclassification term concept: Classification System – Core Terms *Spec, Table 6, p21 Term Options:Source+/- Category, SubCategoryNIH NCIthesaurus+Everyone knows it +Describes hierarchy +In use by standards body (NIH NCI Thesaurus) +Generic Class, SubClassW3C OWL+Describes hierarchy +In use by standards body +Generic - Could be a reserved word for some development tools TMF Zone, SectionTMF Ref Model+In use by TMF RM users -Doesn’t convey hierarchy -Not in use by standards body -Not Generic Proposed Term

21 Content Classification System – Core Terms needed for Architecture – Objectives: Content Type concept –Supports RDF/XML, OWL languages –Non-domain specific, generic terms –Easily understandable by anyone – conveys concept –No conflicts – not a reserved term in RDF/XML, OWL or other compilers/ IDE’s –First priority – Source terms from standards bodies Classification System – Core Terms *Spec, Table 6, p21

22 Content Classification System – Core Terms needed for Architecture Content Type term concept: Classification System – Core Terms *Spec, Table 6, p21 TermSourceDefinition Content TypeW3C & CareLex Oracle W3C: ‘Specifies the nature of a linked resource’ W3C and RFC2045] and [RFC2046]W3C RFC2045][RFC2046] CareLex: A content type is a reusable collection of metadata, business processes, behavior, and other settings for a category of items or documents in electronic content material. Oracle: Content types are used to define the metadata that you can associate with content. ArtifactTMF Ref Model‘A collection of documents’ Wikipedia Wikipedia ( Not published) Proposed Term

23 Content Classification System – Core Terms needed for Architecture Content Type term concept: Classification System – Core Terms *Spec, Table 6, p21 TermSource+/- Content TypeW3C+Widely used in internet SW +ECM SW use - Microsoft, Oracle, Alfresco, etc. +In use by standards body (W3C) +Generic ArtifactTMF Ref Model+In use by TMF RM users -Not in use by standards body -Not Generic -Doesn’t convey concept of metadata Proposed Term

24 Roll call Reports –Outreach –Tech Discussion: Classification Layer: Core Metadata (Charter item 2, p.2) New business Draft Agenda: Next Meeting


Download ppt "OASIS Electronic Trial Master File Standard Technical Committee Content Classification Layer January 20, 2014 9:00 – 10:00 AM PST."

Similar presentations


Ads by Google