Presentation on theme: "SNOMED CT - in Release Format 2 ‘RF2’"— Presentation transcript:
1 SNOMED CT - in Release Format 2 ‘RF2’ Implications for different constituenciesPresented 22nd July 2013Tom Seabury
2 ObjectivesTo share a sense of how SNOMED CT in RF2 can be fed into the terminology subsystems of an EHR systemTo give an RF2 comparison to the RF1 approachesfor the same tasks
3 SourcesA comprehensive exposition on RF2 is included in the SNOMED CT Technical Implementation Guide, as the sectionRelease Format 2 Update GuideInformation relating specifically to the UKTC RF2 distribution of its content is from the UKTC team
4 The impossible 40 minute Webinar? Technical subjectsNew things, redundant thingsPolicy and toolsMajor differences‘Concept Enumeration’Change log, history mechanismRF2 release typesContent : Files : Folder structuresImplementation SchemasPreferred terms, preferred FSNs
5 Webinar Topics Basics RF2 described & illustrated Policy and plans assumed knowledge from prior WebinarsRF2 described & illustratedPolicy and plansUKTC PlansRF2 compared to the pre-existing RF1How do I …Tools
6 Assumed prior knowledge or experience RF1Awareness –or-Experience of use –or-Technical expertiseSNOMED CT basicsThe concept : descriptions : relationships schemeThe existence of an inactivation (‘history’) mechanismSubsets (Refsets in RF2)UK Extension content, its packing and distribution structures.
7 Unfamiliar words or concepts? During the webinarplease do ask for clarification if you encounter unfamiliar words or usese.g. ‘Component’ has a specific meaning:- any of Concept, description, relationship, Refsete.g. ‘stated relationship’- a relationship as stated by one of SNOMED CT’s authors, not one subsequently inferred from other relationships by some classifier tool.- Or seemingly contradictory statements
8 For Clinical usersFor: Clinical users of systems exploiting SNOMED CT RF2 has: No direct impacts – nothing will be immediately apparent
9 For informaticians (clinical or otherwise) Clinical / Informatician usersThose who configure systems exploiting SNOMED CTRF2 is:Relevant to what tool are usede.g. Refsets editor Vs. Subset editore.g. Detection of inactive content in templates, queries or results
10 For Developers and system developers Software & System DevelopersRF2: will need to be understoodIts formatThe potential inherent in the available contentWhich are the appropriate parts of SNOMED CT content to be correctly ingested and transformedAny new Refset types (not needing revision to the RF2 standard) will need to be tracked and impact assessed.
11 For Content Developers UKTCOther developers of SNOMED CT contenti.e. owners of their own SNOMED CT namespaceRF2:Will be conformant to RF2 in the development and maintenance of content and of metadataWill be conformant to for the RF2 distribution of contentCan consider whether to embed RF2 techniques natively into authoring tools, or whether to rely on data transformations
13 SNOMED CT Content, Release Formats SNOMED CT is a large set of reference dataThe UK Edition is distributed by the UKTC (via TRUD)Distribution formats are standardisedIn addition: local choices for ways to pack content are in use
14 SNOMED CT UK EditionThe whole of the SNOMED CT UK Edition is distributed via TRUDA definition of ‘UK Edition’the International Release,the UK Clinical Extension andthe UK Drug Extension
15 About the UK Edition Q) How alike is the UK RF2 to any other nation? The UK Edition differs from releases in other countries in:The specifics of the packaging structures used (based on the International release)The Extensions in the UK Edition are UK specificQ) How are the specifics of the UK Edition documented?A) The release notes which are distributed with the data describe a range of details which are specific to the build of the dataA) The file naming conventions of the standard should be used to understand what each of the distributed files contains, in outline.
16 Data Content of SNOMED CT is distributed as a collection of SNOMED CT data files of different typesFiles containthe core SNOMED CT tablessets of data components (in Refsets)sets of metadata components (in Refsets)Refsets (RF2) or Subsets (RF1) :Sets of things(or more exactly)Collections of references to thingse.g. a set of concepts cherry-picked from the whole of SNOMED CT‘Tables’ and ‘Files’ are partly interchangeably used
17 SNOMED CT Content, Release Formats RF2 (and its predecessor RF1) are Standardised Release Formats for the content of SNOMED CTThese are:99% Product and platform neutral(exception: DOS eol characters)Exclusively for use with SNOMED CTFormalised in IHTSDO documentation as SNOMED CT StandardsIndependent of the content which they distribute*
18 SNOMED CT Content, Release Formats An RF2 (full) release containsAll of the past states of all the things which were ever in SNOMED CT UK EditionBy contrast: an RF1 release containsAll the things which were ever in SNOMED CT UK Edition, in their current state(RF2 Snapshot is similar to RF1, it has only current the status of any component. RF2 snapshot is however different from RF1 in many ways described later)
19 The UKTC use of RF1 and RF2 UK International UKTC have relied exclusively on RF1 until October 2012UK RF1 and RF2 will co-exist for no less than three years from October 2012, UK RF1 distribution is currently the definitive version.InternationalDeprecation of RF1 by IHTSDO is being considered, IHTSDO wish to distribute the international core content exclusively in RF2
21 SNOMED CT content in RF2 RF2 is being used by UKTC to distribute: Core contentConceptsDescriptionsRelationshipsSets (formerly ‘subsets’ now as ‘Refsets’)realm description Refset‘Non-Human’ concepts setCross-mapse.g. to ICD-10
22 What structures and standards? RF2 standardises:Data typesAttributes used, and their meaningFile types and naming (carrying numerous fragments of information)It is used to represent:Core componentsReference sets (aka ‘Refsets’)Essential functionality(such as language specificity, historical status changes and associations
23 Concurrent to RF2 … Introduction of ‘Module’ ‘Active’ field and the Module Dependency Refset‘Active’ fieldEach component in RF2 has an associated active fieldvalues of true ('1') or false ('0')Use to filter out inactive content where appropriateNB It is not always most appropriate to filter out inactive descriptions or concepts
24 Language of release formats ‘State Valid’ date stamped records ‘Refset’ Cf. Subset ‘Concept Enumeration’ self referencing ‘Delta, Snapshot, Full’ release types in RF2 ‘Module’, other new & existing metadata ‘Extensibility’ distribute anything
25 ‘State Valid’ illustrated Log is ordered here in reverse chronologicalThis is the first ever entry(SNOMED CT files have no defined ordering of rows)‘State Valid’ illustratedAnd is subsequently inactivatedRed text signified what has changed between entriesFor illustration, data is NOT colour codedOwnership changes so a new ‘Module’ association is recordedModelling is improved and it becomes ‘fully defined’
26 Refset patterns (RF2) RF1 Subset Patterns UK map patternTIG lists some reset types as ‘Not currently provided - for future use’UK Cross-maps ICD-10
27 Reference Set names The labels for Refsets can be more verbose Addition of text to indicate the Refset type e.g. ‘Family history simple reference set’reference setFamily historyFamily history simple reference set (foundation metadata concept)simple reference set(foundation metadata concept)Implied purposePurpose clarifiedin UK release documentationon Subset registerExplicit on formatting
28 RF2 has a self-referencing nature Compared to RF1In RF1 the meaning of a value in a field in table may need access and review of documentationRF2 relies on ‘Concept Enumeration’not used in RF1Meaning of each metadata value is also included as the concept in SNOMEDCf. systems tables in RDBMS
29 RF2 Concept Enumerations Vs. RF1 arbitrary integers RF2 - Concept enumerations are used across all release files.uses concepts in a metadata hierarchy to represent an enumerated value set rather than using arbitrary integer (as in RF1) valuesTake the SCTID data type
32 Zips, Files and foldersThis International release is not the baseline release, so Delta is legitimate to include
33 Choices of: Which of these am I likely to need? Choice is between a current Snapshot and a current FullBut what is mandated of UKTC?The full view is required to support some SNOMED CT use cases but many requirements can be adequately met by providing access to a current Snapshot view.However:‘A SNOMED CT-enabled terminology server must be able to import data from a full release because this is the only Release Type that is required to be produced by all Extension developers’TIG p237
37 UKTC distribution structures UKTC has always added further structurebeyond that mandated by the standarde.g.TRUD Packs and SubpacksFile:content strategye.g. which extension in what filee.g. which sets in what fileNo changes for RF2 introduction: replicated RF1 file and folder structures UK RF1 <> UK RF2
38 What does RF2 look like 197 files but Don’t Panic! This is the RF2 UK build which was used in the Authors’ Inspection TestsTool used to count and display files in directories is JDiskReport197 files butDon’t Panic!
39 RF2 Zipped structure Unzipped structure What is found where? folder structuresfile namesWhat is found where?DataMetadataContinuity of access: Per RF1:2 full releases and each of the ‘incremental releases’ between them.
41 Illustration of Refset content & metadata as an aside- the datatype of the id column in RF2 tables may be either a UUID or a componentId; its not one datatype
42 Policy and plans UKTC Policy, IHTSDO policy Transformabilty Beyond the UK
43 UKTC policy – released data UKTC enjoys some latitude in its cut-over from RF1 to RF2The planned period of concurrent running of RF1 plus RF2 (RF2 as tech preview status currently) will terminate in October 2015UK Edition in RF2 StatusStatus: ‘Technical Preview’UK RF2 baseline release July 2013Scope – including UK cross-maps
44 UKTC policy – toolingUKTC currently performs a conversion between RF1 and RF2 using tools and configuration data which is not itself distributed.UKTC has no current plans to distribute these tools and configuration data(UKTC continues to author terminology content in tools which are not tied to any particular release format)
45 IHTSDO stated policy IHTSDO RF2 ‘Developed in response to extensive feedback on’ RF1RF1 format was replaced by RF2 in January 2012, RF1 ‘is being maintained for a transitional period’(SNOMED CT®Technical Implementation Guide January 2013)UKTC policy reflects this, but is different
46 Transformability Content is being transformed from RF1 to RF2 by UKTC Content is being transformed from RF2 to RF1 by IHTSDOThese transformations :Require a fraction of prepared, different metadata for each formatTables of equivalence for some metadata such as versioning of membership of Refsets and SubsetsFor the UK Edition: are subject to a set of documented deviations provided by UKTC within the RF2 release note.
47 Forward Compatibility Today’s tools, tomorrows data (RF2 distributed)UKTC Distribution in both RF1 and RF2- UKTC Distribution in RF2 exclusively+/- metadata which is specific to and essential for RF1+/- tools to allow you to generate RF1(should you need to)
48 Back Compatibility UKTC, UK Edition Providing Back compatibility Tomorrow’s tools, today’s data (RF1 distributed)UKTC, UK EditionRF2 metadata present (from RF2 files)RF1 metadata presentProviding Back compatibility
49 Multi-region products Systems developed for use not only within but also beyond the UKStandard prescribes each ofContent semantics, syntax, data typesMetadata semantics & syntaxFile namingLocal variation will exist for‘Skeleton’ of foldersExtensionsRefset packingDistribution tooling, release schedules
50 SNOMED CT Release Formats – stability (Jan 2013) Stability ‘The RF2 format is likely to be stable for at least a five year period, without addition or deletion of fields’ Stable Extensibility The Refset mechanism permits (without change to the core standard) new Refset types to be used (extensibility)TIG page 138 Jan 2013Cf CEN ISO five year review periodsNOTE THAT the conformant patterns are neutral for the DISTRIBUTION format, but can still have impacts for all implementation Schemas
51 ToolsOver time, as RF2 becomes the primary distribution format in the UK, tools will be developed to enhance the ability to process data in this format more easily.This will includeRefset developmentmapping toolsa concept editing environment
53 Contrast RF2 supports things which are unavailable from RF1: Refset extensibility – a constrained set of novel types can be addedRF2 supports things which differently available from RF1:Component historyExtensive documentation of the value and benefits is made by IHTSDO :
54 Contrast Identification of the origin of a component RF1 – NamespaceID (embedded into the component ID)RF2 – ModuleID (a newly added metadata item)
55 In RF1 but not in RF2 PartitionIDs not in RF2 03 A Subset 04 A Cross Map Set05 A Cross Map Target
56 In RF1 but not in RF2 CTV3ID and SNOMEDID Single FULLYSPECIFIEDNAME (to Refsets)Single FULLYSPECIFIEDNAMEISPRIMITIVE (to Refset)REFINABILITY (field in RELATIONSHIP file, to Refset)
57 Content available only in RF2 Non-human RefsetMetadata: Module
60 Implementation Schema (2) Does HSCIC recommend that RF2 (or RF1) is used as the implementation schema?NoCould RF2 be used as the implementation schema?Perhaps, but its principally for distribution
61 Populating an implementation schema Combine files of like typesConcept (x3)Description (x3)Relationship (x3)Apply parts of the data as distributed in RefsetsHistorical relationships in addition to Relationship tableUK Language preferences given precedence over internationalUK Preferred terms appliedContent RefsetsAll other relevant data
62 Release format Vs. Implementation Format Distribution Normalised, no data duplication Extensive Distributed in a normalised formatImplementationPartly DenormalisedDenormalisation (performance)Re-indexedFilteredPartitionedDenormalisation, filtering, sorting, partitioning and index expected to be required to deliver required technical performance and content acceptability.
63 Release format Vs. Implementation Format Distribution Inclusion of all dataImplementationRemoval of unnecessary data (for the given application)Most solutions are likely to be record-entry centric, hence mostly it will be the active components which are actually relevant
64 Populating your implementation schema Release Format 2SnapshotRelease Format 1Release Format 2Full
65 ‘operating on’ SNOMED CT reference data UK Edition
66 ‘operating on’ SNOMED CT reference data UK Edition
67 RF2 operations on reference data RF1 operations on reference data Current status data table style (< 99% of implementations)Rip out & replace existing reference dataLog style reference data database?Almost no-one:Merely append new reference data dataDe-normalisationCombinationSubstitute: Own file and folder skeletonDe-normalisationCombinationSubstitute: Own file and folder skeletonData-reconciliationCore Tables …Update descriptions table with UK description preferences (unpack these from Refsets)? Substitute:Own scheme for metadata e.g. Refsets > local value setsOwn scheme for component statusOwn scheme for component history? Add back: Own interface termsDetect and accommodate any new Refset types foundData-reconciliationCore Tables: Rip and replaceUpdate descriptions table with UK description preferences (unpack these from Subsets)? Substitute:Own scheme for metadata e.g. Subsets > local value setsOwn scheme for component statusOwn scheme for component history? Add back: Own interface terms
68 No import tooling? You may wish to just ‘get at’ a Refset out of the raw dataHow to? Tools you will rely onInitially : File & Refset manifest –or-Lookup tables between Refset names and their identifiers – cut & paste (or search the Descriptions Table)Search in the Reference Set Descriptor Reference Set to identify the file pattern (or alternatively by seeking the Refset supertype in the Metadata hierarchy)Search within the files of the given pattern(if data for one Refset has been partitioned across multiple files: recombine it)Filter the results for only the active contentLooks reasonable. Option (1) find it where the manifest tells you it is, option (2) as above [find the reverse engineer the pattern from the descriptor (via the 'descriptor template' - though this only seems to sit in the documentation) and option (3) bit like 2, but find the Refset's type/pattern by discovering its immediate supertype [tho' known failing cases]
70 RF2 :: RF1 Content : Files : Folders So you need to know what content is in which fileIn (the familiar) RF1 locationsIn RF2 locations12
71 RF2 files The following files are included in an RF2 release: Concept fileRelationship fileDescription fileReference set filesIdentifier file (these may exist, but be empty files)
72 RF2 files Reference set files Primary grouping of Refsets is driven by their data format (i.e. not their common field of use)Second axis of grouping can be by utility / area of applicationField of use clustering of files and datacan lead to the same Refset distributed more than once in a given release
73 RF2 Identifier file alternateIdentifier A field in the RF2 Identifier file containing the representation of an Identifier in another code system with is irrevocably linked to a SNOMED CT identifier.IdentifierSchemeIdA field in the RF2 Identifier file containing a SNOMED CT identifier which identifies the alternate code system.NOT THE SAME AS A SEMANTIC MAPPING RELATIONSHIP. ONLY USED FOR TRUE LOWER LEVEL TECHNILCA LOGICAL IDENTIFIER EQUIVALENCEThe UK Extension as released (Technical Preview) contained no file of this type, rather than distributing an empty Identifier file
74 Difference in files included RF1Possible file types are:• Concepts• Descriptions• Relationships• ComponentHistory;• References;• Subsets• SubsetMembers;• CrossMapSets;• CrossMaps;• CrossMapTargets;• TextDefinitions;• Canonical;• DualKeyIndex;• WordKeyIndex;• StatedRelationships.RF2Possible types are:• Concept• Description• Relationship• Identifier;• Refset (all subtypes)An RF1 release contains no less than 11 filesAn RF2 release contains no less than 14 files:No upper limitRF2Refset file typesFixed patternsreference set descriptormodule dependencydescription formatExtendable patterns (addition of fields)attribute value typesimple maplanguage typequery specification typeannotation typeassociation typeExtension example: CTV3 map:| Simple map | (S)Any number and combinations of(C) (I) (S) additional fieldse.g. | Complex map type | (IISSSC)
75 Differences for a recipient of RF2 (Vs. RF1) (2) Choosing a storage structure for Refsets is different to the challenge for RF1 SubsetsExtensibility of Refsets in RF2 dictates that each of the finite number of Refset patterns must each be accommodated into part of the storage schema.These different Refset patterns may each be held in a different data table structured for the purpose of that particular Refset pattern.The extensibility of RF2 however allows the addition of new Refset patterns, these conform to the standard and are not tied to a revision of the standard. Consequently
76 What sets can be together in one distribution file? RF1Same Subset TypeRF2Same Refset pattern
78 Distribution of sets within Refset distribution files RF1 (Sub)setsUKTC convention:One file per subsetRF2 (Ref)setsUKTC convention:One file per collection of Refsets (perhaps by refset pattern)
79 Files UTF-8 encoded tab delimited text files contain a column header row, providing field names for each column within the fileLower camel case is used for the field names (e.g. moduleId, effectiveTime)use DOS style line terminationEach line is terminated with a carriage return character followed by aline feed characterShould have a last line that ends with a line terminator (CR/LF) before the end of file
80 Both Release Formats represent: The core components of SNOMED CT:ConceptsDescriptionsRelationshipsAdditional derivatives that provide standard representations of :Value-sets consisting of a specified set of concepts or relationshipsCross mapping tables to other codes and classifications.From TIG Jan 2013
81 Both Release Formats are provided in: Tab-delimited text filesRepresent character content in accordance with the Unicode UTF-8 specificationUse SNOMED CT Identifiers as the permanent Identifier of released core componentsSupport extensions to the International Release using namespaces allocated to licensees to denote the provenance of added components and to ensure Identifier uniqueness
82 RF2 - History of each component In RF2 all changes in components are represented by adding a row (same component ID) with:a new effective timeany necessary change in the component values.For changes which get into the ‘release’ dataNot one row for each and every change by SNOMED CT authors made in between releasesParaphrasing TIG Jan 2013
83 Representation of historical relationships in RF1, the concept was moved into an special "Inactive concept" hierarchy, this is not done anymore in RF2In RF1 (only) the following relationships are used for concepts in the “inactive concept” hierarchy| MAY BE A || MOVED FROM || MOVED TO || REPLACED BY || SAME AS || WAS A |The equivalent in RF2 is achieved by the concept and its Relationships being turned inactive. the concept is inactivated "in place", with its last location described in the history of its inactive Relationshipshere it is not easy to distinguish between the ideas of Inactive relationship and in RF1, those relationships of ‘CHARACTERISTIC’ = 2 i.e. ‘Historic’
84 Reconstructing a table of all past and current relationships RF2Historical (non-current) relationships are not retained in the Relationships file but are distributed as a set of RefsetsAny Implementation Schema which needs to collate all past & present relationships can be created by unpacking all the Refsets which contain historic relationships, this can span the International and the Local Extensions (totalling 3) cross product with the number of different relationship types (say 8?) so 24 Refsets to be combined and appended to a table of active relationships
85 Refsets and Active field values Refsets as distributed in RF2 contain components which are both active and inactive, according to the value in the ‘Active’ field.For a full release it is possible, using the applicable date range for each row, to identify the members of a Refset at any past time.
86 Release Types "Full" release each file containing every version of every component ever released."Snapshot" releasecontaining only the most recent version of every component ever released (both active and inactive components).A single snapshot provides access to a single release version and this ‘closely matches’ the view provided by the original SNOMED CT release format (RF1)"Delta" releasecontaining only component versions created since the last release. Each component version represents a new component or a change in an existing component.From TIG Jan 2013The statement about single snapshot is on page 236
87 Combinations of release types First Ever Full Release (‘Baseline’)+ Every subsequent Delta= Current Full ReleaseSnapshot +Deltas= incomplete FullDelta alone is valuelessIf your system have transaction tracking for the reference terminology itself, you may prefer to append Deltas than to Rip & Replace the Full release at each releaseIf you rely on Snapshot releases, then you may need to Rip & Replace the entire snapshot at each release(being aware that you may lose past versions of Refsets which may still be current)Application of an incomplete set of Deltas can be misleadingFor a full set of all foreseeable options see the SNOMED CT Technical Implementation Guide (at Jan2013 this is covered in section Importing and maintaining a Full view and onwardsNB: Snapshot + Delta gives an incomplete history
88 Full release data for Refsets HistoryHistoryHistorySnapshotThis UID is unique to each unique pairing of …’active’ means that the row is active in this RefsetIt’s not a surrogate or repeat of the concept’s own active statusBut …Its not permissible to distribute as an active Refset member if the component itself is not active at that time
89 Refset distribution files Any RF2 file containing Refsets can only contain one type of Refset e.g. a file which holds exclusively ‘ssRefset’ having two additional columns, both holding String valuesThe name indicates the attributes held in the file from any number ofComponentStringInteger
90 Preferred TermsRF2 does not have a Description type value “ Preferred Term”, only types of “ Fully specified name ” and “Synonym”, where the latter may be refined either to a “Preferred term ” or to a “Synonym” within a language reference set. As a result of this change, in RF2 the preference for particular Descriptions in a language or dialect will be represented in the language reference set, and not in the descriptions table.
91 Preferred Terms in RF2(The RF1 release files contain within the core tables identification of just one Preferred Term and one Fully Specified Name per concept)The international Edition in RF2 does not identify one Preferred Term per conceptTo identify a Preferred Term from RF2 data it is essential to combine information from a Language Refset along with data in the core tables.
92 Preferred Terms in RF2UK Edition in RF2 identifies the UK Preferred Terms via:Descriptions.Description.type=SynonymRefSet.Acceptability=Preferred(RefsetID )(Refset file name = xder2_cRefset_NHSRealmDescriptionLanguageFull_GB _yyyymmdd.txt)(Path….\SNOMEDRF2\1.0.0\NHS_SNOMEDRF2\SnomedCT_GB _ \RF2Release\Full\Refset\Content\NHSRealmDescription)There are no restrictions against the identification of alternative preferred terms in Refset(s) and using these as an alternative to the UKTC provided one.NB existing UK documentation statesAlthough supporting a number of description re-prioritisations (Realm-specific promotions of descriptions to „preferred term‟ description-type) the present NHS Realm Description Subset is best thought of as a mechanism to satisfy the „one and only one fully-specified name & preferred term‟ schema constraints for the UK data
94 What needs further exploration? Technical subjectsNew things, redundant thingsPolicy and toolsMajor differences‘Concept Enumeration’Change log, history mechanismRF2 release typesContent : Files : Folder structuresImplementation SchemasPreferred terms, preferred FSNs
95 How did we do? Speak to us Routes by which you might wish to engage: Person to person; orientation(via: )NHS Networks / SNOMED CT(useful even if download speeds are slow)UKTC Implementation Forum(open to all, join via: )Helpdesk
96 Q&A Q) Has RF2 any impact on dm+d? No, dm+d is unaffected - no further questions were received during the Webinar