Presentation is loading. Please wait.

Presentation is loading. Please wait.

SNOMED CT - in Release Format 2 ‘RF2’

Similar presentations


Presentation on theme: "SNOMED CT - in Release Format 2 ‘RF2’"— Presentation transcript:

1 SNOMED CT - in Release Format 2 ‘RF2’
Implications for different constituencies Presented 22nd July 2013 Tom Seabury

2 Objectives To share a sense of how SNOMED CT in RF2 can be fed into the terminology subsystems of an EHR system To give an RF2 comparison to the RF1 approaches for the same tasks

3 Sources A comprehensive exposition on RF2 is included in the SNOMED CT Technical Implementation Guide, as the section Release Format 2 Update Guide Information relating specifically to the UKTC RF2 distribution of its content is from the UKTC team

4 The impossible 40 minute Webinar?
Technical subjects New things, redundant things Policy and tools Major differences ‘Concept Enumeration’ Change log, history mechanism RF2 release types Content : Files : Folder structures Implementation Schemas Preferred terms, preferred FSNs

5 Webinar Topics Basics RF2 described & illustrated Policy and plans
assumed knowledge from prior Webinars RF2 described & illustrated Policy and plans UKTC Plans RF2 compared to the pre-existing RF1 How do I … Tools

6 Assumed prior knowledge or experience
RF1 Awareness –or- Experience of use –or- Technical expertise SNOMED CT basics The concept : descriptions : relationships scheme The existence of an inactivation (‘history’) mechanism Subsets (Refsets in RF2) UK Extension content, its packing and distribution structures.

7 Unfamiliar words or concepts?
During the webinar please do ask for clarification if you encounter unfamiliar words or uses e.g. ‘Component’ has a specific meaning: - any of Concept, description, relationship, Refset e.g. ‘stated relationship’ - a relationship as stated by one of SNOMED CT’s authors, not one subsequently inferred from other relationships by some classifier tool. - Or seemingly contradictory statements

8 For Clinical users For: Clinical users of systems exploiting SNOMED CT RF2 has: No direct impacts – nothing will be immediately apparent

9 For informaticians (clinical or otherwise)
Clinical / Informatician users Those who configure systems exploiting SNOMED CT RF2 is: Relevant to what tool are used e.g. Refsets editor Vs. Subset editor e.g. Detection of inactive content in templates, queries or results

10 For Developers and system developers
Software & System Developers RF2: will need to be understood Its format The potential inherent in the available content Which are the appropriate parts of SNOMED CT content to be correctly ingested and transformed Any new Refset types (not needing revision to the RF2 standard) will need to be tracked and impact assessed.

11 For Content Developers
UKTC Other developers of SNOMED CT content i.e. owners of their own SNOMED CT namespace RF2: Will be conformant to RF2 in the development and maintenance of content and of metadata Will be conformant to for the RF2 distribution of content Can consider whether to embed RF2 techniques natively into authoring tools, or whether to rely on data transformations

12 Basics revisited

13 SNOMED CT Content, Release Formats
SNOMED CT is a large set of reference data The UK Edition is distributed by the UKTC (via TRUD) Distribution formats are standardised In addition: local choices for ways to pack content are in use

14 SNOMED CT UK Edition The whole of the SNOMED CT UK Edition is distributed via TRUD A definition of ‘UK Edition’ the International Release, the UK Clinical Extension and the UK Drug Extension

15 About the UK Edition Q) How alike is the UK RF2 to any other nation?
The UK Edition differs from releases in other countries in: The specifics of the packaging structures used (based on the International release) The Extensions in the UK Edition are UK specific Q) How are the specifics of the UK Edition documented? A) The release notes which are distributed with the data describe a range of details which are specific to the build of the data A) The file naming conventions of the standard should be used to understand what each of the distributed files contains, in outline.

16 Data Content of SNOMED CT is distributed as
a collection of SNOMED CT data files of different types Files contain the core SNOMED CT tables sets of data components (in Refsets) sets of metadata components (in Refsets) Refsets (RF2) or Subsets (RF1) : Sets of things (or more exactly) Collections of references to things e.g. a set of concepts cherry-picked from the whole of SNOMED CT ‘Tables’ and ‘Files’ are partly interchangeably used

17 SNOMED CT Content, Release Formats
RF2 (and its predecessor RF1) are Standardised Release Formats for the content of SNOMED CT These are: 99% Product and platform neutral (exception: DOS eol characters) Exclusively for use with SNOMED CT Formalised in IHTSDO documentation as SNOMED CT Standards Independent of the content which they distribute*

18 SNOMED CT Content, Release Formats
An RF2 (full) release contains All of the past states of all the things which were ever in SNOMED CT UK Edition By contrast: an RF1 release contains All the things which were ever in SNOMED CT UK Edition, in their current state (RF2 Snapshot is similar to RF1, it has only current the status of any component. RF2 snapshot is however different from RF1 in many ways described later)

19 The UKTC use of RF1 and RF2 UK International
UKTC have relied exclusively on RF1 until October 2012 UK RF1 and RF2 will co-exist for no less than three years from October 2012, UK RF1 distribution is currently the definitive version. International Deprecation of RF1 by IHTSDO is being considered, IHTSDO wish to distribute the international core content exclusively in RF2

20 RF2 described and illustrated

21 SNOMED CT content in RF2 RF2 is being used by UKTC to distribute:
Core content Concepts Descriptions Relationships Sets (formerly ‘subsets’ now as ‘Refsets’) realm description Refset ‘Non-Human’ concepts set Cross-maps e.g. to ICD-10

22 What structures and standards?
RF2 standardises: Data types Attributes used, and their meaning File types and naming (carrying numerous fragments of information) It is used to represent: Core components Reference sets (aka ‘Refsets’) Essential functionality (such as language specificity, historical status changes and associations

23 Concurrent to RF2 … Introduction of ‘Module’ ‘Active’ field
and the Module Dependency Refset ‘Active’ field Each component in RF2 has an associated active field values of true ('1') or false ('0') Use to filter out inactive content where appropriate NB It is not always most appropriate to filter out inactive descriptions or concepts

24 Language of release formats
‘State Valid’ date stamped records ‘Refset’ Cf. Subset ‘Concept Enumeration’ self referencing ‘Delta, Snapshot, Full’ release types in RF2 ‘Module’, other new & existing metadata ‘Extensibility’ distribute anything

25 ‘State Valid’ illustrated
Log is ordered here in reverse chronological This is the first ever entry (SNOMED CT files have no defined ordering of rows) ‘State Valid’ illustrated And is subsequently inactivated Red text signified what has changed between entries For illustration, data is NOT colour coded Ownership changes so a new ‘Module’ association is recorded Modelling is improved and it becomes ‘fully defined’

26 Refset patterns (RF2) RF1 Subset Patterns
UK map pattern TIG lists some reset types as ‘Not currently provided - for future use’ UK Cross-maps ICD-10

27 Reference Set names The labels for Refsets can be more verbose
Addition of text to indicate the Refset type e.g. ‘Family history simple reference set’ reference set Family history Family history simple reference set (foundation metadata concept) simple reference set (foundation metadata concept) Implied purpose Purpose clarified in UK release documentation on Subset register Explicit on formatting

28 RF2 has a self-referencing nature
Compared to RF1 In RF1 the meaning of a value in a field in table may need access and review of documentation RF2 relies on ‘Concept Enumeration’ not used in RF1 Meaning of each metadata value is also included as the concept in SNOMED Cf. systems tables in RDBMS

29 RF2 Concept Enumerations Vs. RF1 arbitrary integers
RF2 - Concept enumerations are used across all release files. uses concepts in a metadata hierarchy to represent an enumerated value set rather than using arbitrary integer (as in RF1) values Take the SCTID data type

30 RF2 Concept Enumerations (and other Metadata)
Credit: SNOB Browser The metadata hierarchy

31 Zips, Files and folders

32 Zips, Files and folders This International release is not the baseline release, so Delta is legitimate to include

33 Choices of: Which of these am I likely to need?
Choice is between a current Snapshot and a current Full But what is mandated of UKTC? The full view is required to support some SNOMED CT use cases but many requirements can be adequately met by providing access to a current Snapshot view. However: ‘A SNOMED CT-enabled terminology server must be able to import data from a full release because this is the only Release Type that is required to be produced by all Extension developers’ TIG p237

34 Metadata values illustrated – Core ones

35 Metadata values illustrated - Module

36 Metadata values – carried over from RF1

37 UKTC distribution structures
UKTC has always added further structure beyond that mandated by the standard e.g. TRUD Packs and Subpacks File:content strategy e.g. which extension in what file e.g. which sets in what file No changes for RF2 introduction: replicated RF1 file and folder structures UK RF1 <> UK RF2

38 What does RF2 look like 197 files but Don’t Panic!
This is the RF2 UK build which was used in the Authors’ Inspection Tests Tool used to count and display files in directories is JDiskReport 197 files but Don’t Panic!

39 RF2 Zipped structure Unzipped structure What is found where?
folder structures file names What is found where? Data Metadata Continuity of access: Per RF1: 2 full releases and each of the ‘incremental releases’ between them.

40 Familiar UK Folder names?
UK Drug Extension

41 Illustration of Refset content & metadata
as an aside - the datatype of the id column in RF2 tables may be either a UUID or a componentId; its not one datatype

42 Policy and plans UKTC Policy, IHTSDO policy Transformabilty
Beyond the UK

43 UKTC policy – released data
UKTC enjoys some latitude in its cut-over from RF1 to RF2 The planned period of concurrent running of RF1 plus RF2 (RF2 as tech preview status currently) will terminate in October 2015 UK Edition in RF2 Status Status: ‘Technical Preview’ UK RF2 baseline release July 2013 Scope – including UK cross-maps

44 UKTC policy – tooling UKTC currently performs a conversion between RF1 and RF2 using tools and configuration data which is not itself distributed. UKTC has no current plans to distribute these tools and configuration data (UKTC continues to author terminology content in tools which are not tied to any particular release format)

45 IHTSDO stated policy IHTSDO
RF2 ‘Developed in response to extensive feedback on’ RF1 RF1 format was replaced by RF2 in January 2012, RF1 ‘is being maintained for a transitional period’ (SNOMED CT®Technical Implementation Guide January 2013) UKTC policy reflects this, but is different

46 Transformability Content is being transformed from RF1 to RF2 by UKTC
Content is being transformed from RF2 to RF1 by IHTSDO These transformations : Require a fraction of prepared, different metadata for each format Tables of equivalence for some metadata such as versioning of membership of Refsets and Subsets For the UK Edition: are subject to a set of documented deviations provided by UKTC within the RF2 release note.

47 Forward Compatibility
Today’s tools, tomorrows data (RF2 distributed) UKTC Distribution in both RF1 and RF2 - UKTC Distribution in RF2 exclusively +/- metadata which is specific to and essential for RF1 +/- tools to allow you to generate RF1 (should you need to)

48 Back Compatibility UKTC, UK Edition Providing Back compatibility
Tomorrow’s tools, today’s data (RF1 distributed) UKTC, UK Edition RF2 metadata present (from RF2 files) RF1 metadata present Providing Back compatibility

49 Multi-region products
Systems developed for use not only within but also beyond the UK Standard prescribes each of Content semantics, syntax, data types Metadata semantics & syntax File naming Local variation will exist for ‘Skeleton’ of folders Extensions Refset packing Distribution tooling, release schedules

50 SNOMED CT Release Formats – stability
(Jan 2013) Stability ‘The RF2 format is likely to be stable for at least a five year period, without addition or deletion of fields’ Stable Extensibility The Refset mechanism permits (without change to the core standard) new Refset types to be used (extensibility) TIG page 138 Jan 2013 Cf CEN ISO five year review periods NOTE THAT the conformant patterns are neutral for the DISTRIBUTION format, but can still have impacts for all implementation Schemas

51 Tools Over time, as RF2 becomes the primary distribution format in the UK, tools will be developed to enhance the ability to process data in this format more easily. This will include Refset development mapping tools a concept editing environment

52 Similarities & differences RF2 and RF1

53 Contrast RF2 supports things which are unavailable from RF1:
Refset extensibility – a constrained set of novel types can be added RF2 supports things which differently available from RF1: Component history Extensive documentation of the value and benefits is made by IHTSDO :

54 Contrast Identification of the origin of a component
RF1 – NamespaceID (embedded into the component ID) RF2 – ModuleID (a newly added metadata item)

55 In RF1 but not in RF2 PartitionIDs not in RF2 03 A Subset
04 A Cross Map Set 05 A Cross Map Target

56 In RF1 but not in RF2 CTV3ID and SNOMEDID Single FULLYSPECIFIEDNAME
(to Refsets) Single FULLYSPECIFIEDNAME ISPRIMITIVE (to Refset) REFINABILITY (field in RELATIONSHIP file, to Refset)

57 Content available only in RF2
Non-human Refset Metadata: Module

58 Implementation Schemas

59 Implementation Schemas (1)

60 Implementation Schema (2)
Does HSCIC recommend that RF2 (or RF1) is used as the implementation schema? No Could RF2 be used as the implementation schema? Perhaps, but its principally for distribution

61 Populating an implementation schema
Combine files of like types Concept (x3) Description (x3) Relationship (x3) Apply parts of the data as distributed in Refsets Historical relationships in addition to Relationship table UK Language preferences given precedence over international UK Preferred terms applied Content Refsets All other relevant data

62 Release format Vs. Implementation Format
Distribution Normalised, no data duplication Extensive Distributed in a normalised format Implementation Partly Denormalised Denormalisation (performance) Re-indexed Filtered Partitioned Denormalisation, filtering, sorting, partitioning and index expected to be required to deliver required technical performance and content acceptability.

63 Release format Vs. Implementation Format
Distribution Inclusion of all data Implementation Removal of unnecessary data (for the given application) Most solutions are likely to be record-entry centric, hence mostly it will be the active components which are actually relevant

64 Populating your implementation schema
Release Format 2 Snapshot Release Format 1 Release Format 2 Full

65 ‘operating on’ SNOMED CT reference data
UK Edition

66 ‘operating on’ SNOMED CT reference data
UK Edition

67 RF2 operations on reference data RF1 operations on reference data
Current status data table style (< 99% of implementations) Rip out & replace existing reference data Log style reference data database? Almost no-one: Merely append new reference data data De-normalisation Combination Substitute: Own file and folder skeleton De-normalisation Combination Substitute: Own file and folder skeleton Data-reconciliation Core Tables … Update descriptions table with UK description preferences (unpack these from Refsets) ? Substitute: Own scheme for metadata e.g. Refsets > local value sets Own scheme for component status Own scheme for component history ? Add back: Own interface terms Detect and accommodate any new Refset types found Data-reconciliation Core Tables: Rip and replace Update descriptions table with UK description preferences (unpack these from Subsets) ? Substitute: Own scheme for metadata e.g. Subsets > local value sets Own scheme for component status Own scheme for component history ? Add back: Own interface terms

68 No import tooling? You may wish to just ‘get at’ a Refset out
of the raw data How to? Tools you will rely on Initially : File & Refset manifest –or- Lookup tables between Refset names and their identifiers – cut & paste (or search the Descriptions Table) Search in the Reference Set Descriptor Reference Set to identify the file pattern (or alternatively by seeking the Refset supertype in the Metadata hierarchy) Search within the files of the given pattern (if data for one Refset has been partitioned across multiple files: recombine it) Filter the results for only the active content Looks reasonable. Option (1) find it where the manifest tells you it is, option (2) as above [find the reverse engineer the pattern from the descriptor (via the 'descriptor template' - though this only seems to sit in the documentation) and option (3) bit like 2, but find the Refset's type/pattern by discovering its immediate supertype [tho' known failing cases]

69 RF2 :: RF1

70 RF2 :: RF1 Content : Files : Folders
So you need to know what content is in which file In (the familiar) RF1 locations In RF2 locations 1 2

71 RF2 files The following files are included in an RF2 release:
Concept file Relationship file Description file Reference set files Identifier file (these may exist, but be empty files)

72 RF2 files Reference set files
Primary grouping of Refsets is driven by their data format (i.e. not their common field of use) Second axis of grouping can be by utility / area of application Field of use clustering of files and data can lead to the same Refset distributed more than once in a given release

73 RF2 Identifier file alternateIdentifier
A field in the RF2 Identifier file containing the representation of an Identifier in another code system with is irrevocably linked to a SNOMED CT identifier. IdentifierSchemeId A field in the RF2 Identifier file containing a SNOMED CT identifier which identifies the alternate code system. NOT THE SAME AS A SEMANTIC MAPPING RELATIONSHIP. ONLY USED FOR TRUE LOWER LEVEL TECHNILCA LOGICAL IDENTIFIER EQUIVALENCE The UK Extension as released (Technical Preview) contained no file of this type, rather than distributing an empty Identifier file

74 Difference in files included
RF1 Possible file types are: • Concepts • Descriptions • Relationships • ComponentHistory; • References; • Subsets • SubsetMembers; • CrossMapSets; • CrossMaps; • CrossMapTargets; • TextDefinitions; • Canonical; • DualKeyIndex; • WordKeyIndex; • StatedRelationships. RF2 Possible types are: • Concept • Description • Relationship • Identifier; • Refset (all subtypes) An RF1 release contains no less than 11 files An RF2 release contains no less than 14 files: No upper limit RF2 Refset file types Fixed patterns reference set descriptor module dependency description format Extendable patterns (addition of fields) attribute value type simple map language type query specification type annotation type association type Extension example: CTV3 map: | Simple map | (S) Any number and combinations of (C) (I) (S) additional fields e.g. | Complex map type | (IISSSC)

75 Differences for a recipient of RF2 (Vs. RF1) (2)
Choosing a storage structure for Refsets is different to the challenge for RF1 Subsets Extensibility of Refsets in RF2 dictates that each of the finite number of Refset patterns must each be accommodated into part of the storage schema. These different Refset patterns may each be held in a different data table structured for the purpose of that particular Refset pattern. The extensibility of RF2 however allows the addition of new Refset patterns, these conform to the standard and are not tied to a revision of the standard. Consequently

76 What sets can be together in one distribution file?
RF1 Same Subset Type RF2 Same Refset pattern

77 Distribution folder structure for sets

78 Distribution of sets within Refset distribution files
RF1 (Sub)sets UKTC convention: One file per subset RF2 (Ref)sets UKTC convention: One file per collection of Refsets (perhaps by refset pattern)

79 Files UTF-8 encoded tab delimited text files
contain a column header row, providing field names for each column within the file Lower camel case is used for the field names (e.g. moduleId, effectiveTime) use DOS style line termination Each line is terminated with a carriage return character followed by a line feed character Should have a last line that ends with a line terminator (CR/LF) before the end of file

80 Both Release Formats represent:
The core components of SNOMED CT: Concepts Descriptions Relationships Additional derivatives that provide standard representations of : Value-sets consisting of a specified set of concepts or relationships Cross mapping tables to other codes and classifications. From TIG Jan 2013

81 Both Release Formats are provided in:
Tab-delimited text files Represent character content in accordance with the Unicode UTF-8 specification Use SNOMED CT Identifiers as the permanent Identifier of released core components Support extensions to the International Release using namespaces allocated to licensees to denote the provenance of added components and to ensure Identifier uniqueness

82 RF2 - History of each component
In RF2 all changes in components are represented by adding a row (same component ID) with: a new effective time any necessary change in the component values. For changes which get into the ‘release’ data Not one row for each and every change by SNOMED CT authors made in between releases Paraphrasing TIG Jan 2013

83 Representation of historical relationships
in RF1, the concept was moved into an special "Inactive concept" hierarchy, this is not done anymore in RF2 In RF1 (only) the following relationships are used for concepts in the “inactive concept” hierarchy | MAY BE A | | MOVED FROM | | MOVED TO | | REPLACED BY | | SAME AS | | WAS A | The equivalent in RF2 is achieved by the concept and its Relationships being turned inactive. the concept is inactivated "in place", with its last location described in the history of its inactive Relationships here it is not easy to distinguish between the ideas of Inactive relationship and in RF1, those relationships of ‘CHARACTERISTIC’ = 2 i.e. ‘Historic’

84 Reconstructing a table of all past and current relationships
RF2 Historical (non-current) relationships are not retained in the Relationships file but are distributed as a set of Refsets Any Implementation Schema which needs to collate all past & present relationships can be created by unpacking all the Refsets which contain historic relationships, this can span the International and the Local Extensions (totalling 3) cross product with the number of different relationship types (say 8?) so 24 Refsets to be combined and appended to a table of active relationships

85 Refsets and Active field values
Refsets as distributed in RF2 contain components which are both active and inactive, according to the value in the ‘Active’ field. For a full release it is possible, using the applicable date range for each row, to identify the members of a Refset at any past time.

86 Release Types "Full" release
each file containing every version of every component ever released. "Snapshot" release containing only the most recent version of every component ever released (both active and inactive components). A single snapshot provides access to a single release version and this ‘closely matches’ the view provided by the original SNOMED CT release format (RF1) "Delta" release containing only component versions created since the last release. Each component version represents a new component or a change in an existing component. From TIG Jan 2013 The statement about single snapshot is on page 236

87 Combinations of release types
First Ever Full Release (‘Baseline’) + Every subsequent Delta = Current Full Release Snapshot + Deltas = incomplete Full Delta alone is valueless If your system have transaction tracking for the reference terminology itself, you may prefer to append Deltas than to Rip & Replace the Full release at each release If you rely on Snapshot releases, then you may need to Rip & Replace the entire snapshot at each release (being aware that you may lose past versions of Refsets which may still be current) Application of an incomplete set of Deltas can be misleading For a full set of all foreseeable options see the SNOMED CT Technical Implementation Guide (at Jan2013 this is covered in section Importing and maintaining a Full view and onwards NB: Snapshot + Delta gives an incomplete history

88 Full release data for Refsets
History History History Snapshot This UID is unique to each unique pairing of … ’active’ means that the row is active in this Refset It’s not a surrogate or repeat of the concept’s own active status But … Its not permissible to distribute as an active Refset member if the component itself is not active at that time

89 Refset distribution files
Any RF2 file containing Refsets can only contain one type of Refset e.g. a file which holds exclusively ‘ssRefset’ having two additional columns, both holding String values The name indicates the attributes held in the file from any number of Component String Integer

90 Preferred Terms RF2 does not have a Description type value “ Preferred Term”, only types of “ Fully specified name ” and “Synonym”, where the latter may be refined either to a “Preferred term ” or to a “Synonym” within a language reference set. As a result of this change, in RF2 the preference for particular Descriptions in a language or dialect will be represented in the language reference set, and not in the descriptions table.

91 Preferred Terms in RF2 (The RF1 release files contain within the core tables identification of just one Preferred Term and one Fully Specified Name per concept) The international Edition in RF2 does not identify one Preferred Term per concept To identify a Preferred Term from RF2 data it is essential to combine information from a Language Refset along with data in the core tables.

92 Preferred Terms in RF2 UK Edition in RF2 identifies the UK Preferred Terms via: Descriptions.Description.type=Synonym RefSet.Acceptability=Preferred (RefsetID ) (Refset file name = xder2_cRefset_NHSRealmDescriptionLanguageFull_GB _yyyymmdd.txt) (Path….\SNOMEDRF2\1.0.0\NHS_SNOMEDRF2\SnomedCT_GB _ \RF2Release\Full\Refset\Content\NHSRealmDescription) There are no restrictions against the identification of alternative preferred terms in Refset(s) and using these as an alternative to the UKTC provided one. NB existing UK documentation states Although supporting a number of description re-prioritisations (Realm-specific promotions of descriptions to „preferred term‟ description-type) the present NHS Realm Description Subset is best thought of as a mechanism to satisfy the „one and only one fully-specified name & preferred term‟ schema constraints for the UK data

93 Correspondence : enumerated values

94 What needs further exploration?
Technical subjects New things, redundant things Policy and tools Major differences ‘Concept Enumeration’ Change log, history mechanism RF2 release types Content : Files : Folder structures Implementation Schemas Preferred terms, preferred FSNs

95 How did we do? Speak to us Routes by which you might wish to engage:
Person to person; orientation (via: ) NHS Networks / SNOMED CT (useful even if download speeds are slow) UKTC Implementation Forum (open to all, join via: ) Helpdesk

96 Q&A Q) Has RF2 any impact on dm+d? No, dm+d is unaffected
- no further questions were received during the Webinar


Download ppt "SNOMED CT - in Release Format 2 ‘RF2’"

Similar presentations


Ads by Google