Presentation is loading. Please wait.

Presentation is loading. Please wait.

V3NLP Framework dbAnnotation Database Schema (created 12/2011) (revised) 10/09/2012 (revised) 10/18/2012) (revised 10/23/2012) (revised 10/25/2012)

Similar presentations


Presentation on theme: "V3NLP Framework dbAnnotation Database Schema (created 12/2011) (revised) 10/09/2012 (revised) 10/18/2012) (revised 10/23/2012) (revised 10/25/2012)"— Presentation transcript:

1 v3NLP Framework dbAnnotation Database Schema (created 12/2011) (revised) 10/09/2012 (revised) 10/18/2012) (revised 10/23/2012) (revised 10/25/2012)

2 Tables (original) document document_idBIGINT referenceSystemVARCHAR(120) referenceLocatorVARCHAR(120) documentAnnotations documentAnnotation_idBIGINT document_idBIGINT annotation_idBIGINT annotation annotation_idBIGINT entityDefinition_idBIGINT entityDefinition entityDefinition_idBIGINT NameVARCHAR(120) provenanceVARCHAR(120) span span_idBIGINT documentAnnotation_idBIGINT filterVARCHAR(50) startOffsetINTEGER endOffsetINTEGER feature feature_idBIGINT annotation_idBIGINT entityDefinition_idBIGINT featureElementText featureElement_idBIGINT feature_idBIGINT valueVARCHAR(6300) 1 1 n 1 1 1 1 n n 1 1 n

3 Annotation Notes There is a one-to-one relationship between rows in the documentAnnotations table and the Annotations table. It is recognized that these tables should be folded into one table. There is an explanation why they are not. resources, including annotation admin and chart reader use a schema similar to this. dbAnnotation’s schema is meant to be isomorphic with the schemas for annotation admin and chart reader. These two tables mirror external tables set up for other tools within VINCI’s data. Chart reader and annotation admin allow for an annotation that spans across documents. Under such circumstances, there would be an annotation_id that would have a different documentAnnotation id. The dbAnnotation schema does not handle this pathologic circumstance, resulting in the one-to one relationship rather than the n to 1 relationship in the other schemas. documentAnnotations documentAnnotation_idBIGINT document_idBIGINT annotation_idBIGINT Annotation [see notes] annotation_idBIGINT entityDefinition_idBIGINT 1 1

4 Additional Tables (revised) Corpus [see notes] corpus_idBIGINT document_idBIGINT run_idVARCHAR(20) documentNameVARCHAR(120) documentTitleVARCHAR(120) patient_idVARCHAR(20) tiu_idVARCHAR(20) 1 annotationConceptIndex [see notes] corpus_idBIGINT document_idBIGINT run_idVARCHAR(20) tiu_idVARCHAR(20) patient_idVARCHAR(20) documentTitleVARCHAR(120) annotation_idBIGINT startOffsetINTEGER endOffsetINTEGER annotation_nameVARCHAR(60) contentVARCHAR(2100) negationStatusVARCHAR(20) sectionNameVARCHAR(40) conceptNamesVARCHAR(160) cuisVARCAR(12) semanticTypesVARCHAR(20) semanticGroupsVARCHAR(20) featureNamesVARCHAR(2100) featureValuesVARCHAR(2100)

5 Corpus Notes This table is needed to track the same document through the same software multiple times, as when the software gets revised. Document name is equivalent to reference locator in the document table, but only filled out with a full path to location of the document. (Reference locator might be filled out with the query that created the record) tiu_id is the record id from the table (TIU_NOTES) whence it came. This might be different than the document name. patient_id. Patient id is the link to groups of documents. Patient id is not propagated to the normalized table to keep a firewall between potentially de-identified records and patient sensitive data. Slot for documentTitle if known. Corpus [see notes] corpus_idBIGINT document_idBIGINT run_idVARCHAR(20) documentNameVARCHAR(120) documentTitleVARCHAR(120) patient_idVARCHAR(20) tiu_idVARCHAR(20)

6 annotationConcept Index Notes This table is a flattened view of the corpus for information retrieval purposes One row per annotation and one table for query purposes Is just one of a number of indexes/views that could be made from the normalized tables. Includes patient and tui ids One to one relationship between corpus, document and run id The (normalized) text between offsets is represented in this table within the content field. Annotation names will contain labels that are kinds of concepts – for example Symptom. Includes slots for documentTitle, sectionName Concept attributes represented as explicit fields including conceptNames, cuis, semanticTypes, and semanticGroups Concept attributes are pipe delimited fields Feature names is a pipe delimited string with each field being a feature name as a catch all for other attributes Feature values is a pipe delimited string with each field being a feature value as a catch all for other attributes One to one correspondence between feature name and value fields. annotationConceptIndex [see notes] corpus_idBIGINT document_idBIGINT run_idVARCHAR(20) tiu_idVARCHAR(20) patient_idVARCHAR(20) documentTitleVARCHAR(120) annotation_idBIGINT startOffsetINTEGER endOffsetINTEGER annotation_nameVARCHAR(60) contentVARCHAR(2100) negationStatusVARCHAR(20) sectionNameVARCHAR(40) conceptNamesVARCHAR(160) cuisVARCAR(160) semanticTypesVARCHAR(160) semanticGroupsVARCHAR(160) featureNamesVARCHAR(2100) featureValuesVARCHAR(2100)

7 View to be created from dbAnnotation to annotations-dbd The annotation-dbd schema is an agreed upon schema for interoperability between several systems at the Salt Lake City VA including annotationAdmin, and ChartReader When the need arises, a database view can be created to make dbAnnoation look like the annotations-dbd tables to preserve interoperability between systems.

8 Tables (revised) document document_idBIGINT referenceSystemVARCHAR(120) referenceLocatorVARCHAR(120) documentAnnotations documentAnnotation_idBIGINT document_idBIGINT annotation_idBIGINT Annotation [see notes] annotation_idBIGINT entityDefinition_idBIGINT entityDefinition entityDefinition_idBIGINT NameVARCHAR(120) provenanceVARCHAR(120) span span_idBIGINT documentAnnotation_idBIGINT filterVARCHAR(50) startOffsetINTEGER endOffsetINTEGER feature feature_idBIGINT annotation_idBIGINT entityDefinition_idBIGINT featureElementText featureElement_idBIGINT feature_idBIGINT valueVARCHAR(6300) 1 1 1 1 1 n n 1 1 n Corpus [see notes] corpus_idBIGINT document_idBIGINT run_idVARCHAR(120) documentNameVARCHAR(120) documentTitleVARCHAR(120) patient_idVARCHAR(20) tiu_idVARCHAR(20) 1 annotationConceptIndex [see notes] corpus_idBIGINT document_idBIGINT run_idVARCHAR(20) tiu_idVARCHAR(20) patient_idVARCHAR(20) documentTitleVARCHAR(120) annotation_idBIGINT startOffsetINTEGER endOffsetINTEGER annotation_nameVARCHAR(60) contentVARCHAR(2100) negationStatusVARCHAR(20) sectionNameVARCHAR(40) conceptNamesVARCHAR(160) cuisVARCAR(160) semanticTypesVARCHAR(160) semanticGroupsVARCHAR(160) featureNamesVARCHAR(2100) featureValuesVARCHAR(2100) 1

9 annotations-dbd Schema

10 Compatibility with the annotations-dbd schema Annotations-dbd Table NamedbAnnotations Table NameCompatibility Notes analyte_ reference field: id document field: document_id field: run_id Both have reference_system, and reference_locator fields. v3NLP tools do not fill these fields out. The annotations_dbd schema does not have a run_id. Annotation_analyte_reference field: analyte_reference_id documentAnnotations field: documentAnnotation_id Both have the field filter. v3NLP tools do not fill this field out. span field: id span field: span_id Offsets in the annotation-dbd are long, but int’s in the dbAnnotations schema. annotation field: id field: resource_id annotation field: annotation_id field: entityDefinition_id reference field: id field: uri entityDefinition field: entityDefinition_id field: provenance feature field: id field: resource_id feature field: feature_id field: entityDefinition_id 1.Annotations-dbd contains a parent id field not replicated in dbAnnotations schema. 2.Annotations-dbd features table can reference other features. V3NLP tools have not implemented this relationship. feature_element_text field: id field: text_value featureElementText field: featureElement_id field: value The feature_id,resource_id pair is redundant and not replicated in the dbAnnotations. feature_element_numeric[TBD] feature_element_blob[TBD]


Download ppt "V3NLP Framework dbAnnotation Database Schema (created 12/2011) (revised) 10/09/2012 (revised) 10/18/2012) (revised 10/23/2012) (revised 10/25/2012)"

Similar presentations


Ads by Google