Presentation is loading. Please wait.

Presentation is loading. Please wait.

Documenting Data Quality Ted Habermann, NOAA National Geophysical Data Center.

Similar presentations


Presentation on theme: "Documenting Data Quality Ted Habermann, NOAA National Geophysical Data Center."— Presentation transcript:

1 Documenting Data Quality Ted Habermann, NOAA National Geophysical Data Center

2 Data Quality - Documents

3 Data Quality - Granules

4 > DQ_Scope + level : MD_ScopeCode + extent [0..1] : EX_Extent + levelDescription [0..*] : MI_Metadata Data Quality - Standards > DQ_Element DQ_ConformanceResult + specification : CI_Citation + explanation : CharacterString + pass : Boolean > MD_EvaluationMethodTypeCode + directInternal + directExternal + indirect DQ_DataQuality + scope : DQ_Scope + standAloneReport 0..1 DQ_QuantitativeResult + valueType [0..1] : RecordType + valueUnit : UnitOfMeasure + errorStatistic [0..1] : CharacterString + value [1..*] : Record > MD_ScopeCode + attribute + feature + attributeType + featureType + collectionHardware + propertyType + collectionSession + fieldSession + dataset + software + series + service + nonGeographicDataset + model + dimensionGroup + tile DQ_CoverageResult + report 0..* DQ_StandaloneReportInformation + reportReference : CI_Citation + abstract: CharacterString DQ_MeasureReferenceDQ_EvaluationDQ_Result + resultScope: DQ_Scope [0..1] DQ_DescriptiveResult > MD_ScopeDescription + attributes : Set + features : Set + featureInstances : Set + attributeInstances : Set + dataset : CharacterString + other : CharacterString LI_Lineage

5 GOES-R Data Quality - Documents Level 2+ Volcanic Ash: Detection and Height L2+ Volcanic Ash Science Description L2+ Volcanic Ash Algorithm Description L2+ Volcanic Ash Source Information L2+ Volcanic Ash Applicable ATBDs L2+ Volcanic Ash Quality Algorithms L2+ Volcanic Ash Source Data Inputs L2+ Volcanic Ash Production Notes L2+ Volcanic Ash Data Fields (TBR-16) L2+ Volcanic Ash Metadata Description and Definition L2+ Volcanic Ash Expected Periodicity

6 Documents = Standards Science Description - MD_DataIdentification/abstract Algorithm Description - LE_Algorithm/description Source Information - MD_DistributionInformation Applicable ATDBs - LE_Algorithm/citation Quality Algorithms - DQ_DataQuality/DQ_MeasureReference Source Data Inputs - LI_Lineage/source Production Notes - processStep/description Data Fields- MD_ContentInfo Metadata Description and Definition - seems redundant Expected Periodicity - resourceMaintenance

7 Documentation Objects = Standards NESDIS Documentation Object Mapping Metadata Document System Description Document System Maintenance Manual Interface Control Document Algorithm Theoretical Basis Document

8 Multiple Dialects of the Same Content Documents CI_Citation XSLT Translation XML Reference Granules/Catalogs Standards

9 > DQ_Scope + level : MD_ScopeCode + extent [0..1] : EX_Extent + levelDescription [0..*] : MI_Metadata DQ_DataQuality - 19157 > DQ_Element DQ_ConformanceResult + specification : CI_Citation + explanation : CharacterString + pass : Boolean > MD_EvaluationMethodTypeCode + directInternal + directExternal + indirect DQ_DataQuality + scope : DQ_Scope + standAloneReport 0..1 DQ_QuantitativeResult + valueType [0..1] : RecordType + valueUnit : UnitOfMeasure + errorStatistic [0..1] : CharacterString + value [1..*] : Record > MD_ScopeCode + attribute + feature + attributeType + featureType + collectionHardware + propertyType + collectionSession + fieldSession + dataset + software + series + service + nonGeographicDataset + model + dimensionGroup + tile DQ_CoverageResult + report 0..* DQ_StandaloneReportInformation + reportReference : CI_Citation + abstract: CharacterString DQ_MeasureReferenceDQ_EvaluationDQ_Result + resultScope: DQ_Scope [0..1] DQ_DescriptiveResult > MD_ScopeDescription + attributes : Set + features : Set + featureInstances : Set + attributeInstances : Set + dataset : CharacterString + other : CharacterString 2 3 2 2 2 4 5 5 5 5 4 LI_Lineage 1

10 ISO Lineage Model Source Step Product Processing and Algorithm Descriptions

11 LI_Lineage

12 Attributes: role [how many] : object type how many = [minimum..maximum] minimum = 0: optional minimum = 1: required * = any number how many = blank: required, one how many = [1..*] : required, any number how many = [1..2] : required, one or two how many = [0..1] : optional, zero or one how many = [0..*] : optional, any number Type: package abbreviation_type UML package abbreviation = XML namespace = Document section Role: what this object does for me contact: CI_ResponsibleParty description: CharacterString Operations: generally not used in ISO UML UML.1

13 UML.2 LI_Lineage = the LI_Lineage class is in the Lineage (LI) Package statement [0..1] : CharacterString = LI_Lineage can have up to one statement which is a CharacterString extent [0..1] : EX_Extent = LI_Lineage can have any number of sources which are LI_Sources processStep [0..*] : LI_Lineage can have any number of processSteps which are LE_ProcessSteps

14 Volcanic Ash Detection Sources

15 Volcanic Ash Detection Processing

16 ISO Lineage DQ_Lineage (19115-2) MI_Metadata + lineage 0..1 LI_Lineage + statement [0..1] : CharacterString LE_Source + description [0..1] : CharacterString + scaleDenominator [0..1] : MD_RepresentativeFraction + sourceReferenceSystem [0..1] : MD_ReferenceSystem + sourceCitation [0..1] : CI_Citation + sourceExtent [0..*] : EX_Extent + processedLevel[0..1] : MD_Identifier + resolution[0..1] : LE_NominalResolution + sourcemetadata [0..*] : MD_Reference LE_ProcessStep + description : CharacterString + rationale [0..1] : CharacterString + dateTime [0..1] : DateTime + processor [0..*] : CI_ResponsibleParty + extent [0..*] : EX_Extent + reference [0.*] : CI_Citation + source 0..* + processStep 0..* + output, source 0..* If(count(source) + count(processStep) =0) and (DQ_DataQuality.cope.level = 'dataset' or 'series') then statement is mandatory LE_Processing + identifier : MD_Identifier + softwareReference[0..*] : CI_Citation + procedureDescription[0..1] : CharacterString + documentation[0..*] : CI_Citation + runTimeParameters[0..1] : CharacterString LE_Algorithm + citation: CI_Citation + description : CharacterString + processingInformation 0..* + algorithm 0..* LE_ProcessStepReport + name : CharacterString + description[0..1] : CharacterString + fileType[0..1] : CharacterString + report 0..* + sourceStep 0..*

17 Granule Lineage - 1 Brief Text Brief Text (PUG) Citation Citation Product Anciliary Data Auxiliary Data Lookup Table Product

18 Granule Lineage - 2 Source:clear_sky_masks Could boil down to

19 Volcanic Ash Detection Lineage in the Granule Option 1: one identifier: Option 2: lineage group with filenames as unique identifiers: Option 3: lineage group with uniqueIdentifiers: includes processingInformation / algorithm / output

20 Database and XML Keys Citation ID Title Date Friend_ID Location_ID Citation ID Title Date Friend_ID Location_ID Person ID Name EMail Person ID Name EMail OnlineResource ID Name URL OnlineResource ID Name URL XML …

21 XML Attributes: Objects and References ISO XML consists of tags, elements (with or without content), and attributes. An attribute is a name/value pair that exists within a start-tag or empty-element tag. Attributes provide additional information about an element which is not part of the data. Attribute values must contain either single or double quotes. This example shows a step element with one attribute, number with a value of “3”: Connect A to B. Many of the XML attributes used in the ISO Standards fall into two groups: identifiers and references: Identifiers: id and uuid References: uuidref and xlink:href Objects that start with upper case letters have identifiers (id and uuid) Roles that start with lower case letters have references (uuidref and xlink:href) object: CI_ResponsibleParty id="JaneDoe" object: CI_ResponsibleParty id="JohnDoe" role: friend xlink:href=#JohnDoerole: friend xlink:href=#JaneDoe

22 ISO Lineage Model - 2 Source A step ps1 Source B step ps1 Source C step ps2 Source D step ps2 Source E step ps3 Step ps1 source A source B output C Step ps2 source C source D output E Step ps3 source E Product Processing and Algorithm Descriptions

23 ISO Lineage DQ_Lineage (19115-2) MI_Metadata + lineage 0..1 LI_Lineage + statement [0..1] : CharacterString LE_Source + description [0..1] : CharacterString + scaleDenominator [0..1] : MD_RepresentativeFraction + sourceReferenceSystem [0..1] : MD_ReferenceSystem + sourceCitation [0..1] : CI_Citation + sourceExtent [0..*] : EX_Extent + processedLevel[0..1] : MD_Identifier + resolution[0..1] : LE_NominalResolution + sourcemetadata [0..*] : MD_Reference LE_ProcessStep + description : CharacterString + rationale [0..1] : CharacterString + dateTime [0..1] : DateTime + processor [0..*] : CI_ResponsibleParty + extent [0..*] : EX_Extent + reference [0.*] : CI_Citation + source 0..* + processStep 0..* + output, source 0..* If(count(source) + count(processStep) =0) and (DQ_DataQuality.cope.level = 'dataset' or 'series') then statement is mandatory LE_Processing + identifier : MD_Identifier + softwareReference[0..*] : CI_Citation + procedureDescription[0..1] : CharacterString + documentation[0..*] : CI_Citation + runTimeParameters[0..1] : CharacterString LE_Algorithm + citation: CI_Citation + description : CharacterString + processingInformation 0..* + algorithm 0..* LE_ProcessStepReport + name : CharacterString + description[0..1] : CharacterString + fileType[0..1] : CharacterString + report 0..* + sourceStep 0..* References

24 XML Attributes: Objects and References

25 DQ_Scope

26 Attributes: role [how many] : object type how many = [minimum..maximum] minimum = 0: optional minimum = 1: required * = any number how many = blank: required, one how many = [1..*] : required, any number how many = [1..2] : required, one or two how many = [0..1] : optional, zero or one how many = [0..*] : optional, any number Type: package abbreviation_type UML package abbreviation = XML namespace = Document section Role: what this object does for me contact: CI_ResponsibleParty description: CharacterString Operations: generally not used in ISO UML UML.1

27 > DQ_Scope + level : MD_ScopeCode + extent [0..1] : EX_Extent + levelDescription [0..*] : MD_ScopeDescription UML.2 > DQ_Scope = the DQ_Scope is a DataType in the Data Quality (DQ) Package level : MD_ScopeCode = a DQ_Scope must have one level which is a MD_ScopeCode extent [0..1] : EX_Extent = a DQ_Scope can have up to 1 extent which is an EX_Extent levelDescription [0..*] : MD_ScopeDescription = a DQ_Scope can have up to 1 levelDescription which is a MD_ScopeDescription

28 DQ_Scope > DQ_Scope + level : MD_ScopeCode + extent [0..1] : EX_Extent + levelDescription [0..*] : MD_ScopeDescription > MD_ScopeCode + attribute + feature + attributeType + featureType + collectionHardware + propertyType + collectionSession + fieldSession + dataset + software + series + service + nonGeographicDataset + model + dimensionGroup + tile > MD_ScopeDescription + attributes : Set + features : Set + featureInstances : Set + attributeInstances : Set + dataset : CharacterString + other : CharacterString

29 Abstract Dessert

30 > EX_Extent +description [0..1]: Character String EX_BoundingPolygon +polygon [0..1]: GM_Object EX_GeographicBoundingBox +westBoundingLongitude: Decimal +eastBoundingLongitude: Decimal +southBoundingLatiitude: Decimal +northBoundingLatiitude: Decimal EX_GeographicDescription +geographicIdentifier: MD_Identifier EX_VerticalExtent +minimumValue: Real +maximumValue: Real EX_TemporalExtent +extent: TM_Primitive EX_SpatialTemporalExtent > EX_GeographicExtent +extentTypeCode [0..1]: Boolean="1" count(description + geographicElement + temporalElement + verticalElement) > 0 EX_Extent

31 DQ_StandAloneReport

32 MI_Metadata StandAloneReport DQ_DataQuality + scope : DQ_Scope + standAloneReport 0..1 DQ_StandaloneReportInformation + reportReference : CI_Citation + abstract: CharacterString Global or Variable Attribute:

33 Level 2 Metadata (NcML) datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata datadata getNcML NcML Schema Type Definition

34 netCDF Variable Type Attributes for variable = memberName in granule = granuleIdentifier Level 2 Metadata (ISO) RecordType Record

35 > DQ_Scope + level : MD_ScopeCode + extent [0..1] : EX_Extent + levelDescription [0..*] : MD_Metadata DQ_DataQuality - 19115 LI_Lineage > DQ_Element + nameOfMeasure [0..*] : CharacterString + measureIdentification [0..1] : MD_Identifier + measureDescription [0..1] : CharacterString + evaluationMethodType [0..1] : DQ_EvaluationMethodTypeCode + evaluationMethodDescription [0..1] : CharacterString + evaluationProcedure [0..1] : CI_Citation + dateTime [0..*] : DateTime + result [1..2] : DQ_Result DQ_ConformanceResult + specification : CI_Citation + explanation : CharacterString + pass : Boolean > MD_EvaluationMethodTypeCode + directInternal + directExternal + indirect DQ_DataQuality + scope : DQ_Scope + lineage 0..1 DQ_QuantitativeResult + valueType [0..1] : RecordType + valueUnit : UnitOfMeasure + errorStatistic [0..1] : CharacterString + value [1..*] : Record "report" or "linage" role is mandatory if scope.DQ_Scope.level = 'dataset' "levelDescription" is mandatory if "level" notEqual 'dataset' or 'series' > DQ_Result > MD_ScopeCode + attribute + feature + attributeType + featureType + collectionHardware + propertyType + collectionSession + fieldSession + dataset + software + series + service + nonGeographicDataset + model + dimensionGroup + tile DQ_CoverageResult + report 0..*

36 > DQ_Scope + level : MD_ScopeCode + extent [0..1] : EX_Extent + levelDescription [0..*] : MD_Metadata DQ_DataQuality > DQ_Element + nameOfMeasure [0..*] : CharacterString + measureIdentification [0..1] : MD_Identifier + measureDescription [0..1] : CharacterString + evaluationMethodType [0..1] : DQ_EvaluationMethodTypeCode + evaluationMethodDescription [0..1] : CharacterString + evaluationProcedure [0..1] : CI_Citation + dateTime [0..*] : DateTime + result [1..2] : DQ_Result DQ_ConformanceResult + specification : CI_Citation + explanation : CharacterString + pass : Boolean > MD_EvaluationMethodTypeCode + directInternal + directExternal + indirect DQ_DataQuality + scope : DQ_Scope + standAloneReport 0..1 DQ_QuantitativeResult + valueType [0..1] : RecordType + valueUnit : UnitOfMeasure + errorStatistic [0..1] : CharacterString + value [1..*] : Record "report" or "linage" role is mandatory if scope.DQ_Scope.level = 'dataset' "levelDescription" is mandatory if "level" notEqual 'dataset' or 'series' > DQ_Result > MD_ScopeCode + attribute + feature + attributeType + featureType + collectionHardware + propertyType + collectionSession + fieldSession + dataset + software + series + service + nonGeographicDataset + model + dimensionGroup + tile DQ_CoverageResult + report 0..* DQ_StandaloneReportInformation + reportReference : CI_Citation + abstract: CharacterString

37 > DQ_Scope + level : MD_ScopeCode + extent [0..1] : EX_Extent + levelDescription [0..*] : MI_Metadata DQ_DataQuality - 19157 > DQ_Element DQ_ConformanceResult + specification : CI_Citation + explanation : CharacterString + pass : Boolean > MD_EvaluationMethodTypeCode + directInternal + directExternal + indirect DQ_DataQuality + scope : DQ_Scope + standAloneReport 0..1 DQ_QuantitativeResult + valueType [0..1] : RecordType + valueUnit : UnitOfMeasure + errorStatistic [0..1] : CharacterString + value [1..*] : Record > MD_ScopeCode + attribute + feature + attributeType + featureType + collectionHardware + propertyType + collectionSession + fieldSession + dataset + software + series + service + nonGeographicDataset + model + dimensionGroup + tile DQ_CoverageResult + report 0..* DQ_StandaloneReportInformation + reportReference : CI_Citation + abstract: CharacterString DQ_MeasureReferenceDQ_EvaluationDQ_Result + resultScope: DQ_Scope [0..1] DQ_DescriptiveResult

38 DQ_Element > DQ_Element + nameOfMeasure [0..*] : CharacterString + measureIdentification [0..1] : MD_Identifier + measureDescription [0..1] : CharacterString + evaluationMethodType [0..1] : DQ_EvaluationMethodTypeCode + evaluationMethodDescription [0..1] : CharacterString + evaluationProcedure [0..1] : CI_Citation + dateTime [0..*] : DateTime + result [1..2] : DQ_Result

39 DQ_MeasureReference + measureIdentification: MD_Identifier [0..1] + nameOfMeasure: CharacterString [0..*] + measureDescription: CharacterString [0..1]

40 MD_Identifier

41 Measure Registry / Database

42 Data Quality - Granules

43 DQ_Element > DQ_Element + nameOfMeasure [0..*] : CharacterString + measureIdentification [0..1] : MD_Identifier + measureDescription [0..1] : CharacterString + evaluationMethodType [0..1] : DQ_EvaluationMethodTypeCode + evaluationMethodDescription [0..1] : CharacterString + evaluationProcedure [0..1] : CI_Citation + dateTime [0..*] : DateTime + result [1..2] : DQ_Result > DQ_Element + measure [0..*] : DQ_MeasureReference + evaluation [0..1] : DQ_Evaluation + result [1..2] : DQ_Result DQ_MeasureReference + measureIdentification: MD_Identifier [0..1] + nameOfMeasure: CharacterString [0..*] + measureDescription: CharacterString [0..1]

44 DQ_Result DQ_ConformanceResult + specification : CI_Citation + explanation : CharacterString + pass : Boolean DQ_QuantitativeResult + valueType [0..1] : RecordType + valueUnit : UnitOfMeasure + errorStatistic [0..1] : CharacterString + value [1..*] : Record DQ_DescriptiveResult + statement: CharacterString QE_CoverageResult + resultFile : MX_DataFile + resultFormat: MD_Format + resultContentDescription: MD_CoverageDescription + resultSpatialRepresentation: MD_SpatialRepresentation + spatialRepresentationType : MD_SpatialRepresentationTypeCode DQ_Result + resultScope: DQ_Scope [0..1]

45 DQ_QuantitativeResult + valueType [0..1] : RecordType + valueUnit : UnitOfMeasure + errorStatistic [0..1] : CharacterString + value [1..*] : Record DQ_DescriptiveResult + statement: CharacterString QE_CoverageResult + spatialRepresentationType : MD_SpatialRepresentationTypeCode

46 MD_Band + peakResponse [0..1] : Real + bitsPerValue [0..1] : Integer + toneGradation [0..1] : Integer MI_CoverageDescription Revisions MD_Metadata +contentInfo 0..* > MD_CoverageContentTypeCode + image + thematicClassification + physicalMeasurement + referenceInformation + qualityInformation + auxilliaryData + modelResult MD_CoverageDescription + attributeDescription : RecordType + contentType [1.*]: MD_CoverageContentTypeCode + processingLevelCode [0..1]: MD_Identifier +dimension 0..* MI_RangeElementDescription + name : CharacterString + definition : CharacterString + rangeElement[1..*] : Record +rangeElementDescription 0..* MD_SampleDimension + minValue [0..1] : Real + maxValue [0..1] : Real + units [0..1] : UnitOfMeasure + scaleFactor [0..1] : Real + offset [0..1] : Real + numberOfValues [0..1] : Integer + meanValue [0..1] : Real + standardDeviation [0..1] : Real + otherAttributeType [0..1] : RecordType + otherAttribute [0..1] : Record MD_RangeDimension + sequenceIdentifier [0..1] : MemberName + name[0..*]: MD_Identifier + description [0..1] : CharacterString minValue, maxValue and units must have units of length. RangeElement, otherAttributeType, and other Attribute have cardinality [0..0] +rangeElementDescription 0..*

47

48 DQ_Evaluation + dateTime: DateTime [0..*] + evaluationMethodDescription: CharacterString [0..1] + evaluationProcedure: CI_Citation [0..1] + referenceDoc: CI_Citation [0..*] + evaluationMethodType: DQ_EvaluationMethodTypeCode [0..1] DQ_DataEvaluationDQ_Aggregation + sourceQualityResult: CharacterString [2..*] DQ_FullInspectionDQ_SamplebasedInspection + samplingScheme: CharacterString + lotDescription: CharacterString + samplingRatio: CharacterString DQ_IndirectEvaluation > DQ_EvaluationMethodTypeCode + directInternal + directExternal + indirect

49 DQ_Evaluation

50 DQ_Result


Download ppt "Documenting Data Quality Ted Habermann, NOAA National Geophysical Data Center."

Similar presentations


Ads by Google