Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reducing Metadata Objects Dan Gillman November 14, 2014.

Similar presentations


Presentation on theme: "Reducing Metadata Objects Dan Gillman November 14, 2014."— Presentation transcript:

1 Reducing Metadata Objects Dan Gillman November 14, 2014

2 Focus Metadata describing data Conforming to a standard may imply  Creating too many objects  Lack of meaningful roles  Generating nightmares for – Discovery – Efficiency – Management – Semantic interoperability 2

3 Focus Can this be helped? Problems  Illustrated by ISO/IEC 11179 Potential solution  Incorporated into DDI-4 3

4 Preliminaries Metadata  Definition: – Data used to describe some objects  Metadata are data first – No data always metadata  “relative” concept – Descriptive relationship is key 4

5 Preliminaries Re-use  Power of metadata management  Write once – Link many  Similar to normalizing database schemas  Allows for – Sharing meanings – Comparison – Targeted search – Efficient storage / retrieval 5

6 Preliminaries Problem  Dependencies  Many-to-One relationships  Let B’ be new version of B  But A can’t be related to both 6 A 1 0..* B’ A 1 0..* B

7 ISO/IEC 11179 About – description of data Title – Metadata registries Mechanism – organize semantics 6 part standard  Framework (1)Definitions (4)  Classification (2)Naming (5)  Metamodel (3)Registration (6) 7

8 ISO/IEC 11179 Basic model – 8 DATA ELEMENT CONCEPT DATA ELEMENT CONCEPTUAL DOMAIN 0..* VALUE DOMAIN 1 CONCEPTUAL LEVEL REPRESENTATIONAL LEVEL 0..* 1 1

9 ISO/IEC 11179 Plus – 9 DATA ELEMENT CONCEPT 0..* PROPERTY 0..1 0..* OBJECT CLASS

10 ISO/IEC 11179 New Object Class or Property  Implies new Data Element Concept – Implies new Data Element Change in Permissible Values  Implies new Value Domain – Implies new Data Element Similarly for change in Value Meanings  Implies new Permissible Values 10

11 Problems 11179  One kind of data element – No abstract vs application  One kind of value domain – Processing codes not separated Processing steps  Sentinel values – Missing, Etc.  Software and application dependent 11

12 Problems Dimensional data  Tables – Many cells – Each cell its own data element? No means to differentiate cells  Time series – Similar problem 12

13 Data Documentation Initiative (DDI) Social Science data libraries and archives Since 1995 Consortium based since 2005  DDI Alliance  University of Michigan 13

14 DDI 2 development threads  Codebook – From earlier work – Latest version 2.5  Lifecycle – Includes processing – Latest version 3.2  Both rendered in XML-Schema  Complex to read and use 14

15 DDI Modernization (DDI-4)  Upgrade for Lifecycle  Rendered in UML  Built in sections  Following Generic Statistical Information Model – Built under UNECE Statistical Division – DDI is Profile (ISO/IEC TR 10000-1) 15

16 DDI Variables Differs from 11179 Data Element  Types – Conceptual No object class Only has Conceptual Domain – Represented Inherits from Conceptual Variable Has object class (called Unit Type)  E.g., People, Establishment Has Value Domain  Substantive – subject matter related 16

17 DDI Variables – Instance Inherits from Represented Variable Has Universe – specialized Object Class  E.g., Patients, Hospitals Has second Value Domain  Sentinel – processing related  No DEC – implied  Specificity cascade – For 11179 Property (DDI Variable) – For 11179 Object Class (DDI Unit Type) 17

18 DDI Variables  Value Domain growth – Due to changing codes – 11179 Substantive * Sentinel – DDI Substantive + Sentinel  Data Element growth – About the same – DDI is much more specific 18

19 DDI Variables 19 Represented Instance Sentinel Value Domain ` ` Conceptual Domain Substantive Value Domain Conceptual

20 DDI Variables 20 Represented Instance Universe ` ` Unit Type Conceptual `

21 Example DDI  Sex of a patient  Conceptual variable (CV) = sex – CD = {male, female}  Represented variables (RV1 and RV2) – Inherit from CV – Unit type = Person – VD1 = {, } for RV1 – VD2 = {, } For RV2 21

22 Example DDI  For 3 applications: SAS, SPSS, Excel – Sentinel CD = {Don’t know, Refused} – Universe = Patient (specialization of Person)  Instance variable (IV) – for SAS – Two – inherit from RV1 or RV2 – SenVD = {, } 22

23 Example DDI  Instance variable (IV) – for SPSS – Two - inherit from RV1 or RV2 – SenVD = {, – }  Instance variable (IV) – for Excel – Two - inherit from RV1 or RV2 – user defined sentinel codes – SenVD = {, } 23

24 Example DDI  Total objects (18) – 1 Unit Type – 1 Universe – 1 CV – 2 RV – 6 IV – 2 CD (sub & sen) – 5 VD – Including much inheritance 24

25 Example 11179  Sex of patient  Object class = patient  Property = sex  DEC = sex of patient  CD = {male, female}  VD1 = {, }  VD2 = {, }  Two DE’s, one for each VD 25

26 Example 11179  2 more abstract DE’s  Correspond to CV in DDI  Sex of patient  Object class = person  Property = sex  DEC = sex of person  CD = {male, female}  Need VD1 and VD2, too 26

27 Example 11179  DE’s for processing? – Missing sentinels for each application – Need 6 VD’s, one CD, 6 DE’s  CD = {male, female, don’t know, refused}  VD3 = {m, f,.d,.r} (SAS)  VD4 = {0, 1,.d,.r} (SAS)  VD5 = {m, f, -998, -999}  Etc. 27

28 Example 11179  Total objects (25) – 2 Object Class – 1 Property – 2 DEC – 2 CD – 8 VD – 10 DE – Little inheritance – Each new application -> twice the VD’s 28

29 Example 11179  Less specificity  More objects  Lack of constructs 29

30 Contact Information Dan Gillman Information Scientist Office of Survey Methods Research www.bls.gov/osmr 202-691-7523 Gillman.Daniel@bls.gov


Download ppt "Reducing Metadata Objects Dan Gillman November 14, 2014."

Similar presentations


Ads by Google