Presentation is loading. Please wait.

Presentation is loading. Please wait.

Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.

Similar presentations

Presentation on theme: "Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore."— Presentation transcript:

1 Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore

2 2  Consider a hydrologist who needs to:  Acquire data sets needed for research  Execute an analysis  Save the research results  Enable another hydrologist to re-execute the analysis  Embed the goal of data discovery, access, analysis and management in the larger context of Reproducible data-driven research  Where did the data come from?  How was the data created?  How was the data managed? Mapping Terminology to Use Cases

3 3  There is a duality between:  Procedures that generate data objects  Data objects generated by a procedure  Terminology is needed that describes:  Operations executed by a researcher to create data objects  Operations executed by a repository to manage data objects Concepts

4 4 Eco-Hydrology Choose gauge or outlet (HIS) Extract drainage area (NHDPlus) Digital Elevation Model (DEM) Worldfile Flowtable RHESSys Slope Aspect Streams (NHD) Roads (DOT) Strata Hillslope Patch Basin Stream network Nested watershed structure Land Use Leaf Area Index Phenology Soil Data NLCD (EPA) Landsat TM MODIS USDA Soil and vegetation parameter files RHESSys workflow to develop a nested watershed parameter file (worldfile) containing a nested ecogeomorphic object framework, and full, initial system state.

5 5  Researcher operations  Pick the location of a stream gauge and a date  Access USGS data sets to determine the watershed that surrounds the stream gauge  Access USDA for soils data for the watershed  Access NASA for LandSat data  Access NOAA for precipitation data  Access USDOT for roads and dams  Project each data set to the region of interest  Generate the appropriate environment variables  Conduct the watershed analysis  Store the workflow, the input files, and the results  Data Repository management operations  Authenticate the user  Authorize the deposition  Add a retention period  Extract descriptive metadata  Record provenance information  Log the event  Create derived data products (image thumbnails)  Add access controls (collection sticky bits)  Verify checksum  Version  Replicate  Index  Choose a storage location  Choose the physical path name Researcher operations vs Repository operations

6 6  DataBits (0s and 1s)  Digital objectNamed bits  Data objectNamed bits plus representation object  Representation objectContext containing provenance, description, structural, and administrative information  OperationsData manipulation function  WorkflowSet of chained operations  Workflow objectText file listing the chained operations Concepts needed for Reproducible research

7 7  X.1255 - An operation on a digital entity involves the following elements:  EntityID: the identifier of the digital entity requesting invocation of the operation;  TargetEntityID: the identifier of the digital entity to be operated upon;  OperationID: the identifier that specifies the operation to be performed;  Input: a sequence of bits containing the input to the operation, including any parameters, content or other information; and  Output: a sequence of bits containing the output of the operation, including any content or other information.  Challenge is how to characterize the response of the data management system to a requested operation. The repository may authenticate and authorize, modify state information, log information, add retention, …  Pre-process workflow that controls the input (access control, error checking, logging)  Operation  Post-process workflow that controls the output (changes to state information, audits) Definition of operation

8 8  Access a known repository. The researcher has an explicit repository in mind for each data set  Query the repository for data sets that satisfy spatial/temporal relationships  Either  get a list of identifiers, retrieve the data sets, and apply a data subsetting algorithm locally  Or apply the data subsetting algorithm at the remote repository  Name the local data subset for processing within the research workflow. This can be  a local collection name  or a global persistent identifier. Data Access Steps

9 9 Interactions with collections: Remote metadata catalog and Remote data repository DataONE Model: User queries remote MD repository using spatial/temporal parameters Related Metadata for Data Sets Remote MD catalog Repository sends identifiers & MD for files that satisfy spatial/ temporal requirements User OPeNDAP Model: User queries remote data repository using spatial/temporal parameters for desired physical variables Data Collection Data Sets Remote Data repository ` Desired data sets are generated by remote data repository and returned to user Remote Data repository ` User retrieves files using the identifiers Data Collection Data Sets Local Data repository ` Data Collection Data Sets Local Data repository `

10 10 Policy-Based Data Management  Purpose - reason a collection is assembled  Properties - attributes needed to ensure the purpose  Policies - enforce and maintain collection properties  Procedures - functions that implement the policies  Persistent state information - results of applying procedures  Property assessment criteria – validation that state information conforms to the desired purpose  Federation - controlled sharing of logical name spaces Policy: Assertion or assurance that is enforced about a collection or a dataset

11 11 Collection Purpose Completeness Correctness Consensus Defines Consistency Attribute HasFeature Has Defines Policy Has Property Defines Procedure Control s Updates Client Action Periodic Assessment Criteria Policy Policy Enforcement Point Workflow Invokes Has SubType Isa Function Chains Operation Isa Persistent State Information Persistent State Information Isa Digital Object Updates Has Replication Policy Checksum Policy Quota Policy Data Type Policy Isa Integrity Isa Authenticity Isa Access control Isa GetUserACL SetDataType SetQuota DataObjRepl SysChksumDataObj Isa DATA_ID DATA_REPL_NUM DATA_CHECKSUM Isa HasFeature Policy Concept Graph

Download ppt "Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore."

Similar presentations

Ads by Google