Presentation is loading. Please wait.

Presentation is loading. Please wait.

State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners.

Similar presentations


Presentation on theme: "State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners."— Presentation transcript:

1 State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners (ESIP) Workshop July 8, 2009

2 One of eight initial collection building projects in the Library of Congress NDIIPP (National Digital Information Infrastructure and Preservation Program) Lead organizations: North Carolina State University Libraries and North Carolina Center for Geographic Information & Analysis (NCCGIA) Focus:  State and local government geospatial data in NC  Repository development as catalyst for discussion  Goal: Engage spatial data infrastructure in data archiving Initial 3 year project extended to Dec. 2009 NC Geospatial Data Archiving Project (NCGDAP)

3 NCGDAP Data Types – Raster Digital orthophotography Satellite imagery Static data

4 NCGDAP Data Types – Vector Data Point, line, and polygon Attached attribute data Often updated

5 Note: Percentages based on the actual number of respondents to each question Downtown Raleigh Near State Capitol 2005 Wake County Ortho Imagery = Durable Static Simple structure Mostly open formats Vector data = Volatile Frequent update Complex structure Mostly proprietary formats Downtown Raleigh, NC Near State Capitol 2005 Wake County Ortho Imagery = Durable Static Simple structure Mostly open formats Vector data = Volatile Frequent update Complex structure Mostly commercial formats

6 NCGDAP Data Types – Spatial Databases Vector and raster data Relationships Behaviors Annotation Data Models

7 Dynamic content  Constantly updated information  Data versioning Digital object complexity  Spatially-enabled databases  Complicated, multi-component formats  Proprietary formats Geospatial Data: Compelling Issues

8 Data consists of multi-file, multi-format objects Ancillary data files can be shared by datasets Some format conversions involve one-to-many relationships Compressed archive files are common and behave unpredictably And all the usual challenges: format validation, validity checking, threat scanning,… Ingest Challenges: General

9 Where is the Dataset?

10 Here’s One! Files Multi-file dataset Georeferencing Metadata file Symbolization file Additional documentation License Disclaimer More Metadata FGDC Acquisition metadata Transfer metadata Ingest metadata Archive rights Archive processes Collection metadata Series metadata

11 Metadata is encoded in a variety or ways  The FGDC content standard for metadata lacked an encoding standard (arrived pre-XML), addressed in ISO 19115/19139 North American Profile implementation  XML (varied schemas), TXT, HTML Metadata is missing  Only about 25% of local agencies use FGDC Metadata is wrong  Metadata is commonly asynchronous with the data Inconsistent use of dataset naming, etc.  e.g., “Streets” vs. “Wake County Streets” Ingest Challenges: Metadata

12 Existing geospatial metadata often needs:  Remediation – to fix errors or omissions  Normalization – to adhere to a standard structure  Synchronization – so that the data at hand matches the metadata If no metadata then:  Can build minimal metadata using templates and auto-extraction  Lose key information such as data quality, lineage, data dictionaries Automating metadata for repository ingest  Raster data is easy – large sets of consistently structured files  Vector data is hard – each dataset is a different story Many additional administrative and technical metadata elements not accommodated by FGDC NCGDAP Metadata Summary

13 Extended Curation: Feedback and Outreach Data Receipt Format Processing Metadata Processing Ingest Processes Content Producers Industry Standards Organizations

14 Metadata standards and outreach  Metadata quality, best practices Inventories  Reduce “contact fatigue”, shareable information store Content exchange networks  Leverage more compelling business reasons to put data in motion  Automate process, add technical & administrative metadata Framework data communities  Snapshot frequency, schemas, format strategies Spatial Data Infrastructure and Archiving

15

16 Geospatial datasets are typically complex, multi-file objects Data are often accompanied by ancillary data, which must be associated with the data item Rights information and licenses must be associated with the item Various implementations in different domains (METS, IMS-CP, XFDU, etc.) Simpler.zip-based packages also used (MEF, KMZ, etc.) Content Packaging Issues

17 Spatial Database Approaches Manage database forward over time Extract data layers to preservable form Set aside archival snapshot of database

18 Partners (NC, KY, UT, Library of Congress, NCSU):  State geospatial organizations  State Archives State-to-state and geo-to-Archives collaboration  Organizational and technical diversity across states Archives as part of spatial data infrastructure  Selection and appraisal processes  Retention schedule development  Data transfer to archives  Development of enhanced business cases GeoMAPP: Geospatial Multistate Archival and Preservation Partnership

19 NCGDAP Learning Outcomes Preservation of GIS projects is needed to support re- creation of past work Preservation of data representations is needed to document decision-making processes Validation, remediation, and conversion of data and metadata is expensive: push for improvements upstream Some repositories handle “items”: can result in “atomization” of data For vendors, frame data preservation as a “customer problem” -- must build the business case

20 Thank You! Steve Morris Head, Digital Library Initiatives North Carolina State University Libraries steven_morris@ncsu.edu North Carolina Geospatial Data Archiving Project http://www.lib.ncsu.edu/ncgdap GeoMAPP http://www.geomapp.net

21 AGRC exports data from SGID and splits out datasets by series. Metadata occasionally incomplete complete Local governments supply GIS datasets on CD/DVD to AGRC. Metadata often missing All Metadata is completed to FGDC Standards AGRC creates geoPDF files of individual datasets, plus ZIP files of the native format. One ZIP file would contain all the pieces belonging to one shapefile or, alternatively, the file would contain a geodatabase. Geodatabases would not be just one big database with everything in it (multiple series and years). Instead, the native files would be composed of a single downloadable file per series per year. AGRC copies these files to Archives’ FTP server. Example FTP Site Structure:  ftp.archives-agrc.utah.gov/Archives Metadata harvested to populate Archive’s Finding Aids ftp.archives-agrc.utah.gov/Archives o Biota Dublin Core Metadata o Boundaries Dublin Core Metadata  MunicipalityRecords-Series-26846 Dublin Core Metadata  2000 o MunicipalBoundaries.zip FGDC Metadata o MunicipalBoundaries.pdf FGDC Metadata  2001  2002  2003  CountyBoundaries-Series-26845 Dublin Core Metadata  2003  2004 Draft of Utah’s GIS to Archives Data Flow

22 Database with Dublin Core Descriptive and Administrative Metadata iRODS DSpace Content Files Distributed Storage Layer Single item & batch ingest into DSpace by Archivist Kentucky Metadata Workflow into DSpace and iRODS Environment UNC other KDLA Batch metadata extraction using iRODS rules Database with Administrative & Preservation Metadata Preservation metadata from iRODS rules Metadata & content entered by agencies using template and modified by Archivist

23 Source Metadata Translation Hub-and-spoke model a la Echo DEPository  repository agnostic  modular conversion hub  facilitate repository software migration & inter-archive exchange

24 Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA), State Archives of NC, with Library of Congress Partners:  State geospatial organizations of Kentucky and Utah  State Archives of Kentucky and Utah  NCSU Libraries in catalytic/advisory role State-to-state and geo-to-Archives collaboration 2 year project: Nov. 2007-Dec. 2009 Archives as part of Spatial Data Infrastructure GeoMAPP: Geospatial Multistate Archival and Preservation Partnership

25 Introduce GIS organizations and State Archives to each other Archival selection and appraisal processes Retention schedule development Data transfer to archives Development of enhanced business case GeoMAPP: Project Components

26 Repository Goal  Capture at-risk data  Explore technical and organizational challenges Project End Goal  Data Producers: Improved temporal data management practices  Archives: More efficient means of acquiring and preserving data; Progress towards best practices NC Geospatial Data Archiving Project (NCGDAP) Temporal data management vs. long-term preservation

27  Data capture  Backups are common, but not long-term archives  Producer focus on current data  Shift to web services-based access  Inadequate or non-existent metadata  Consistent NC survey statistics: Only 40% of data producers create and maintain metadata  Existing metadata often needs to be normalized, synchronized with the data, and remediated Geospatial Data Preservation Challenges Loss of memory about the data is also a problem

28 When to automate and when not to  Learn first from human intervention  Minimizing risk of error related to human intervention Accepting that ingest packages used will evolve over time (implications for archive?) Handling post-ingest migrations Ongoing Challenges

29 Challenge: Preservation Metadata Results from a 2006 survey of all 100 NC counties and 25 largest NC municipalities

30 Capture “transfer set” metadata Normalize, synchronize, and remediate existing metadata, and retain original metadata record Treat contact information as archival Update metadata with format conversions Use ESRI Profile of FGDC  added technical and administrative elements  Has an XML schema  ArcCatalog tool support Use simple rights encoding scheme Record metadata in a workflow management database Some Key Metadata Decisions

31 NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington SIP Item Creation: Workflow Submission Information Package grouping – Ontology logic based on defined multi-file complex format components and directory structure Repository-agnostic item grouping

32 Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata  Version one (1994) mandated for use by federal agencies  Descriptive metadata, plus some administrative and technical  Extensive use at state level, spotty use at local level  Problem: content standard without an encoding spec  FGDC profiles: ESRI, NBII, Remote Sensing, etc. ISO Standards  ISO 19115: Geospatial Information – Metadata (2003)  ISO 19139: Geospatial Information – Metadata – XML (2007)  North American Profile of ISO to replace FGDC CGDSM Metadata Overview


Download ppt "State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners."

Similar presentations


Ads by Google