Presentation is loading. Please wait.

Presentation is loading. Please wait.

Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

Similar presentations


Presentation on theme: "Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users."— Presentation transcript:

1 Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users ConferenceJuly 14, 2009

2 2 NC Geospatial Data Archiving Project (NCGDAP) Partnership between university library (NCSU) and state agency (NCCGIA), with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP) Focus on state and local geospatial content in North Carolina (state demonstration) Website: http://www.lib.ncsu.edu/ncgdap

3 3 Geospatial Data Preservation Challenge: Vector Data Formats No widely-supported, open vector formats for geospatial data Spatial Data Transfer Standard (SDTS) not widely supported Geography Markup Language (GML) – diversity of application schemas and profiles a challenge for “permanent access” Spatial Databases The whole is more than the sum of the parts, and the whole is very difficult to preserve Can export individual data layers for curation, but relationships and other context are lost

4 4 Challenge: Other Data Types Cartographic Representation Software Project Files, PDFs, GeoPDFs, WMS images Web 2.0 content Street views, Mashups Oblique Imagery 3D Models

5 5 Different Ways to Approach Preservation Technical solutions: How do we preserve acquired content over the long term? Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be preserved—from point of production? Current use and data sharing requirements – not archiving needs – are most likely to drive improved preservability of content and improvement of metadata

6 6 Question: Frequency of Capture? Content Exchange – Getting Data in Motion Repository Development Repository of Temporal Data Snapshots

7 7 Repository Development Downloading or acquiring “low hanging fruit” Tapping into current data flows Developing our own metadata when necessary Converting and preserving vector data in shapefile format

8 8 Data Preservation Like Fruit Desiccation? Complex data representations can be made more preservable (yet less useful) through simplification. Conversion of various formats to shp Image outputs (web services, PDF maps, map image files) Open GeoPDF standard Analogous to paper maps Combines data, symbology, annotation More data intelligence than simple image PDF content retained in addition to, NOT instead of data

9 9 Archival and Long Term Access Working Group Initiated by NC Geographic Information Coordinating Council in 2008 to address growing concerns of state and local agencies about long-term access to data Federal, state, regional, and local agency representation Key focus Best practices for data snapshots and retention State Archives processes: appraisal, selection, retention schedules, etc. Valuable outcome of NCGDAP – multiple parties and levels discussing data archiving on their own.

10 10 Archival and Long Term Access Working Group Final Report approved by NC GICC in November, 2008 Best Practices for: Archiving Schedule Inventory Storage Medium Formats Naming http://www.ncgicc.org/ Wake County adopted, providing archived data online http://www.wakegov.com/gis/download_data.htm Metadata Distribution Periodic Review Data Integrity Publicity

11 11 NDIIPP Multi-State Geospatial Project Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA) and State Archives of NC Partners: Leading state geospatial organizations of Kentucky and Utah State Archives of Kentucky and Utah NCSU Libraries in catalytic/advisory role State-to-state and geo-to-Archives collaboration Archives as part of statewide Spatial Data Infrastructure

12 Geodatabase Curation Study: Overview Three types of Geodatases: Personal, File, SDE Curation/Conversion options: Archive GDB object Export to: XML, shapefiles, GML Simple Features (open published formats) Consideration given to objects and export files created in older ArcGIS versions - Will they be compatible with newer versions?

13 Caveats Only tested what appeared to be the most reasonable and logical conversion options. Numerous other possibilities not tested. Some conversions required running overnight. Limited time for testing multiple datasets and scenarios. Didn’t explore GDB’s with rasters. Very limited geodatabase experience or expertise.

14 Personal Geodatabase Not ideal archival object Very proprietary – ArcGIS / MS Access formats ESRI now recommends using File GDB instead http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Types_of_geodatabases Archive export formats: XML, shapefiles

15 File Geodatabase Potential archival object Kentucky KYGEONET ESRI working on low-level (non ArcObjects based) API ( http://moreati.org.uk/blog/2009/03/01/shapefile-20-manifesto/ and http://events.esri.com/uc/QandA/index.cfm?fuseaction=answer&conferenceId=2A8E27 13-1422-2418-7F20BB7C186B5B83&questionId=2578 ) Folder/File structure Can see “under the hood” Requires knowledge of component parts Archive export formats: XML, shapefiles, GML

16 File Geodatabase KYGEONET: “Snapshot File Format – Kentucky has chosen to archive its data in the form of an ESRI’s file-based geodatabase (fGDB). This file-based relational database format will allow the entire archive set to exist within it’s own container with groupings of data based upon the FGDC Metadata model (same as groupings on KYGEONET and GOS). This file format is appropriate for the storage of both raster and vector data and allows for compression. Additionally, the fGDB allows for vector topology, the inclusions of route data, and other advanced relationships that cannot be supported with the old Shapefile format.” http://www.geomapp.net/docs/ky_geoarchives_procedures.pdf

17 SDE Geodatabase Stored in RDBMS, so can’t be archived as a stand-alone object unless exported Supports Historical Archiving Commonly used among local govts. for enterprise data management Archive export format: XML, fGDB, shapefiles

18 Questions for Testing Will pGDB XML export files round-trip between 9.1 and 9.3.1? Will fGDB XML export files round-trip between 9.2 and 9.3.1? Will fGDB GML round-trip within 9.3.1? Do GDB’s have added value that is not represented in shapefile exports?

19 Personal and File GDB Export Export to XMLExport to shapefiles Export to XML interface

20 Personal GDB Tests Richmond VA pGDB – Version 8.3 – Created October 3, 2003 Initial SizeCompressed SizeRatio Original pGDB728 MB 309 MB1:2.36 Export to XML using 9.1 / Binary Success 2.8 GB (4X > than source) 269 MB1:10.7 XML Import to pGDB using 9.1 Success 736 MB Attribute text for Sub-Domains and Relationships Preserved XML Import to pGDB using 9.2 FAILED (size reached 394 MB) XML Import to pGDB using 9.3.1 FAILED (size reached 788 MB) pGDB Export to Shapefiles using 9.3.1 Success 523 MB / 448 Files Attribute text for Sub-Domains and Relationship Classes are lost; Codes and IDs retained

21 pGDB Import of 9.1 XML 9.3.1 Failure Message 9.2 Failure Message Import in progress

22 pGDB Export to Shapefiles Sub-domain attribute text is lost in the conversion to shapefile

23 pGDB Upgrade to 9.3.1 Richmond VA pGDB – Version 8.3 – Created October 3, 2003 Initial SizeCompressed SizeRatio Original pGDB728 MB309 MB1:2.36 Upgraded to 9.3.1 pGDB Success 728 MB Note: Upgrade using “Properties/Upgrade Geodatabase” Export to XML Success 1.25 GB XML Import to pGDB using 9.3.1 Success 738 MB Functionality and content intact

24 pGDB conversion to fGDB Richmond VA pGDB – Version 8.3 – Created October 3, 2003 Initial SizeCompressed SizeRatio Original pGDB728 MB309 MB1:2.36 Import to 9.3.1 fGDB Success 274 MB / 322 Files sub-domain attributes preserved; relationship classes were lost

25 File GDB Tests Kentucky Transportation Vectors – Version 9.2 – Acquired 6 June 2009 Initial SizeCompressed SizeRatio Original fGDB224 MB / 64 files80.9 MB1:2.77 Export to XML using 9.2 / Binary Success 1.11 GB (5X > than source) 137 MB1:8.3 XML Import to fGDB using 9.3.1 Success 223 MB / 61 Files fGDB Export to shapefiles using 9.3.1 Success 427 MB / 63 Files No sub-domain attributes or relationship classes to test, but it’s documented that significant fGDB functionality and tabular data may be lost.

26 GML Export GML “Simple Features Profile” now supported by 9.3 ArcToolbox/Data Interoperability Tools: GML support available out-of-the-box to all users

27 File GDB/GML Test Kentucky Transportation Vectors – Version 9.2 – Acquired 6 June 2009 Initial SizeCompressed SizeRatio Original fGDB224 MB / 64 Files80.9 MB1:2.77 Export to GML using 9.3.1 456 MB60.1 MB1:7.59 GML Import to fGDB using 9.3.1 FAILED (reached 111 MB / 46 Files)

28

29 Conclusions For archival, pGDB must be regularly upgraded, exported to shapefiles (including relational tables), and/or imported to a fGDB. Stand alone fGDB may be safe archival format, following KYGEONET’s lead. Risk: format newness & unknown future Will feel safer after ESRI release of API.

30 Future Study Needs Round-trip fGDB via XML- Are complex functions, properties, and relationships preserved? SDE Export Options – Best practices to preserve as much as possible via XML, fGDB, and/or shapefiles? What’s the problem with the GML import?

31 31 http://www.lib.ncsu.edu/ncgdap /presentations.html Jeff Essic, Matt Sumner Data Services Librarians NCSU Libraries jeff_essic@ncsu.edu, matt_sumner@ncsu.edu Slide Presentation


Download ppt "Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users."

Similar presentations


Ads by Google