Presentation on theme: "Spatial Database Engine"— Presentation transcript:
1 Spatial Database Engine Keith T. Weber, GISPGIS DirectorIdaho State University
2 Today’s Topics What is SDE? Why use SDE? SDE Data Structure How is data stored within SDE?DEMO: Meet ArcSDE Professional GDBEnterprise workflow: Versioning and Replication
3 What is SDE? SDE A spatial database engine that works on an RDBMS. Helps to serve geospatial data to clients via a networkSDEIs SDE a database? Does SDE store data or just manage data that is stored elsewhere?
4 Why use SDE? = Advantages: Data loss/integrity degradation through versioningCentralized data managementEnterprise GISGeo-spatial data is immediately usable=Define enterprise?
5 Why use SDE? (cont’d) Disadvantages Data management role RDBMS administrationCapital expenditureCost: Return on Investment? Unknown now
6 To Use SDE…or Not To Use SDE… What will help make this decision?ROITCOIs this the correct technology for the problem?Let students brainstorm
7 ArcGIS Data Structures GDBVector ObjectsShape filesCoveragesRaster ObjectsGridsImages
8 The GDBCan store tables (just information), vector feature classes, and raster layers
9 Layers and Layer FilesAll GIS Datasets are considered LAYERs in ArcMap.A LAYER FILE is a file that you save in ArcMap to retain customized settings.This file refers to the LAYER (shape file, coverage, grid, or feature class)It displays the data with your saved visualization settings, textual annotation, etc.
10 Workspaces Arc/Info Collection of ArcView shape files Geodatabases Info folderGeodata sets (coverages, grids, TINs)Collection of ArcView shape filesGeodatabasesHold on a minute…what in the world is a workspace?A workspace is a folder on your workstation where you store your GIS coverages and files. Each workspace needs an Info folder.You can instruct AI to create a workspace or AI will create a workspace for you when you set your workspace to an existing folder.How do you do this…simpletype &workspace c:\winnt\profiles\…etc [NOTE THE &]If you are going to do all this typing it certainly makes sense to use AML’s.It will also be helpful to use AML’s to set up some routine configurations each time you run AI.To help you out and get you going I have created two simple AML’s that you should download from the Server and copy to your Personal profile folder. The personal folder will be your ROOT or HOME WORKSPACE.Your workspace will contain coverages. Coverages differ from themes in that they are a set of files stored in a coverage folder --the name of the coverage-- and the workspace’s info folder.
11 Coverages Tic Bnd Arc AAT, PAT Lets explore a coverage. Each coverage will contain a number of files. These are:Tic: The location of registration ticsBnd: The map extent of the coverage…boundary informationArc: Arc, Line workArx: ArcIndex file, topological dataLab: Labels and label pointsAAT: Arc Attribute TablePAT: Point or Polygon Attribute TableLOG: Log file, a record of the coverage’s historyOther topological database files exist as well.In this class and actually, whenever you work with AI, you will directly contact the TIC and BND files somewhat, but will use the AAT and PAT files most. The LOG file is also important but also as a reference.These files are stored in Info format. To access these you can use the Info module of AI and we will learn how to use it. You can also convert the files to Dbase files to viewing and editing.
12 GeoDatabases Personal File-based ArcSDE Personal ArcSDE Professional (or Enterprise)
13 Personal Geodatabases Uses the MS Access Jet Database engineNote: Do not open/edit these with MS AccessLimitations2GB (Access)Only vector feature classes are actually stored inside the Access database4 users but only one editorDoes not support versioning
14 File-based Geodatabase fGDBStores vector and raster layers in the file/folder structure.LimitationsMulti-user (max = 10)1 Editor (no versioning)Max size is 1 TBRDBMS
15 ArcSDE Personal Uses MS SQL Server Express Limitations 4 GB Supports versioning/replication but only one editor
16 ArcSDE Professional Geodatabases Uses DB2, Oracle, Informix, SQL Server, etc.No software size limits and unlimited number of usersCan accommodate vector and raster data
17 Given all these differences, there are really many similarities
18 Geospatial Data Storage (Vector) Geo-spatial data are stored as Feature classesNon-spatial data are stored as stand-alone tablesVector data is handled by DB2’s Spatial Extender. SDE is a broker.This schema shows paths for vector data storage as an example.Effectively, you must know and understand the data is stored as a feature class. It is no longer a coverage or a shape file.Raster data is also accommodated in SDE.
19 Geo-spatial Data Storage (Raster) Two methodsStand-alone raster data setMosaicArcSDE is not the best solution to store raster GIS data for the EnterpriseSize considerationsPerformance issuesRaster data is handled by SDEStand-alone: Each raster grid/image imported becomes its own raster layer in SDE.Embedded: Each raster grid/image imported has most of its data stored separately within the RDBMS tables, some parts are stored collectively sort of like a grouping in ArcMap. The data is displayed in one piece. Data returned to the user is the index value for the raster layer (1 (the first one added to the catalog) through n).Mosaic: All raster grids/images imported are mosaiced together into one piece. The data is stored as one unit and displayed as one unit. Building pyramids and calculating statistics is very important.The number of “stats” records and pyramids records changes too.BRAINSTORM BENEFITS AND ADVANTAGES…METHODS VARY FOR DIFFERENT DATA SETS…
20 Internal Data Storage Within the DB2 RDBMS All data is stored within table spaces –referred to by Configuration Keyword.A Configuration Keyword points to a set of two table spaces:Attribute table spaceCoords table spaceTable spaces are invisible to the user or client.
21 Loading Vector Data into a GDB PART 1: Stand-alone feature classesLog into SDE as a manager level. In catalogGoto SDE, choose importDo singles…you will see why (field names)
22 The Spatial Index Grid Uniform grid of square tiles Like grid reference on a street mapEach feature (lakes) referenced by one or more tilesEnvelope of feature determines tiles occupiedSpatial Index Key records occurrences of features in tilesEmpty tiles not storedReference IDGrid X, Grid Y
23 Loading Vector Data into ArcSDE PART 2: Feature classes within a Feature Data SetFirst, you need a Feature Data SetWhat is a Feature Data Set?What is the precision of the source dataWhat will the future data be like for this feature classThe data storage precision value is determined during import.Rule of thumb:Doubling precision results in 10% more table space usage (HD usage).To change the precision of the data storage, clickchange settingsSpatial reference tabXY DomainLower the domain (by default ArcGIS will set the precision at its highest value. This is determined by the full spatial extent of the data set to be imported.(min, max X,Y).Also mention the item names import…query import….and keywords
24 A Feature Data Set is:Required to implement Full Topology!What?!
25 Full Topology“The spatial relationship among feature classes participating in a topology layer”Must belong to a feature datasetFeature classes share geographic reference system,and spatial domain.More realistic representation of dataAKA Shared topology or Advanced Topology.
26 A Feature Data Set then… Is an organizational tool used to ensure that all feature classes within it use a common:Geospatial reference systemSpatial domain
27 Understanding the Spatial Domain Low-precision GDB Based upon LONG INTEGER (32-bit)What is the domain range of a LONG?High-precision GDB Based upon 64-bit IntegerCovers a geographic reference systems “Horizon”Both X and Y coords can be stored in 4B space
28 Fitting the World into a LONG If we express the X,Y coordinates in the familiar Latitude/Longitude system…By whole degrees, we would use:Latitude (180 units)Longitude (360 units)This is only % of the 4B spaceCalc shown is for Longitude
29 Problems with this approach Resolution to 1 degree is terribleWastes the capacity of LONG INTEGER
30 What if we use Decimal Degrees? Hold on! Decimals cannot be stored in an INTEGER data typeLet’s just shift the decimal place to the right by multiplying the coordinate by a scaling factore.g., 10 preserves one decimal place, 100 preserves decimal places etc.
31 Fitting the World into a LONG (revisited) By using a scaling factor of 1M, the world would fit nicely into a 3.6B space (there’s even a bit left over!)What is the spatial resolution of 1/1Mth of a degree?Approximately 1/10th of a millimeter!In Idaho, there is approximately 10,000m per 7.5’ quad (along the X)That is 1,333 meters per degreeThat is /1millionth of a degreeThat then is 1/10th of millimeter!
32 More about the High-Precision GDB Can be pGDB, fGDB, or SDE GDBUses 64-bit integer to encapsulate the spatial horizonWhat?64-bit numbers have a range of 18,446,744,073,709,551,616That‘s 18 quintillion!
33 The Spatial Horizon?Essentially, it’s a spatial domain large enough the contain the entire earth at high-precision
34 Applying this to ArcGIS Rule #1, use the high-precision GDB model whenever possible.Why not always?Long is long paper.Precision may not be the best word…in actuality resolution would have been better.
35 Hints and TipsOptimize the spatial domain by using high-precision GDB Feature datasetIf not, set up your low-precision Feature dataset toAllow for spatial growthAllow for improved instrumentationI would choose a precision of 1000Make the min/max X,Y’s fit EVENLY around the study area or AOC
36 ArcSDE ProfessionalDemoImport a vector data set into ArcSDE
38 Think about it…Object-relational databases have native geospatial supportArcGIS for Server can make geospatial data available to the EnterpriseDo we need an ArcSDE middle-ware?ArcGIS Spatial Data ServerSpend some time discussing…
40 Geodatabases in an Enterprise Workflow Keith T. Weber, GISPGIS Director, ISUGIS Training and Research Center
41 Understanding and managing workflow Presentation and DiscussionUnderstanding and managing workflow
42 Let’s Get Started Adjectives GIS is… Data-driven Powerful Dynamic GIS is many things…many adjectives can be used to describe it. For our workshop today however, there is one property of GIS that we will concentrate on and that is “GIS is Dynamic”
43 GIS Data Life Cycle Create Data Change Happens! Edition Backup Edit ValidateUpdate MetadataChange Happens!This cycle is not new… it is in fact, old…since the beginning of GIS this is how things were done.Let’s think about a roads layers. Create by digitizing…that is edition number one when it is done (fix the overshoots and dangles, and populate the database).Once we recognized that things have changed and our layer is outdated, we plan for how we will fix this. This is very task oriented… so we backup our data (copy it) and then proceed with editing it. Hopefully we validate our edits and update our metadata as well.Now we have a new edition of the roads layer and all is right with the world.There is nothing wrong with this cycle. IF the rate at which “change happens” and the demand/expectation for a new edition is not too frequent.For instance… 1 revolution per year or per quarter is not too bad. But what if the demand and expectation for a new edition…and the need for a need edition requires 1 revolution per week or per day?
47 GIS Grows Up! RDBMS Keep the benefit of network connectivity Eliminate the problem of “MY” versionEliminate the bottleneckAnd, change the cycle of events
48 GIS Data Life Cycle Create Data Change Happens! Edition Version and/or ReplicateEditValidate:Synchronize or reconcile and postUpdate MetadataChange Happens!Two things have changed:Wording/terminology: We don’t backup today, instead we version and replicate. We don’t just check things over as our validation, today we also need to synchronize or reconcile and post.Colors: this is important as it symbolizes a distributed workflow within a team environment… a team that is part of the enterprise.Blue = manager, orange = technician/specialist, and green = metadata librarian.
49 Backup vs. VersioningBackups and archiving are still critical steps for the enterprise.BUT, not part of the GIS Life Cycle any longer
50 In the Beginning… Backups were made in case we really messed up Edits were made to the originalCopies of the “clean” new edition were distributed
51 Today… The original [parent] is versioned [a child is born] Edits are made to the child, not the parent“Clean” edits are copied [synchronized or posted] to the parent.
52 Benefits Of This Approach Brainstorm!!!Minimize downtimeProcesses completed within the RDBMS
53 The Role of BackupsData retention and deletionLegal requirements
54 GIS Data Life Cycle…Today Create DataChange Happens!EditionVersion and/or ReplicateEditValidate:Synchronize or reconcile and postUpdate MetadataThis process is enterprise enabled. The manager, technician, and librarian do not have to be in the same office. Indeed, this cycle is “outsourcing-ready”…
69 The State Tree Tree Trunk Branches Default: state 0 Arthur’s Court sub-division[Another] sub-divisionBranches
70 Multiple Versions Multiple versions are allowed Versions can be based upon location (north edits, south edits), projects (sub-divisions), or other logic decided upon my the GIS Manager.Batch reconcile and post are supported
71 The Day of Reconciliation Arthur’s Court sub-division edits have been completedTime to reconcileThis process looks for conflictsOnce all conflicts have been resolved…Reconciliation is complete
72 PostTo roll-up the edits back to the “trunk of the state tree” we Post
73 Considerations Performance can degrade with active databases Workflow itself can generate unnecessary versionsDelta tables will become large over timeDBMS statistics may need to be refreshed or reviewed by the DB Admin
74 The CureFor many of these ArcGIS-centric performance issues is compressing the databaseMoves common rows from delta tables into base tablesReduces depth of the state tree by removing states no longer needed
75 Compression Example Active editing sessions are shown in yellow Versions with no deltas since last reconcile/post are shown in hollowGIS Manager compresses, says, I do not need these versions any longer. They are eliminated.
77 Hands-On ExercisePractice both replication and versioning
78 Your Assignment Complete the exercise handouts Connecting to and using SDE on DB2Practice both replication and versioningRead the PDFs in the SDE exercise folderVisit the URL link for Spatial Data Server and explore this topic
79 Key ConceptsSDE is an engine layer residing between a spatially-enabled RDBMS and the GIS desktop.SDE enables Enterprise GISSDE reduces data management responsibilities.Understand Enterprise workflow