# Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin.

## Presentation on theme: "Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin."— Presentation transcript:

Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin Lecture 8 October 17, 2013 Geocoding

Outline  Geocoding overview  Polygon geocoding  Linear (street) geocoding  Problems and solutions  Geocoding layer sources  Geocoding in ArcGIS 2 INF385T(28620) – Fall 2013 – Lecture 8

Overview  Process of creating geometric representations for locations (such as points) from descriptions of locations (such as street addresses)  Uses a computer program called a geocoding engine that employs code tables and rules to standardize address components 3 INF385T(28620) – Fall 2013 – Lecture 8

Examples  City’s economic development department  Maps technology businesses by street address to determine technology-rich areas in a city  Hospital  Maps patients to determine where to open a satellite clinic  Emergency dispatch  Maps callers’ addresses to determine who should respond to an emergency  Retail store chain  Maps store and customer locations, and compares to mapped competitor locations  Others? 4 INF385T(28620) – Fall 2013 – Lecture 8

Tabular data  Text file or database  Street addresses  ZIP Codes 5 INF385T(28620) – Fall 2013 – Lecture 8

Geocoding reference layers  Street centerlines  ZIP Code polygons 6 INF385T(28620) – Fall 2013 – Lecture 8

POLYGON GEOCODING Lecture 8

ZIP Code geocoding  Method to map data whose geocode is for a polygon  Assign each record to its polygon  Count the records for each polygon  Join the table to the corresponding polygon layer  Symbolize using a choropleth map or graduated point symbols 8 INF385T(28620) – Fall 2013 – Lecture 8

ZIP Code geocoding 9 INF385T(28620) – Fall 2013 – Lecture 8

ZIP Code geocoding Points created at ZIP Code centroids 10 INF385T(28620) – Fall 2013 – Lecture 8

ZIP Code geocoding Points (attendees) spatially joined to ZIP Code polygons 11 INF385T(28620) – Fall 2013 – Lecture 8

ZIP Code geocoding  Choropleth map created 12 INF385T(28620) – Fall 2013 – Lecture 8

LINEAR (STREET) GEOCODING Lecture 8

Linear geocoding (streets)  TIGER (Census Bureau) street maps  Four street address numbers, low to high for each side of a street segment 100 198 101 199 Oak Street 14 INF385T(28620) – Fall 2013 – Lecture 8

Number125 Oak St E, Apt. 2, Pittsburgh, PA 15213 Street name 125 Oak St E, Apt. 2, Pittsburgh, PA 15213 Street type 125 Oak St E, Apt. 2, Pittsburgh, PA 15213 Direction, suffix125 Oak St E, Apt. 2, Pittsburgh, PA 15213 Direction, prefix 125 E Oak St, Apt. 2, Pittsburgh, PA 15213 Unit number125 Oak St E, Apt. 2, Pittsburgh, PA 15213 Zone, city125 Oak St E, Apt. 2, Pittsburgh, PA 15213 Zone, ZIP Code125 Oak St E, Apt. 2, Pittsburgh, PA 15213 Items for single-number street address: Address Unit City ZIP Code 125 Oak St E Apt. 2 Pittsburgh 15213 Address components 15 INF385T(28620) – Fall 2013 – Lecture 8

Street Intersections  Put intersections in address field Forbes AV & Craig ST Grant ST & 5 th AV E North Star RD & Duncan AV  Do not include street numbers 3999 Forbes Ave & 100 Craig ST  Connectors Any unusual character (e.g., &, @, |) Just be consistent 16

Geocoding Flowchart Output No match Parse Address Generate Soundex Key Find Candidates: No Range & Soundex Key Score Matches Best match >= 90? Output Address Input Address Matches ? YesNo Yes INF385T(28620) – Fall 2013 – Lecture 8 17

Geocoding steps Original address: 125 East Oak Street 15213 Address parsed: |125|East|Oak|Street|15213 Abbreviations standardized: |125|E|Oak|St|15213 Elements assigned to match keys: [HN]:125 [SN]:Oak[ST]:St [SD]:E [ZP]:15213 Elements assigned to match keys: [HN]:125 [SN]:Oak[ST]:St [SD]:E [ZP]:15213 Index values calculated: [HN]:125 [SN]:Oak(Soundex #) [ST]:St [SD]:E [ZP]:15213 (Index #) 18 INF385T(28620) – Fall 2013 – Lecture 8

Soundex index  Matches names based on how they sound (if indices match )  Translates names to a 4-digit index of 1 letter and 3 numbers  First character of name remains unchanged  Adjacent letters in the name which have the same Soundex key are assigned a single digit  If the end of the name is reached before filling 3 digits, use zeros to complete the code KeyLetters 1b f p v 2c g j k q s x z 3d t 4l 5m n 6r disregarda e h i o u y w Oake = O-200, Oak = O-200 Smith = S-530, Smythe = S-530 Paine = P-500, Payne = P-500 Callahan = C-450, Calahan = C-450 Beadles = B-342, Beattles = B-342 Schultz = S-243, Shults = S-432 http://www.sconsig.com/sastips/soundex-01.htm http://www.archives.gov/research/census/soundex.html 19

Scoring candidates  Use a rule base to score source and reference matches  Start with score of 100  Subtract points for each mismatch  Examples from rule base  Soundex indices match but street names do not (-2)  Street type missing in source (-1)  Street types do not match (-2) 20 INF385T(28620) – Fall 2013 – Lecture 8

Candidate streets FromToStreetTypeSideParityDirectionStreet_ 298OakStREW4344 199OakStLOW4345 100198OakStREE4346 101199OakStLOE4357 Candidates identified: 125 East Oak Street 15213 Candidates scored and filtered: FromToStreetTypeSideParityDirectionStreet_ 100198OakStREE4346 101199OakStLOE4357 21 INF385T(28620) – Fall 2013 – Lecture 8

Address matched as point FromToStreetTypeSideParityDirectionStreet_ 101199OakStLOE4357 Best candidate matched Oak St Pine Ave 100 101 198 199 125 2 1 98 99 22 INF385T(28620) – Fall 2013 – Lecture 8

PROBLEMS AND SOLUTIONS Lecture 8

Possible problems  Variations in street names  Fifth Avenue, Fifth Ave., 5th AV  Saw Mill Run Blvd, Route 51  Data entry errors  Fidth Avenue  Sawmill Run  Place names  White House, Heinz Field, Empire State Building  Intersections  Fifth Avenue and Craig Street 24 INF385T(28620) – Fall 2013 – Lecture 8

Possible problems  Zones  100 Main ST 15101, 100 Main ST 16202  P.O. boxes  P.O. Box 125  Missing street data 25 INF385T(28620) – Fall 2013 – Lecture 8

Solutions  Clean data before geocoding  Purchase or build high-quality maps (field verification)  Use postal address standards  Assign house numbers in rural areas  Use alias tables 26 AliasAddress White House1600 Pennsylvania Avenue Heinz Field100 Art Rooney Avenue Empire State Building350 5th Ave INF385T(28620) – Fall 2013 – Lecture 8

Alias table AliasAddress CMU5000 Forbes Av Carnegie Mellon5000 Forbes Av Carnegie Mellon U5000 Forbes Av Carnegie Mellon Univ5000 Forbes Av Carnegie Mellon University5000 Forbes Av Etc. INF385T(28620) – Fall 2013 – Lecture 8 27

GEOCODING LAYER SOURCES Lecture 8

US Census TIGER files 29  Digitized from 1:100,000 scale maps  Pros:  Free and easy to download  Uniform across jurisdictional lines (nationally)  Street address formatting works well with standard GIS geocoding capacities  Cons:  Incomplete data  Placement of address point is approximate INF385T(28620) – Fall 2013 – Lecture 8

TIGER line attribute table 30  Census street centerlines extracted from lines that make up census boundaries  tl_2009_04013_edges.shp  "FEATCAT" = 'S' INF385T(28620) – Fall 2013 – Lecture 8

MAF/TIGER  Master Address File / Topologically Integrated Geographic Encoding and Referencing  MAF is a complete inventory of housing units and businesses in the United States and its territories TIGER is a collection of lines as we know it  MAF produces mail-out census forms and ACS random samples  MAF/TIGER produces maps for on-the-ground census takers  MAF is confidential  TIGER 2009 and newer have much improved positional accuracy INF385T(28620) – Fall 2013 – Lecture 8 31

US Census ZIP Codes 32  ZIP Code Tabulation Areas (ZCTAs)  Approximations for census purposes  Do not reflect actual ZIP Code areas and are not kept up to date INF385T(28620) – Fall 2013 – Lecture 8

Local jurisdictions  Parcel address points  Pros: Accurate placement of residential location (parcel positional data is often very good; e.g., +/- 5 meters or less)  Cons:  May need to contact individuals within agencies to get most up-to-date data  May not be available, or may cost a substantial amount of money  Data ends at jurisdictional boundaries  Data files tend to be very large 33 INF385T(28620) – Fall 2013 – Lecture 8

Local jurisdictions  Street centerlines  Pros:  Potential to be more up to date (often yearly updates, sometimes quarterly)  Often accuracy adequate to meet city infrastructure needs (typically +/- 10 meters or less)  Cons:  May need to contact individuals within agencies to get most up-to-date data  Data ends at jurisdictional boundaries 34 INF385T(28620) – Fall 2013 – Lecture 8

Private vendors  StreetMap USA  National dataset (US and Canada)  Address locators prebuilt, can geocode across the United States  GDT Dynamap/2000 US street data  Small fee for individual ZIP Code layers.  Map layers are the highest quality street map layers in terms of appearance, completeness, and accuracy.  More than one million changes every quarter  Maps include more than 14 million US street segments and include postal boundaries, landmarks, water features, and other features 35 INF385T(28620) – Fall 2013 – Lecture 8

Online geocoding  ArcGIS.com, Google, GeoCommons, Maptive, etc.  Pros:  Fast and easy to access  Free or inexpensive  Cons  Loss of privacy/confidentiality  Accuracy  Usability in desktop GIS 36 INF385T(28620) – Fall 2013 – Lecture 8

GEOCODING IN ARCGIS Lecture 8

Create address locator  ArcCatalog 38 INF385T(28620) – Fall 2013 – Lecture 8

Choose address locator style  Skeleton of the address locator  Based on data tables and reference layer 39 INF385T(28620) – Fall 2013 – Lecture 8

Address locator styles 40 INF385T(28620) – Fall 2013 – Lecture 8 Style Reference dataset geometry Reference dataset representation Address search parameters ExampleApplications US Address— Dual Ranges Lines Address range for both sides of street segment All address elements in a single field 320 Madison St. N2W1700 County Rd. 105-30 Union St. Finding a house on a specific side of the street US Address— Single House Points or polygons Each feature represents an address All address elements in a single field 71 Cherry Ln. W1700 Rock Rd. 38-76 Carson Rd. Finding parcels, buildings, or address points

Note: there are other styles… 41 INF385T(28620) – Fall 2013 – Lecture 8

 Queens, NY  Salt Lake City, UT  Regions of Illinois & Wisconsin  Germany … and many others! 42 INF385T(28620) – Fall 2013 – Lecture 8 Other styles… (build custom locators)

Choose reference layer  Streets, ZIP Codes 43 INF385T(28620) – Fall 2013 – Lecture 8

ArcGIS locator parameters INF385T(28620) – Fall 2013 – Lecture 8 44

Geocode in ArcMap  Add tabular data and streets layer  Add address locator  Geocode addresses  View geocoding results  Interactively rematch addresses 45 INF385T(28620) – Fall 2013 – Lecture 8

Address rematching  Investigate unmatched addresses  Generally requires expertise and knowledge of local streets  Compare a street name in the attributes of the streets table and the address table. 46 INF385T(28620) – Fall 2013 – Lecture 8

Prepare log file  Log file includes reasons why addresses did not get geocoded.  Useful for future work on cleaning addresses or repairing street maps 47 Incorrect addressPossible reason/solution 490 Penn AvenueMissing ZIP Code 111 HawksworthSpelled incorrectly 900 Smallman StreetTIGER street missing 900 Lib AveSpelled incorrectly INF385T(28620) – Fall 2013 – Lecture 8

Summary  Geocoding overview  Polygon geocoding  Linear (street) geocoding  Problems and solutions  Geocoding layer sources  Geocoding in ArcGIS Next week: Tutorial chapter 9, and discussion of term projects – see iSchool syllabus links: http://courses.ischool.utexas.edu/Arctur_David/2013/fall/385T/schedule.php 48 INF385T(28620) – Fall 2013 – Lecture 8

Download ppt "Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin."

Similar presentations