Presentation is loading. Please wait.

Presentation is loading. Please wait.

BioGeomancer: Semi-automated Georeferencing Engine

Similar presentations


Presentation on theme: "BioGeomancer: Semi-automated Georeferencing Engine"— Presentation transcript:

1 BioGeomancer: Semi-automated Georeferencing Engine
John Wieczorek, Aaron Steele, Dave Neufeld, P. Bryan Heidorn, Robert Guralnick, Reed Beaman, Chris Frazier, Paul Flemons, Nelson Rios, Greg Hill, Youjun Guo

2 Spatially Challenged Occurrence Data
LA PEÑITA; 5.5. KM N Baird Mtns.; Salmon R. headwaters CALIENTE MOUNTAIN 10 MI SW CANAS, RIO HIGUERON near Sedan 4.4 MI N, 6.2 MI W SEMINOLE

3 Spatially Enabled Occurrence Data

4 Input - Verbatim Locality Strings
LA PEÑITA; 5.5. KM N Baird Mtns.; Salmon R. headwaters CALIENTE MOUNTAIN 10 MI SW CANAS, RIO HIGUERON near Sedan 4.4 MI N, 6.2 MI W SEMINOLE

5 Legacy Locality Data Issues
Treat locality description as accurate Treat locality description as complete

6 Legacy Locality Data Issues
Treat locality description as accurate Treat locality description as complete We need these to start processing.

7 Legacy Locality Data Issues
Treat locality description as accurate Treat locality description as complete We need these to start processing. These are assumptions we should not hold to be true.

8 Legacy Locality Data Issues
Treat locality description is accurate Treat locality description as complete Apply rules for locality string interpretation

9 Legacy Locality Data Issues
Treat locality description is accurate Treat locality description as complete Apply rules for locality string interpretation There is more than one way to accomplish string interpretation.

10 Locality Interpretation Methods
Regular expression analysis GeoLocate - Tulane Enhanced BioGeomancer Classic – Yale Machine Learning/Natural Language Processing U. Illinois, Urbana-Champagne Inxight Software, Inc.

11 Locality Types F – feature P – path
FO – offset from a feature, sans heading FOH – offset from feature at a heading FO+ – orthogonal offsets from a feature FPOH – offset at a heading from a feature along a path 31 other locality types known so far

12 Five Most Common Locality Types*
51.0% - feature 21.4% - locality not recorded 17.6% - offset from feature at a heading 8.6% - path 5.8% - undefined *based on 500 records randomly selected from the 296k records georeferenced manually in the MaNIS Project.

13 Clause Subset of a locality description to which a locality type can be applied.

14 Step 1: Define Clause Boundaries
LA PEÑITA; 5.5. KM N Baird Mtns.; Salmon R. headwaters CALIENTE MOUNTAIN 10 MI SW CANAS, RIO HIGUERON near Sedan 4.4 MI N, 6.2 MI W SEMINOLE

15 Step 1: Define Clause Boundaries
<LA PEÑITA; 5.5. KM N>

16 Step 1: Define Clause Boundaries
<LA PEÑITA; 5.5. KM N> <Baird Mtns.; >

17 Step 1: Define Clause Boundaries
<LA PEÑITA; 5.5. KM N> <Baird Mtns.; ><Salmon R. headwaters>

18 Step 1: Define Clause Boundaries
<LA PEÑITA; 5.5. KM N> <Baird Mtns.; ><Salmon R. headwaters> <CALIENTE MOUNTAIN> <10 MI SW CANAS, ><RIO HIGUERON> <near Sedan> <4.4 MI N, 6.2 MI W SEMINOLE>

19 Step 2: Determine Locality Types
<FOH>LA PEÑITA; 5.5. KM N</FOH>

20 Step 2: Determine Locality Types
<FOH>LA PEÑITA; 5.5. KM N</FOH> <F>Baird Mtns.; </F>

21 Step 2: Determine Locality Types
<FOH>LA PEÑITA; 5.5. KM N</FOH> <F>Baird Mtns.; </F><PS>Salmon R. headwaters</PS>

22 Step 2: Determine Locality Types
<FOH>LA PEÑITA; 5.5. KM N</FOH> <F>Baird Mtns.; </F><PS>Salmon R. headwaters</PS> <F>CALIENTE MOUNTAIN</F> <FOH>10 MI SW CANAS, </FOH><P>RIO HIGUERON</P> <NF>near Sedan</NF> <FO+>4.4 MI N, 6.2 MI W SEMINOLE</FO+>

23 Step 3: Interpret Clauses
<FOH>LA PEÑITA; 5.5. KM N</FOH> Feature: LA PEÑITA Offset: 5.5 Offset Units: KM Heading: N

24 Step 4: Find Feature Descriptions
<FOH>LA PEÑITA; 5.5. KM N</FOH> Feature: LA PEÑITA Offset: 5.5 Offset Units: KM Heading: N

25 Legacy Locality Data Issues
Treat locality description is accurate Treat locality description as complete Apply rules for locality string interpretation Treat spatial data references as accurate

26 Legacy Locality Data Issues
Treat locality description is accurate Treat locality description as complete Apply rules for locality string interpretation Treat spatial data references as accurate This is another assumption we should not hold to be true.

27 “Davis, Yolo County, California”
testing slide 2

28 “Davis, Yolo County, California”
testing slide 2

29 “Davis, Yolo County, California”
testing slide 2

30 Legacy Locality Data Issues
Treat locality description is accurate Treat locality description as complete Apply rules for locality string interpretation Treat spatial data references as accurate Apply rules for spatial description building

31 Step 5: Construct Spatial Description for Each Clause

32 Step 5: Construct Spatial Description for Each Clause
West of B

33 Step 6: Construct Final Spatial Interpretation
10 MI SW CANAS, RIO HIGUERON Clause 1: <FOH>10 MI SW CANAS, </FOH> Clause 2: <P>RIO HIGUERON</P>

34 Step 6: Construct Final Spatial Interpretation
10 MI SW CANAS, RIO HIGUERON Clause 1: <FOH>10 MI SW CANAS, </FOH> Clause 2: <P>RIO HIGUERON</P> We hold these clauses to be simultaneously true.

35 Step 6: Construct Final Spatial Interpretation
10 MI SW CANAS, RIO HIGUERON Clause 1: <FOH>10 MI SW CANAS, </FOH> Clause 2: <P>RIO HIGUERON</P> We hold these clauses to be simultaneously true. The final spatial description is the intersection of the spatial descriptions of all clauses.

36 Legacy Locality Data Issues
Treat locality description is accurate Treat locality description as complete Apply rules for locality string interpretation Treat spatial data references as accurate Apply rules for spatial description building Apply criteria to reject unwanted hypotheses

37 Additional Input - Preferences
Assume terrestrial locations Assume aquatic locations marine only freshwater only Assume direct offsets Assume offsets by road, if possible

38 Output Original data Zero, one, or more spatial interpretations
- spatial footprint - point-radius description Process metadata preferences (e.g., GeoLocate method, assume by road) omissions (e.g., unused information) confidence values

39 Conclusion Georeferences are hypotheses Hypotheses require testing
Tested hypotheses should be so noted

40


Download ppt "BioGeomancer: Semi-automated Georeferencing Engine"

Similar presentations


Ads by Google