Presentation is loading. Please wait.

Presentation is loading. Please wait.

When ontology and reality collide:

Similar presentations


Presentation on theme: "When ontology and reality collide:"— Presentation transcript:

1 When ontology and reality collide:
The Archaeotools project, faceted classification and natural language processing in an archaeological context. Stuart Jeffrey, Julian Richards, Fabio Ciravegna , Stewart Waller, Sam Chapman, Ziqi Zhang ,Tony Austin. CAA Budapest, 5th April 2008

2 AHRC-EPSRC-JISC eScience research grants scheme:
PARTNERS: Natural Language Processing Research Group, Department of Computer Science, University of Sheffield AIM: To allow archaeologists to discover, share and analyse datasets and legacy publications which have hitherto been very difficult to integrate into existing digital frameworks BUILDS UPON: Common Information Environment Enhanced Geospatial browser Joint Information Systems Committee

3 Three distinct Workpackages:
Workpackage 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; 4 primary facets (What, Where, When and Media). Workpackage 2 – Natural language processing /Data-mining of Grey Literature; plus tagging Workpackage 3 – Data-mining of Historic Literature; plus geoXwalk

4 Datasets include: Thesauri include:
National Monuments Records (Scotland, Wales, England) Excavation Index (EH) Archive Holdings Local Authority Historic Environment Records Thesauri include: Thesaurus of Monuments Types (TMT) Thesaurus of Object Types MIDAS Period list UK Government list of administrative areas, County, District, Parish (CDP) – Not MIDAS

5 Input Input MIDAS XML Record RDF Resource XML Docs of Thesaurus Query
Oracle RDBMS MIDAS XML Record RDF Resource Information Extraction Input When, Where, What ontologies as entries to faceted index Knowledge triple store XML Docs of Thesaurus Information Extraction Input Query User Interface

6

7 Search Demo 1:Click to zoom in to England

8 Search Demo 1:Click to choose ‘Results’ tab

9 Search Demo 1:Click to view ‘EVAN HOWE, North Yorkshire’ record.

10 Search Demo 1:Click RESET to go back to CIE root slide

11 “WHAT” Records that have no subject information
Records that use terms not found in TMT, so these records cannot be indexed (6,442 unique terms) Records (1,001,407) 19,269 records (2%) Records (1,001,407) 101,507 records (10.1%) 11

12 “WHEN” Records that have no temporal information
Records that use period terms not found in MIDAS so these records cannot be indexed (457 types of irresolvable dates) Records (1,001,407) 292,793 records (29.2%) Records (1,001,407) 114,505 (11.4%) 1066, ,11th Centuary, C11, 11C, Eleventh Century 12

13 “WHERE” Records that have no spatial information
Records that use terms not found in CDP, so these records cannot be indexed. Records (1,001,407) 11,126(1.1%) Records (1,001,407) 245,601 records (24.5%) 13

14

15

16

17

18 linear

19 Three distinct Workpackages:
Workpackage 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; 4 primary facets (What, Where, When and Media). Workpackage 2 – Natural language processing /Data-mining of Grey Literature; plus tagging Workpackage 3 – Data-mining of Historic Literature; plus geoXwalk

20 XML tagging of semantic content
CIDOC: CRM

21 University Researchers
Local authority curators

22

23

24

25 Three distinct Workpackages:
Workpackage 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; 4 primary facets (What, Where, When and Media). Workpackage 2 – Natural language processing /Data-mining of Grey Literature; plus tagging Workpackage 3 – Data-mining of Historic Literature; plus geoXwalk

26

27

28


Download ppt "When ontology and reality collide:"

Similar presentations


Ads by Google