Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge Organization Systems and Information Discovery Douglas Tudhope Inaugural Lecture.

Similar presentations


Presentation on theme: "Knowledge Organization Systems and Information Discovery Douglas Tudhope Inaugural Lecture."— Presentation transcript:

1 Knowledge Organization Systems and Information Discovery Douglas Tudhope Inaugural Lecture

2 Acknowledgements Research team members and collaborators –Ceri Binding (University of Glamorgan) –Andreas Vlachidis (University of Glamorgan) –Keith May, English Heritage (EH) –Stuart Jeffrey, Julian Richards, Archaeology Data Service (ADS) Archaeology Department, University of York

3 Collaborative acknowledgements Harith AlaniSteve Harris Paul Beynon-DaviesTraugott Koch Dorothee BlockMarianne Lykke Daniel CunliffeBrian Matthews Emlyn EverittStuart Lewis Kora GolubHugh Mackay Rachel HeeryJim Moon Chris JonesRenato Souza Iolo JonesCarl Taylor

4 Information Discovery Literal string match (eg Google) is good for some kinds of searches: specific concrete topics where all we want are some relevant results - not care how many we miss! Google less good at more conceptual (re)search topics where important to be sure not missed anything important eg medical, legal, scholarly research ------------- Searching data and documents a recent general research focus variously termed... eScience, Digital Humanities, Cyberinfrastructure - data.gov.uk a recent initiative for government data

5 Words are tricky! "When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean--neither more nor less." (Lewis Carroll) Various potential problems with literal string search Different words mean same thing Same word means different things Trivial spelling differences can affect results or a particular choice of synonym or a slightly different perspective in choice of concept - How to address this issue?

6 This lecture Brief look at the history of work on this topic at Glamorgan Examples from recent AHRC funded research on cross search of different archaeological datasets and reports - try to give a general flavour Discuss some current research issues

7 This lecture Part of a general move towards a (more) machine understandable Web

8 Machine readable vs machine understandable What we say to the machine: The Cat in the Hat ISBN: 0007158440 Author: Dr. Seuss Publisher: Collins What the machine understands: 

9 (More) machine understandable What we say to the machine: Title:The Cat in the Hat ISBN: 0007158440 Author: Dr. Seuss Publisher: Collins What the machine understands: 

10 (More) machine understandable What we say to the machine: Title:The Cat in the Hat ISBN: 0007158440 Author: Dr. Seuss Publisher: Collins What the machine understands:  Book ID Author Publisher --------------- conceptual structure (ontology)

11 (More) machine understandable What we say to the machine: Title:The Cat in the Hat ISBN: 0007158440 Author: Dr. Seuss Publisher: Collins What the machine understands:  Book ID Author Publisher --------------- conceptual structure (ontology) --------------- vocabularies for terminology and knowledge organization Theodor Geisel

12 Knowledge Organization Systems eg classifications, thesauri and ontologies help semantic interoperability Reduce ambiguity by defining terms and providing synonyms Organise concepts via semantic relationships

13 Knowledge Organization Systems - classifications, thesauri and ontologies help semantic interoperability Reduce ambiguity by defining terms and providing synonyms Organise concepts via semantic relationships EH Monuments Type Thesaurus

14 Knowledge Organization Systems - classifications, thesauri and ontologies help semantic interoperability Reduce ambiguity by defining terms and providing synonyms Organise concepts via semantic relationships EH Monuments Type Thesaurus

15 Origins of research Polytechnic of Wales Research Assistantship (collaborating with Paul Beynon-Davies, Chris Jones - Carl Taylor’s PhD) Experimental museum exhibit Extract of collections database - Pontypridd Historical and Cultural Centre

16 Origins of research Polytechnic of Wales Research Assistantship (collaborating with Paul Beynon-Davies, Chris Jones - Carl Taylor’s PhD) Experimental museum exhibit Extract of collections database - Pontypridd Historical and Cultural Centre Hard to generalise and maintain if based on manual linking of information  dynamic implicit links In this case based on Social History and Industrial Classification (SHIC) and indexing for place, time period

17 Indexing on subject, period, place

18 Similar or different?

19

20 FACET - Faceted Access to Cultural hEritage Terminology Subsequent EPSRC funded project with Science Museum, National Railway Museum and J. Paul Getty Trust - Art & Architecture Thesaurus (AAT) Aims: Integration of thesaurus into user interface Semantic query expansion

21 FACET research question “The major problem lies in developing a system whereby individual parts of subject headings containing multiple AAT terms are broken apart, individually exploded hierarchically, and then reintegrated to answer a query with relevance” (Toni Petersen, AAT Director) Example Query: mahogany, dark yellow, brocading, Edwardian, armchair for National Railway Museum collection - eg royal carriageroyal carriage

22 FACET Web Demonstrator - Semantic Query Expansion

23 FACET Web Demonstrator - how to generalise? FACET - more sophisticated search but still a single database How to generalise to multiple datasets and thesauri? How to connect with text documents?

24 STAR Semantic Technologies for Archaeological Resources AHRC funded project(s) with English Heritage and the ADS Generalise previous methods to :- Different datasets with different structures Reports of excavations ADS OASIS Grey Literature Library (unpublished reports)OASIS Online AccesS to the Index of archaeological investigationS

25 STAR Semantic Technologies for Archaeological Resources Currently excavation datasets isolated with different terminology systems Currently no connection with grey literature excavation reports Aims Cross search at a conceptual level archaeological datasets with associated grey literature

26 STAR Semantic Technologies for Archaeological Resources Need for integrating conceptual framework and terminology control via thesauri and glossaries EH (Keith May) designed an ontology describing the archaeological process

27 The archaeological process Events in the present and events in the past, related by the place in which they occur and the physical remains in that place Activities in the present investigate the remains of the past (affecting them in the process)

28 Events in the present Excavation // Drawing and Photography Survey // Sampling Treatments and Processing Classification // Grouping and Phasing Measuring including scientific dating Recording of observations Dissemination // Interpretation // Analysis

29 Events in the past have results in the present Events shaping natural environment geological, environmental and biological processes

30 Events in the past have results in the present Events shaping natural environment geological, environmental and biological processes Events concerned with object production, disposal or loss (how ‘finds’ produced and later deposited in archaeological context)

31 Events in the past have results in the present Events shaping natural environment geological, environmental and biological processes Events concerned with object production, disposal or loss (how ‘finds’ produced and later deposited in archaeological context) Construction, modification and destruction events relating to human buildings

32 Events in the past have results in the present Conceptual framework to model these archaeological events (an EH extension of a standard cultural heritage ontology) Need to move beyond simple Who – What – Where – When model typically used in state of the art cultural heritage databases

33 Typical ‘Advanced Search’ model - does not deal with events Typical Who - What - Where - When advanced search user interface Who O and O or What O and O or Where O and O or When -------- Resources

34 Typical ‘Advanced Search’ limitations Typical Who - What - Where - When model - needs more semantics Who O and O or What O and O or Where O and O or When -------- Resources Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth)

35 Typical ‘Advanced Search’ limitations Need to define relationships between entities and allow multiple connections Who O and O or What O and O or Where O and O or When -------- Resources Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth) When photo was taken? When ‘find’ originally made? When ‘find’ deposited?

36 Typical ‘Advanced Search’ limitations Assigning dates and classifying are important ‘events’ in the present - outcomes of the archaeological process (interpretations can differ) Who O and O or What O and O or Where O and O or When -------- Resources Who made dating judgment? Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth) When photo was taken? When ‘find’ originally made? When ‘find’ deposited?

37 Broader conceptual framework (ontology) Modeling multiple interpretations – linked to underlying data within the ontology  ‘multivocality’ in archaeology Who O and O or What O and O or Where O and O or When -------- Resources Who made dating judgment? Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth) When photo was taken? When ‘find’ originally made? When ‘find’ deposited? Who made dating judgment? Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth) When photo was taken? When ‘find’ originally made? When ‘find’ deposited? Who made dating judgment? Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth) When photo was taken? When ‘find’ originally made? When ‘find’ deposited? Who made dating judgment? Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth) When photo was taken? When ‘find’ originally made? When ‘find’ deposited? Who made dating judgment? Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth) When photo was taken? When ‘find’ originally made? When ‘find’ deposited?

38 Broader conceptual framework (ontology) EH extension of CIDOC Conceptual Reference Model (CRM) explicit modelling of archaeological events – complicated!

39 STAR general architecture STAR web services EH Thesauri and CRM ontology Archaeological Datasets (CRM) Windows applications Browser components Full text search Browse concept space Navigate via expansion Cross search archaeological datasets Windows applications Browser components Full text search Browse concept space Navigate via expansion Cross search archaeological datasets STAR client applications STAR datasets (expressed in terms of CRM) Grey literature indexing (CRM) Grey literature indexing (CRM)

40 Natural Language Processing (NLP) of archaeological grey literature Extract key concepts in same semantic representation as for data. Allows unified searching of different datasets and grey literature in terms of same underlying conceptual structure “ditch containing prehistoric pottery dating to the Late Bronze Age”

41 NLP output – what the machine sees!

42 STAR Demonstrator – search for a conceptual pattern An Internet Archaeology publication on one of the (Silchester Roman) datasets we used in STAR discusses the finding of a coin within a hearth. -- does the same thing occur in any of the grey literature reports? Requires comparison of extracted data with NLP indexing in terms of the ontology.

43 STAR Demonstrator – search for a conceptual pattern Research paper reports finding a coin in hearth – exist elsewhere?

44 Current issues and goals a)Apply research outcomes in practice (knowledge transfer) semantic terminology services ‘rubbish example’ using the ADS Archaeology Image Bank b)NLP challenges negation!  Negative findings? c)Multivocality in archaeology broader picture of the research issues

45 Archaeology is rubbish! Google search for archaeology rubbish

46 ADS Archaeology Image Bank Example No results when search for rubbish or refuse – what to do?

47 STAR STAR Semantic Terminology Services - concept expansion (as web service)  midden

48 MIDDEN n dunghill, refuse heap midden dunghill, compost heap, refuse heap,... muddle, mess... dirty slovenly person... midden mavis or midden raker --- searchers of refuse heaps (Concise Scots dictionary - Mairi Robinson, Scottish National Dictionary Association)

49 ADSADS Archaeology Image Bank Example No results when search for rubbish or refuse – try midden!

50 NLP challenges – not just negation detection

51 NLP challenges – need for negative findings!

52 Archaeologists have to plan for the future “Research excavations, therefore, must be planned for posterity, eschewing the quick answer and setting up a framework of excavation and recording which can be handed over, extended, modified and improved over decades and in some cases, centuries.” Techniques of Archaeological Excavation, Philip Barker (1993) Archaeology in particular lends itself to the reuse of (excavation) data Connect interpretations with the underlying data Revisit previous archaeological interpretations and findings - excavations inevitably based on a limited sample

53 Archaeological Multivocality - more voices involved than just original project team? Expose (invisible) datasets for wider analysis and reuse Meta studies comparing different excavation projects Connect datasets and wider grey literature – look for wider patterns Open up a broader range of research questions that might be answered when we connect currently isolated excavation datasets Allow different communities to share data and expertise

54 Words are tricky! We should have a great fewer disputes in the world if words were taken for what they are, the signs of our ideas only, and not for things themselves. (John Locke) Emergent classification? – an outcome of the archaeological process - both constructing and constraining the world Map between different classifications and glossaries rather than one imposed standard?

55 Words are tricky! Words are not as satisfactory as we should like them to be, but, like our neighbours, we have got to live with them and must make the best and not the worst of them. (Samuel Butler) Major issues remain but knowledge organization systems offer some current assistance for moving beyond literal string search and making the best of the words we have to use


Download ppt "Knowledge Organization Systems and Information Discovery Douglas Tudhope Inaugural Lecture."

Similar presentations


Ads by Google