Presentation is loading. Please wait.

Presentation is loading. Please wait.

Conceptual foundations for semantic mapping and semantic search Dagobert Soergel Department of Library and Information Studies, University at Buffalo 1.

Similar presentations


Presentation on theme: "Conceptual foundations for semantic mapping and semantic search Dagobert Soergel Department of Library and Information Studies, University at Buffalo 1."— Presentation transcript:

1 Conceptual foundations for semantic mapping and semantic search Dagobert Soergel Department of Library and Information Studies, University at Buffalo 1 Cologne Conference on Interoperability and Semantics in Knowledge Organization Cologne University of Applied Sciences Institute of Information Management (IIM) July 19, 2010

2 2 Hub Water transport Inland water transport Ocean transport Traffic station Water transport Traffic station Inland water tr. Traffic station Ocean transport Dewey 387 Water, air, space transportation 386 Inland waterway & ferry transportation Ocean transportation Inland waterway tr. > Ports Ports LCSH Shipping Inland water transport Merchant marine Harbors German Hafen Mapping through a Hub

3 Outline Objective: Interoperability Plus KOS concept hub: canonical expressions Examples: Knowledge base and applications Implementation Canonical expressions local, hub global Knowledge-based, computer-assisted creation of canonical expressions to represent concepts. Crowdsourcing Cross-language mapping and shades of meaning Conclusion 3

4 Objective Improve semantic-based search across multiple collections in multiple languages. Interoperability between any two participating KOS (Knowledge Organization Systems) Support for search, esp. facet-based search for any collection indexed by a participating KOS for search based on free-text or free-form social tagging Assistance in cataloging (metadata creation) by catalogers or users (social tagging) Long-range goal: Web service where a KOS can be uploaded and mappings to specified target KOS are returned 4

5 KOS Concept Hub Interoperability is achieved by representing concepts from all participating KOS through canonical expressions, such as a description logic formula using atomic concepts and relationships The backbone of the proposed system is an extensible faceted core classification of atomic concepts together with a set of relationships Mapping from KOS to KOS is achieved by reasoning over these canonical expressions 5

6 6 Hub Water transport Inland water transport Ocean transport Traffic station Water transport Traffic station Inland water tr. Traffic station Ocean transport Dewey 387 Water, air, space transportation 386 Inland waterway & ferry transportation Ocean transportation Inland waterway tr. > Ports Ports LCSH Shipping Inland water transport Merchant marine Harbors German Hafen Mapping through a Hub

7 7 Hub Traffic station Vehicle parking Terminal facilities Water transport Inland water transport Ocean transport Traffic station Water transport By type of water transport Traffic station Inland water tr. Traffic station Ocean transport By component of traffic station Vehicle parking Water transport Terminal facilities Water transport Dewey 387 Water, air, space transportation 386 Inland waterway & ferry transportation Ocean transportation Inland waterway tr. > Ports Ports LCSH/AAT Shipping water transport Inland water transport Merchant marine Harbors ports harbors Mapping through a Hub

8 Examples from the Library of Congress Classification and the Library of Congress Subject 8

9 Examples from the Library of Congress Classification and the LC Subject Headings 9

10 L00Transportation and traffic L10Traffic system components L13Traffic facilities L15Traffic stations L17Vehicles L30Modes of transportation L33Air transport L37Water transport P00Buildings, construction P23Buildings P27Architecture P43Construction R00Engineering R30Acoustics R37Soundproofing T70Military vs. civilian T73Military T77Civilian 10 Core Classification faceted classification

11 HE Ports, harbors, docks, wharves, etc. NA2800 Architectural acoustics NA Airport buildings NA6330 Dock buildings, ferry houses, etc. TC Harbor works TH1725 Soundproof construction TL681.S6 Airplanes. Soundproofing TL Airways (Routes). Airports and landing fields. Aerodromes VA67-79 Naval ports, bases, reservations, docks VM367.S6 Submarines. Soundproofing =L15 Traffic stations L37 Water transport T77 Civilian =P27 Architecture R30 Acoustics =L15 Traffic stations L33 Air transport P23 Buildings T77 Civilian =L15 Traffic stations L37 Water transport P23 Buildings T77 Civilian =L15 Traffic stations L37 Water transport R00 Engineering T77 Civilian =P23 Buildings P43 Construction R37 Soundproofing =L17 Vehicles L33 Air transport R37 Soundproofing =L13 Traffic facilities L33 Air transport Technical aspects =L15 Traffic stations L37 Water transport T73 Military =L17 Vehicles L37 Water transport R37 Soundproofing T73 Military Underwater 11

12 Aeroplanes-Soundproofing Airports-Buildings Buildings-Soundproofing Ships-Soundproofing =L17 Vehicles L33 Air transport R37 Soundproofing =P23 Buildings L15 Traffic stations L33 Air transport =P23 Buildings P43 Construction R37 Soundproofing =L17 Vehicles L37 Water transport R37 Soundproofing 12 LC subject headings with combinations of atomic concepts

13 13 Hub L17 Vehicles L33 Air transport R37 Soundproofing L17 Vehicles L37 Water transport R37 Soundproofing L17 Vehicles L37 Water transport R37 Soundproofing T73 Military Underwater LCC TL681.S6 Airplanes. Soundproofing VM367.S6 Submarines. Soundproofing LCSH Aeroplanes- Soundproofing Ships-Soundproofing Mapping through a Hub

14 14 Hub Canonical form of query (DL formula) User query Free text Combination of elemental concepts through facets (guided query formulation) Controlled term(s) from a KOS, possibly found through browsing a KOS Final query (Enriched) free text query Query in terms of a KOS Mapping user queries

15 TL681.S6 Airplanes. Soundproofing VM367.S6 Submarines. Soundproofing Aeroplanes-Soundproofing Ships-Soundproofing [L17 Vehicles L33 Air transport R37 Soundproofing] [L17 Vehicles L37 Water transport R37 Soundproofing Military] [L17 Vehicles L33 Air transport R37 Soundproofing] [L17 Vehicles L37 Water transport R37 Soundproofing] 15 Query: L17 Vehicles AND R37 Soundproofing

16 Examples from NALT, LCSH, DDC, and SWD NALTNational Agricultural Library Thesaurus LCSHLibrary of Congress Subject Headings DDCDewey Decimal Classification SWDSchlagwortnormdatei 16

17 17 Hub [isa] Condition [isConditionOf] Air [ca [isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable Undesirable [isa] Legal rule [isa] Legal rule [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable LCSH Air - pollution Laws and regulations Air – pollution - Laws and regulations NALT Air pollution Laws and regulations Air pollution AND Laws and regulations Mapping through a Hub

18 18 Mapping through a Hub Hub [isa] Condition [isConditionOf] Air [ca [isa] Condition [isConditionOf] Air [causedBy] Pollutant [prop.] Undesirable [prop.] Undesirable [isa] Legal rule [isa] Legal rule [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable [isa] International treaty [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable [isa] Rights [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable DDC Air pollution 340 Law Air pollution [Law] Air pollution rights SWD Luftverschmutzung Gesetz ??? Übereinkommen über weiträumige grenzüberschreitende Luftverschmutzung Umweltzertifikat

19 Soil moisture vs. Soil water LCSH term Soil moisture [isa] Water [containedIn] Soil NALT term Soil water [isa] Water [containedIn] Soil Mapping LCSH NALT Soil moisture Soil water 19

20 Greenhouse gardening LCSH term Greenhouse gardening [isa] Gardening [inEnvironment] Greenhouse [inEnvironment] Home NALT terms Home gardening [isa] Gardening [inEnvironment] Home Greenhouse [isa] Greenhouse Mapping LCSH NALT Greenhouse gardening Home gardening AND Greenhouse 20

21 Salad greens LCSH term Salad greens [isa] Green leafy vegetable [usedFor] Salad NALT term Green leafy vegetables [isa] Green leafy vegetable Mapping LCSH NALT Salad greens BT Green leafy vegetables 21

22 Emerging diseases LCSH term Emerging infectious diseases [isa] Disease [hasProperty] Infectious [hasProperty] Emerging NALT term Emerging diseases [isa] Disease [hasProperty] Infectious ??? [hasProperty] Emerging Mapping LCSH NALT ??? Emerging infectious diseases Emerging diseases Emerging infectious diseases BT Emerging diseases 22

23 23 Hub [isa] Worker [hasGender] Female [isa] Worker [hasGender] Female [hasStatus] Employee [isa] Worker [hasGender] Female [hasStatus] Employee [hasPayStatus] HourlyPay [isa] Worker [hasGender] Female [hasStatus] Employee [hasPayStatus] HourlyPay [hasQualification] Unskilled [isa] Worker [hasGender] Female [hasStatus] Employee [hasPayStatus] HourlyPay [hasQualification] Skilled [isa] Worker [hasGender] Female [hasStatus] Employee [hasPayStatus] Salaried [isa] Work BeingDone [executedBy] {Worker [hasGender] Female} DDC Women workers SWD Arbeitnehmerin Arbeiterin Ungelernte Arbeiterin Hilfsarbeiterin Facharbeiterin Angestellte Frauenarbeit Mapping through a Hub

24 24 Physician= [isa] Worker [profLevel] Doctoral [domain] Medicine Oncologist= [isa] Worker [profLevel] Doctoral [domain] Oncology Ophthalmologist= [isa] Worker [profLevel] Doctoral [domain] Ophthalmology Physician ST Doctor Ophthalmologist ST Eye doctor Medicine BT Health care [isa] Worker [profLevel] Doctoral BT Professional Income ST Earnings Income NT Compensation Compensation ET Pay Compensation NT Wages Fee schedule [usedBy] {Insurance company [domain] Health care} Compensation [receivedBy] Physician Knowledge base for query formulation

25 25 Hub Compensation [receivedBy] Physician User query Doctor's pay Final query (Enriched) free text query See below Mapping user queries [(Physician OR Doctor OR Oncologist OR Ophthalmologist OR (Professional AND (Medicine OR "Health care" OR Oncology OR Ophthalmology))) AND (Pay OR Earnings OR Compensation OR Wages OR Income)] OR [("fee schedule" OR fee) AND ("health insurance" OR "Blue Cross" OR Medicare OR Medicaid)]

26 Examples from the realm of AAT Taiwan AATArt and Architecture Thesaurus (Getty) AAT TaiwanTELDAP, Institute for Information Science Academia Sinica TGMThesaurus of Graphic Materials, Library of Congress E-HowNetA Lexical Knowledge Base for Semantic Composition, Academia Sinica 26

27 27 Hub Facility Worship Facility Worship Judaism Facility Worship Christianity Facility Worship Islam Facility Worship Buddhism Facility Worship Taoism TGM temples synagogues churches mosques Buddhist temples Taoist temples AAT/ Chinese temples (buildings) synagogues (buildings) churches (buildings) mosques (buildings) Mapping through a Hub

28 Mapping to Chinese Use E-HowNet formal semantic expressions Use terms that already exist in E-HowNet Add terms using computer-assisted derivation of semantic expressions as described later for English 28

29 E-HowNet ontology Building Facilities Chinese Word: English: Temple Conceptual expression: {facilities : domain = {religion }} Chinese Word: English: Buddhist temple Conceptual expression: {facilities : domain = {Buddhist }} Chinese Word: English: Taoist temple/ Taoist quan Conceptual expression: {facilities : domain = {Taoism }} 29

30 Implementation How to get to the promised land 30

31 Examples of deriving canonical expressions 31 Creating canonical expressions is key Start out with some examples

32 L00Transportation and traffic L10Traffic system components L13Traffic facilities L15Traffic stations L17Vehicles L30Modes of transportation L33Air transport L37Water transport P00Buildings, construction P23Buildings P27Architecture P43Construction R00Engineering R30Acoustics R37Soundproofing T70Military vs. civilian T73Military T77Civilian 32 Underlying faceted classification

33 HE Transportation HE Ports, harbors, docks, wharves, etc. L00 Transportation and traffic T77 Civilian Inherited: L00 Transportation and traffic T77 Civilian Added by editor: L15 Traffic stations L37 Water transport Resolved to: L15 Traffic stations L37 Water transport T77 Civilian 33 Method: Assigning atomic concepts 1

34 NA Airport buildingsFrom database already established: Airport = L15 Traffic stations L33 Air transport Buildings = P23 Buildings Added by editor T77 Civilian Resolved to L15 Traffic stations L33 Air transport P23 Buildings T77 Civilian 34 Method: Assigning atomic concepts 2

35 TL681.S6 Airplanes. SoundproofingFrom database already established: Airplane = L17 Vehicles L33 Air transport Soundproofing = R37 Soundproofing Added by editor: Nothing Resolved to L17 Vehicles L33 Air transport R37 Soundproofing 35 Method: Assigning atomic concepts 3

36 Aeroplanes-SoundproofingFrom database already established: Aeroplanes = Airplane [Spelling variant] Therefore Term is recognized as same as Airplanes. Soundproofing Resolved to L17 Vehicles L33 Air transport R37 Soundproofing 36 Method: Assigning atomic concepts 4

37 Any class formed by geographical subdivision Such as NA Airport buildings NA6305.E3 Egypt Recognized using a dictionary of geographical names Inherits from subject class above it; simply add the country L15 Traffic stations L33 Air transport P23 Buildings T77 Civilian Egypt No editor checking needed 37 Method: Assigning atomic concepts 5

38 Distributed implementation Key principle: Canonical expressions can be created locally, The hub places each concept in a global structure The person or algorithm producing canonical expressions need to know only the core classification. They need not know the structure of the often large KOS to be mapped 38

39 Distributed implementation Ideally, use one central faceted classification of core concepts, but multiple mapped core classifications could be used The central core classification is extensible and should continuously updated by many contributors The central core classification must be able to express shades of meaning and, in the long run, usage information 39

40 Distributed implementation A KOS could assign canonical expressions to its concepts let's call this a semantically enhanced KOS or SEKOS It is now a simple matter to map from any SEKOS to any other (somewhat dependent on the core classifications used) 40

41 Efficient creation of canonical expressions Apply existing knowledge: Large knowledge base less effort for processing a new KOS Use knowledge of KOS structure for hierarchical inheritance Use linguistic analysis of terms and captions Eliminate redundant atomic concepts Check or produce mapping results from assignment of concepts to the same records Get human editors input and verification where needed through a user-friendly interface. Crowdsourcing, one term at a time KOS owners may verify and edit data pertaining to their KOS 41

42 Knowledge base Requires an ever larger classification and lexical knowledge base containing many kinds of data: 1.A faceted classification of atomic concepts Seeded from sources with well-developed facets such as UDC the Alcohol and Other Drug (AOD) Thesaurus the Harvard Business Thesaurus the Art and Architecture Thesaurus various systems called ontologies 42

43 Knowledge base 2 Requires an ever larger classification and lexical knowledge base containing many kinds of data: 2.Linguistic knowledge bases such as WordNet, E-HowNet (Chinese), FrameNet, and mono-,bi-, and multi-lingual dictionaries and thesauri 3.Many KOS (Knowledge Organization Systems), such as LCC, UDC, DDC, DMOZ directory, LCSH, Schlagwortnormdatei,MeSH and UMLS, AGROVOC, Gene Ontology 4.These will over time be fused into one large multilingual knowledge base with many terminological and translation relationships and relationships linking terms to concepts, with an increasing number of concepts semantically represented by a canonical expression. One database: Intellectual, not physical. Could be in Linked Data 43

44 Take-home message It is time to unify many disparate mapping efforts on a sound semantic footing 44

45 Dagobert Soergel buffalo.edu 45

46 46 man-female+mature woman+female+mature boy-female-mature girl+female-mature child+/-female-mature

47 47 beatearngainwin an opponentx money x x a reputation x a football match x somebody in a competitionx a living x weight x experience x a victory x x an advantage x time x ?? 3. To help students distinguish between 'beat', 'earn', 'gain' & 'win'

48 Air pollution laws LCSH term Air – Pollution – Laws and regulations [isa] Legal rule [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable} NALT terms Air pollution [isa] Condition [isConditionOf] Air [causedBy] Pollutant [prop.] Undesirable Laws and regulations [isa] Legal rule Mapping LCSH NALT Air – Pollution – Laws and regulations Air pollution AND Laws and regulations Interpretation for indexing and searching in both directions 48

49 T 49

50 50 This project will achieve the following Interoperability between any two participating Knowledge Organization Systems (KOS) (to the extent the two schemes allow) Facet-based search for any collection indexed by a participating KOS for free-text search Assistance in cataloging (metadata creation) by catalogers or users (social tagging) Long-range goal: Web service where a KOS can be uploaded and mappings to specified target KOS are returned Means Create a comprehensive knowledge base relating many classification schemes and subject heading lists used in libraries and in other contexts (LCC, DDC, DMOZ directory, LCSH, European schemes). Use combinations of atomic concepts taken from a well- structured underlying faceted classification to represent the meaning of classes and subject headings.

51 51 Why might this work Principled, powerful, concept-based intellectual technology. Long-established ideas revived using modern linguistic and AI methods Large scale creates synergy As more KOS participate in the system, processing a new KOS requires less effort Unified access to mapping data from many projects Need to talk to MACS, CrissCross, Stitch, OAEI (Ontology Alignment Evaluation Initiative), etc. about importing their mapping data Configured as a Semantic Web Resource

52 52 L We will seed the faceted classification from many sources that have well-developed facets such as the AOD thesaurus, the Harvard Business Thesaurus (if it is made available), Art and Architecture thesaurus (some facets have mainly elemental concepts), various ontologies to be identified. Will consult CIDOC-CRM for structure

53 53 L The internal data format will be rich to deal with any kind of information on concepts and terms that will be useful Will keep very detailed track of sources Will keep track of access rights Will import and export to any format required, especially to SKOS and OWL Note: Many of these formats are limited and will not preserve all information available in the proposed system

54 Koeln Themen Role indicators for building themes arrangement of themes for exploration under user control carry-over from citation order Practical problem of connection to the participating systems – should use IDs for combinations in Hub. Make sure that hub stays consistent with participating systems. 54

55 E-HowNet ontology Building Facilities Chinese Word: English: Temple Conceptual expression: {facilities : domain = {religion }} Chinese Word: English: Buddhist temple Conceptual expression: {facilities : domain = {Buddhist }} Chinese Word: English: Taoist temple/ Taoist quan Conceptual expression: {facilities : domain = {Taoism }} 55

56 56 Mapping Issues- 1 Mapping Issues- 1 Terms related to Chinese religious concept The word temples is frequently considered as an equivalent term miao in Chinese. However, due to different purposes of the building and the spirit that it worships, names of religious buildings in Taiwan are varied. Temples (buildings) (religious buildings,,... Built Environment (Hierarchy Name)) Note: Buildings housing places devoted to the worship of a deity or deities. In the strictest sense, it refers to the dwelling place of a deity, and thus often houses a cult image. In modern usage a temple is generally a structure, but it was originally derived from the Latin "templum" and historically has referred to an uncovered place affording a view of the surrounding region. For Christian or Islamic religious buildings the terms "churches" or "mosques" are generally used, but an exception is that "temples" is used for Protestant, as opposed to Roman Catholic, places of worship in France and some French-speaking regions. Q1. The mapping team has found that temple in AAT is broader than the concept in Chinese. Therefore it is necessary to distinguish the differences in each Chinese terms before mapping.

57 57 Mapping Issues- Mapping Issues- 1 Terms related to Chinese religious concept Despite the similar appearance, each of them has slight difference from the others. Miao( ): In the past, it was a place to worship ancestors. Since Han dynasty, it had been used as a place both worship ancestor and the spirits. ci ( ): It is built for the purpose to worship/ commemorate saints, or some famous scholars, poets, people with great achievement. Sometimes also refers to those places that worship ancestors. si ( ): Generally refers to a place that worship the Buddhist spirits. Sometimes it also refers to the place where Buddhist monk live. an ( ): used to refers to scholars study place ( ). Nowadays it refers to where Buddhist nuns live. guan( ): only refers to Taoist building yan( ): refers to those miaos( ) established nearby or at mountain.

58 58 Mapping Issues- 2 A Chinese set term stands for broader meaning (Wenwan) A word combined with two words cultural object and antique curio. ( ) It specifically refers to those objects used in the educated peoples reading room, including those writing equipments, small tools and decorations. ( ) It represents the culture of reading room, by combining the practical function of educated peoples study equipments and art crafts for peoples appreciation. ( ) Common objects including: ink stones, seals, washing vessels, fine sculptured decoration…etc. Elegant and exquisite are its essential characters. ( : … ) It is produced in a highly artistic manner. Nowadays it has become popular collection that values more as an artifact than equipment. ( )

59 59 Mapping Issues- 2 Mapping Issues- 2 A Chinese set term stands for broader meaning lotus pod shaped vessel for injecting water banana leaf shaped wooden plate olive stone boat sculpture blue snuff bottle lotus leaf shaped washing vessel seal ivory desk tidy

60 60 Mapping Issues- 2 Mapping Issues- 2 A Chinese set term stands for broader meaning Q2. The mapping team has found the meaning of Wenwan is boarder than the term desk sets, while some part of them are equal. Therefore, the 2 terms are inexact equivalent relations. Is it more suitable to create a new term Wenwan in the structure, or it should be referred as desk sets? desk sets (sets (groups),,... Object Groupings and Systems) Note: Sets of matching articles intended to be used on a desk including such articles as inkstands, pen trays, and stamp boxes.

61 When English terms have broader meanings (1/2) EX1: ID: Record Type: concept stitching (,, Processes and Techniques) Note: Refers to the process of fastening, joining, closing, uniting, mending, or creating ornamentation by stitches, which are the portions of thread left in fabric or another material by the in and out movement of a threaded needle through the thickness or surface of the material, or the loops of thread created on a needle in knitting or other needlework. In the context of textiles and needleworking, its meaning overlaps with "sewing." In the context of bookbinding, it refers to the fastening together a number of leaves or gatherings by passing the thread or wire through all of the sheets at once; it is distinct from "sewing," which, in the context of bookbinding, is used for the joining of leaves or gatherings together one by one by drawing thread or wire backwards and forwards through the back fold of each sheet to attach it to the cords. / (,, ) (sewing) sewing In different contexts (bookbinding vs. needleworking), the meaning of stitching may change accordingly. In AAT, two kinds of meanings are explained in the same record, but when translating the term into Chinese, there will be two ways of translation, (feng he) for needleworking and (feng ding) for bookbinding. The same problem occurs in the record of sewing (ID: ). Stiching in needleworkingStiching in bookbinding

62 When English terms have broader meanings (2/2) EX2: Record Type: concept patios (,,... Components (Hierarchy Name)) Note: Paved recreation areas adjoining contemporary houses and the paved interior courts of Spanish or Spanish-style buildings. The term refers to two types of open spaces, so the translations could be or ( ). Spanish patioPatio adjoining a house

63 When English terms have broader meanings (2/2) EX3: Record Type: concept maculatures (, prints (visual works),... Visual and Verbal Communication) Note: Prints made by taking a second impression without reinking the plate, often used for cleaning the plate. May also refer to blotting paper. Also used for scrap paper that can reinforce fabric in Medieval embroidery. The term maculatures could be used in three different contexts (prints, blotting paper, and scrap paper), and there are three kinds of translations ( ). Q3: In this case, since the record contains multiple meanings, its not a problem of which one being the preferred term, so how should the Chinese translations be displayed?

64 64 beatearngainwin an opponentx money x x a reputation x a football match x somebody in a competitionx a living x weight x experience x a victory x x an advantage x time x ?? 3. To help students distinguish between 'beat', 'earn', 'gain' & 'win'

65 65 man-female+mature woman+female+mature boy-female-mature girl+female-mature child+/-female-mature


Download ppt "Conceptual foundations for semantic mapping and semantic search Dagobert Soergel Department of Library and Information Studies, University at Buffalo 1."

Similar presentations


Ads by Google