Presentation is loading. Please wait.

Presentation is loading. Please wait.

EHRI Vocabularies and Linked Open Data : An enrichment?

Similar presentations


Presentation on theme: "EHRI Vocabularies and Linked Open Data : An enrichment?"— Presentation transcript:

1 EHRI Vocabularies and Linked Open Data : An enrichment?
Annelies van Nispen 15/05/2018 CONNECTING COLLECTIONS

2 The EHRI Portal connecting archives and users
Online inventory of institutions and collections about the Holocaust Making sources visible in a systematic fashion in order to counteract the fragmentation of the sources Reveal interconnections (e.g. through a multilingual thesaurus; collation of authority files; relationships between originals and copies) EHRI focuses on collection descriptions

3 EHRI Portal https://portal.ehri-project.eu/

4 EHRI Vocabularies EHRI Thesaurus (subject terms) Camps Ghettos Administrative districts Places (Geonames) Persons Corporate bodies

5 EHRI Vocabularies Main tool for multilingual information Retrieval & search functionality Cataloguing and integration tool for incoming data Holocaust related knowledge base, useful for further developments eg. NER, LOD or …..

6 EHRI Vocabularies and Linked Open Data
Experiments with EHRI Vocabularies and LOD Places – Geonames Persons – VIAF Camps & Ghettos – Wikidata Aim: Enrich EHRIs Vocabularies and where possible publish as LOD

7 EHRI Places & Geonames

8 GeoNames Reconciliation - problematic cases
Places not listed in GeoNames (e.g. Altreich) Places listed in GeoNames but missing spelling variants (e.g. Babyn Iar) More than one location per place names, e.g. "Berlin" from "1(Berlin, sowjetischer Sektor)" mapped to 176 different locations Access points which are difficult to disambiguate without context (e.g. "Bauer" can be the German word for "peasant", a German family name, or a German town)

9 Geonames: More issues access points with typos not clustered by OpenRefine (e.g. "Aushwitz" instead of Auschwitz) access points wrongly filtered out as person names (e.g. "Amsterdam, Landsmeer”) Common nouns sometimes give false positives, e.g. "Artillerie" from "1(Artillerie)" mapped to a part of town in New Caledonia Problem: Historical states, such as Yugoslavia or Czechoslovakia, are not properly linked to parents / children in the GeoNames dataset

10 EHRI Persons & VIAF

11 EHRI Personalities and VIAF
Experiment with automatic matching to VIAF of persons data from Yad Vashem, CDEC and Cegesoma with manual quality check on matching results. Issues : Many people carry the same name Not enough information on birth/death dates, places or profession to distinguish individuals Spelling variants/mistakes

12 Outcome of experiment 100 YV names, 68 were matched against entries in VIAF. High ambiguity in matching: a total of 234 matches, each name was matched 3.44 times 68 matches: 31 were correct and 37 false positives. The ambiguity in cases of a correct match was sometimes higher, eg correct one in a set of 5/6 matches Cegesoma and CDEC data give similar results, with CDEC data even much higher false positives

13 Ghettos, Camps and Wikidata

14 Import Ghettos in Wikidata
Name of the ghetto in different languages Unique EHRI identifier for the ghetto Associated place name and its unique identifier in Wikidata Coordinates from Yad Vashem and/or USHMM Unique identifiers from online resources, including The Yad Vashem Encyclopedia of the Ghettos During the Holocaust and the USHMM Holocaust Encyclopedia Added statement qualifying the entry as a “ghetto in Nazi-occupied Europe”

15 EHRI Ghettos & Wikidata

16

17 Wikidata to EHRI Portal
English name of the ghetto Place where the ghetto was located Coordinates for the location EHRI-assigned unique identifier for the ghetto Associated unique identifiers from online resources Multilingual labels generated from the name of the places

18 EHRI Vocabularies & LOD: An enrichment?
Mixed results Geonames set has problems, but we will use for further development Personalities too much errors and sensitive vocabulary Ghettos, Camps and Wikidata a positive experience 14

19 CONNECTING KNOWLEDGE The Jewish Museum of Greece (GR)
Jewish Historical Institute (PL) King’s College London (UK) Ontotext AD (BG) Elie Wiesel National Institute for the Study of Holocaust in Romania (RO) DANS Data Archiving and Networked Services (NL) Shoah Memorial, Museum, Center for Contemporary Jewish Documentation (FR) ITS International Tracing Service (DE) Hungarian Jewish Archives (HU) INRIA Institute for Research in Computer Science and Automation (FR) Vilna Gaon State Jewish Museum (LT) VWI Vienna Wiesenthal Institute for Holocaust Studies (AT) Foundation Jewish Contemporary Documentation Center (IT) NIOD Institute for War, Holocaust and Genocide Studies (NL) CEGESOMA Centre for Historical Research and Documentation on War and Contemporary Society (BE) Jewish Museum in Prague (CZ) Center for Holocaust Studies at the Institute for Contemporary History in Munich (DE) YAD VASHEM The Holocaust Martyrs’ and Heroes’ Remembrance Authority (IL) United States Holocaust Memorial Museum (USA) Bundesarchiv (DE) The Wiener Library Institute for the Study of the Holocaust & Genocide (UK) Holocaust Documentation Centre (SK) Polish Center for Holocaust Research (PL) EHRI is funded by the European Union CONNECTING KNOWLEDGE


Download ppt "EHRI Vocabularies and Linked Open Data : An enrichment?"

Similar presentations


Ads by Google