Download presentation
Presentation is loading. Please wait.
Published byWinifred Oliver Modified over 9 years ago
1
Metadata Normalisation in Europeana The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop
2
A. Workflow B. Metadata normalisation with ESE C. Approach in practice: Demo of tools used D. Knowledge SHARING Workshop: Discussion of the practice for EuropeanaLocal Session
3
A. Workflow B. Metadata normalisation with ESE C. Approach in practice: Demo of tools used D. Knowledge SHARING Workshop: Discussion of the practice for EuropeanaLocal Session
4
CONTENT SURVEY #0
5
Stage #0: Content survey Input: Output: Specifications of content contribution Excel specs questionnaire
9
CONTENT SURVEY #0
10
Stage #1: Harvesting and package creation Input: Output:Harvested data in XML Collection-specific analysis tool Sample of source data: 1000 records Mapping specifications template Excel specs XML raw data HTML analysis tool XML sample raw data TXT mapping template
11
CONTENT SURVEY #0
12
#2 Analysis and mapping specifications Input: Output: Excel specs TXT mapping specs HTML analysis tool XML sample raw data TXT mapping template
14
CONTENT SURVEY #0
15
Stage #3: Mapping and normalisation Input: Output: XML raw data TXT mapping specs XML normalised mapped data XML profile Quality check
16
NORMALISER
17
STAGE 3
18
CONTENT SURVEY #0
19
Stage #4: Database storage and indexing Input: Output: XML normalised mapped data DBINDEX
20
A. Workflow B. Metadata normalisation with ESE C. Approach in practice: Demo of tools used D. Knowledge SHARING Workshop: Discussion of the practice for EuropeanaLocal Session
21
Europeana Semantic Element (ESE) Europeana “Schema” for the Prototype Based on Dublin Core Metadata Elements Set (DCMES)(ISO ) 49 Elements (26 Elements & 23 Refinements) Created through discussions in July/August 2008
22
ESE specialities europeana:country europeana:provider (dc:source) europeana:language (dc:language) europeana:type (dc:type, dc:format) europeana:year (dc:date) europeana:isShownBy (dc:relation) europeana:isShownAt (dc:relation) europeana:object europeana:uri (dc:identifier)
23
All normalised: Syntax Value Let’s examine their characteristics ESE specialities
24
Definition: Country of content provider. If several countries: Europe Format: String, ex: switzerland, germany,… Reference: TEL controlled list. Supports TEL interface translation mechanism Mechanism: Manual In portal: Facet browsing of search results Normalised ESE terms: Country
26
Definition: Organisation sending the data to Europeana Format: String, ex: Musées lausannois, Nasjonalbiblioteket,… Reference: Europeana controlled list of content providers: Mechanism: Manual but potentially can be automated In portal: Facet browsing of search results Normalised ESE terms : Provider
29
Definition: Language of provider’s country (ESE:languages of the metadata) Format: 2-letters, ex: it, no,fr, en, es,… Reference: ISO639-1 language codes Exception: If several languages: “mul” Mechanism: Manual but potentially can be automated In portal: Facet browsing of search results Normalised ESE terms: Language
30
Definition: Type of the original object Format: String Reference: 4 Europeana types: IMAGE, TEXT, SOUND, VIDEO Mechanism: Manual: Mapping specified by content provider In portal: Categorisation display Facet browsing of search results Normalised ESE terms: Type
32
Definition: Date of creation of the original object (analog or born digital) Format: 4 digits [YYYY], ex: 1950 Reference: Europeana year Mechanism: Automatic extraction with “YearExtractor” converter In portal: Facet browsing of search results Browsing by time (timeline) Normalised ESE terms: Year
35
Definition: URL to the digital object Format: URL (http://...) Mechanism: Automatic or manual In portal: Linking Normalised ESE terms: isShownBy
37
Definition: URL to the digital object with context Format: URL (http://...) Mechanism: Automatic or manual In portal: Linking Normalised ESE terms: isShownAt
39
Definition: URL to the digital object as thumbnail Format: URL (http://...) Mechanism: Automatic or manual In portal: Display Normalised ESE terms: Object
41
Definition: Record identifier for Europeana system Format: URI Mechanism: Automatic: special algorithm guaranteeing uniqueness (and integrity) of records http://www.europeana.eu/resolve/record/91101/0BAF44EDF8B98F1322DEEAD4AB989778E6394418 In portal: MyEuropeana Full digital object view in Europeana Normalised ESE terms: URI
42
A. Workflow B. Metadata normalisation with ESE C. Approach in practice: Demo of tools used D. Knowledge SHARING Workshop: Discussion of the practice for EuropeanaLocal Session
43
Metadata normalisation in practice Demo of stage #3’s workflow: 1.Go through data of example collection #1 2.Practical exercise: let’s normalise example collection #2 for Europeana!! 3.2 examplesof known issues MAPPING & NORMALISATION #3
44
SUBVERSION (SVN)
45
COLLECTION FOLDERSOURCE XMLMAPPING SPECS TXTOUTPUT XMLMAPPING/NORM. SPECS XML
46
Example 1: “Midas” collection 83 moving image records from the Association des Cinémathèques Européennes Harvested data Fields mapping/Type values mapping specs Analysis file (source data) Mapping file Profile file Analysis file + sample (normalised data)
47
Example 2: “Outsider Art Museum” collection 4142 records from the Musées Lausannois
48
Known issues with mapping/profile files 1. Wrong syntax in mapping file causes errors in profile.xml: If use “=>” in comment in mapping.txt this creates a mapping entry in profile.xml! Ex: ………
52
BEFORE
53
AFTER
54
Known issues with mapping/profile files 2. Wrong syntax in mapping file causes errors in profile.xml: There should be 2 blanks between “=>” and “N/A” and not one otherwise the mapping specification is not well formatted in XML in profile.xml: Ex: ………………….
55
MAPPING.TXT PROFILE.XML MAPPING.TXT PROFILE.XML profile.xml with error: 2 white spaces!
56
Documentation in Europeana context Europeana Semantic Elements (ESE) v3.1 “Europeana – Data Offline Preparation” Commented version of “profile.xml” “Quality Control Checklist”
59
A. Workflow B. Metadata normalisation with ESE C. Approach in practice: Demo of tools used D. Knowledge SHARING Workshop: Discussion of the practice for EuropeanaLocal Session
60
Questions about Europeana metadata ingestion/normalisation process? Integration and/or compatibility of this process with EuropeanaLocal content strategy: Where normalisation will take place? By who? … Discussion
61
Thank you Julie.Verleyen@kb.nl
64
Duplicated records Records without URLs to digital object Records without Europeana type (SOUND, TYPE, IMAGE, VIDEO) Records to copyright-protected digital objects Discarding factors during normalisation
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.