Presentation on theme: "Why, what were the idea ? 1.Create a data infrastructure, 2.Data + the knowledge products that are produced on the basis of data a) Efficiant access to."— Presentation transcript:
Why, what were the idea ? 1.Create a data infrastructure, 2.Data + the knowledge products that are produced on the basis of data a) Efficiant access to large volumes of data b) Promote comparative analysis c) Support dissemination of knowledge d) Support the idea that knowledge have to be empirically based e) Create an infrastructure that may grow by its own force
How A distributed model, data stored and maintained locally, modern technology substitute for central institutions One common entrypoint, a portal One common metadata standard, that we were supposed to contribute to One technical solution One common multilingual thesaurus
More hows A requirement was that the user communities participated, allowed themselves to be activated and invested some resources a) Developing a classification of resources b) Use common metadata standard Give bettered semantics / ontology Help solve some language issues Produce more heterogeneous data Produce better quality of data Give better administration of data
Resource promation and integration Tools for publishing and finding data Guidelines for publishing and finding data Access control And there should be room for others, we could go beyond CESSDA
The Portal Metadata is all about communication A set of tools + an idea: Data is the core that facilitates a ”conversation” Technology, functionality Multilingual thesaurus Metadata standard
EC contribution Total EC funding € - Received € = Remaining €
List of deliverables D1.1 - Project Initiation Document D3.1 - Functional Specification and Design - M3 D5.1 - Guidelines Thesaurus construction & translation D1.2 - Quality Assurance Plan D2.1 - User Analysis Report - M6 D3.2 - MADIERA Prototype - M6 D7.1 - Dissemination Plan - M6 D1.3 - Periodic Progress Report (6-month) - M7 D2.2 - Usability test - MADIERA Prototype - M8 D3.3 - MADIERA Beta Version 1 - M15 D3.3a - MADIERA Beta Version 2 - M17 D3.3b - MADIERA Publisher Beta Version B - M17 D4.1 - Recommendation - Geo-referencing system D6.1 - Guidelines - Content provision &access control D2.3 - Usability test - MADIERA Beta version - D1.4 - Periodic Progress Report (12-month) - M14 D4.2 - Methodology identification comparable elements D3.4 - MADIERA Version M23 D4.3 - Naming and identification recommendation D5.2 - Report on adm mechanisms for thesaurus maintenance - M User guides and training packs for content provision - M18 D6.3 - First version of hyper-linked information space demonstrator - M23 D6.4 - Data archive content provision workshop - D6.5 - Workshop on content metadata (CDG/DDI) D7.2 - On-going dissemination events D7.3 - Userguides and training packs - M23 D8.2 - Workshops for non-archive data providers - D2.4 - Usability test - MADIERA Version 1 - M24 D1.5 - Periodic Progress Report (18-month) - M19 D5.3 - Extended multilingual thesauri - M24 D6.6 - Hyperlinked information-space demonstrator version 2 - M24 D1.6 - Periodic Progress Report (24-month) - M26 D4.4 - Package of revised recommendations - M27 D5.4 - Evaluation Workshops - M30 D1.7 - Periodic Progress Report (30-month) - M31 D1.8 - Third annual report - M38 D2.5 - Final usability test report - M38 D3.5 - MADIERA Version M38 D5.5 - Additional thesaurus hierarchies - M38 D8.3 - Technological Implementation Plan - M41 D1.8 - Final Report - M41
The Portal We have data identified at 3 levels: Study, Variable group and Variable Study Variable group Variable Free text search X X X CESSDA Classification X ELSST 1 X ELSST 2 X X X Archives X NUTS X
The Portal The free-text search give the user the possibility to specify a completely free search term. If you search for “sausage”, you will presently get 1 hit, at variable level. This term (sausage) seems not to be in ELSST (yet) If you search for “radio”, you get hits. “Radio” is a word used in many languages (all languages with data on the servers). If you search for “fjernsyn”, you get hits. “Fjernsyn” is the Norwegian word for television. If we expand the word “fjernsyn” to the equivalent in other languages, we get hits. Such an expansion checks against ELSST and picks up the translations. Common for all: Searching in free text may give hits at all three levels of data. When browsing, some terms (keywords) are automatically translated back to the user. The Cessda classification is a controlled vocabulary used for the DDI element topcClass, which is at study level.. If this term is systematically used, we can set up a catalog structure. Then a study typically could be published in more than one catalogue. ELSST1 is a finer granulation then the Cessda classification, it gives the impression of an alphabethical sorted list of keywords, and it gives easy access to translations and the systematic structure with synonyms and related terms. But it works at study level,.
The Portal ELSST1 is a finer granulation then the Cessda classification, it gives the impression of an alphabethical sorted list of keywords, and it gives easy access to translations and the systematic structure with synonyms and related terms. But it works at study level, ELSST2 matches on a few key text fields (title, abstract, keywords, subject, etc.) The most important thing about the etc is that it searches DDI elements at three different levels, study, variable group (name) and variable level (label, text, concept). Archives actually lists the servers under the portal, for every server studies are listed sorted alphabethic The NUTS list gives units at different levels of NUTS, the search could use coordinates inserted in GeoBndBox. I don’t know how this is done (which DDI-elements are used).
Functionality: Geo-Chartography Finding data by geography Europe a mixture of political, administrative and statistical units Code, Name, Coordinates Problem: Publish
Functionality: Naming Conventions Objective: For a user to be able to update (metadata) 1. Add to metadata of a study 2. Use could also lead to changes, corrections, updates Distinguish between two components of an identification: Identifier (static) – version code (dynamic) Elements that we identify consist of data and metadata Elements could also be a complex mixture of instances that make up a study And studies could be part of series
Functionality: Naming Conventions Series Study Instance 1Instance 2 Data Metadata All this described as a complex set of modules Data from data producers Metadata from archives
DDI 3.0 IDModuleSimpleComplexP/L WWrapper 1..1 L AArchive 1..1 L GGroup nP CConcept nP DCData Collection 1..n P IInstrumentation 1..n P LCLogical Data Structure 1..n P PSPhysical Data Structure 1..n L PIPhysical Instance 1..n L