Presentation is loading. Please wait.

Presentation is loading. Please wait.

CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

Similar presentations


Presentation on theme: "CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive."— Presentation transcript:

1 CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive

2

3 A common future ? The last 15 years has been focused on building up a common data infrastructure for the social sciences, based on modern web-technology

4 1.The web: The idea that the archives could create an integrated catalog, Grenoble 1994 2.DDI: A richer and better data documentation format, R.Rockwell / ICPSR IASSIST 1995 3.Integrate 3-4 components: Internet / web / Common catalog DDI Access explore analyse download data: The social science dream machine NESSTAR J.Ryssevik / S.Musgrave ILSES Integrated Library and Survey Data Extraction Service 4.Richer services, FASTER (Data types) LIMBER (Attack the language barrier) 5.One single common entry point, Madiera, Metadater

5 CESSDA METADATA HARVESTER SERVER 3 SERVER 1 SERVER 2 Nesstar SERVER 5 SERVER 4 OAI-PMH SearchBrowse LUCENE ELSST Topical List

6 Square files Resources : CESSDA Template Controled vocabularies Multilingual thesaurus CESSDA classification Harvester Indexing tool Portal Server 1 2 3 4 Publishing Client Browsing tool Search tool

7 THE RESEARCHER Looking for data...Cultivate knowledge... THE ARCHIVES THE PORTAL Bridging the gap

8 1.Greenland 2.Iceland 3.Feroe Islands 4.Norway 5.Sweden 6.Finland 7.Aaland Islands 8.Estonia 9.Latvia 10.Lithuania 11.Belorussia 12.Ukraine 13.Moldova 14.Poland 15.Germany 16.Denmark 17.England 18.Scotland 19.Wales 20.Northern Ireland 21.Ireland 22.Netherland 23.Belgium 24.Luxembourg 25.France 26.Portugal 27.Spain 28.Andorra 29.Monaco 30.Switzerland 31.Italy 32.San Marino 33.Vatican State 34.Slovenia 35.Lichtenstein 36.Austria 37.Czech republic 38.Slovakia 39.Hungary 40.Romania 41.Bulgaria 42.Serbia 43.Croatia 44.Bosnia & Herzegovina 45.Montenegro 46.Kosovo 47.Albania 48.Macedonia 49.Greece 50.Cyprus South 51.Cyprus North 52.Malta 53.Turkey 54.Russia ? 55.Georgia ?? 56.Armenia ??? 57.Israel ???? 30 Languages, 45 legal systems We are supposed to support research, break down technical-, linguistic-, judicial-, economic barriers Several processes – timelines in a layered system

9 Share formats and routines Access and download Instrument development Control access

10

11 1.Make a more powerful interface to data holdings - more sophisticated search / browse possibilities, more focused, even across languages - better possibilities to handle resultshandle 2.Handle more complex datastructures, over time, across space, languages, link micro – macrolanguages These we may see as analytic dimensions 3. Persistent identity, connect knowledge products back into the data used, turn traditional picture upside downknowledge products These are more practical management 4.Handle problems of double storage. Data dynamics, more than one value in a table cell Versioning, updating, comments, links, references Adding to the data item 5.Single Sign On, need to pass information and access more than one server, logging

12 ConseptualisationInstrumentData production (SIP)Data documentation (AIP) Question DB The researcher formulate a problem and need data to analyse the problem If data have to be collected, we need an instrument, a questionnaire When data are collected, with necessary metadata, they represent a SIP A questions- and concepts DB is a very useful tool to develop instruments To make data ready for archiving they have to be documented (and processed), lifted from a SIP to a AIP Data documentation: Should be based on standardised procedures / best practices and common tools for all CESSDA (+) archives DDI 2/3 expressed as a Template/DDI-profile, which is a) selection of elements, with status b) element repositories c) controled vocabularies d) multi-lingual thesaurus e) gazetteer, geographic classification f) CESSDA study classification This requires software or a manual / clear guidelines. DDI becomes the glue that hold this whole system together. A questions DB potentially problematic for data documentation processes. Better to import directly via questionnaire Much data generated by the public statistical system or other producers Will make it possible to find questions from concepts (Need an interface) Learn from others Encourage comp research Look up translations Contact with user community Metadata – standard Tool for instrument development Tool for data collection Tool for documentation Question DB, translations A overarching plan Integration of components Data have a life-cycle The archive: A Greenhouse or a Graveyard ?

13 Metadata Metadata Data Data Data-data-data Data Data-data-data Ingest Data repositories UKDA DDA FSD AIP Question DB When an AIP a inserted into an archive or storage it can trigger an update of a question database. Or do updates happen as a harvesting process ? To what degree are packages pre-defined or built for purposes ? A question database will be related to a basic storage. Do updates happen as a guarded / explicit process ? What are the criteria ? Our AIPs

14 Finnish and EnglishDDI 3.1OtherNesstarFSD Combinations LanguageMetadata-standardStorageArchive Danish and EnglishDD2.xNesstarFedoraDDA EnglishDDI 3.0DDI 2.0OtherFedoraUKDA Because of storage complexity harvesting also becomes quite complex

15 Data repositories UKDA DDA FSD Metadata Metadata Data Data Data-data-data Data Data-data-data Data repositories are guarded by access policies. Policies are usually formulated at institution or repository level Policies are activated by the crossing of the line between metadata and data, which is at data package level Should policies be linked to packages instead of repositories ? Should it be an obligatory part of metadata ? Then we need to have policies formalised. SSO / AAA LOG-DB Data repositories should be documented in national + common language Different documentation templates for national and international language

16 CV: LifeCycleEvent Study Proposal Study Design Instrument Design Funding Interviewer training Ethics Review Sampling Instrument pre-testing Pilot study Questionnaire translation Documentation translation DATA COLLECTION Data collection reports Post-collection processing Data production Initial data quality checks Metadata production Original release DEPOSIT Post-production processing Data quality checks Data editing Data integration Processing for Disclosure Metadata editing Preservation package production Dissemination package New version production New version release / publication From producer to consumer, the data archival work Cover the whole data (or project ?) life-cycle Locate, explore and download

17 NSDs Nesstar-servers NSDDataMeta- data Civic- active ESSPolitical system Innovation Norway Church data School nesstar Welfare data EurosphereOpinion polls MicroCubeMicroMix MetadataQualitativeMicro The CESSDA data archives will in due time be both data providers, aggregators and single service providers. This is an illustration of what would presently be the NSD situation. CESSDA complications: We need services that cover many servers and many conditions for use

18 NSDs Nesstar-servers NSDDataMeta- data Civic- active ESSPolitical system Innovation Norway Church data School nesstar Welfare data EurosphereOpinion polls MicroCubeMicroMix MetadataQualitativeMicro Functionalities we need, with a scale from producer to consumer NSDDataMulti-linguality, Translation-support, DDI-profile, ELSST CivicActiveSwitch absolute/relative figures, convert cubes to rectangular files ESSComplex files, link micro- and macro-levels Political systemAuto-publishing from databases to service system EurosphereText, qualitative data Link servers Complex serversSelective login

19 The user authentication problem Almost always at institutional level

20 Portal Server 1 DDA Server 2 Server 3 ZA Server 4 Server 5 UKDA Server n Dataset 1 Dataset 2 Dataset 3 Users, affiliated with national institutions, based on a common justification (research) and work within specific projects (Have roles within projects ?) want to access data resources in different institutions and countries User The user authorisation problem Very often at resource level

21 Complex?

22

23 DDI 2/3 expressed as Template/DDI-profile, as a) selection of elements, with status b) Controled vocabularies c) Multilingual thesaurus d) Gazetteer e) CESSDA classification Portal Search Browse ELSST x time, space, methodology ELSST Query service Data loader: May handle multiple and complex data packages Explore and compare functionality Ingest (AIP) Data repositories UKDA DDA FSD SSO/AAA Politics ( Repository or package level) Metadata Data Web browser Conceptualisation Instrument Data production (SIP) Data documentation Harmonisation (and concepts) DB Log database Question DB Download Registry CESSDA Toolkit 1 5 3 4 6 7 9 8 2 11 12 10 Tool Intermediate storage

24 CESSDA WS Concept Bank Classification Bank Geo Bank Question Bank Questionnaire Bank Metadata Ingester Instruction Bank Universe Bank Variable Bank 3CDB QDB Future Services Study Bank C3DB WS QDB WS Future WS … Banks 3CDB Applications QDB Applications 3CDB/QDB Applications Future Applications Nesstar Publisher Reporting Tools Admin Tools Security Tools non-DDI Objects DDI 1/2.x Ingest WS Publication Tool Legacy Database DDI 3.0+ Custom Exporter DDI 3.0 Converter Could interact with WS for metadata preparation Ingester performs quality assurance, split metadata and maintains referential integrity for storage in CESSDA Bank DDI centric back-end CESSDA-DB stores all low level objects Back-end maintenance and reporting tools Web services exposed for public consumption Internal web services stack 3CDB/QBD applications call relevant WS local objects

25 Ingestion/Registration Process Concept Bank Classification Bank Geo Bank Question Bank Questionnaire Bank Instruction Bank Universe Bank Variable Bank Study Bank … Banks Metadata Ingester Nesstar Publisher DDI 1/2.x Ingest WS Publication Tool Legacy Database DDI 3.0+ Custom Exporter DDI 3.0 Converter Metadata Registry Publication WS Example Submission of a Nesstar DDI will typically result in creation of objects in the following banks: study, classifications, variables, instance (files) and possibly concepts, universes, questions, instructions if such variable level metadata have been compiled. Example A legacy system used for the production of questionnaire could create objects in the question, questionnaire, instruction, concepts, universes and classification banks. This may happen outside the context of a survey (question bank) and no variable would be associated with these objects. Metadata optimization / harmonization Optimization of the metadata (merging duplicates, aligning on harmonized objects, etc.) can be done using various automated, semi-automated or manual methods during the various stages of submission (this can also be performed later on) Submission Object registration could be automated upon release of the metadata by the provider. Workflow can be implemented as necessary. Repository Many metadata repositories can exist around the network. These can be deployed at the provider level, or as shared metadata storage. Repository WS Interfaces Note that metadata repositories also expose a set of general and specialized web services along with administrative / security interfaces Metadata Repositories (Banks) Submission Submission packages are prepared by providers in compliance with the CESSDA DDI3+ specification. Publications tools are used to manage packages and control ingestion process. Packages are broken down and stored in various banks (as needed)

26 DDI 2/3 expressed as Template/DDI-profile, as a) selection of elements, with status b) Controled vocabularies c) Multilingual thesaurus d) Gazetteer e) CESSDA classification Portal Search Browse ELSST x time, space, methodology ELSST Query service Data loader: May handle multiple and complex data packages Explore and compare functionality Ingest (AIP) Data repositories UKDA DDA FSD SSO/AAA Politics ( Repository or package level) Metadata Data Web browser Conceptualisation Instrument Data production (SIP) Data documentation Harmonisation (and concepts) DB Log database Question DB Download Registry CESSDA Toolkit 1 5 3 4 6 7 9 8 2 11 12 10 Tool Intermediate storage


Download ppt "CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive."

Similar presentations


Ads by Google