Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research data workflow Practice in Slovenian Social Science Data Archives SERSCIDA WP4 – WORKSHOP Ljubljana September 2013.

Similar presentations


Presentation on theme: "Research data workflow Practice in Slovenian Social Science Data Archives SERSCIDA WP4 – WORKSHOP Ljubljana September 2013."— Presentation transcript:

1 Research data workflow Practice in Slovenian Social Science Data Archives SERSCIDA WP4 – WORKSHOP Ljubljana September 2013

2 www.serscida.eu SIP, AIP, DIP Submission Information Package (SIP) Archival Information Package (AIP) Dissemination Information Package (DIP) SIPDIP AIP Long term preservation

3 www.serscida.eu Recommended formats – input Type of materialRecommended formatOther acceptable formats Questionnaire  Rich Text Format (*.rtf)  structured metadata record of questionnaire (*.xml) by DDI or CAI programme (*.bmi)  other text formats (*.docx, *.txt, etc.)  *.pdf or other graphical formats  printed version Data material (data file)  SPSS (*.por, *.sav)  plain text data, ASCII (*.txt) + structured text or mark-up file containing metadata information (variable names, labels, categories, question text)  other statistical packages  tables (*.xlsx etc.)  data bases Textual material (study description, codebook, interviewer instructions, speech to respondents, copies of research reports)  Rich Text Format (*.rtf)  printed version  *.pdf or other graphical formats  other text formats (*.docx, *.txt, etc.)

4 www.serscida.eu Recommended formats – distribution STUDY DESCIPTION: DDI structured XML DATA FILE: ASCII + xml  distributed in formats that can be exported from Nesstar OTHER TEXTUAL MATERIAL: PDF

5 www.serscida.eu Recommended formats – archiving DATA FILE: ASCII (*.txt) + xml with DDI file and data description

6 www.serscida.eu Recommended formats – archiving QUESTIONNAIRE, TEXT MATERIAL: original (any format) + distribution files (PDF) STUDY DESCRIPTION: DDI structured XML

7 www.serscida.eu Licence Agreement Free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work to make commercial use of the work Under the following conditions: Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Noncommercial — You may not use this work for commercial purposes.

8 www.serscida.eu Naming files and versioning File format: StudyID_MaterialType_Language_Version_Subversion.FileFormat Example: sutr1006_p1_sl_v1_r2.txt URN: URN:SI:UNI-LJ-FDV:ADP:StudyID_MaterialType_Language_Version Example: URN:SI:UNI-LJ-FDV:ADP:sutr1006_p1_sl_v1

9 www.serscida.eu Managing workflow Project tracking software Task for every study, with 29 subtasks covering: -general part with email correspondence -managing deposited materials -preparing data file -preparing study description -publishing http://nesstar2.adp.fdv.uni-lj.si:8080/browse/RAZ-4536

10 www.serscida.eu Cleaning data and documentation Frequencies check Variable names, values Missing values Recode Weight Anonymisation Cumulative dataset

11 Anonymisation Sebastian Kočar Expert Assistant in Social Science Data Archives SERSCIDA WP4 – WORKSHOP Ljubljana September 2013

12 www.serscida.eu Anonymisation in the archives - types basic anonymisation - of mostly academic research dataset anonymisation of Eurostat files anonymisation of official statistics Public Use Files (PUF)

13 www.serscida.eu Basic anonymisation of distributed microdata in archives deleting variables Direct identifiers (telephone numbers, addresses etc.) are removed. recoding indirect identifiers But still allowing serious researchers to receive datasets with indirect identifiers non-recoded). Recoding includes removing values and bracketing – combining the categories of a variable.

14 www.serscida.eu Anonymisation of Eurostat files (the case of Eurostat Labor Force Survey) deleting variables: indirect identifiers and unneeded variables are removed (municipality, wave nr. etc.) bracketing: age, marital status, education, years of residence, age of establishment of residence, duration of search of employment, professional status, country & nationality classification: income numbers are not given, respondents are divided into classes based on their income aggregation: economic activity and occupation values are aggregated at 1-digit level top-coding: restricting the upper range of a variable (nr. of hours worked)

15 www.serscida.eu Anonymisation of official statistics Public Use Files for distribution in archives anonymisation software: μArgus, R! (sdcMicro, bethel, sampling packages), Cornell anonymisation toolkit, synthetic data generators anonymisation technics: data reduction techniques (global coding, local suppression etc.), data perturbation techniques (micro-aggregation, PRAM etc.), sampling, generating synthetic microdata

16 www.serscida.eu Anonymisation – a case study PUF prepared in cooperation with SORS Sector for General Methodology and Standards anonymisation procedure which follows Eurostat LFS anonymisation criteria (in SPSS) calculating individual and global risk (R! – sdcMicro) calculating strata allocation, based on individual risk averages by strata (R! – bethel) stratified sampling, based on the inclusion probability of a certain case (R! – sampling – samplecube) sample weights recalculation LFS 2010 PUF distributed in August 2013

17 www.serscida.eu


Download ppt "Research data workflow Practice in Slovenian Social Science Data Archives SERSCIDA WP4 – WORKSHOP Ljubljana September 2013."

Similar presentations


Ads by Google