Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.

Similar presentations


Presentation on theme: "Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009."— Presentation transcript:

1 Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009

2 2 Good data management good research high quality data needs to be planned specific for purpose data can be understood and used now and in future data can then be shared and re-used

3 3 Can you understand / use these data? SrvMthdDraft.doc SrvMthdFinal.doc SrvMthdLastOne.doc SrvMthdRealVersion.doc

4 4 Quality control Data quality control at various stages: data collection –e.g. instrument calibration; expert opinion; multiple measurements; computer assisted interviews data entry, digitisation, transcription and coding - standardised and consistent procedures –e.g. set up validation rules for data entry; use input masks; detailed variable labelling; missing value coding; use controlled vocabularies or choice lists; best structure to organise data and data files data checking and verifying - automated and/or manual –e.g. double entry; check for out-of-range values; apply random sample validation; statistical analyses (descriptives, frequencies, means, range, clustering) to detect errors or find anomalous values; verify data completeness

5 5 Data formats choice of software format for digital data: –planned data analyses –software availability –hardware used –discipline specific standards and customs digital data software dependent digital data endangered by obsolescence of software/hardware best formats for long-term preservation - standard formats, interchangeable formats, open formats –e.g. tab-delimited; comma-delimited (CSV); ASCII; OpenDocument format; SPSS portable; XML

6 6 Data format conversions convert data for preservation or back-up, e.g. export, save as beware of conversion errors: –loss of internal metadata >e.g. convert MS Access to tab-delimited tables –loss of editing, formatting, formulae >e.g. convert MS Word to RTF –truncation or loss of data >e.g. string variables lost in SPSS – STATA conversion check for errors and changes after conversion Example 1: MS Excel to tab-delimited Example 2: Word to XML Example 3: Proprietary audio file (DVF) to WAV

7 7 MS Excel format Tab–delimited text format

8 8 Version control keep track of different copies or versions of data files which methods: › single site vs. across locations › single vs. multiple users › different versions to be stored vs. files to be synchronised single user of data files: › file naming – unique file names with date or version number (avoid spaces!) e.g. FoodInterview_1_draft; FoodInterview_1_final; HealthTests_06-04-2008; BGHSurveyProcedures_00_04 › version control table or file history within or alongside data file › version control facility within software, e.g. MS Windows software multiple users of data files › same as above › control rights to file editing: read/write permissions, e.g. Windows Explorer › versioning/file sharing software: check files out/in, e.g. SVN, VSS, Google Docs, Amazon S3 › manual merging of multiple entries/edits synchronise files, e.g. MS SyncToy software

9 9 Authenticity of data master files assign responsibility for master files record changes to master files

10 10 Data storage digital storage media unreliable file formats and physical storage media ultimately become obsolete optical (CD, DVD) and magnetic media (hard drive, tapes) vulnerable and subject to physical degradation Best practice: use data formats with long-term readability storage strategy with at least two different forms of storage copy/migrate data files to new media between two and five years after first created check data integrity of stored data files at regular intervals (checksum) know your back-up strategy: institutional/personal; network server/PC/laptop maintain original copy, external local copy and external remote copy test file recovery Data Protection Act and data back-up – may require minimal data copies for personal data; secure storage

11 11 Example: data storage and preservation at UKDA  preservation copy (UKDA)  shadow copy (UKDA)  dissemination copy to reduce load on main system  near-site online copy (on campus)  off-site online copy  tape-based offline copy (UKDA) Multi-copy, multi-storage media and multi version resilience: scheduled nightly robotic 3-monthly

12 12 Good data management practice plan data management early assign roles and responsibilities design data management according to needs and purpose of research data management throughout research

13 13 Resources ESDS (2008). Guide to good practice: micro data handling and security. http://www.esds.ac.uk/news/publications/microDataHandlingandSecurity.pdf http://www.esds.ac.uk/news/publications/microDataHandlingandSecurity.pdf Finch, L. & Webster, J. (2008). Caring for CDs and DVDs. NPO Preservation Guidance. Preservation in Practice Series. London, National Preservation Office. Available at http://www.bl.uk/npo/pdf/cd.pdfhttp://www.bl.uk/npo/pdf/cd.pdf UK Data Archive (2009). Manage and Share Data. http://www.data- archive.ac.uk/sharing/http://www.data- archive.ac.uk/sharing/ See: http://www.data-archive.ac.uk/sharing/furtherstorage.asp http://www.data-archive.ac.uk/sharing/furtherstorage.asp


Download ppt "Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009."

Similar presentations


Ads by Google