Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research Data Management Gareth Cole. Data Curation Officer. 14 th January 2015.

Similar presentations


Presentation on theme: "Research Data Management Gareth Cole. Data Curation Officer. 14 th January 2015."— Presentation transcript:

1 Research Data Management Gareth Cole. Data Curation Officer. 14 th January 2015

2 Today’s Session Introductions Data Storage Data Sharing Open Access

3 Introductions Who are we? Who are you and why are you here?

4 Why Manage Data? Short-Term: Increase efficiency and save time Simplify your life Meet funder and institutional requirements Long-term: Preserve your data Easier Sharing and collaboration Raise your visibility and research profile

5 Roles and Responsibilities in RDM

6 PGR Policy On open access to research papers and research data management (RDM) – http://hdl.handle.net/10036/4279 http://hdl.handle.net/10036/4279 RDM: Encourage good practice in RDM. Compliance with funder policy. Data should be made available on open access when legally, commercially and ethically appropriate.

7 Guidance Annual data review with supervisor (checklist) “The lead PGR Supervisor and the PGR student should discuss and review research data management annually.” Responsibilities “Responsibility for ongoing, day-to-day management of their research data lies with PGR students. Where the PGR is part of a project, data management policy will be set and monitored by the Principal Investigator (PI) and the PGR will be expected to comply with project guidelines. The lead PGR Supervisor is responsible for advising the PGR student on good practice in research data management.” Online guidance, training workshops and face-to-face advice service.

8 RCUK Common Principles on Data Policy The Common Principles are available on the RCUK websiteThe Common Principles are available on the RCUK website: Open Data Accessible Data Discoverable data Legal, ethical and commercial considerations should be considered Privileged use of the data allowed Data use should be acknowledged Public funds can be used to support the management and sharing of publically-funded research data

9 ESRC Policy Data management plan Period of privileged use: Data must be made available or archived within three months of end of the award. Data Sharing/Archiving: Via UKDSUKDS “Where research data are considered confidential or contain sensitive personal data, award holders must seek to secure consent for data sharing or alternatively anonymise the data in order to make sharing possible.” Monitoring: The final payment of a grant may be withheld if data has not been offered for deposit to the required standard. Costs: The ESRC will review any costs associated with implementing the data plan.

10 Useful Links University of Exeter PGR policy Draft supervisor checklist RCUK Common Principles on Data Policy ESRC research data policy ESRC DCC summary of funders’ RDM policies

11 Storing your Data Data storage Back-up File naming Versioning Data security, encryption and destruction Where to find further information

12 Which is the final version?

13 Data Storage 1 Where will you be working: at home; in the office; both; fieldwork? Will you be working collaboratively? U:Drive – 20GBs allowance ExeHub – 1TB through OneDrive but… Cloud storage (not for sensitive or confidential data) Computer hard drive External hard drives and USB sticks Hard copy of documents

14 Data Storage 2 Remember: File formats and physical storage media become obsolete. All digital media are fallible: optical (CD, DVD) and magnetic media (hard drive, tapes) degrade Never assume the format will be around forever Storage strategy best practice: At least two storage formats Prefer open or standard formats – e.g. OpenDocument Format (ODF), comma-separated values Some proprietary data formats such as MS Excel are likely to be accessible for a reasonable, but not unlimited, time Maintain original copy, external local copy and external remote copy Copy data files to new media two to five years after first created Check Check data integrity of stored data files regularly (checksum e.g. FastSum) FastSum

15 Non-Digital Storage Always follow the procedures stated in your ethical approval Confidential items, e.g. signed consent forms, interview notes Store securely, behind a lock Separate from data files Printed materials, photographs Degradation from sunlight and acid (sweat on skin, in paper) Use high quality media for long-term storage/preservation. e.g. using acid-free paper & boxes, non-rust paperclips (no staples)

16 Why Back-up? Back-ups are additional copies that can be used to restore originals. Protect against: software failure, hardware failure, malicious attack, natural disasters e.g. University of Southampton fire It’s not backed-up unless it’s backed-up with a strategy Backing-up need not be expensive 1Tb external drive = around £50

17 Back-Up Strategy Know your institutional and personal back-up strategy: What’s backed-up? - all, some data? Where? - original copy, external local and remote copies What media? - CD, DVD, external hard drive, tape, etc. How often? – assess frequency and automate the process For how long is it kept? Verify and recover - never assume, regularly test a restore Make sure you know which version is the most up to date...

18 File Naming File name = principal identifier of file Easy to: identify, locate, retrieve, access Provides context e.g.: version number e.g. FoodInterview_1.1 date e.g. HealthTest_2011-04-06 content description e.g. BGHSurveyProcedures creator name e.g. CommsPlanGJC

19 File Naming: Best Practice Brief and relevant No special characters, dots or spaces For separation use underscores _ or - Name independent of location Date: YYYY_MM_DD Have a System! Consistent and logical naming system Develop a system with colleagues for shared data

20 Version Control Tools/Strategies Record file status/versions Record relationships between files e.g. data file and documentation; similar data files Keep track of file locations e.g. laptop vs. PC

21 Version Control: Single User File naming; unique file name with date or version number Version control table or file history alongside data file

22 Version Control: Multiple Users Control rights to file editing: read/write permissions e.g. Microsoft Office Versioning/file sharing software: e.g. Google Drive, Amazon S3 Merging of multiple entries/edits

23 Version Control: Multiple Locations Synchronise files e.g. MS SyncToy software, DropBox Use remote desktop

24 Encryption: Personal or Sensitive Data Encrypt anything you would not send on a postcard for moving files e.g. transcripts for storing files e.g. shared areas, mobile devices Free software that is easy to use: 7-Zip – this is what the University recommends: http://as.exeter.ac.uk/it/infosec/encryptfiles/7-Zip http://as.exeter.ac.uk/it/infosec/encryptfiles/

25 Data Destruction When you delete data and documentation from a hard drive, it is probably not gone: Files need to be overwritten to ensure they are irretrievably deleted: BCWipe - uses ‘military-grade procedures to surgically remove all traces of any file’BCWipe If in doubt, physically destroy the drive using an approved secure destruction facility Physically destroy portable media, as you would shred paper

26 Data Security Protect data from unauthorised access, use, change, disclosure and destruction Personal data need more protection – always keep separate Control access to computers passwords anti-virus and firewall protection, power surge protection networked vs non-networked PCs all devices: desktops, laptops, memory sticks, mobile devices all locations: work, home, travel restrict access to sensitive materials e.g. consent forms, patient records Proper disposal of equipment (and data) even reformatting the hard drive is not sufficient Control physical access to buildings, rooms, cabinets

27 Further Information University of Exeter Code of Good Practice in the Conduct of Research Data back-up Storage Organising Files Ethical approval Data security and destruction guidance from Information SecurityInformation Security External Back-up advice from UK Data ArchiveUK Data Archive UKDA checksum exercise

28 Data Sharing Two stages of your project when you may share data “Live” sharing during the project Making your “completed” data available at the end of your project

29 Why Should you Share your Data Benefits – “Live” data Increased collaboration opportunities with colleagues Increased exposure of your current work Increased efficiency across research group Benefits – “Completed” data Increased citation counts Increased exposure for your work Increased chance of collaboration in the future Allows others to build on your research Policy RCUK Common Principles on Data Policy University Policy

30 Group Exercise One Thinking of the data you have shared: What are the pros and cons of the different methods you have used? What issues did you face when sharing your data? Why haven’t you shared data? Feedback to the group.

31 Sharing your Data – During your Research With your supervisor; with project colleagues; with external interested parties Cloud Storage – Dropbox, Googledrive, OneDrive etc. Not recommended for sensitive or personal dataDropboxGoogledriveOneDrive Email – issues with large data and/or sensitive data. Potential version control problems USB sticks – easily lost. Can transfer viruses External hard drives – less suitable if collaborator is at a different institution Websites – lack of permanency. Need internet connection. May not have access rights to the site FTP – Not secure. Data can be intercepted Hard copy documents – one of a kind

32 How to Share your Data – at the end of your Research Archive Repositories Discipline specific archive Archaeology Data Service UK Data Service Wellcome Trust list of Data Repositories (Inter)national archive UKDA University repository – Open Research Exeter (ORE)University repository Link your data with your thesis/research papers Websites Link from your University personal webspace to data in a repository Link from academic network sites Academia.edu, ResearchGate.netAcademia.eduResearchGate.net

33 Issues in Data Sharing Ethical and Data Protection Act Copyright and legal issues File size Metadata Discoverability of the data Re-use of data Documentation of data File format – open or proprietary What to share Quality control and versioning

34 Ethical and DPA Issues Not all data can be shared. You must ensure that you don’t share data you are not allowed to: Abide by your ethical approval Abide by the Data Protection Act Are you sharing this data securely? Have you got consent to share the data? Use Cloud Storage wisely – not for sensitive data Getting consent: Advice from the UKDAAdvice from the UKDA Ethics advice from: College Ethics Officers (Exeter username and password needed)College Ethics Officers DPA Advice: recordsmanagement@exeter.ac.ukrecordsmanagement@exeter.ac.uk

35 Copyright and Legal Issues You must abide by any contract you or your project group have signed: This may state that you are not allowed to share the data or it may include the conditions of sharing You must be aware of who owns the copyright for the data you are sharing: You may not be allowed to share it You must get permission from the copyright owner before sharing data Also applies to data in your thesis Advice from JISC Digital Media on using imagesJISC Digital Media

36 File Size Large files cannot be emailed Some files may not fit onto USB sticks How do you know if a file has been received? Use the University’s File Drop Box (up to 600MB)File Drop Box Large files can take a long time to upload to Cloud Storage

37 File Format Is the file format you are using widely used? If not, can you migrate it to a more widely used format? E.g..xlsx (Excel);.pdf Is the format you are using an “open” format or is it “proprietary”? Open formats can be more easily accessed by other researchers e.g. SPSS files can be saved as.csv files. Word files can be saved as an Open Document format (.odt rather than.docx) Make sure you don’t lose important information when migrating formats See advice from the UKDAUKDA

38 38 Example: Format Conversion MS Excel format Tab–delimited text format Loss of annotation

39 Metadata Record the metadata as you collect/create your data Have you provided information about the data with the data you share? It is needed for discoverability, reuse, reproducibility and verification etc. E.g. Author Title Date of creation Publisher Abstract Description of the data University web page

40 Example of Metadata Record in Institutional Repository

41 Supporting Documentation Have you provided enough information for another researcher to be able to understand, retrieve, validate and re-use the data? Where was the data created? How was the data created? What hardware and software were used? What methodologies were used? What assumptions did you make in your experiments? Why are there anomalies in your data? Along with the metadata, the documentation should enable the data to be understood and reusable independently of any other publications, data etc. Advice from the UKDAUKDA

42 What to Share? You don’t need to share all your “live” data Only data that is helpful and useful to the recipient What to archive? Consider policy/legal requirements In collaboration with your supervisor or PI develop a set of criteria: Only the data supporting your publications? Data that can reproduce your results? Data that can validate your results? How unique or significant is your data? University web pages

43 Data Re-use Data citation is becoming more common Get credit for all your research If others use your data it can increase your citation rates. Sharing can mean that your data is re-used in areas you didn’t think it could be. E.g. ships’ logs are being used by climate scientists Prof. Tim Naylor on data sharing: ‘I have examples of people who could have simply lifted the data, gone away and done something with it and given me a citation for it; but actually they have come to me and said, “OK, I’ve got this data, which is yours, we’re interested in it, but we need your expertise to interpret it” and then I get a co-authorship out of it as well.’

44 Open Access to Completed Data You will be required to make your data available on Open Access where appropriate RCUK Common Principles on Data Wellcome Trust Policy StatementPolicy Statement UoE policies Link publications and supporting data RCUK requires a statement in published research papers saying where the supporting data can be accessedRCUK Archives and repositories

45 Further Information External: UKDA guidance on “Planning for Sharing” DCC table showing if your funder provides a data centre NIHR Research Governance Framework For Health and Social Care “Data relevant to findings should also be accessible.” (p. 14)Research Governance Framework For Health and Social Care Internal: Exeter research data management web pages Exeter University’s Institutional Repository (ORE) PhD student’s experience of copyright issues Contact rdm@exeter.ac.uk for help and advicerdm@exeter.ac.uk

46 Ethical and Legal Issues in Data Sharing Ethical arguments for archiving data Duty of confidentiality/DPA Options for sharing personal/sensitive data

47 Ethical Arguments For Archiving Data Store and protect data securely Not burden over-researched, vulnerable groups Make best use of hard-to-obtain data (e.g., elites, socially excluded) Extend voices of participants Provide greater research transparency Enable fullest ethical use of rich data

48 Duty of Confidentiality UK: Duty of confidentiality exists in common law Public interest can override duty of confidentiality; best practice is to avoid vague or general promises in consent forms If participant consents to share data, then sharing does not breach confidentiality

49 Data Protection Act, 1998 Personal data: Relate to living individual Individual can be identified from those data or from those data and other information Include any expression of opinion about the individual Fair processing: Open and transparent Justified and reasonable; not kept longer than necessary Processed in accordance with the rights of data subjects, e.g. right to be informed about how data will be used, stored, processed, transferred, destroyed, right to access info and data held Security Protect against unauthorised access, data loss, damage to data Not transferred abroad without adequate protection Only disclosed if consent has been given to do so (except legal duty)

50 Sensitive Personal Data Sensitive personal data Refers to individual's race or ethnic origin, political opinion, religious beliefs, trade union membership, physical or mental health, sex life, criminal proceedings or convictions. Can only be processed for research purposes if: Explicit consent (ideally in writing) has been obtained; or Medical research by a health professional or equivalent with duty of confidentiality; or Analysis of racial/ethnic origins for purpose of equal opportunities monitoring; or In substantial public interest and won’t cause substantial damage and distress.

51 Sharing Personal Data In groups discuss: Can we ensure the fair processing of personal data if the data is to be shared at the end of a project? How? Can we ensure the security of personal data whilst allowing data to be shared at the end of a project? If so, how? 5 minutes Feedback to whole group

52 Options for Sharing Confidential Data Obtain informed consent Protect identities e.g. anonymisation or by not collecting personal data. If data are anonymised (personal identifiers removed) then DP laws will not apply as these no longer constitute ‘personal data’ Restrict /regulate access where needed (all or part of data) Securely storing personal or sensitive data

53 Informed Consent What is “informed” consent? Purpose of the research What is involved in participation Benefits and risks Mechanism of withdrawal Data uses – primary research, storing, processing, re-use, sharing, archiving Strategies to ensure confidentiality of data where this is relevant – anonymisation, access restrictions Informed consent for unknown future uses A great deal of information can be provided: Who can access the data Purposes – research or teaching or both/other Confidentiality protections, undertakings of future users General consent

54 Consent through the Research Lifecycle Plan early in research Consider jointly and in dialogue with participants 1. Engagement in the research process E.g. Decide who approves final versions of transcripts 2. Dissemination in presentations, publications, the web Decide who approves research outputs 3. Data sharing and archiving Consider future uses of data

55 Consent Form Meets requirements of Data Protection laws Simple Avoids excessive warnings Complete for all purposes: use, publishing, sharing UK Data Archive model consent form

56 When to Ask for Consent ProsCons One-offSimple Least hassle for participant Research outputs (not known in advance) Participants will not know all content they will contribute ProcessMost complete for assuring active consent Might not get consent needed before losing contact Repetitive, can annoy participant

57 Do Participants Consent to Share Data? Foot and mouth disease in N. Cumbria (2001-2003) Foot and mouth disease in N. Cumbria Sensitive community information 40/54 interviews; 42/54 diaries; audio restricted Deposited in UKDS Medical research and biobank models Enduring, broad, open consent No time limits; no recontact required Wales Cancer Bank: 99% consent rate for 2500+ patients – samples of tumour, normal tissue and blood plus anonymised data sets for researchers.Wales Cancer Bank

58 Anonymisation A person’s identity can be disclosed through: Direct identifiers e.g. name, address, postcode, telephone number, voice, picture Often not essential research information (for administrative use) Indirect identifiers e.g. occupation, geography, unique or exceptional values or characteristics Possible disclosure of identity in combination with other information

59 Key points for Anonymising Never disclose personal data - unless consent for disclosure Reasonable/appropriate level of anonymity Maintain maximum meaningful information for context, do not over-anonymise Where possible replace rather than remove Re-users of data have the same legal and ethical obligation to not disclose confidential information as primary users

60 Anonymising Quantitative Data Remove direct identifiers e.g. names, address, institution, photo Reduce the precision/detail of a variable through aggregation e.g. birth year vs. date of birth, area rather than village Generalise meaning of detailed text variable e.g. occupational expertise Restrict upper lower ranges of a variable to hide outliers e.g. income, age Combine variables e.g. rural/urban variable for place variables

61 Geo-Referenced Data Spatial references (point coordinates, small areas) may disclose position of individuals, organisations, businesses Removing spatial references prevents disclosure; but also all geographical and related information lost Possible solutions Reduce precision: Replace point coordinate with larger, non-disclosing geographical area e.g. km 2 area, postcode district, road Replace point coordinate with meaningful variable typifying the geographical position; or summary statistics of location e.g. catchment area, poverty index, population density Keep spatial references and impose access restrictions on data

62 Anonymising Qualitative Data Don’t collect disclosive data unless necessary. Edit at time of transcription except longitudinal studies - anonymise when data collection complete (linkages) Avoid blanking out; use pseudonyms or replacements. identify replacements, e.g. with [brackets] Avoid over-anonymising - removing/aggregating information in text can distort data, make them unusable, unreliable or misleading. Consistency within research team and throughout project. Keep anonymisation log of all replacements, aggregations or removals made – keep separate from anonymised data files.

63 Anonymising Qualitative Data

64 Access Control Essential when anonymisation ineffective or damaging to quality E.g. visual or audio data Gradation of access controls Open Access Metadata only – contact details for requesting data reuse, End User Licence Dark Archive Embargo for given time period Multiple access controls can apply to different data types within one study

65 Useful Links Getting consent: Advice from the UKDAAdvice from the UKDA Ethics advice from: College Ethics Officers (Exeter username and password needed)College Ethics Officers DPA Advice: recordsmanagement@exeter.ac.ukrecordsmanagement@exeter.ac.uk

66 Using Secondary Data Data review Copyright and data sharing Data citation

67 Data Review Good practice to demonstrate that no suitable data are available for re-use before collecting new data; data review (e.g. via UKDS) UKDS You must abide by any contract you or your project group have signed about using secondary data: This may state that you are not allowed to share data or include the conditions of sharing

68 Copyright Find out who owns the copyright for the data you are using/sharing: Copyright permissions must be sought and granted prior to data sharing/archiving. Also applies to data in your thesis/research papers e.g PhD student’s copyright case studyPhD student’s copyright case study Copyright holders give permission to data archives to preserve data and make them accessible to users. Data archives publish data – they hold no copyright.

69 Data Citation 1 Data citation: Good research practice. Acknowledges the author's sources. Makes identifying data easier. Promotes the reproduction of research results. Makes it easier to find data. Allows the impact of data to be tracked. Provides a structure which recognises and rewards data creator

70 Data Citation 2 Include enough information so that the exact version of the data being cited can be located. Include a Digital Object Identifier (DOI) Each dataset used must have a separate citation. Example: University of Essex. Institute for Social and Economic Research and National Centre for Social Research, Understanding Society: Wave 1, 2009-2010 [computer file]. 2nd Edition. Colchester, Essex: UK Data Archive [distributor], November 2011. SN: 6614, http://dx.doi.org/10.5255/UKDA-SN-6614-2 http://dx.doi.org/10.5255/UKDA-SN-6614-2

71 Useful Links Advice from JISC Digital Media on using imagesJISC Digital Media UKDA advice on copyrightcopyright ESRC: Data Citation: What you need to knowData Citation: What you need to know How to cite census data

72 Data Management Plans (DMP) “Plans typically state what data will be created and how, and outline the plans for sharing and preservation, noting what is appropriate given the nature of the data and any restrictions that may need to be applied.” Digital Curation CentreDigital Curation Centre website

73 Why write a DMP? Many funders now require a DMP as part of the application process Helps the associated project with data management issues Makes the project members think about relevant issues

74 Data Management Planning Bids to most major funders now require a DMP outlining: Roles and responsibilities What data will be created and how Data formats Documentation of data Storage and back up Data sharing Long-term preservation and access... Guidance available on the University web pages: University web pages on data management plansUniversity web pages on data management plans

75 Helpful links DMPonline DCC policies Funder Information: ESRC

76 Any Questions Contact us: openaccess@exeter.ac.uk


Download ppt "Research Data Management Gareth Cole. Data Curation Officer. 14 th January 2015."

Similar presentations


Ads by Google