Presentation is loading. Please wait.

Presentation is loading. Please wait.

Organizing Your Research Project: DATA Claudia Neuhauser University of Minnesota Informatics Institute.

Similar presentations


Presentation on theme: "Organizing Your Research Project: DATA Claudia Neuhauser University of Minnesota Informatics Institute."— Presentation transcript:

1 Organizing Your Research Project: DATA Claudia Neuhauser University of Minnesota Informatics Institute

2 From Data to Knowledge… Data – Text – Numbers – Images Knowledge – Understood by the human mind – Context Information – Processed data – categorization Source (Image): http://www.nickdiakopoulos.com/2011/12/16/data-information-knowledge-visualization/http://www.nickdiakopoulos.com/2011/12/16/data-information-knowledge-visualization/

3 …to Decision Making Using evidence from big (or small) data to make decisions – Education – Community engagement Smart cities Transportation Energy – Precision health care – Precision agriculture – Procurement – … Source: http://momentsinmyhead.files.wordpress.com/2 010/02/fork_in_road.jpg http://momentsinmyhead.files.wordpress.com/2 010/02/fork_in_road.jpg

4 Big Data Volume – Data size – “Each day, we create more than 70 times the amount of information in the Library of Congress.” (D. Walton, 2014) – Lots of small data… Velocity – Streaming data from sensors – Real-time analysis Variety – Data sources – Structured and unstructured data http://ad-exchange.fr/wp- content/uploads/2013/06/big-data.jpg

5 Big versus Small Data Most data are small Similar management but different challenges Data Life Cycle – Data Management Guide for Public Participation in Scientific Research https://www.dataone.org/sites/ all/documents/DataONE-PPSR- DataManagementGuide.pdf Tools at the Libraries – Data Management Plan – Metadata – Repositories https://www.lib.umn.edu/datam anagement/tools PlanCollectAssureDescribePreserveDiscoverIntegrateAnalyze Figure Source: DataOne (https://www.dataone.org/best-practices)

6 Planning Your Research Project: Learning from Design Treat it like a design problem – Identify gap and need – Define the problem Ask “Why?” repeatedly so that you don’t end up solving a problem that does not fill the gap – Explore the solution space Identify constraints – Iterate – Prototype Excel may be a good start—use it if it does the job to get you going More sophisticated tools may eventually be needed – Start at the end Don’t build a database before you know what you want to do Communication gap between data science and domain expertise – You start where you feel comfortable Data science: build a database Domain expert: what’s the gap in knowledge

7 Planning Your Research Project: Managing your Data Data management plan – Assign roles and responsibilities – Determine types of data and format Sharing of data – Expected schedule – Method of sharing – agreements – Confidentiality of data IRB approval – Long-term preservation – Metadata – Reusing vs. acquiring new data

8 Collaboration Communication among team members Trust Integrity Identifying roles Project management – Personal recommendation: Check out Asana Practical issues – Who owns the data? – Who can use the data for publications and how are team members acknowledged? – Who will access the data? – What happens if a member leaves the team? – Can different people access the data at the same time? – Who pays for data storage? – What happens to the data after the team disbands?

9 Data Processing “80% of the work in any data project is cleaning the data.” – D.J. Patil, U.S. Chief Data Scientist Quality control is essential Integrating different data sets can be very difficult and time consuming – Plan for it Metadata is essential during merging of data sets and re- use of data Missing and incomplete data Document what you did—you will forget the details Data modeling – Relationships among the different data tables

10 Analyzing Data “It’s an absolute myth that you can send an algorithm over raw data and have insights pop up.” – Jeffrey Heer, University of Washington and co-founder of Trifacta Don’t be afraid to explore data with user-friendly tools – Excel PowerPivot – Tableau Be aware of erroneous patterns in your data – Multiple hypothesis testing

11 Communicating Results What a technical user wants to see… What a stakeholder wants to see…

12 Research Data Management Policy New policy (January 2015) – Uwide Policy Library Research Data Management: Archiving, Ownership, Retention, Security, Storage, and Transfer establishes high level guidance for coordinating the institution’s efforts to satisfy the research data storage and infrastructure needs clarifies ownership and stewardship of research data – Students data ownership similar to copyright PI as steward of data Use Case Categorization Scheme Committee

13 Research Data Recorded factual material commonly accepted in the scientific or scholarly community as necessary to validate research findings, excluding preliminary analyses, drafts of scholarly or scientific work, plans for future research, peer reviews, communications with colleagues and physical objects (e.g., laboratory samples).

14 Ownership (Policy) Unless superseded by specific terms of sponsorship or other agreements or University policy (e.g., Copyright), the University owns all research data generated or acquired by University employees (faculty and staff) or non- student trainees or fellows (not employed by the University) through research projects conducted at or under the auspices of the University of Minnesota, regardless of funding source. – Students own research data that they generate or acquire in their academic work, unless the research data are: – generated or acquired within the scope of their employment at the University; – generated or acquired through use of substantial University resources; or – subject to other agreements that supersede this right (e.g., Research Data Ownership Acknowledgment form signed by student and PI). Research data generated or acquired by students outside of their academic work or by volunteers through research projects conducted at or under the auspices of the University of Minnesota, regardless of funding source, are owned by the University unless superseded by specific terms of sponsorship or other agreements.

15 Stewardship (Policy) Principal Investigator (PI) – Determines what needs to be retained in sufficient detail and for an adequate period of time. – Manages access to research data. – Selects the vehicle for publication or presentation of the data. – Shares research data, including placing research data in public repositories, unless specific terms of sponsorship or other agreements supersede these rights. – Is responsible for ensuring that critical, high-value research data under their stewardship are preserved. – Educates all participants in the research project about their obligations regarding research data. – Alerts Sponsored Projects Administration (SPA) if a grant or contract may require management of research data that go beyond standard requirements.

16 Retaining and Archiving Data PIs are responsible for ensuring that critical, high-value research data under their stewardship are preserved. The PI is responsible for determining what needs to be retained in sufficient detail and for an adequate period of time to enable appropriate responses to questions about accuracy, authenticity, primacy, and compliance with laws and regulations governing the conduct of research. PIs must retain research data for at least the minimum period required by applicable laws and regulations, sponsorship requirements, or other agreements. PIs may choose to retain the data beyond the minimum period, up to any deadline specified by laws, regulations or other agreements. PIs must destroy research data when required by laws, regulations, or other agreements, on or before a specified deadline, and follow the applicable process for destroying research data


Download ppt "Organizing Your Research Project: DATA Claudia Neuhauser University of Minnesota Informatics Institute."

Similar presentations


Ads by Google