IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF-1115210).

Slides:



Advertisements
Similar presentations
Marylands Technology Education Voluntary State Curriculum 2007 Bob Gray Center for the Teaching of Technology & Science (ITEA-CATTS) and the University.
Advertisements

Dr Andy Pryke - The Data Mine Ltd An Introduction to R Free software for repeatable statistics, visualisation and modeling Dr Andy Pryke, The Data Mine.
Gakava L Roche Products Ltd., Welwyn, UK
Microsoft Access for beginners. What is a Database? Collection of information related to a particular subject or purpose Computer databases try to organize.
Building Support for a Discipline-Based Data Repository Ryan Scherle 1, Sarah Carrier 2, Jane Greenberg 2, Hilmar Lapp 1, Abbey Thompson 2, Todd Vision.
1 Champlain Valley Head Start Child Outcomes Assessment in Champlain Valley Head Start.
National Academy of Engineering of the National Academies 1 Phase II: Educating the 2020 Engineer Phase II: Adapting Engineering Education to the New Century...
E-Science Data Information and Knowledge Transformation Thoughts on Education and Training for E-Science Based on edikt project experience Dr. Denise Ecklund.
Welcome. Aims of the Workshop The overall aim of this workshop is for delegates to develop skills and techniques to enable them to carry out effective.
Done by: - Khalid Sheikhan Shames Al-Shukaili ID: Course code: - TECH 4211 Date: - 9/05/2012 Submitted to: - Dr. M.
UKOLN is supported by: Using Blogs Effectively Within Your Organisation: Introduction A Half-Day Workshop Brian Kelly UKOLN University of Bath Bath, UK.
DIScovery SciEnce through Computational Thinking (DISSECT) Enrico Pontelli.
1 Literacy PERKS Standard 1: Aligned Curriculum. 2 PERKS Essential Elements Academic Performance 1. Aligned Curriculum 2. Multiple Assessments 3. Instruction.
Cornell University January 2015 Sponsored by Cornell Statistical Consulting Unit Instructors Emily Davenport (Cornell University) Erika Mudrak (CSCU) Jeramia.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Data Cleaning, Validation and Enhancement iDigBio Wet Collections Digitization Workshop March 4 – 6, 2013 KU Biodiversity Institute, University of Kansas.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
National Science Foundation: Transforming Undergraduate Education in Science, Technology, Engineering, and Mathematics (TUES)
Dogan Seber, PhD San Diego Supercomputer Center University of California, San Diego I. DLESE Library II. DISCOVER OUR EARTH Earth Science Resources for.
Research Proposal Presentation, June 21, 2011: David South and Mary Shuman Integration of a Graphics-Based Programming Tool with Robotics to Stimulate.
Database Software Application
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Project Technician Education in Additive Manufacturing (T.E.A.M.) Frank Cox, PI Mel Cossette, Co-PI 1.
Overview WHAT IS MANUFACTURING? PRESENTATION
OnTimeMeasure Integration with Gush Prasad Calyam, Ph.D. (PI) Tony Zhu (Software Programmer) Alex Berryman (REU Student) GEC10 Selected.
Software Sustainability Institute Software Sustainability: Issues, Challenges and Initiatives Neil Chue Hong,
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
The Macroalgal Digitization Project Chris Neefus, Department of Biological Sciences University of New Hampshire, Durham, New Hampshire.
Learning Unit Documents and Examples. Learning Units - basic building block of a course For iGETT a Learning Unit consists of –Three parts Instructor.
University of Florida Florida State University
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
LIKES: Educating the Next Generation of Knowledge Society Builders Authors: Wingyan Chung, Edward A. Fox, Steven D. Sheetz, Seungwon Yang Presenter: Wingyan.
Being Smart with Graphs This material is based upon work supported by the National Science Foundation under Grant No. DRL ==≠≠ == Any opinions,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Component 4: Introduction to Information and Computer Science Unit 6a Databases and SQL.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Context: The Strategic Plan for Establishing the Network Integrated Biocollections Alliance Judith E. Skog, Office of the Assistant Director, Biological.
Geo-Needs An NSF funded project to explore barriers and opportunities for enhancing geoscience instruction at two-year colleges and minority- serving institutions.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Course, Curriculum, and Laboratory Improvement (CCLI) Transforming Undergraduate Education in Science, Technology, Engineering and Mathematics PROGRAM.
Fostering Sustained Impact: Lessons Learned from Geoscience Faculty Workshops Ellen Roscoe Iverson, Cathryn A. Manduca, Science Education Resource Center,
Computers Are Your Future Tenth Edition Spotlight 5: Microsoft Office Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
This material is based upon work supported by the National Science Foundation under Cooperative Agreement EF Any opinions, findings, and conclusions.
Software Sustainability Institute Data Carpentry Aleksandra Pawlik Software Sustainability Institute Data Science Club, 17 th March.
Cornell University June 2016 Sponsored by Cornell Statistical Consulting Unit Instructors Emily Davenport (Cornell University) Erika Mudrak (CSCU) Lynn.
Principles and Processes in Chemistry 100
Software Carpentry Workshop University of Nebraska – Lincoln Holland Computing Center Instructors: Dr. Jingchao Zhang, Natasha Pavlovikj, Carrie Brown.
Authors: Deborah Paul, Pam Soltis, Matt #ICE2016
A different kind of Carpentry
Teaching Computers.
Overview of the RHIS Rapid Assessment Tool
Discussion and Conclusion
| (269) | Western Michigan University
Institute for Leadership in Education Development (I-LED)
Written by: Jennifer Doherty, Cornelia Harris, Laurel Hartley
Florida State University
Data Management: The Data Repatriation Re-integration Step or …
Title of Poster Site Visit 2017 Introduction Results
Geo-Needs An NSF funded project to explore barriers and opportunities for enhancing geoscience instruction at two-year colleges and minority-serving institutions.
NH Computing Education Landscape Report
Title of Poster Site Visit 2018 Introduction Results
Designing, Implementing, and Benefiting from a Collections Attribution Channel: the view from iDigBio and the ADBC Alex Thompson, Deborah L. Paul, Gil.
This material is based upon work supported by the National Science Foundation under Grant #XXXXXX. Any opinions, findings, and conclusions or recommendations.
Presentation transcript:

iDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Filling a data literacy and computational literacy gap Presenter: Deborah Paul Florida State University Integrated Digitized Biocollections (iDigBio) at Biodiversity Information Standards (TDWG) 2014 Conference Elmia Congress Centre, Rydberg Hall, Jönköping, Sweden Oct 2014 Authors: Deborah Paul

2 Researchers are experiencing a lot of data pain and are frustrated or limited by their current workflows

3 I usually manage data in Excel and it's terrible and I want to do it better. I'm organizing GIS data and it's becoming a nightmare. My advisor insists that we store 50,000 barcodes in a spreadsheet, and something must be done about that. I'm having a hard time analyzing microarray, SNP or multivariate data with Excel and Access. I want to use public data. I work with faculty at undergrad institutions and want to teach data practices, but I need to learn it myself first. I'm interested in going in to industry and companies are asking for data analysis experience. I'm trying to reboot my lab's workflow to manage data and analysis in a more sustainable way. I'm re-entering data over and over again by hand and know there's a better way. I have overwhelming amounts of data. I'm tired of feeling out of my depth on computation and want to increase my confidence. Sentiments on data within the NSF BIO Centers (BEACON, SESYNC, NESCent, iPlant, iDigBio)

4 Goal: Develop and teach workshops to help train the next generation of researchers in good data analysis and management practices to enable individual research progress and open and reproducible research.

5 What’s Data Carpentry? Two day intensive workshops, modeled on Software Carpentry Learning objective: Researchers should be able to retrieve, view, manipulate, analyze and store their and other's data in an open and reproducible way. Data Carpentry is focused on data - The workshop introduces one data set at the beginning of the workshop. This data set is used throughout the workshop to teach how to manage and analyze data in an effective and reproducible way. Data Carpentry is designed for novices - there are no prerequisites, and no prior knowledge about the tools is assumed. Data Carpentry is domain specific by design.

6 A typical 2-day Data Carpentry Workshop Day 1 morning: Better spreadsheet skills and Introduction to more powerful tools Day 1 afternoon: Introduction to databases, combining and querying data using SQL. Day 2 morning: Introduction to R and managing data in R. Day 2 afternoon: Workflows, visualizing data, and making research repeatable.

7

8 Data Literacy and Computational Literacy

9 Consider this task: A database has two tables: Scientist and Lab. Scientist's columns are the scientist's user ID, name, and address; Lab's columns are lab IDs, lab names, and scientist IDs. Write an SQL statement that outputs the number of scientists in each lab.

10 Data Carpentry curriculum Preparing data for analysis How to organize data and use spreadsheet programs more effectively, but also to recognize their limitations. Getting data out of spreadsheets and into tools such as R or Python that allow for reproducible workflows and have more capabilities. Using databases, including managing and querying data in SQL. Workflows and automating repetitive tasks, in particular using the command line shell and shell scripts. Using data and computational resources, in particular publicly available ones such as Amazon, DataDryad and Figshare Overall, conducting data and computation-heavy research more efficiently, reproducibly and openly.

11 Data Carpentry instructor development and resources Training and supporting instructors is another primary goal of Data Carpentry Providing open source/creative commons materials for re-use Potentially acting as a hub for instructional materials on data analysis and management

12 Materials development Currently materials for multiple domains and topics and working with people in different domain to develop more Topics: Shell, R, Python, SQL, Excel, data cleaning, text mining, HDF5 Domains: Ecology, genomics, social science, neuroscience, geosciences

13 Community driven effort Data Carpentry board: Karen Cranston (NESCent), Hilmar Lapp (Duke), Aleksandra Pawlik (ELIXIR UK), Karthik Ram (rOpenSci), Tracy Teal (Michigan State), Ethan White (Univ of Florida), Greg Wilson (Software Carpentry) Contributors: 20 people contributing to materials development already 4 workshops taught, 11 instructors, ~20 helpers Open source materials

14 Tack så mycket! Tracy K Teal, Michigan State University Francois Michonneau, iDigBio Post Doc Katja Seltmann, AMNH, TTD - TCN Matt Collins, iDigBio Kevin Love, iDigBio Reed Beaman, iDigBio SESYNC, iPlant, BEACON, NESCent, And the Data Carpentry Board Work presented here made possible by many and especially…

iDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. facebook.com/iDigBio twitter.com/iDigBio vimeo.com/idigbio idigbio.org/rss-feed.xml webcal:// Find out more at