Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Spreadsheets in Research – Best Practices

Similar presentations


Presentation on theme: "Using Spreadsheets in Research – Best Practices"— Presentation transcript:

1 Using Spreadsheets in Research – Best Practices
Jan Cheetham, PhD (DoIT Academic Technology) Barry Radler, PhD (UW Institute on Aging) Jan and I became interested in how spreadsheets were used in research because they are used so much (and in other areas besides science, like the humanities) yet there seemed to be few ways to standardize the way they manage data and metadata. Metadata is information about data and while spreadsheets can contain a wealth of data, they do not contain very much metadata to help interpret their contents. We decided to explore how we could develop some standards and find some tools that would help address this deficit. RDS Brownbag, October 8, 2013

2 Spreadsheets in Research
Usage is ubiquitous Problems with documentation, reproducibility Example: Reinhart and Rogoff paper scandal (April, 2013) Cited 567 times since 2010 Referenced in stimulus/austerity debates Famous example: The original research by Carmen Reinhart and Ken Rogoff was titled "Growth in a Time of Debt" claimed that economic growth slowed quite dramatically for countries whose public debt crossed a threshold of 90% of Gross Domestic Product. Since its publication, this finding has often been cited in stimulus/austerity debates (and 567 times in academic literature), but many economists were unable to replicate it, in part because of the authors' reticence to share their original data. 6/28/2019

3 They eventually did share their data and a grad student found an Excel spreadsheet error: when computing the average economic growth for countries whose public debt exceeded 90% of GDP, they accidentally omitted 5 countries. This error was buried inside a cell where the formula resided. Fixing this error changed the findings and basis for their paper. So with this cautionary tale of a real-world problem in which using a spreadsheet for analysis caused a huge error, Jan is going to discuss some best practices.

4 Recommendations for Spreadsheets
Save a RAW copy of each spreadsheet Take a class, become a power spreadsheet user See resources at: data/spreadsheets/ Consider moving to another software for analysis Make your spreadsheets more: Machine readable Human readable 6/28/2019

5 Machine-readable 6/28/2019

6 Human-readable A separate document that describes and explains:
The data set Variables (field names, column headings, etc.) Data values (codes, data labels, etc) Type of values and formats What data values mean; code lists Formulas and analysis steps 6/28/2019

7 One Tool: Colectica for Excel
Variable Dataset I’m going to talk about one tool (it’s free!) that appends metadata to spreadsheets and can help document their contents. Colectica realized that people aren’t going to abandon spreadsheets because they are difficult to document. They decided to accommodate people’s behavior by creating: Colectica for Excel (CFE) treats spreadsheets in Excel as flat file datasets. Codes Catgories? 6/28/2019

8 Colectica for Excel Documents and identifies:
Datasets Variables Code lists Categories Add more metadata than Excel supports on its own Colectica for Excel is based on DDI CFE describes and documents READ LIST DDI is an XML-based standard for describing social science and bio-medical data. It essentially creates dynamic electronic codebooks that are human- and machine-readable). Not only does Colectica for Excel help the researcher better document their spreadsheet, but makes the spreadsheet compliant with DDI and can be ingested into any DDI system (archives, repositories, publishers?).

9 Colectica for Excel Enables reuse of metadata through unique identification Variables can share the same code list Records audit trail through versioning Reuse of information is one of the key advantages of DDI.

10 Document Datasets DEMO: The CFE app adds a tab to Excel called Data Documentation.

11 Document Variables

12 Record Custom Fields

13 Document Code Lists

14 Reuse Code Lists

15 Publish Documentation
Generate Codebooks from Excel PDF, HTML, Word, XSL-FO Publish DDI 3.1 XML

16 Resources Colectica for Excel: DataUp dataup.cdlib.org/
I demo-ed Colectica because I’m familiar with it and DDI. There are other tools available from organizations like DataUp. DataUp dataup.cdlib.org/

17 Research Data Services “Practice good data management!”
Resources Research Data Services researchdata.wisc.edu “Practice good data management!” You can find just about everything we talked about today (and more!) on the RDS website, managing data, using spreadsheets.


Download ppt "Using Spreadsheets in Research – Best Practices"

Similar presentations


Ads by Google