Presentation on theme: "Science Gateways and their role in Reproducibility Nancy Wilkins-Diehr San Diego Supercomputer Center"— Presentation transcript:
Science Gateways and their role in Reproducibility Nancy Wilkins-Diehr San Diego Supercomputer Center firstname.lastname@example.org
That reproducibility is a problem has already been established, but Brian Granger (IPython developer) talk at UCSD, May 2014 – Computing (thus software) is one of the foundations of data science – Important decisions being made on these data Political, financial, institutional, peer review system, social – Several recent examples of errors in academic data “Growth in a Time of Debt”, Reinhard and Rogoff (2010), Herndon, Ash, Pollin (critique, 2013) “Capital in the 21 st Century”, Piketty (2014) BICEP2 (2014)
Software designed for specific purposes May do what it does well, but if it’s not designed to enforce reproducibility it will be nearly impossible for a user to achieve that – Excel – almost impossible to design a reproducible experiment – Github – almost impossible not to design a reproducible experiment
Many have used science gateways to address reproducibility IPython notebooks – Perez, Fernando, Brian E. Granger, and C. P. S. L. Obispo. "An Open Source Framework For Interactive, Collaborative And Reproducible Scientific Computing And Education." (2013). Galaxy – Goecks, Jeremy, Anton Nekrutenko, and James Taylor. "Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences." Genome Biol 11.8 (2010): R86. VisTrails – Freire, Juliana. "Making computations and publications reproducible with vistrails." Computing in Science & Engineering 14.4 (2012): 18-25. nanoHUB – Lundstrom, Mark, and Gerhard Klimeck. "The NCN: science, simulation, and cyber services." Emerging Technologies-Nanoelectronics, 2006 IEEE Conference on. IEEE, 2006.
Issues for discussion What issues do gateway designers need to consider for reproducibility so they can follow the Github model and not the Excel model? What happens when a software framework itself goes away? What needs to be considered? What does it mean to be reproducible for the long term? How long? How is this possible?