Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,

Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz, Austria

Outline 1.Motivation – Repeatability in Empirical Research 2.Our Approach– The Data Restore Model 3.Outlook– Status of this Work / Next Steps Seite 2

Repeatability in Science Fundamental criterion – to verify is the job of the community Experiments must lead to the same findings different researchers under certain constant parameters Further Robustness (w.r.t measuring errors, etc.) Repeatability vs. Reproducibility vs. Verifiability Seite 3

Repeatability in Economics and the infamous case of Rogoff and Reinhard Seite 4

Improving Review Processes Seite 5 - Justin Wolfers, Betsey Stevenson, economists at University of Michigan....so we need access to the data If we try it all on our own and cannot reproduce the results, what does it mean?

McCullough – Experiences & Recommendations Seite 6

McCullough – Requirements & Experiences Seite 7

McCullough – Requirements & Experiences Seite 8

Sweave – Literate Programming for Statistics Seite 9

Sweave – Literate Programming for Statistics Seite 10

Data Publishing in Economics / Social Sciences Different disciplines have different challenges Characteristics of empirical research: sensitive / protected data distributed external data sources Seite 11 Data Sharing submit data bundles to 3 rd -party repositories?

? Data Management The Black Box Approach data review curation legal situation re-usetransparency repeatability Seite 12 a data set copy (some resource bundle)

Statistical Data on the Semantic Web Seite 13

Data Restore Model Seite 15 Spreadsheet obsdata set

Data Restore Model Seite 16 Spreadsheet obsdata set

DataSet type UserDataSet Data Items type Data Items from own survey includesData external dataset buildScript No gaps Trust Incentive 17

Seite 18 Source: EuroStat Dataset: Household XZ Version: 0.2 Published: Jan 2009 [read more]

Integration with Research Environments Seite 19

Seite 20

Review and Re-use Seite 21 Client Source Code Repository Archive DArchive C Archive B Archive A DOI Code and Data Templates Authenticate & Request Data

Data Infrastructure Concept One source per data set transparency, curation by highest expertise Data protection make data publishing possible for all scenarios Data and code integration one-click-solution – no manual efforts for replication attempts Precise Citation traceable data provenance Seite 22

Incentives for the Research Community Transparency increases trust: no gaps – trust – incentive Easy re-use: the research models applied live longer More impact: more citation Seite 23

Incentives for the Research Community Material for tutorials: Students learn computational research in practice Research is more efficient: Easier to understand and pick up the research of others Secured Knowledge: Replication attempts in different research environments and context discussion, inspiration, innovation Non-Findings may get more recognition Seite 24

What we are currently working on Seite 26 The Rogoff and Reinhard / Herndon case apply Data Restore Model add semantic data documentation (partly available as RDF already) model by Data and Code ontology

Data and Code Ontology Seite 27 Data and Code System Environment Resources HW SW Replication Attempts Experiment Setup Maven Make Build Virtualisation Emulation Linked Science Social Media Data References Semantic Coding?

What we are currently working on Seite 28 The Koenker Zeileis case Model relations between Data and Code instances protected public use file figures data set transformation by code The Koenker Zeileis case

Data Access and Retrieval

Next Steps Seite 30 1.Challenge, Goals, Requirements 2.The Data Restore Model 3.Semantic Linkup / Data Annotation 4.Data Retrieval and Reuse 5.System Architecture 6.Validation / Evaluation

Thank you Daniel Bahls, ZBW d.bahls@zbw.eu d.bahls@zbw.eu

So there are still gaps Examples: data set is titled EU Unemployment statistics 2012, EuroStat age class? seasonal adjustments? Executing the code does not produce the results wrong data? system environment? error? cf. Herndons replication of Rogoff/Reinhard research DOI does not specify file format Seite 32

Data and Code Ontology Seite 33 observationstring value spo data ref default value for_stata for_spss

Such relationship can be stated within the semantic model Proxy Relations Dataset for economic growth (GDP or the like) Dataset for Aluminium Price Index Describes the proxy relation: - details on correlation - best practices - frequency of use -... hasProxyRel

Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,

Similar presentations

Presentation on theme: "Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,

Similar presentations

Presentation on theme: "Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,"— Presentation transcript:

Similar presentations

About project

Feedback