Presentation is loading. Please wait.

Presentation is loading. Please wait.

Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,

Similar presentations


Presentation on theme: "Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,"— Presentation transcript:

1 Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz, Austria

2 Outline 1.Motivation – Repeatability in Empirical Research 2.Our Approach– The Data Restore Model 3.Outlook– Status of this Work / Next Steps Seite 2

3 Repeatability in Science Fundamental criterion – to verify is the job of the community Experiments must lead to the same findings different researchers under certain constant parameters Further Robustness (w.r.t measuring errors, etc.) Repeatability vs. Reproducibility vs. Verifiability Seite 3

4 Repeatability in Economics and the infamous case of Rogoff and Reinhard Seite 4

5 Improving Review Processes Seite 5 - Justin Wolfers, Betsey Stevenson, economists at University of Michigan....so we need access to the data If we try it all on our own and cannot reproduce the results, what does it mean?

6 McCullough – Experiences & Recommendations Seite 6

7 McCullough – Requirements & Experiences Seite 7

8 McCullough – Requirements & Experiences Seite 8

9 Sweave – Literate Programming for Statistics Seite 9

10 Sweave – Literate Programming for Statistics Seite 10

11 Data Publishing in Economics / Social Sciences Different disciplines have different challenges Characteristics of empirical research: sensitive / protected data distributed external data sources Seite 11 Data Sharing submit data bundles to 3 rd -party repositories?

12 ? Data Management The Black Box Approach data review curation legal situation re-usetransparency repeatability Seite 12 a data set copy (some resource bundle)

13 Statistical Data on the Semantic Web Seite 13

14 Outline 1.Motivation – Repeatability in Empirical Research 2.Our Approach– The Data Restore Model 3.Outlook– Status of this Work / Next Steps Seite 14

15 Data Restore Model Seite 15 Spreadsheet obsdata set

16 Data Restore Model Seite 16 Spreadsheet obsdata set

17 DataSet type UserDataSet Data Items type Data Items from own survey includesData external dataset buildScript No gaps Trust Incentive 17

18 Seite 18 Source: EuroStat Dataset: Household XZ Version: 0.2 Published: Jan 2009 [read more]

19 Integration with Research Environments Seite 19

20 Seite 20

21 Review and Re-use Seite 21 Client Source Code Repository Archive DArchive C Archive B Archive A DOI Code and Data Templates Authenticate & Request Data

22 Data Infrastructure Concept One source per data set transparency, curation by highest expertise Data protection make data publishing possible for all scenarios Data and code integration one-click-solution – no manual efforts for replication attempts Precise Citation traceable data provenance Seite 22

23 Incentives for the Research Community Transparency increases trust: no gaps – trust – incentive Easy re-use: the research models applied live longer More impact: more citation Seite 23

24 Incentives for the Research Community Material for tutorials: Students learn computational research in practice Research is more efficient: Easier to understand and pick up the research of others Secured Knowledge: Replication attempts in different research environments and context discussion, inspiration, innovation Non-Findings may get more recognition Seite 24

25 Outline 1.Motivation – Repeatability in Empirical Research 2.Our Approach– The Data Restore Model 3.Outlook– Status of this Work / Next Steps Seite 25

26 What we are currently working on Seite 26 The Rogoff and Reinhard / Herndon case apply Data Restore Model add semantic data documentation (partly available as RDF already) model by Data and Code ontology

27 Data and Code Ontology Seite 27 Data and Code System Environment Resources HW SW Replication Attempts Experiment Setup Maven Make Build Virtualisation Emulation Linked Science Social Media Data References Semantic Coding?

28 What we are currently working on Seite 28 The Koenker Zeileis case Model relations between Data and Code instances protected public use file figures data set transformation by code The Koenker Zeileis case

29 Data Access and Retrieval

30 Next Steps Seite 30 1.Challenge, Goals, Requirements 2.The Data Restore Model 3.Semantic Linkup / Data Annotation 4.Data Retrieval and Reuse 5.System Architecture 6.Validation / Evaluation

31 Thank you Daniel Bahls, ZBW

32 So there are still gaps Examples: data set is titled EU Unemployment statistics 2012, EuroStat age class? seasonal adjustments? Executing the code does not produce the results wrong data? system environment? error? cf. Herndons replication of Rogoff/Reinhard research DOI does not specify file format Seite 32

33 Data and Code Ontology Seite 33 observationstring value spo data ref default value for_stata for_spss

34 Such relationship can be stated within the semantic model Proxy Relations Dataset for economic growth (GDP or the like) Dataset for Aluminium Price Index Describes the proxy relation: - details on correlation - best practices - frequency of use -... hasProxyRel


Download ppt "Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,"

Similar presentations


Ads by Google