Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal Samsi, March 7, 2006 Duke University.

Similar presentations

Presentation on theme: "The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal Samsi, March 7, 2006 Duke University."— Presentation transcript:

1 The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal Samsi, March 7, 2006 Duke University

2 The Repository A broadly accessible collection of –Tools and utilities –Modules and Libraries –Code fragments and technical documentation Pertaining to statistics used in physics Idea emerged as an adjunct of the PHYSTAT Conferences on Statistical problems in Particle Physics, Astrophysics, and Cosmology –Small workshop held in August at FNAL

3 Observations at PHYSTAT and at the Workshop: Many of the papers presented at PHYSTAT05 (Oxford) and 03 (SLAC) would benefit from a common place to cite code and technical expositions concerning statistics techniques –Citing a package for more detail about what was done in a physics publication is a primary motivator for the Phystat repository Many of the participants have code modules and tools which they would like to make more readily available to the physics community

4 The Useful Statistics Repository Would Contain Tools and utilities –Useful stand-alone packages Modules and Libraries –Working code intended as building blocks for others programs Major Integrated Toolsets Code fragments –Illustrating the precise statistical algorithms applied to major experiments analyses –Not necessarily intended to run intact outside their original environment Technical documentation of statistical algorithms –Perhaps more detailed than would be appropriate for archival journal papers

5 Does Such a Repository Have To Be Created? Existing arXiv-style repositories –Are not a place for code and libraries Existing code repositories (e.g., SourceForge, R Project) –Would not be appropriate for code fragments or expositions documenting experiments algorithms –Physics Statistics code would get lost in the mass of packages Code collections by individual physicists –Continuity issues: Will it be there in 10 yrs?

6 The Phystat Repository Strategy Institutional responsibility is key –To ensure that archived material will remain available over time –Assigned package numbers (e.g., PHYSTAT/ /v2) will be suitable for use as citations, without concern that they will become invalid We should be as inclusive as possible –No restrictions based on which platforms or languages a package works with –No acceptance/refereeing wrestling –The broadest possible acceptance of licensing approaches Dont be too ambitious –The repository content will come from the community, not from the repository maintainers

7 Universal download access –Sophisticated search and browsing aids –Multi-view classification of contents Mildly moderated content submission –As unrestrictive as possible Support for value added –User comments –Validation and endorsement FNAL Computing Division commitment –Support for site mechanism, archival storage, and content moderation

8 Intended Scope of the Repository Hypothesis testing –Model comparison –Classical and Bayesian tests Fitting/parameter estimation Limit setting Categorization –Decision tree, Neural Net, … Random Distribution Generation {Your suggestions here} –E.g., if people feel Phystat is a good place to share tracking algorithms, it can be flexible

9 Using the Repository – organized using Plone Main page has: –How-to instructions (and links) for Finding packages Submitting/modifying a package Commenting, validating, and so forth Links to all the PHYSTAT conferences Links to related web resources –Navigation to each type of package –Search tools



12 Using the Repository Navigation leads to several types of page: –Package lists Created dynamically as result of searches or selection of categories of packages Contain names, one-line descriptions –Package pages Full description of one package Download button –Submit-a-package form Fields for descriptions, uploads

13 Using the Repository Searches by –Category Executable utility, Library, Code Fragment, Root macro… –Language C++, R, Python, Fortran, … –Purpose Fitting, categorization, hypothesis testing –Keywords Package pages –Description –Download Multiple versions allowed –User discussion –Validation links

14 Submitting Content The author should prepare: –A package name –A one-line description (suitable for reading in lists of packages) –A full description (a paragraph suitable to let users decide whether to download) –Tarball containing Code(if applicable) Build tools(if applicable) Documentation(if available) Test/sample data(if available) Scripts that would reproduce figures from a paper (if applicable) –Answers to: type, purpose, language, platforms Pulldowns make entering these easy –(Optional) keywords

15 Submitting Content Come as you are philosophy –Dont want to discourage busy physicists from submitting citable work because documentation is in poor shape Goal is that submitting a prepared package will take five minutes or less –Check boxes for type, purpose, language –Pulldown list for keywords Package will become publicly visible after moderator verifies it is suitable

16 Policies This is a code (and papers) repository –Packages contain source code and/or technical or theoretical documentation –Build instructions and files should be included where relevant – does not distribute executables (Loose) Content Control –Must be relevant to some area of physics –Must be related to statistics, probability, fitting, categorization, or similar area –The moderator(s) are not trying to be judges of quality

17 Policies License Issues –Submitters must agree to let our site freely distribute the package (of course) –Submissions are allowed to attach whatever license agreements they wish As long as we can distribute the package –The author – not the repository – is responsible for any enforcement of copyright and license issues. Repository held harmless against improper use by downloaders

18 Policies Steering Committee –5-10 people active in statistics in physics –Probable initial configuration includes: Jim Linnemann (initial chair) (Atlas, D0) Louis Lyons (CDF) Harrison Prosper (D0, CMS, Cosmology) Glen Cowan (PDG statistics editor) Kyle Cranmer (Atlas) Roger Barlow (Babar) –Meet primarily by –Set policies, directions of value-added work, and so forth

19 Repository Support Activities (Phase I of Phystat) Establishment of web site –With mechanisms for browsing, submission/updating, and discussion –With assignment of submission numbers suitable for use a citations in papers Licensing and filtering policies –Must satisfy FNAL/DOE criteria Community consensus on content policies –And formation of steering committee Dissemination of info about Phystat

20 Value-Added Activities (Phase II) These are all potential –Depending on community desires and time available –Some done by supporters/moderators –Others depend on participation by outside physicists Classification/validation related: –Distinguish actively maintained usable packages from archival entries –Organizing user feedback synopsis –Lists of known working platforms for pkgs –Basic functional validation/certification –Organization of community comparisons among packages

21 Possible Value-Added Activities (Phase II) Extending Scope –Keep a code wanted list People express needs for specific capabilities –Looking for and interfacing to relevant software produced by stats community –Blobel: how about mathematical methods? Improving Capabilities –Integrating related packages –Soliciting/supporting/adding extensions to submitted code –Portability enhancements

22 You can make Valuable Add to the Contents of –Submit packages to be disseminated –Submit code fragments defining how your analysis did statistics You can reliably cite your submitted code by its phystat number, much like a paper in arXiv. Prosper et. al, or Prosper et. al, v2 –Submit documents explaining choices of statistical approaches is pretty empty today –But there is a large backlog of code and tools potentially valuable to the HEP community!

23 What Next Make use of! –Browse for packages you may be able to use –Browse to see how various experiments tackled your statistics issues –Use repository to download versions of major packages Add value to packages –Validation and endorsement comments –Report problems and make suggestions Comment about repository mechanics

Download ppt "The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal Samsi, March 7, 2006 Duke University."

Similar presentations

Ads by Google