Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chris Campbell – Senior Consultant ValidR: The Challenges of Validating R Chris Campbell, PhD # This code is a complete.

Similar presentations


Presentation on theme: "Chris Campbell – Senior Consultant ValidR: The Challenges of Validating R Chris Campbell, PhD # This code is a complete."— Presentation transcript:

1

2 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com ValidR: The Challenges of Validating R Chris Campbell, PhD # This code is a complete hack, # may or may not work, etc.

3 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Who are we? Statisticians – R, SAS Scientists – R, Julia Developers – Java, C# Quality Managers - GAMP

4 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com What do our customers want? User interfaces Training Consulting/Advice Code ValidR

5 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com What does validation mean? “Establishing documented evidence which provides a high degree of assurance that a specific process will consistently produce a product, meeting its predetermined specifications and quality attributes” U.S. Food and Drug Administration (2013). 21 CFR Part 11: Electronic Records, Electronic Signatures, http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/CFRSearch.cfm?CFRPart=11&showFR=1

6 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com WHY VALIDATE R?

7 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Source code source? # Dear maintainer: # Once you are done trying to 'optimize' this routine, # and have realized what a terrible mistake that was, # please increment the following counter as a warning # to the next guy: # total_hours_wasted_here = 42

8 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Not R! # Dear maintainer: # Once you are done trying to 'optimize' this routine, # and have realized what a terrible mistake that was, # please increment the following counter as a warning # to the next guy: # total_hours_wasted_here = 42

9 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Source code source? stop() # Hammertime!

10 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Not R! stop() # Hammertime!

11 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Source code source? ## WARNING: ## This code is a complete hack, may or may not work, etc. ## Use at your own risk. You have been warned.

12 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com package:codetools ## WARNING: ## This code is a complete hack, may or may not work, etc. ## Use at your own risk. You have been warned. R package: “codetools”, version 0.2-8. Author and Maintainer: Luke Tierney

13 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Why Validate R?

14 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Why Validate R? R Foundation response to CFR Part 11: R: Regulatory Compliance and Validation Issues A Guidance Document for the Use of R in Regulated Clinical Trial Environments 1 Document relates to Base R plus recommended packages only R CMD CHECK provides no guarantee that an R add-on package meets its specifications Authors do not have to write tests 1. http://www.r-project.org/doc/R-FDA.pdf

15 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com WHAT IS VALIDATION?

16 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com What is Validation? “Establishing documented evidence which provides a high degree of assurance that a specific process will consistently produce a product, meeting its predetermined specifications and quality attributes” U.S. Food and Drug Administration (2013). 21 CFR Part 11: Electronic Records, Electronic Signatures, http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/CFRSearch.cfm?CFRPart=11&showFR=1

17 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com The Lay Person’s Interpretation of CFR Part 11 Prior to developing software: Document what the software should do Document the acceptable tolerance Afterwards: Test (and document) that the software does what it should Re-test for consistency

18 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Example – Rock Climbing Rope Before the rope is deemed ‘safe’ (valid) for use the manufacturer will: Define the intended use Specify tolerance measures (e.g. max weight) Test the rope many times within and outside the tolerance Documents this in a safety report

19 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Example – An R Package Before the package is deemed ‘safe’ (valid) for use Mango: Define the intended use Specify tolerance measures (via unit tests) Test the package many times within and outside the tolerance Document this Validation Report IQ / OQ / PQ

20 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com A “High degree of assurance” Statement is not quantifiable There are an infinite number of possible tests Assurance can be built through: Experience Testing

21 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Testing code is important Expected successful use cases Expected failure use cases

22 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com What has changed? Expected new functionality? Regressions? Dependencies?

23 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com THE CHALLENGES OF VALIDATING R

24 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Typical Software Development Process Define Requirements Build Software Test Software against Requirements Create Validation Documentation

25 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Gather Requirements Packages don’t typically come with a list of requirements How do we determine the intended use? Package descriptions/help files Package vignettes Experience of use of packages

26 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Build Software Software is already built How do we gain an understanding of the package structure? The functionMap package

27 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Test Against Requirements Package authors don’t have to write tests How do we test the requirements? Writing specific unit tests for requirements Understand level of testing with testCoverage

28 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Create Validation Documentation Creating documentation for a large number of R packages Using Sweave/knitr would require validation of a tex installation How can we create the documentation without these tools?

29 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com THE functionMap PACKAGE

30 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com The R project is big... 5533 packages on CRAN 1748 packages on R-Forge 824 packages on Bioconductor

31 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com How do you find your way?

32 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com What does it do? See function names See relationship between functions

33 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Map a package > require(functionMap) > prsVT <- parseRfolder("visualTest/R") > nVT <- createNetwork(prsVT) > plotFunctionMap(nVT, + pdffile = "visualTest_map.pdf", + label.cex = 0.8)

34 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com

35 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com How does it work? Parse R code Determine functions that are called within each function Create a network object to show relationships Creates graphics mapping code

36 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com What’s next? Handle class Improve interactive graphic Make available on CRAN

37 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com THE visualTest PACKAGE

38 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com How to compare rendered outputs? File size? File identity (md5checksum)? Pixel values? Image summary?

39 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Windows versus Unix Identical script WindowsUnix

40 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Windows versus Unix Different render > file.info("windows/VR-616_plot-lm01.jpg")["size"] size windows/VR-616_plot-lm01.jpg 29381 > file.info("unix/VR-616_plot-lm01.jpg")["size"] size unix/VR-616_plot-lm01.jpg 25035

41 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Windows versus Unix Files not the same > md5sum("windows/VR-616_plot-lm01.jpg") windows/VR-616_plot-lm01.jpg "461268e4edb5f1872c913394511df7aa" > md5sum("unix/VR-616_plot-lm01.jpg") unix/VR-616_plot-lm01.jpg "ff974c8678648d7611f99efc035c9772"

42 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Windows versus Unix Pixels not the same > round(grWin[52:54, 52:54, 1], 3) [,1] [,2] [,3] [1,] 0.859 0.957 1.000 [2,] 0.957 1.000 1.000 [3,] 0.004 0.122 0.122 > round(grUnx[52:54, 52:54, 1], 3) [,1] [,2] [,3] [1,] 1.000 1 1.000 [2,] 0.945 1 0.992 [3,] 1.000 1 0.988

43 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com What does visualTest do? Describes a rendered file as a fingerprint Compares the fingerprints of two files

44 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com > file.info("jollyfisherman-spotL.png")["size"] size jollyfisherman-spotL.png 1261055 > file.info("jollyfisherman-spotR.png")["size"] size jollyfisherman-spotR.png 1254452

45 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com > require(visualTest) > isSimilar(file = "jollyfisherman-spotL.png", + fingerprint = "jollyfisherman-spotR.png") [1] FALSE

46 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com > isSimilar(file = "jollyfisherman-spotL.png", + fingerprint = "jollyfisherman-spotR.png", + threshold = 1) [1] TRUE

47 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com How does it work? RGB array to 2D matrix Fourier transform matrix Sum 2D matrix to 1D vector Compare vectors

48 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com visualTest Summarizes image as fingerprint Fuzzy comparisons of similar images

49 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com THE testCoverage PACKAGE

50 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com What does it do? Determine which code in a package is covered by unit tests

51 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com How does it work? Reads package code and adds trace points Runs package unit tests and marks points as hit Creates a full report of code hit

52 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com How does it work? y <- x + 10 `_1` <- {trace(); `_2` + 10} `_1` <- {trace(); `_2` + 10} y <- {trace(2); x + 10} y <- {trace(2); x + 10} `_1` <- `_2` + 10 1. Get parse table mapping symbols to unique ID 2. Replace symbols by UIDs 3. Insert trace calls 4. Instrument trace calls and replace UIDs by symbols

53 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com What’s next? Work on incorporating classes and methods Improve reporting Make available on CRAN

54 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Validating R Understand components – functionMap Test components – testthat, visualTest Check coverage – TestCoverage Manual review – careful work

55 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com WHAT’S NEXT?

56 Chris Campbell – Senior Consultant ccampbell@mango-solutions.com Key Dates September 2014 EARL Conference 15-17 Release testCoverage and functionMap on CRAN Easter 2015 Release of validated R 3.1.X

57

58

59


Download ppt "Chris Campbell – Senior Consultant ValidR: The Challenges of Validating R Chris Campbell, PhD # This code is a complete."

Similar presentations


Ads by Google