Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing and Creating applications built on R Richard Pugh, Andy Nicholls & Chris Campbell 23 rd October 2012.

Similar presentations


Presentation on theme: "Designing and Creating applications built on R Richard Pugh, Andy Nicholls & Chris Campbell 23 rd October 2012."— Presentation transcript:

1 Designing and Creating applications built on R Richard Pugh, Andy Nicholls & Chris Campbell 23 rd October 2012

2 Thank you for the invitation to speak tonight

3 Andy Nicholls Senior R Consultant Chris Campbell Senior R Consultant Richard Pugh Principal R Consultant & Co-Founder

4 Agenda Who are Mango Solutions? Why Build Analytic Applications on R? Formal R Application Development Case Studies The R Community Discussion

5 Who are Mango Solutions?

6 Overview of Mango Solutions Private Company formed in 2002 Global Team of ~70 Cross-Sector Software and Services ISO 9001 Accredited

7 Located here... Bath, UKLondon, UKShanghai, CNBasel, CH

8 Spend a lot of time here...

9 The Beginning: October 2002 Started by 2 ex-Insightful colleagues Sales Guy (BO, Cognos etc) Techy Guy (S+, SAS, R etc) Idea to deploy predictive analytics to business users

10 Why Mango? Early awful ideas DataStatz Stats Entertainment VizUStat Stats2U In the end, named after my colleagues cat

11

12 What we do? R Training Code Creation Validation Support Consultants

13 What we do? Consultants Developers Analytic Application Development

14 Mango Key Industries Mango work across sectors: Pharmaceuticals Mango Imaging Finance Energy Sensory

15 Why Build Analytic Applications on R?

16 Why Analytics? Analytics can help people answer all sorts of questions I believe there is no company in the world today who cannot benefit from analytics in some way More and more people are realising it

17 Who is a good driver?How do we win more games?What bonus should I pay? Will someone like this?What are they likely to want?When might this break?

18 Why build Analytic Applications? 3 key reasons we see: To deploy analytical tools to decision makers To make an analysts life more efficient To add rigour to an analysts workflow

19 Deploying Analytics Adding analytics into a business process can mean more informed decisions can be made Complex analytics shouldn’t be attempted by non-analysts Means there is a communication between the decision maker and the analyst

20 Deploying Analytics If we build an application which … is easy for the decision maker to use contains the correct analysis to apply communicates analytical results in suitable manner … this leads to some major benefits!

21 Benefits for the Analyst Benefits for the Decision Maker No need to wait for information Can perform “what if” analysis Decision not dependent on analyst availability Less need to perform often- repetitive tasks Comfortable that the “right” analysis is being run Can get on with more strategic things?

22 Analytic Engine User Interface Data Analytic Outputs Data Storage Analytic Code Mgment Analytic App Structure

23 Why build Analytic Applications on R? Building applications requires installing analytic engine on desktops, servers, clusters, clouds R is license free Building analytic applications involves integrating an analytic engine with other technologies (data sources, UI etc) R’s open nature means it can be readily integrated

24 Why build Analytic Applications on R? We want a programmable engine so that it can be readily extended (i.e. no black boxes please) R can be extended by the developer as needed We often want to be able to deploy new algorithms and techniques as they become available R is rapidly developed

25 Formal R Application Development

26 Formal R Development Creating sophisticated analytic applications requires a formal development approach This mostly means taking standard development practices and applying it to analytics Mango’s formal R development procedures and structure has been evolving since its inception ~2004

27 Project Mgment Requirements Behaviour Driven Code Review Review board StatET runit roxygen2 Continuous Integration Issue Tracking Quality Manual Dev Procedures R Coding Standards mangoUtils Knowledge Mgment

28 Project Mgment Requirements Behaviour Driven Code Review Review board StatET runit roxygen2 Continuous Integration Issue Tracking Quality Manual Dev Procedures Coding Standards mangoUtils Knowledge Mgment

29 Project Mgment Requirements Behaviour Driven Code Review Review board StatET testthat roxygen2 Continuous Integration Issue Tracking Quality Manual Dev Procedures R Coding Standards mangoUtils Knowledge Mgment

30 Case Studies

31 These are examples of applications we’ve built that use R in some way We’re presented a range of information about each including: Business Reason for the application Technical Approach Some Technical Detail where applicable Things that worked well / things that didn’t

32 Case Studies Ranges from information we can fully disclose to only being able to say vague things about the customer Only so much info we can give today – please see us after or contact us and we can step through things in more detail Richard Pugh = rpugh@mango-solutions.comrpugh@mango-solutions.com Andy Nicholls = anicholls@mango-solutions.comanicholls@mango-solutions.com Chris Campbell = ccampbell@mango-solutions.comccampbell@mango-solutions.com

33 Case Studies PKPD Web Modelling Platform M&S Workflow Platform Non-Compartmental Analysis Application Coffee Blend Optimisation Tool Pipeline Corrosion Forecasting Application Backtesting Application

34 CASE STUDY PKPD WEB PLATFORM

35 Case Study: PKPD Modelling Overview Pharmacokinetics- pharmacodynamics (PKPD) is the study of the manner in which a drug transitions through the body and its impact on a target disease PK is highly complex, involving sophisticated non-linear mixed effects modelling approaches

36 Case Study: PKPD Modelling Overview Modellers use “NONMEM” software in order to fit these models Inputs and outputs to NONMEM are a mixture of structured and unstructured textual files R often used to analyse the outputs in order to assess model fit (see “xpose4” library)

37 Case Study: PKPD Modelling Overview PKPD is an evolving and exciting area, with modellers needing flexibility and a variety of tools However, being within life sciences, rigour around workflows is key in order to satisfy regulatory requirements

38 Case Study: PKPD Modelling The Challenge Build a modern modelling platform that provides rigour whilst allowing the modellers the flexibility they need Range of technical users from “everything is a shell script” to “which button do I click” Execution of third party tools (NONMEM, R, SAS, PsN, …) in a controlled manner Interface to generate reproducible graphics, tables and reports

39 Case Study: PKPD Modelling The “R” bit Where does R fit in? Many users use R and want to be able to develop scripts and execute them on an internal grid R used as the graphics engine to support the model evaluation and reporting processes Users want to be able to execute R interactively with objects in their project

40 The App App Server Execution Server(s) MIF MIF Queue Cloud Grid + Others RPool Mgr

41 Case Study: PKPD Modelling What is a “Report Item Definition” Definition of a graph or table that can be executed from Navigator Consists of snippet of R code, options that may be presented to the user, required columns, and a few other bits Can be used in a number of situations in the application Originally XML then stored in Db (XML shown to give a feel for structure on next slide)

42 Report Options Source Data Command Definition

43 The App / RPool Manager DataGraphTableText Data Item Graph Item Table Item Text Item Data Frame Graphics Table Object Character xml Method xml Method xml Method xml Method

44 Execution Engine (Java) Command Definitions Command Results Version Control

45 Case Study: PKPD Modelling How are “RIDs” used? Created, managed by Super Users (under version control) Called in a few places in the application: Directly (create this graph with this data) In “Run Views” (reports) In “Comparison Views” (reports that compare models) In “Template Reports” (tagged docx files)

46 Case Study: PKPD Modelling Outcome The app in general was a big success The “R” part was created as a separate service that we have since reused in a number of other applications (e.g. Lloyds Risk Platform!) Shame that regulatory rules forced some design which we’re now building alternatives too Next: interactive graphical presentation

47 CASE STUDY M&S WORKFLOW PLATFORM

48 Case Study: M&S Workflow Platform Overview Exciting project for major pharmaceutical company Possibly the closest we’ve come to deploying an analysts workflow in a scalable platform Hundreds of pre-clinical (animal) studies are run by a team of ~400 scientists Analysis performed by roughly 15 advanced modellers Outcome: most studies not analysed!

49 Case Study: M&S Workflow Platform The Challenge Idea to create a truly scalable platform to allow bench scientists to run their own analysis Modeller publishes an analysis “protocol” containing analysis paths, code, and support documentation Desktop application pulls from central set of protocols and “derives” the interface which is presented to the user Modelling can put in checks to ensure things look right (e.g. data is of right format, model fit is particularly poor but user seems keep to create predictions from it)

50 Case Study: M&S Workflow Platform The Solution Eclipse RCP application executing R and NONMEM scripts on an internal LSF grid, with protocols and code held in SVN Generated workflow “protocol” definition (XML) detailing possible paths in a step, linked to R scripts and NONMEM model code with corresponding dialog Built “Protocol Developer” Eclipse interface onto repository RCP application derives analysis paths, UI, options and commentary to guide the end user

51 ProtocolMetadata Workflow Analysis StepData Check StepCommentary R Script R Script NM Model File Options

52 ModellerScientist NONMEM LSF GridProtocol Server File System

53 Possible Models Derived Options Commentary

54

55 Case Study: M&S Workflow Platform How did it go Technical solution was very strong and applicable to other areas RCP good technology, but steep learning curve Testing was complex Agile project – pros and cons Ultimately, not deployed (site closure)

56 CASE STUDY NON-COMPARTMENTAL ANALYSIS

57 RapidNCA, the non-compartmental analysis workflow tool Need for RapidNCA Using.NET RapidNCA Structure Code Quality Connections with R.NET Complete & Deploy RapidNCA

58 Need for RapidNCA Customer needed to send monthly reports to dozens of trial centres Small team, so time limited Predefined non-compartmental analysis Standardized report

59 Using.NET What is.NET? Object-oriented environment to develop applications Safe execution environment Choice of programming languages Framework consisting of: runtime class library Developed with Visual Studio

60 Using.NET Visual Studio A graphical programming tool (IDE) Visual Studio Express - free version

61 Using.NET Choice of languages C# is the main one F# is a functional language (similar concepts to OCaml) XAML (a Microsoft declarative XML language) for interactive graphics C++/CLI useful for legacy and bespoke parallel processing (including GPGPU) Other possibilities... Vb.Net is very like C# (no advantage over it) Third parties have added languages to the CLI platform

62 Using.NET “Ajar Source” Platform Not exactly open source, but… Most CLI third party languages are open C# and VB.Net are not, but many open source projects based on them Microsoft have made F# open source Compiler is free Other editors / IDEs are available

63 Using.NET Performance Performance is very good On graphics (millions of data points will plot with ease and zoom smoothly) Computation is fast enough in C#, calling R adds little overhead Standard Maths library is limited; third parties and MS maths for “drawing” are better Data parallel computation is possible on the desktop (GPGPU) F# provides further “big data” capabilities

64 User Interface Data Analytic Outputs Data Storage Analytic Code Mgment Analytic Engine Data Service RapidNCA Structure

65 RapidNCA Structure MangoNca Analytic Code Analyse Element Do Analysis Get Analysis Unit Tests Data Checks

66 RapidNCA Structure MangoNca Analytic Code

67 Code Quality Unit Tests Ensure product works! User/Customer/Payer trust Ease of maintenance/extension

68 Code Quality Run Code, Check Output Working Cases > test1 <- ncaAnalysis(Conc = c(4, 9, 8, 6, 4:1, 1), + Time = 0:8, Dose = 100, Dof = 2) > checkEquals(test1[1, "ROutput_adjr2"], 0.9714937901, + tol = 1e-8) [1] TRUE > require(RUnit) > # there are other automated test packages!

69 Code Quality Error Case Unit Tests Use try Handled Error Cases > test7 <- try(AUCLast(Conc = 1:10, Time = 9:0), + silent = TRUE) > checkEquals(test7, + "Error in checkOrderedVector(Time,... ") [1] TRUE > test26 <- ncaAnalysis(Conc = c(4, 9, 8, 6, 4:1, 1), + Time = 0:8, Dof = 1) > checkEquals(test26[, "ROutput_Error"], + "Error in checkSingleNumeric(Dose,... ") [1] TRUE

70 Connections with R.NET What will be provided to R? What will be returned from R? What happens if something goes wrong?

71 Connections with R.NET Using the R Service R.NET allows R calls to be submitted to an R service R.NET connects to R down to Expression level Data from return objects passed back into.NET

72 Connections with R.NET Data Checks Function may be passed data outside its anticipated structure > checkOrderedVector(c(0, 1, 3, 2, 4), + description = "Time") Error in checkOrderedVector(c(0, 1, 3, 2, 4), description = "Time") : Error: Time is not ordered. Actual value is 0 1 3 2 4 >

73 Connections with R.NET Data Checks The tool expects a certain return object An error in an R call should be trapped by the communicating function Return object passed as normal An error checking element of the return object can report information about the error > check01 <- try(checkOrderedVector(Time, + description = "Time"), silent = TRUE) > if (is(check01, "try-error")) { return(object) }

74 Connections with R.NET _pluginsManager = new RPluginManager(PluginLocation, RLocation); _pluginsManager.SetActivePlugin(); _session = _pluginsManager.GetSession(); bool sessionOk = _pluginsManager.TryMakeSession(out _session); R is efficiently accessed, via R.Net (as pictured in Visual Studio) via a Plugin (as above)

75 Connections with R.NET User Interface Data Analytic Outputs Data Storage Analytic Code Mgment Analytic Engine Data Service R.NET

76 Analysis Display Get PK Params Data Service Dialog Service App Logger Status Bar Service App Config Mgment Data Importers Project Wizard Validators Receive R Output Create R Expressns Connections with R.NET.NET Data Service R.NET

77 Connections with R.NET Using the framework _pluginsManager = new RPluginManager(PluginLocation, RLocation); _pluginsManager.SetActivePlugin(); _session = _pluginsManager.GetSession(); bool sessionOk = _pluginsManager.TryMakeSession(out _session); _session.SetNumericSymbol("TimePtVector", CheckTimePointData(toAnalyse)); _session.SetNumericSymbol("ConcVector", CheckConcentrationPointData(toAnalyse)); var evalString = string.Format("ncaAnalysis(TimePtVector, ConcVector, … MathEngineDataRowDto ncaGetBack = _session.PerformNumericEvaluation(evalString, "ROutput_Error"); _lastErrors = ncaGetBack.ErrorStrings; _session.FlushConsole(); _pluginsManager.RelinquishSession();

78 Complete & Deploy RapidNCA Can users understand how to use tool? How confident are we in tool output? On-going code review Independent test team Installation Qualification Operational Qualification Performance Qualification

79 Deploy Tool

80 Data Import

81 Map Variables

82 Review Analysis

83 Review Grouping

84 Generate Report

85 Select Report Type

86 Add Group Comments

87 View Report

88 Conclusions Great graphical interfaces can be built using.NET Intuitive interactive features are available R.NET allows R analysis to be accessed as a service Good coding practice will ensure application is robust Work on a well engineered framework will be rewarded with desktop solutions created at high speed

89 CASE STUDY COFFEE BLEND OPTIMIZATION

90 Company Background A global chocolatier, biscuit baker, candy maker and maker of gum.

91 Business /Technical Situation The client was using a desktop SPLUS application to simulate and optimise coffee blends for their manufacturing teams Hugely successful application saving the company $millions They wanted to make improvements and expand the usage beyond Global Statistics and beyond coffee Also keen to remove the license fee

92 Application Workflow Import Data from Excel Graphical Visualisations Export Data Run Blend Optimiser Simulate Blends Audit Log

93 System Architecture Functions for GUI Functions for Analysis R Package Optimizer Data Import Data Export

94 Approach Development phase split into three separate pieces: Code conversion GUI creation Development and integration of a new optimiser Each required the generation of unit and system tests and appropriate documentation, including help files Design specifications captured prior to development Project estimated at c90 man days over 3 months

95 Creation of new GUI

96 GUI Choices Some R/R-based technologies we could have used... tcltk is R’s ‘recommended’ menu builder Glade, RGtk2 gWidgets rpanel Deducer manipulate (Rstudio)...

97 GUI Choices Other options: Choice is almost limitless Often they require a knowledge of other languages such as Java or C Possibly warrants a standalone talk...

98 Creation of a New GUI using RGtk2 RGtk2 adapter for R of the GTK+ engine Gimp Toolkit Glade can be used to trial new features GTK allows for automated testing of the GUI Huge time saving

99

100

101 Code Conversion Mango took a test-based approach for the code conversion (RUnit) Allows for automated testing in future revisions Simple PASS/FAIL reporting SPLUS knowledge not required for R code development

102

103 Optimization The original SPLUS application used the SPLUS NuOpt optimizer R NuOpt exists but only on license Mango used an open source optimiser that we integrated into the R GUI Mango implemented a ‘quick run’ option to allow quick comparisons with the simulation piece

104

105 Primary Benefits New departments are now benefitting from the application The application is now in the hands of the manufacturing teams, reducing the burden on Global Statistics Test-based approach facilitates future development of the application

106 CASE STUDY PIPELINE CORROSION APP

107 Background One of the biggest companies in the world with thousands of staff Oilfield Exploration Team based in the UK but with responsibility for complex exploration areas Alaska, shale fields etc

108 Business Situation Thousands of miles of pipeline corroding in freezing, isolated areas How do you choose how often to inspect them? The cost of a leak can run into many billions of £s

109 Technical Situation Customer Team were analysing data using S-PLUS Insightful Miner with many non-analytical workarounds Process was messy and took a long time to run

110 System Architecture This piece is one of several in a continuous workflow All information is fed back into the database Functions for GUI Functions for Analysis R Package Access Database General Workflow Read Write

111 Approach Consulting engagement to improve programming techniques and statistical methodology Create an R package for the code Construct a GUI in order to deploy to non-technical users on the frontline

112

113

114 An Interesting Challenge: Converting S-Plus Code to R

115 This is Easy, Right? Some (true?) statements: R can be considered as a different implementation of S There are some important differences, but much code written for S runs unaltered under R Discuss... Source: www.r-project.org

116 Considerations S+ applications can generally be split into two pieces: An underlying library of code A set of functions defining the menu system and help pages

117 Approach There are essentially two approaches to code conversion: Direct Conversion Test-based Conversion

118 Direct Conversion Requires knowledge of both languages (stdev vs sd) Relatively quick to achieve Difficult to prove the new code does what the old code did

119 Test-based Conversion Generating unit tests in S+ requires some S+ knowledge Takes some time to generate and document tests but better in the long-run Unit tests give a definitive PASS/FAIL result Can often be automated

120 Code Conversion Challenges The application upgrade usually coincides with an operating system upgrade Windows (or other) version and R version need to be determined in advance It is almost guaranteed that the new system will produce different results for the same test data!

121 What is “different”? Often this is simply rounding Still require agreement on precision: 0.049782 vs 0.050436 If simulation is involved this can be VERY difficult to define!!! Appearance of graphics may also differ

122 Other Challenges As the business owner I want to use the opportunity to improve the application: New menu items New functionality Modifications to existing functionality All of these require careful planning

123 Primary Benefits for Customer Rationalised code base means the analysis is quicker and extensible by end-users Construction of a front-end has enable rollout to users on the font-line in Alaska Conversion to R has removed license cost

124 CASE STUDY BACKTESTING APP FOR HEDGE FUND

125 Case Study: Backtesting App Overview Backtesting has a key role to play in the testing of automated trading strategies Asked by a Hedge Fund Manager to build for his team of users (who love Excel) Mango were asked to build a backtesting platform that was more sophisticated that what was on offer from other vendors Sorry that the details may be occasionally sketchy in this section 

126 Case Study: Backtesting App The Challenge Key parts of the challenge included: Integration with standard finance data streams Advanced portfolio optimisation Flexibility to define automated strategy Transaction-cost based benefit analysis Leverage of financial hurdle ARCH-style error incorporation Advanced reporting

127 Alpha Storage Data Storage Data Flow.NET Interface RdotNet C Interface!.Rda Files

128 How I learnt apply functions!! Some hacky code here …

129 Case Study: Backtesting App The Outcome Very successful hedge fund Convinced the users to use R – UI dropped!

130 The R Community

131 IP Considerations IP based on R includes: New libraries & code New scripts Mango attempt to open source (with client permission) any “R-side” generic functionality Also feedback and assist library authors User Interface Analytic Code New R Libraries

132 Great Example MSToolkit library built for Pfizer Funded by Pfizer, built by Mango Released as open source library Since extended by other companies

133 R Community Contribute code where allowed/useful Sponsor R conferences and events Provide free training courses / webinars Organise and fund many R user groups (LondonR, BaselR, ZurichR, ShanghaiR, NewJerseyR, …)

134 The End!

135 Summary Thank you for the invitation Hope the discussion was useful We could only cover certain amount of detail in time, so ask us for more if interested!

136 Andy Nicholls anicholls@mango-solutions.com Chris Campbell ccampbell@mango-solutions.com Richard Pugh rpugh@mango-solutions.com


Download ppt "Designing and Creating applications built on R Richard Pugh, Andy Nicholls & Chris Campbell 23 rd October 2012."

Similar presentations


Ads by Google