RStat: Differentiators & Benefits Based on R-Project Open Source Maintained by world wide consortium of universities, scientists, government funded research organizations, statisticians. Over 2000 packages RStat is a GUI to R Intuitive guided approach to modeling Simple model evaluation Intended both for business analysts and advanced modelers Single BI and Predictive Modeling Environment Re-use metadata and queries Perform data manipulation and sampling Build scoring applications Unique Deployment Method for Scoring Solutions Scoring models are built directly into WF metadata Deployment on any platform and operating system - Windows, Unix, Linux, Z/OS, and i Series.
RStat 1.2 Enhancements: New Modeling Technique: Survival Analysis: Two Techniques – Cox Regression and Parametric Time Regression Cox Regression – risk scoring routine Parametric regression – time scoring routine What Survival Does and when to use Survival analysis encompasses a wide variety of methods for analyzing the timing of events with censored data (Censoring: Nearly every sample contains some cases that do not experience an event) How to study the causes of Births and Deaths Marriages and Divorces Arrests and Convictions Job Changes and Promotions Bankruptcies and Mergers Wars and Revolutions Residence Changes Consumer Purchases Adoption of Innovations Hospitalizations.
RStat 1.2 Enhancements – cont’d New Scoring Routines: Neural Network model with comprehensive output – Enables users to compile NNET models into WebFOCUS functions for creation of applications. Transformation capabilities for scoring routines – Allows for data manipulation within the RStat tool. Some methods are: Imputation, Scaling, and Remapping Enhanced statistical output: Indicators to Regression models ANOVA table to show significance – Enables users to determine the variables that are significant to the model. Performance and Usability optimization Auto sampling for faster visualization of large data sets in the KMeans model – Enables more optimized and efficient resource usage to display Cluster model statistics and data plots.
Performance and Usability optimization Model optimization – Allows only the variables used to create the model to be included in the exported C file. [In RStat 1.1 all variables selected by the user were included in the model] Enhanced Log functionality – Allows users to create R-scripts for use with other applications, such as a Dialogue Manager application. Process Cancellation capability – Allows users to cancel a long running process from within RStat. Special characters functionality – Enables efficient handling of data with special characters. Timestamp within the RConsole and Log Textview – Enables users to view and match the log with any errors received, thereby allowing for easier troubleshooting. RStat 1.2 Enhancements – cont’d
Copyright 2007, Information Builders. Slide 7
Demo: Child Welfare Use Case To identify the children who will stay in Child Welfare programs, and at what age will the children leave the programs – a time to event analysis
Foster Care Analytical Framework: Background and Optimization Goals Half a million children in foster care Managed by county departments and the private agencies who train families It is a team effort to find a child a permanent home Severe consequence of bad foster care: Youth who leave the system are more likely to be homeless, incarcerated, unemployed, and unskilled. Foster Care Analytical Framework: Goals & Benefits : Provide better understanding of the factors that contribute to better foster care to all parties involved in the process Provide standardized analytic and reporting system Match children with better foster parents Optimize child foster care duration
Survival Analysis – Child Welfare
Survival Analysis – Child Welfare (cont’d)
Copyright 2007, Information Builders. Slide 22
Thank you! "..if you are serious about statistics as a career, you need to become familiar with R because it is the most powerful and flexible language available, and may become the lingua franca of statistical programming in the near future.“ Source: "Statistics in a Nutshell" by Sarah Boslaugh published by O'Reilly