Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using R as enterprise-wide data analysis platform Zivan Karaman.

Similar presentations


Presentation on theme: "Using R as enterprise-wide data analysis platform Zivan Karaman."— Presentation transcript:

1 Using R as enterprise-wide data analysis platform Zivan Karaman

2 2 Limagrain FIELD SEEDS VEGETABLE SEEDS AND GARDEN PRODUCTS CEREAL INGREDIENTS AND BAKERY PRODUCTS Our profession: improvement and valorization of plants Our mission: innovate in order to create varieties that meet the expectations of farmers, market gardeners, industrialists and consumers

3 3 Limagrain research 73 research centers Europe : 41 centers Americas : 22 centers Asia Pacific : 10 centers Annual budget: € 102 million 12% of professional sales 1 200 researchers

4 4 Context Plant breeding aims at creating new varieties – stable forms with desirable agronomic properties - from the existing genetic diversity. It is a long and resource-consuming activity. Many field trials and laboratory experiments are needed to evaluate the tested plant material Huge amounts of data must be analysed by the users who are not specialists in statistics & computing … and it must be done quickly!

5 5 Needs Data to be analysed must be retrieved from the operational databases and quickly processed Most end users are geographically dispersed with no local support for data analysis Some types of analysis require long and complex computations  client/server architecture with computations being done on the server side (minimise WAN traffic) & Web interface to routine analyses but … Some users need (much) more flexibility … and we all want to use the same tool

6 6 Users End users –occasional & routine analyses –ease of use/GUI (Web interface) Power users –regular & more flexible, interactive analyses –ease of use/GUI (desktop application) Developers –develop tools for the users –software engineering tools (IDE, source code mgt.) Expert users (statisticians) –develop & test new statistical methodology –require flexible programming language

7 7 Requirements Rich function set for statistical data analysis and flexible graphics Possibility to extend the built-in functions Database connectivity and access to file system Integration with other software Handling large problems (upsizing) Capacity to build user-friendly interfaces (GUI) Capacity to be used over the Web (server) Standard software development tools Ease of deployment

8 8 Rich function set & extendibility R programming environment is an invitation to explore the data and create own functions – the only limit being user’s imagination R provides rich set of functions for statistical data analysis and extremely flexible graphics capabilities  limited built-in support for interactive graphics (linked views) - is Rggobi the way to go?  Graphlets ® - useful S-PLUS ® feature that we miss

9 9 Database connectivity & file system Database access –RODBC provides a wide range of possibilities, including access to Excel files  can’t handle multiple result set queries (list of data frames), which would be helpful File system access –excellent set of functions for accessing local files system and even the files over the internet –can handle zip files, but …  full support for zip-file management (create, list contents, add/remove files, etc.) would be nice

10 10 Integration with other software R provides excellent built-in support for integrating existing Fortran or C code Communication protocols exist for directly integrating R with Java and other software, both as client and server On, any COM compliant software can be used to drive R (GUI front-end, for example) Finally, through the rich set of functions for accessing operating system files and possibility to invoke system shell, any program that can read and write text files in the batch mode can be easily interfaced with R

11 11 Upsizing Microsoft Windows is our common platform Some problems require more than 4 Gb of memory that standard Windows can manage We hope to be able to handle them on 64-bits Linux R code can be painlessly moved from 32-bits Windows to 64-bits Linux (can it?), providing a straightforward way for upsizing Long-running simulations – several R packages provide support for parallel computing

12 12 User-friendly interfaces Several GUI toolkits are available as add-on packages Providing a standard set of tools for building user interfaces as a part of the core distribution would be very helpful Common data analysis functions could be implemented through this standard GUI toolkit (like in GenStat ® or S-PLUS ® ) Another way is to use excellent integration capabilities of R to develop user interface in Java, VB, or other tool – but this requires resorting to another, completely different programming language

13 13 Web server Several implementation of R Web servers are available They use different technologies, and offer different sets of functionalities We have in-house built Web portal and distributed computing platform that is currently using S-PLUS ® Server from Insightful We plan to integrate R using the R/DCOM interface  Having a feature like Insightful Graphlets ® would allow us to implement some user interaction in the Web application

14 14 Software development tools IDE –Tinn-R on Windows –StatET Eclipse plug-in –…  why not provide a standard IDE (probably Eclipse-based) as a part of the core distribution? Debugger, profiler –good tools are available  integration with IDE (graphical debugging) Source code management –subversion  integration with IDE

15 15 Deployment Keeping users’ computers with up to date versions of software is system administrators’ nightmare R package installation/update system provides everything one would ever need to keep an R-based software up and running!

16 16 Conclusions R provides an excellent platform for delivering data analytical functions enterprise-wide: +broad range of statistical methods included +highly flexible graphics +ease of extending existing code +great database and file system connectivity +built-in facilities for package updates Possible improvements: ±include standard, multi-platform IDE and (at least) some form of GUI toolkit in core distribution

17 17 Thank you for your attention


Download ppt "Using R as enterprise-wide data analysis platform Zivan Karaman."

Similar presentations


Ads by Google