Using R as enterprise-wide data analysis platform Zivan Karaman.

Slides:



Advertisements
Similar presentations
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Advertisements

Distributed Data Processing
MICHAEL MARINO CSC 101 Whats New in Office Office Live Workspace 3 new things about Office Live Workspace are: Anywhere Access Store Microsoft.
Copyright Hub Software Engineering Ltd 2010All rights reserved Hub Document Manager Product Overview.
Windows NT server and workstation Name: Li Shen Course: COCS541 Instructor: Mort Anvari.
MATLAB Presented By: Nathalie Tacconi Presented By: Nathalie Tacconi Originally Prepared By: Sheridan Saint-Michel Originally Prepared By: Sheridan Saint-Michel.
DEV392: Extending SharePoint Products And Technologies Through Web Parts And ASP.NET Clint Covington, Program Manager Data And Developer Services - Office.
Chapter 17: Client/Server Computing Business Data Communications, 4e.
Web Server Hardware and Software
Access 2007 Product Review. With its improved interface and interactive design capabilities that do not require deep database knowledge, Microsoft Office.
1 Pertemuan 13 Servers for E-Business Matakuliah: M0284/Teknologi & Infrastruktur E-Business Tahun: 2005 Versi: >
CS 501: Software Engineering Fall 2000 Lecture 16 System Architecture III Distributed Objects.
Chapter 7: Client/Server Computing Business Data Communications, 5e.
Interpret Application Specifications
V1.00 © 2009 Research In Motion Limited Introduction to Mobile Device Web Development Trainer name Date.
Business Intelligence components Introduction. Microsoft® SQL Server™ 2005 is a complete business intelligence (BI) platform that provides the features,
Distributed Systems: Client/Server Computing
Installing Windows XP Professional Using Attended Installation Slide 1 of 41Session 2 Ver. 1.0 CompTIA A+ Certification: A Comprehensive Approach for all.
Passage Three Introduction to Microsoft SQL Server 2000.
Decimaker: A statistical software using R, Microsoft.NET, R (D)COM Server and graphical libraries Julien Vanwinsberghe, ClinBAY, France Francois Vandenhende,
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
CRYSTAL REPORTS Jacob Grogan. CRYSTAL REPORTS AND WHY IT’S USEFUL? “ Crystal Reports is a popular Windows-based report generation program that allows.
Version Enterprise Architect Redefines Modeling in 2006 An Agile and Scalable modeling solution Provides Full Lifecycle.
AN INTRODUCTION TO LINUX OPERATING SYSTEM Zihui Han.
SharePoint Portal Server 2003 JAMES WEIMHOLT WEIDER HAO JUAN TURCIOS BILL HUERTA BRANDON BROWN JAMES WEIMHOLT INTRODUCTION OVERVIEW IMPLEMENTATION CASE.
Internet GIS. A vast network connecting computers throughout the world Computers on the Internet are physically connected Computers on the Internet use.
CLOUD COMPUTING. A general term for anything that involves delivering hosted services over the Internet. And Cloud is referred to the hardware and software.
1. 2 How do I verify that my plant network is OK? Manually: Watch link lights and traffic indicators… Electronically: Purchase a SNMP management software.
BASIC NETWORK CONCEPTS (PART 6). Network Operating Systems NNow that you have a general idea of the network topologies, cable types, and network architectures,
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
By Mihir Joshi Nikhil Dixit Limaye Pallavi Bhide Payal Godse.
Overview of SQL Server Alka Arora.
Working With Large Datasets in Corporate Settings Ed Bassin
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
1. Windows Vista Enterprise And Mid-Market User Scenarios 2. Customer Profiling And Segmentation Tools 3. Windows Vista Business Value And Infrastructure.
Chapter 9: Novell NetWare
Progress in Multi-platform Software Deployment (Linux and Windows) Tim Kwiatkowski Welcome Consortium Members November 29,
Using the Powerful Microsoft Azure Platform, e-SUAP Properly and Securely Manages All Steps for Customizable Business Activities Permissions MICROSOFT.
Visual Linker Final presentation.
WebLogic Versus JBoss.
M1G Introduction to Database Development 6. Building Applications.
Best Uses of Microsoft Access Lauren Lewis. What is Microsoft Access? o MS access is a database management system from Microsoft that combines the relational.
material assembled from the web pages at
 Evolution of Smart Client  What is Smart client?  Types of Smart client  Architectural challenges  Smart Client Architecture  Demo application.
CS480 Computer Science Seminar Introduction to Microsoft Solutions Framework (MSF)
INFSO-RI Module 01 ETICS Overview Alberto Di Meglio.
INFSO-RI Module 01 ETICS Overview Etics Online Tutorial Marian ŻUREK Baltic Grid II Summer School Vilnius, 2-3 July 2009.
Universal Data Access and OLE DB. Customer Requirements for Data Access Technologies High-Performance access to data Reliability Vendor Commitment Broad.
Personal Computer - Stand- Alone Database  Database (or files) reside on a PC - on the hard disk.  Applications run on the same PC and directly access.
BlackBerry Application Development 06PLM – Group 7 Andrian Eduard Bangga Ikhsan Baskara Joovanny Pasuhuk Rangga Fajarullah.
Chapter 17: Client/Server Computing Business Data Communications, 4e.
ABSTRACT The JDBC (Java Database Connectivity) API is the industry standard for database- independent connectivity between the Java programming language.
In the Labs… X-Bot 2003 by Overtech Technologies.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
| Copyright© 2011 Microsoft Corporation 1 journey to the cloud KOEN VAN TOLHUYZEN TSP OFFICE 365 MICROSOFT CORPORATION.
Imagine Creating Software Without a Single Line of Code!
Lecture (7) Systems software and Application Software.Systems software and Application Software. Dr:Emad Elsharkawy Eng-Omar Salah Dr:Emad Elsharkawy 1.
Devanshu Bawa Customization Specialist Logo Business Solutions.
Features Of SQL Server 2000: 1. Internet Integration: SQL Server 2000 works with other products to form a stable and secure data store for internet and.
Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.
Software Architecture in Practice Mandatory project in performance engineering.
Distributed Data Servers and Web Interface in the Climate Data Portal Willa H. Zhu Joint Institute for the Study of Ocean and Atmosphere University of.
GROUP PresentsPresents. WEB CRAWLER A visualization of links in the World Wide Web Software Engineering C Semester Two Massey University - Palmerston.
Best 5 Mobile App Development Tools for Developer's to think beyond the Limitation.
BlackBerry Application Development
Introduction to .NET Framework Ch2 – Deitel’s Book
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
Chapter 17: Client/Server Computing
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Presentation transcript:

Using R as enterprise-wide data analysis platform Zivan Karaman

2 Limagrain FIELD SEEDS VEGETABLE SEEDS AND GARDEN PRODUCTS CEREAL INGREDIENTS AND BAKERY PRODUCTS Our profession: improvement and valorization of plants Our mission: innovate in order to create varieties that meet the expectations of farmers, market gardeners, industrialists and consumers

3 Limagrain research 73 research centers Europe : 41 centers Americas : 22 centers Asia Pacific : 10 centers Annual budget: € 102 million 12% of professional sales researchers

4 Context Plant breeding aims at creating new varieties – stable forms with desirable agronomic properties - from the existing genetic diversity. It is a long and resource-consuming activity. Many field trials and laboratory experiments are needed to evaluate the tested plant material Huge amounts of data must be analysed by the users who are not specialists in statistics & computing … and it must be done quickly!

5 Needs Data to be analysed must be retrieved from the operational databases and quickly processed Most end users are geographically dispersed with no local support for data analysis Some types of analysis require long and complex computations  client/server architecture with computations being done on the server side (minimise WAN traffic) & Web interface to routine analyses but … Some users need (much) more flexibility … and we all want to use the same tool

6 Users End users –occasional & routine analyses –ease of use/GUI (Web interface) Power users –regular & more flexible, interactive analyses –ease of use/GUI (desktop application) Developers –develop tools for the users –software engineering tools (IDE, source code mgt.) Expert users (statisticians) –develop & test new statistical methodology –require flexible programming language

7 Requirements Rich function set for statistical data analysis and flexible graphics Possibility to extend the built-in functions Database connectivity and access to file system Integration with other software Handling large problems (upsizing) Capacity to build user-friendly interfaces (GUI) Capacity to be used over the Web (server) Standard software development tools Ease of deployment

8 Rich function set & extendibility R programming environment is an invitation to explore the data and create own functions – the only limit being user’s imagination R provides rich set of functions for statistical data analysis and extremely flexible graphics capabilities  limited built-in support for interactive graphics (linked views) - is Rggobi the way to go?  Graphlets ® - useful S-PLUS ® feature that we miss

9 Database connectivity & file system Database access –RODBC provides a wide range of possibilities, including access to Excel files  can’t handle multiple result set queries (list of data frames), which would be helpful File system access –excellent set of functions for accessing local files system and even the files over the internet –can handle zip files, but …  full support for zip-file management (create, list contents, add/remove files, etc.) would be nice

10 Integration with other software R provides excellent built-in support for integrating existing Fortran or C code Communication protocols exist for directly integrating R with Java and other software, both as client and server On, any COM compliant software can be used to drive R (GUI front-end, for example) Finally, through the rich set of functions for accessing operating system files and possibility to invoke system shell, any program that can read and write text files in the batch mode can be easily interfaced with R

11 Upsizing Microsoft Windows is our common platform Some problems require more than 4 Gb of memory that standard Windows can manage We hope to be able to handle them on 64-bits Linux R code can be painlessly moved from 32-bits Windows to 64-bits Linux (can it?), providing a straightforward way for upsizing Long-running simulations – several R packages provide support for parallel computing

12 User-friendly interfaces Several GUI toolkits are available as add-on packages Providing a standard set of tools for building user interfaces as a part of the core distribution would be very helpful Common data analysis functions could be implemented through this standard GUI toolkit (like in GenStat ® or S-PLUS ® ) Another way is to use excellent integration capabilities of R to develop user interface in Java, VB, or other tool – but this requires resorting to another, completely different programming language

13 Web server Several implementation of R Web servers are available They use different technologies, and offer different sets of functionalities We have in-house built Web portal and distributed computing platform that is currently using S-PLUS ® Server from Insightful We plan to integrate R using the R/DCOM interface  Having a feature like Insightful Graphlets ® would allow us to implement some user interaction in the Web application

14 Software development tools IDE –Tinn-R on Windows –StatET Eclipse plug-in –…  why not provide a standard IDE (probably Eclipse-based) as a part of the core distribution? Debugger, profiler –good tools are available  integration with IDE (graphical debugging) Source code management –subversion  integration with IDE

15 Deployment Keeping users’ computers with up to date versions of software is system administrators’ nightmare R package installation/update system provides everything one would ever need to keep an R-based software up and running!

16 Conclusions R provides an excellent platform for delivering data analytical functions enterprise-wide: +broad range of statistical methods included +highly flexible graphics +ease of extending existing code +great database and file system connectivity +built-in facilities for package updates Possible improvements: ±include standard, multi-platform IDE and (at least) some form of GUI toolkit in core distribution

17 Thank you for your attention