Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pentaho business analytics & data integration

Similar presentations


Presentation on theme: "Pentaho business analytics & data integration"— Presentation transcript:

1 Pentaho business analytics & data integration

2 About US – Zaponet data science solutions
Zaponet is a service integrator and development shop providing solutions & professional services for building state of the art data-products which leverage big-data & data-science technologies. Zaponet architect, design and builds big-data solutions: data warehouses, user-profile systems, recommendation engines, complex event processing and more Some of our technology partners are: pentaho ,cloudera ,infobright , vertica, kognitio ,gigaspaces more details *future meetup: Pentaho Weka for data science

3 About Me – Amjad Akkawi Zaponet CTO Experience in pentaho

4 Agenda Pentaho in business analytics & data integration Pentaho BI Demo Pentaho PDI Demo

5 OVER 160 PARTNERS GLOBALLY
About Pentaho Recognized leader in business analytics & data integration Subscription-based business model Achieved critical mass: Over 1,200 commercial customers Over 10,000 production deployments Over 185 countries Stewardship of most important open source analytics projects INDUSTRY RECOGNITION OVER 160 PARTNERS GLOBALLY

6 Why Customer Love Pentaho
Speed of Deployment Marketing dashboard in less than 1 day 2 weeks time to market 8 weeks time to market Fully rolled out in budget in 4 months Innovation & Scalability Analyzing buying patterns of 5 million members Music files from 20,000 sources Operational reports at all 1000 retail stores Analytics on 500,000 patients records Superior Customer Service “… a great partner through every phase of our project” “… better functionality and more support” “… top-notch professional support” “Pentaho support is as good as its software” Total Value 75% lower acquisition costs €350K+ cost saving Less than 1 month ROI “…ROI was almost immediate.”

7 Pentaho Business Analytics
Pentaho in the Big Data Fabric Pentaho Business Analytics R 3rd Party BI Tools Applications 3rd Party Tools Big Analytics Data Integration Job Orchestration Workflow Scheduling High Performance Visual IDE Hadoop Java MapReduce, Pig Pentaho MapReduce NoSQL Databases Analytic Databases Big Data Mgmt

8 High Level Feature/Functions
Self-service Interactive KPI & Metrics and Visualization Dashboards Information Consumers Ad hoc and Operational Reports Reporting Business Users Components are independent Knowledge Workers/ Business Users Analysis Self-service Interactive and Ad Hoc Analysis High Performance Data Integration, BIG DATA, Cleansing and Presentation Data Power Users, Developers & DBAs Advanced Predictive Analysis Advanced Power Users & Viewers Data Mining

9 Information Consumers Knowledge Workers/ Business Users
High Level Feature/Functions Self-service Interactive KPI & Metrics and Visualization Dashboards Information Consumers Ad hoc and Operational Reports Reporting Business Users Knowledge Workers/ Business Users Analysis Self-service Interactive and Ad Hoc Analysis High Performance Data Integration, BIG DATA, Cleansing and Presentation Data Power Users, Developers & DBAs Advanced Predictive Analysis Advanced Power Users & Viewers Data Mining

10 Dashboards

11 Dashboards & Interactive Dashboards

12 Dashboards – Geo Location-Based

13 Information Consumers Knowledge Workers/ Business Users
High Level Feature/Functions Self-service Interactive KPI & Metrics and Visualization Dashboards Information Consumers Ad hoc and Operational Reports Reporting Business Users Knowledge Workers/ Business Users Analysis Self-service Interactive and Ad Hoc Analysis High Performance Data Integration, BIG DATA, Cleansing and Presentation Data Power Users, Developers & DBAs Advanced Predictive Analysis Advanced Power Users & Viewers Data Mining

14 Reports – Interactive, Static, Distributed

15 Reports – Reporting Pack & House Styles

16 Reports – Reporting Pack & House Styles

17 Information Consumers Knowledge Workers/ Business Users
High Level Feature/Functions Self-service Interactive KPI & Metrics and Visualization Dashboards Information Consumers Ad hoc and Operational Reports Reporting Business Users Knowledge Workers/ Business Users Analysis Self-service Interactive and Ad Hoc Analysis High Performance Data Integration, BIG DATA, Cleansing and Presentation Data Power Users, Developers & DBAs Advanced Predictive Analysis Advanced Power Users & Viewers Data Mining

18 Enhanced In-Memory Analytics
Enhanced in-memory caching for speed of thought visualization & analysis More re-usability of in-memory data Fewer trips to the database/disk Builds on existing unique extreme-scale in-memory analytics Support for external data grids Infinispan / JBoss Enteprise Data Grid and Memcached Scale to caching hundreds of GBs (potentially TBs) of data in-memory Competition Java heap or C++ memory space (a few GB at most (most BI products) or Proprietary (hard to manage) in-memory technology (e.g. Qlikview, Microstrategy) Even the best interactive visualization is frustrating if the end-user has to sit there interminably waiting for the system to respond. Testing has shown that usage of BI systems drops dramatically once response time starts to exceed 5 seconds, as users tend to lose their train of thought. This is what we mean by “speed of thought” response times – a snappy system that keeps up with thought-train of the user. By avoiding database round-trips, in-memory data caching is a popular and growing approach to providing this performance, But with the dramatic growth of data volumes it is become more and more challenging for traditional BI applications to keep-up. Pentaho is the “only” business analytics provider to use the extreme-scale in-memory caching technology used to power some of the world’s highest volume consumer websites such as Youtube and Amazon.com. That technology is known as data grids – a way of caching large amounts of data across an inexpensive cluster of commodity servers. Pentaho’s analytics supports two of the leading data grids – Infinispan (also known as JBoss Enterprise Data Grid) and Memcached. Traditional in-memory products written in either Java or C++ are constrained to using a limited amount of memory on the server on which they are executing – at most a few GBs. By contrast a data grid can be distributed across a cluster of commodity servers and can address hundreds of GBs, and potentially TBs of memory in future as hardware memory sizes get larger and less expensive. This allows customers to load all or most of their data into memory, so delivering consistent speed of thought responses times, orders of magnitude faster than needing to query a database because the data the user needed was not in-memory. A couple of vendors, Qliktech and Microstrategy, do provide proprietary in-memory caching capabilities. But these require special training and skills to use and maintain – and they are single server solutions so constrained to the amount of physical memory that can be installed on a single server, typically no more than 64GB.

19 Analyzer – Table format

20 Analyzer – Chart format

21 Analyzer: Geo Location-Based Analysis

22 Information Consumers Knowledge Workers/ Business Users
High Level Feature/Functions Self-service Interactive KPI & Metrics and Visualization Dashboards Information Consumers Ad hoc and Operational Reports Reporting Business Users Knowledge Workers/ Business Users Analysis Self-service Interactive and Ad Hoc Analysis High Performance Data Integration, BIG DATA, Cleansing and Presentation Data Power Users, Developers & DBAs Advanced Predictive Analysis Advanced Power Users & Viewers Data Mining

23 Scenario 1 Operational Database Dashboard Report

24 Data Mart(s) / Warehouse
Scenario 2 Data Mart(s) / Warehouse Dashboard Metadata Report Analyzer

25 Metadata – Schema Workbench
Complex calculations and multi-cube requirements may need more modeling

26 Scenario 3 PDI PDI Dashboard Metadata Report Analyzer
BIG DATA Technology and/or Staging Area & Data Vault Structured Data Data Mart(s) / Warehouse Dashboard PDI PDI Metadata Report Unstructured Data100 Analyzer Pentaho Data Integration Source data acquisition Initial consolidation as required Pentaho Data Integration Cleansing Transformation Change Data Capture Data Warehouse Management

27 Variations on a Theme PDI PDI Alerting Dashboard Metadata Report
SMS, & attachments BIG DATA Technology and/or Staging Area & Data Vault Structured Data Data Mart(s) / Warehouse Dashboard PDI PDI Metadata Report Unstructured Data Analyzer Pentaho Data Integration Source data acquisition Initial consolidation as required Pentaho Data Integration Cleansing Transformation Change Data Capture Data Warehouse Management Ad-hoc Data

28 Enterprise Edition Data Integration Server
PDI Components Enterprise Edition Data Integration Server Execution and remote monitoring Integrated scheduling Enterprise Security options Enhanced content management including revision history and locking Remote distributed cluster based processing

29 Kettle Conceptual Model

30 Pentaho Data Integration
Step based processing engine with instant visualization of results

31 Pentaho Data Integration
Step based performance

32 Pentaho Data Integration
Integrated Metadata Creation

33 Pentaho and Big Data Forrester Wave, Enterprise Hadoop Solutions, Q1 2012
Only vendor in strong performer category: “an impressive Hadoop integration tool” Only business analytics vendor Richest functionality Most extensive integration with open source Apache Hadoop and major Hadoop distributions

34 Expanded Insight into Big and Diverse Data
Improved support for Hadoop Simpler deployment across Hadoop clusters Support for the Hadoop cache Debian RPM installer Performance and ease of use enhancements for Pentaho MapReduce visual development Support for Hadoop Security data access New NoSQL database support Cassandra MongoDB Growing the Pentaho big data community Open sourced all big data components (Hadoop & NoSQL) Apache License – same as used by leading Hadoop and NoSQL distros New big data developer resources: How to documents, videos, walk-throughs With 4.5 Pentaho continues to improve and expand the industry’s broadest support for big data stores, including the leading Hadoop distributions, NoSQL databases, and analytic databases. With this release we’ve made it easier to deploy PDI across a Hadoop cluster by using Hadoop’s cache (something no other ETL engine can do). And we’ve improved the performance of Pentaho MapReduce, which is Pentaho’s visual alternative to writing MapReduce jobs in Java or Pig, such that the performance of these jobs are comparable and often faster than hand-written code. For example, in a recent POC at a major Wall Street financial institution the POC took us 2 days versus a month for a major SI, the performance went from 5 min to 20 seconds in one use case and 7 minutes to 27 seconds in another. So less than 10% of the time to implement, and executed 15 times faster. Granted a competent SI could probably tune their code to deliver comparable execution times, this serves to illustrate the dramatic advantages in developer productivity we provide, and the fact you don’t have to find and hire a scarce and expensive big data specialist developer. We also now support Hadoop security, the new mechanism that Hadoop uses to control data access via authentication and authorization. We’re also focused on growing the Pentaho big data community, and during the last quarter we open sourced all the big data capabilities in PDI and created the Pentaho Big Data Community website. This is resulting in broad adoption of the use of Pentaho for big data projects, which will over time trickle up to revenue.

35 Hadoop Data Management & Integration
Accessible by any ETL developer or data scientist Pentaho MapReduce

36 NoSQL Data Management & Integration
Visual Job Orchestration Any Data Source Accessible by any ETL developer or data scientist

37 Visual Job Orchestration Any Data Source
Scheduling Accessible to any ETL developer or data scientist

38 Pentaho Integration Options
Pentaho BI Server Other Application Pentaho Custom Stuff My Application Pentaho Components

39 Integration Bundled Mashup Extended Embedded Value What it Takes?
Fastest Way to Get Analytics that Have Your Look & Feel An Integrated Experience for Yours End User Customizing Pentaho for Your Experience Ultimate Integration and Customization What it Takes? Pentaho is a separate app, branded with Partner’s logo, look & feel Optional: Partner app may include links to Pentaho reports, analysis and dashboards (popping new window) Optional: Single sign-on creates a seamless experience Pentaho & Partner app have the same UI Pentaho User Console, or individual reports, analysis or dashboards are included in partner app Single sign-on creates a seamless experience Pentaho’s core functionality is extended through plug-ins. Examples: Connecting to custom data sources Adding new visualizations Customizing security Replacing Pentaho rules engine Integrate with Partner’s App Server Directly embedding Pentaho into your app Calling Pentaho Java APIs from your App Skill Level Limited HTML skills HTML skills Java skills Knowledge of Pentaho architecture

40 Q & A NEXT … Pentaho PDI Demo Pentaho BI Demo

41 “Traditional” Database Support
DATA ANALYSIS DATA INTEGRATION

42 Broadest Support for Big Data Platforms
Hadoop NoSQL Analytic Databases


Download ppt "Pentaho business analytics & data integration"

Similar presentations


Ads by Google