Download presentation
Presentation is loading. Please wait.
1
BI for Big Data Beyond the Hype
2
Pentaho Mission The Future of Analytics: Big Data Exploration without Boundaries Modern, unified data integration and business analytics platform Native integration into big data ecosystem Embeddable, cloud-ready analytics Fast and Broad Innovation Open source development model Critical mass achieved Over 1,000 commercial customers Over 10,000 production deployments
3
Big Data Solutions Engineering, Pentaho
Ian Fyfe Big Data Solutions Engineering, Pentaho Ian brings over 20 years of experience in the business analytics software market with roles spanning consulting services, pre-sales engineering, product management and product marketing. Ian started his career by co-founding a business intelligence startup and has worked at Business Objects, Informix, Epiphany, PeopleSoft and Jaspersoft.
4
Common Use Cases
5
The Value of Big Data for our Customers
Big opportunities Drive incremental revenue Predict customer behavior across all channels Understand and monetize customer behavior Improve operational effectiveness Machines/sensors: predict failures, network attacks Financial risk management: reduce fraud, increase security Reduce data warehouse cost Integrate new data sources without increased database cost Provide online access to ‘dark data’
6
Example Use Cases Today
Transactional Fraud detection Financial services / stock markets Sub-Transactional Weblogs Social/online media Telecoms events Non-Transactional Web pages, blogs etc Documents Physical events Application events Machine events * Not many companies have transactional data that classifies as Big Data. Credit card companies, and financial services companies are about it. * With stock market data were are talking about every stock trade and the bid and ask prices between the transactions - for every stock on multiple markets for a significant time period. For many other companies the Big Data is sub-transactional - it is the events that lead up to transactions * Weblogs are semi/badly structured. Consider the number of weblog entries created as you look for a book online - researching 5-10 books, reading reviews and comments. You might generate 1000 entries and may or may not buy a book - potentially lots of entries for no transaction. We also want to enrich this data with metadata about the URLs and information about the location of user * In an online game or world every interaction between participants and the system and between each other is logged. An individual participant might generate > 1 million events for their 1 monthly transaction * A single phone call or text message generates many events within a telecoms company US and Worldwide: +1 (866) | Slide © 2010, Pentaho. All Rights Reserved.
7
Click Stream Analytics
From buying patterns to revenue Business Challenge Monetize buying patterns hidden in billions of data points Quickly analyze multi-channel click stream data Pentaho Benefits Reduced ETL time to analyze blended data from Hadoop, Hbase & data warehouse Use of big data analytics to grow revenue from targeted campaigns
8
Device Data Analytics Big Data for Fortune 100 Enterprise Storage provider Business Challenge Affordably scale machine data from storage devices for customer support app Predict device failure Enhance product performance Pentaho Benefits Easy to use ETL & analysis for Hadoop, Hbase, & Oracle data sources 15x cost improvement Stronger performance against customer SLA’s
9
Innovative Organizations Use Pentaho to Unlock Value from Big Data Stores
Healthcare Embedded Pentaho to better patient care & compliance through analysis of unstructured digital pen data stored in CouchDB Online Retailer Understanding the buying patterns of 5 million users from click stream data stored in Hadoop & HBase Gaming Better monetization of premium game features through analyzing large volumes of player data - stored in MongoDB & Infobright Social Commerce Better campaign performance through monitoring social media, page clicks and marketing data stored in HP Vertica Travel & Entertainment Helping thousands of travel partners like expedia.co.uk and thomascook.fr improve promotional targeting using Hbase and Hadoop Mobile & Digital Media Embedded Pentaho to measure massive volumes of mobile and event data generated from mobile devices stored in MongoDB TAKE-AWAYS Pentaho has many big data customers across a range of industries and big data platforms.
10
Pentaho Embedded Analytics
New Revenue Stream in Eight Weeks Business Challenge Gain new revenue source from add-on module with reporting, analysis & dashboards Get to market fast to differentiate Pentaho Benefits Easy to embed & brand Broad capabilities result in new revenue stream Increased functionality & compelling visualizations
11
Embedded Analytics Pentaho Uniquely Positioned to Win
Dashboard Designer Why We Win in Embedded: Architectural ‘sweet spot’ for Pentaho platform Flexible pricing, adaptable to fit partner pricing Open source and innovation Fastest time-to-market for embedded analytics Dashboard Framework Continued Leadership: Cloud & multi-tenancy ease-of-use Simplified REST services for ISVs BI Platform SDK enhancements – deep solution examples, tutorials and training Continued focus on standards and extensibility
12
Big Data Technologies BI Strengths and Weaknesses
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866)
13
GIGABYTES OF DATA CREATED (IN BILLIONS)
The Current Solutions 10,000 Current Database Solutions are designed for structured data. Optimized to answer known questions quickly Schemas dictate form/context Difficult to adapt to new data types and new questions Expensive at petabyte scale GIGABYTES OF DATA CREATED (IN BILLIONS) 5,000 10% 2005 2010 2015 STRUCTURED DATA UNSTRUCTURED DATA
14
Main Big Data Technologies
Hadoop Low cost, reliable scale-out architecture Distributed computing Proven success in Fortune 500 companies Exploding interest NoSQL Databases Huge horizontal scaling and high availability Highly optimized for retrieval and appending Types Document stores Key Value stores Graph databases Analytic RDBMS Optimized for bulk-load and fast aggregate query workloads Types Column-oriented MPP In-memory Hadoop NoSQL Databases Analytic Databases TAKE-AWAYS Pentaho provides complete integrated DI+BI for every leading big data platform.
15
Hadoop Core Components
Hadoop Distributed File System (HDFS) Massive redundant storage across a commodity cluster MapReduce Map: distribute a computational problem across a cluster Reduce: Master node collects the answers to all the sub-problems and combines them Many distros available Big Data solutions are not databases. They don’t provide the capabilities that BI toolsets expect of a database. Hadoop also has a high latency. This means the smallest query possible has an execution time that is much slower than that of a database Hadoop is optimized for executing very intensive data processing tasks on very large amounts of data. It is not optimized for quick queries. Some Hadoop experts recommend configuring the workloads so that Hadoop jobs take an hour or more. This conflicts with OLAP performance criteria of 5-10 seconds per query. There are database implementations within the Hadoop world, Hive, HBase etc. US and Worldwide: +1 (866) | Slide © 2010, Pentaho. All Rights Reserved.
16
Major Hadoop Utilities
Apache Pig High-level language for expressing data analysis programs Apache Hive Apache HBase SQL-like language and metadata repository The Hadoop database. Random, real -time read/write access Hue Apache Zookeeper Browser-based desktop interface for interacting with Hadoop Highly reliable distributed coordination service Oozie Flume Server-based workflow engine for Hadoop activities Distributed service for collecting and aggregating log and event data Sqoop Apache Whirr Integrating Hadoop with RDBMS Library for running Hadoop in the cloud
17
Hadoop & Databases
18
Big Data Platform Challenges
“The working conditions can be are shocking” Unfortunately for developers who are used to working with data transformation tools, the productivity within the Hadoop environment is not what they are used to. ETL Developer
19
Challenges Somewhat immature Lack of tooling
Steep technical learning curve Hiring qualified people Availability of enterprise-ready products and tools High latency (Hadoop) Running inside the cluster
20
Ingestion / Manipulation / Integration
Challenges Scheduling Modeling Ingestion / Manipulation / Integration … or this? TAKE-AWAYS The better choice is obviously visual development Would you rather do this?
21
Investigating BI & Big Data Solutions
22
Questions to Ask Business Drivers Technical
Mandate to reduce EDW costs? Clear use case that you need to solve? Do you have access to technical skill set? Technical Do you have more than one kind of big data store, for example Hadoop as well as HBase, MongoDB or Cassandra? Would you prefer to use the same tool for big data stores in addition to your traditional relational data stores? Are you ok waiting minutes or even hours to access your big data? Are you ok using a spreadsheet-like interface to access and analyze your data? Do you need complete BI capabilities, including reporting, interactive visualization, and predictive analytics? Do you need to enrich your big data with data from outside of the big data platform? Is the big data you want to analyze bigger than the amount of memory you have available?
23
Demo © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866)
24
Complete Big Data Analytics & Visual Data Management
Data Ingestion Manipulation Integration Enterprise & Ad Hoc Reporting Data Discovery Visualization Predictive Analytics Pentaho Big Data Analytics Hadoop NoSQL Analytic Databases Relational
25
Open Discussion
26
Join the conversation. You can find us on:
Thank You Join the conversation. You can find us on: blog.pentaho.com Facebook.com/Pentaho @Pentaho Pentaho Business Analytics
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.