2Pentaho MissionThe Future of Analytics: Big Data Exploration without BoundariesModern, unified data integration and business analytics platformNative integration into big data ecosystemEmbeddable, cloud-ready analyticsFast and Broad InnovationOpen source development modelCritical mass achievedOver 1,000 commercial customersOver 10,000 production deployments
3Big Data Solutions Engineering, Pentaho Ian FyfeBig Data Solutions Engineering, PentahoIan brings over 20 years of experience in the business analytics software market with roles spanning consulting services, pre-sales engineering, product management and product marketing. Ian started his career by co-founding a business intelligence startup and has worked at Business Objects, Informix, Epiphany, PeopleSoft and Jaspersoft.
5The Value of Big Data for our Customers Big opportunitiesDrive incremental revenuePredict customer behavior across all channelsUnderstand and monetize customer behaviorImprove operational effectivenessMachines/sensors: predict failures, network attacksFinancial risk management: reduce fraud, increase securityReduce data warehouse costIntegrate new data sources without increased database costProvide online access to ‘dark data’
7Click Stream Analytics From buying patterns to revenueBusiness ChallengeMonetize buying patterns hidden in billions of data pointsQuickly analyze multi-channel click stream dataPentaho BenefitsReduced ETL time to analyze blended data from Hadoop, Hbase & data warehouseUse of big data analytics to grow revenue from targeted campaigns
8Device Data AnalyticsBig Data for Fortune 100 Enterprise Storage providerBusiness ChallengeAffordably scale machine data from storage devices for customer support appPredict device failureEnhance product performancePentaho BenefitsEasy to use ETL & analysis for Hadoop, Hbase, & Oracle data sources15x cost improvementStronger performance against customer SLA’s
9Innovative Organizations Use Pentaho to Unlock Value from Big Data Stores HealthcareEmbedded Pentaho to better patient care & compliance through analysis of unstructured digital pen data stored in CouchDBOnline RetailerUnderstanding the buying patterns of 5 million users from click stream data stored in Hadoop & HBaseGamingBetter monetization of premium game features through analyzing large volumes of player data - stored in MongoDB & InfobrightSocial CommerceBetter campaign performance through monitoring social media, page clicks and marketing data stored in HP VerticaTravel & EntertainmentHelping thousands of travel partners like expedia.co.uk and thomascook.fr improve promotional targeting using Hbase and HadoopMobile & Digital MediaEmbedded Pentaho to measure massive volumes of mobile and event data generated from mobile devices stored in MongoDBTAKE-AWAYSPentaho has many big data customers across a range of industries and big data platforms.
10Pentaho Embedded Analytics New Revenue Stream in Eight WeeksBusiness ChallengeGain new revenue source from add-on module with reporting, analysis & dashboardsGet to market fast to differentiatePentaho BenefitsEasy to embed & brandBroad capabilities result in new revenue streamIncreased functionality & compelling visualizations
11Embedded Analytics Pentaho Uniquely Positioned to Win Dashboard DesignerWhy We Win in Embedded:Architectural ‘sweet spot’ for Pentaho platformFlexible pricing, adaptable to fit partner pricingOpen source and innovationFastest time-to-market for embedded analyticsDashboard FrameworkContinued Leadership:Cloud & multi-tenancy ease-of-useSimplified REST services for ISVsBI Platform SDK enhancements – deep solution examples, tutorials and trainingContinued focus on standards and extensibility
13GIGABYTES OF DATA CREATED (IN BILLIONS) The Current Solutions10,000Current Database Solutions are designed for structured data.Optimized to answer known questions quicklySchemas dictate form/contextDifficult to adapt to new data types and new questionsExpensive at petabyte scaleGIGABYTES OF DATA CREATED (IN BILLIONS)5,00010%200520102015STRUCTURED DATAUNSTRUCTURED DATA
14Main Big Data Technologies HadoopLow cost, reliable scale-out architectureDistributed computing Proven success in Fortune 500 companiesExploding interestNoSQL DatabasesHuge horizontal scaling and high availabilityHighly optimized for retrieval and appendingTypesDocument storesKey Value storesGraph databasesAnalytic RDBMSOptimized for bulk-load and fast aggregate query workloadsTypesColumn-orientedMPPIn-memoryHadoopNoSQL DatabasesAnalytic DatabasesTAKE-AWAYSPentaho provides complete integrated DI+BI for every leading big data platform.
16Major Hadoop Utilities Apache PigHigh-level language for expressing data analysis programsApache HiveApache HBaseSQL-like language and metadata repositoryThe Hadoop database. Random, real -time read/write accessHueApache ZookeeperBrowser-based desktop interface for interacting with HadoopHighly reliable distributed coordination serviceOozieFlumeServer-based workflow engine for Hadoop activitiesDistributed service for collecting and aggregating log and event dataSqoopApache WhirrIntegrating Hadoop with RDBMSLibrary for running Hadoop in the cloud
18Big Data Platform Challenges “The working conditions can be are shocking”Unfortunately for developers who are used to working with data transformation tools, the productivity within the Hadoop environment is not what they are used to.ETL Developer
19Challenges Somewhat immature Lack of tooling Steep technical learning curveHiring qualified peopleAvailability of enterprise-ready products and toolsHigh latency (Hadoop)Running inside the cluster
20Ingestion / Manipulation / Integration ChallengesSchedulingModelingIngestion / Manipulation / Integration… or this?TAKE-AWAYSThe better choice is obviously visual developmentWould you rather do this?
22Questions to Ask Business Drivers Technical Mandate to reduce EDW costs?Clear use case that you need to solve?Do you have access to technical skill set?TechnicalDo you have more than one kind of big data store, for example Hadoop as well as HBase, MongoDB or Cassandra?Would you prefer to use the same tool for big data stores in addition to your traditional relational data stores?Are you ok waiting minutes or even hours to access your big data?Are you ok using a spreadsheet-like interface to access and analyze your data?Do you need complete BI capabilities, including reporting, interactive visualization, and predictive analytics?Do you need to enrich your big data with data from outside of the big data platform?Is the big data you want to analyze bigger than the amount of memory you have available?
24Complete Big Data Analytics & Visual Data Management Data IngestionManipulationIntegrationEnterprise &Ad Hoc ReportingData DiscoveryVisualizationPredictive AnalyticsPentaho Big Data AnalyticsHadoopNoSQLAnalytic DatabasesRelational