BI for Big Data Beyond the Hype.

Slides:



Advertisements
Similar presentations
1 Towards an Open Service Framework for Cloud-based Knowledge Discovery Domenico Talia ICAR-CNR & UNIVERSITY OF CALABRIA, Italy Cloud.
Advertisements

Syncsort Data Integration Update Summary Helping Data Intensive Organizations Across the Big Data Continuum Hadoop – The Operating System.
MAXIMIZE WEBFOCUS GO MOBILE! Copyright 2010, Information Builders. Slide 1 Brian Carter 3/21/12.
© 2010 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. TIBCO Spotfire Application Data Services TIBCO Spotfire European User Conference.
Oracle Application Express Rapid Application Development Tool
Business Intelligence in Microsoft SQL Server 2005 Marin Bezić Microsoft EMEA SQL BI PRODUCT MANAGER
Page 1 GADD Software - An Introduction Public version, August 2014, gaddsoftware.com.
The Cloud Powering the Modern Business and Application 1.
Page 1 GADD Software & GADD Analytics 1.6 Public version, 2015, gaddsoftware.com GADD Analytics.
You can’t manage what you can’t measure
SERVING CORPORATES AND INDIVIDUALS ©2012 BUSINESS REPORTING MANAGEMENT SERVICES, INC WELCOME.
A Fast Growing Market. Interesting New Players Lyzasoft.
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
MICROSOFT BIG DATA. WHAT IS BIG DATA? How do I optimize my fleet based on weather and traffic patterns? SOCIAL & WEB ANALYTICS LIVE DATA FEEDS ADVANCED.
FAST FORWARD WITH MICROSOFT BIG DATA Vinoo Srinivas M Solutions Specialist Windows Azure (Hadoop, HPC, Media)
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Hadoop Ecosystem Overview
SM STRATA PRESENTATION Tim Garnto - SVP Engineering, edo Interactive Rob Rosen – Big Data Field Lead, Pentaho.
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) Pentaho Analytics for Big Data SEPTEMBER, 2013.
Apache Spark and the future of big data applications Eric Baldeschwieler.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
© 2011 IBM Corporation Smarter Software for a Smarter Planet The Capabilities of IBM Software Borislav Borissov SWG Manager, IBM.
` tuplejump The data engineering platform. A startup with a vision to simplify data engineering and empower the next generation of data powered miracles!
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
OpenField Consolidates Stadium Data, Provides CRM and Analysis Functions for an Intelligent, End-to-End Solution COMPANY PROFILE : OPENFIELD Founded by.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.
AZURE DISTRIBUTED DATA Storage, HDInsight Hadoop, Azure Data Lake.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Internet of Things. Creating Our Future Together.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
BIG DATA/ Hadoop Interview Questions.
Apache Hadoop on Windows Azure Avkash Chauhan
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Microsoft Ignite /28/2017 6:07 PM
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Business Insights Play briefing deck.
A presentation on ElasticSearch
The Self-Service Business Intelligence Suite
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Data Platform and Analytics Foundational Training
Data Platform and Analytics Foundational Training
Big Data Enterprise Patterns
Hybrid Management and Security
Hadoopla: Microsoft and the Hadoop Ecosystem
The Self-Service Business Intelligence Suite
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
With Help from the Microsoft Azure Cloud,
Scalable SoftNAS Cloud Protects Customers’ Mission-Critical Data in the Cloud with a Highly Available, Flexible Solution for Microsoft Azure MICROSOFT.
Logsign All-In-One Security Information and Event Management (SIEM) Solution Built on Azure Improves Security & Business Continuity MICROSOFT AZURE APP.
Utilizing the Capabilities of Microsoft Azure, Skipper Offers a Results-Based Platform That Helps Digital Advertisers with the Marketing of Their Mobile.
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
Accelerate Your Self-Service Data Analytics
MyCloudIT Enables Partners to Drive Their Cloud Profitability Using CSP-Enabled Desktop Hosting Automation with Microsoft Azure and Office 365 MICROSOFT.
XtremeData on the Microsoft Azure Cloud Platform:
Overview of big data tools
Improve Patient Experience with Saama and Microsoft Azure
Big DATA.
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
SQL Server 2019 Bringing Apache Spark to SQL Server
Architecture of modern data warehouse
Presentation transcript:

BI for Big Data Beyond the Hype

Pentaho Mission The Future of Analytics: Big Data Exploration without Boundaries Modern, unified data integration and business analytics platform Native integration into big data ecosystem Embeddable, cloud-ready analytics Fast and Broad Innovation Open source development model Critical mass achieved Over 1,000 commercial customers Over 10,000 production deployments

Big Data Solutions Engineering, Pentaho Ian Fyfe Big Data Solutions Engineering, Pentaho   Ian brings over 20 years of experience in the business analytics software market with roles spanning consulting services, pre-sales engineering, product management and product marketing. Ian started his career by co-founding a business intelligence startup and has worked at Business Objects, Informix, Epiphany, PeopleSoft and Jaspersoft.

Common Use Cases

The Value of Big Data for our Customers Big opportunities Drive incremental revenue Predict customer behavior across all channels Understand and monetize customer behavior Improve operational effectiveness Machines/sensors: predict failures, network attacks Financial risk management: reduce fraud, increase security Reduce data warehouse cost Integrate new data sources without increased database cost Provide online access to ‘dark data’

Example Use Cases Today Transactional Fraud detection Financial services / stock markets Sub-Transactional Weblogs Social/online media Telecoms events Non-Transactional Web pages, blogs etc Documents Physical events Application events Machine events * Not many companies have transactional data that classifies as Big Data. Credit card companies, and financial services companies are about it. * With stock market data were are talking about every stock trade and the bid and ask prices between the transactions - for every stock on multiple markets for a significant time period. For many other companies the Big Data is sub-transactional - it is the events that lead up to transactions * Weblogs are semi/badly structured. Consider the number of weblog entries created as you look for a book online - researching 5-10 books, reading reviews and comments. You might generate 1000 entries and may or may not buy a book - potentially lots of entries for no transaction. We also want to enrich this data with metadata about the URLs and information about the location of user * In an online game or world every interaction between participants and the system and between each other is logged. An individual participant might generate > 1 million events for their 1 monthly transaction * A single phone call or text message generates many events within a telecoms company US and Worldwide: +1 (866) 660-7555 | Slide © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

Click Stream Analytics From buying patterns to revenue Business Challenge Monetize buying patterns hidden in billions of data points Quickly analyze multi-channel click stream data Pentaho Benefits Reduced ETL time to analyze blended data from Hadoop, Hbase & data warehouse Use of big data analytics to grow revenue from targeted campaigns

Device Data Analytics Big Data for Fortune 100 Enterprise Storage provider Business Challenge Affordably scale machine data from storage devices for customer support app Predict device failure Enhance product performance Pentaho Benefits Easy to use ETL & analysis for Hadoop, Hbase, & Oracle data sources 15x cost improvement Stronger performance against customer SLA’s

Innovative Organizations Use Pentaho to Unlock Value from Big Data Stores Healthcare Embedded Pentaho to better patient care & compliance through analysis of unstructured digital pen data stored in CouchDB Online Retailer Understanding the buying patterns of 5 million users from click stream data stored in Hadoop & HBase Gaming Better monetization of premium game features through analyzing large volumes of player data - stored in MongoDB & Infobright Social Commerce Better campaign performance through monitoring social media, page clicks and email marketing data stored in HP Vertica Travel & Entertainment Helping thousands of travel partners like expedia.co.uk and thomascook.fr improve promotional targeting using Hbase and Hadoop Mobile & Digital Media Embedded Pentaho to measure massive volumes of mobile and event data generated from mobile devices stored in MongoDB TAKE-AWAYS Pentaho has many big data customers across a range of industries and big data platforms.

Pentaho Embedded Analytics New Revenue Stream in Eight Weeks Business Challenge Gain new revenue source from add-on module with reporting, analysis & dashboards Get to market fast to differentiate Pentaho Benefits Easy to embed & brand Broad capabilities result in new revenue stream Increased functionality & compelling visualizations

Embedded Analytics Pentaho Uniquely Positioned to Win Dashboard Designer Why We Win in Embedded: Architectural ‘sweet spot’ for Pentaho platform Flexible pricing, adaptable to fit partner pricing Open source and innovation Fastest time-to-market for embedded analytics Dashboard Framework Continued Leadership: Cloud & multi-tenancy ease-of-use Simplified REST services for ISVs BI Platform SDK enhancements – deep solution examples, tutorials and training Continued focus on standards and extensibility

Big Data Technologies BI Strengths and Weaknesses © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

GIGABYTES OF DATA CREATED (IN BILLIONS) The Current Solutions 10,000 Current Database Solutions are designed for structured data. Optimized to answer known questions quickly Schemas dictate form/context Difficult to adapt to new data types and new questions Expensive at petabyte scale GIGABYTES OF DATA CREATED (IN BILLIONS) 5,000 10% 2005 2010 2015 STRUCTURED DATA UNSTRUCTURED DATA

Main Big Data Technologies Hadoop Low cost, reliable scale-out architecture Distributed computing Proven success in Fortune 500 companies Exploding interest NoSQL Databases Huge horizontal scaling and high availability Highly optimized for retrieval and appending Types Document stores Key Value stores Graph databases Analytic RDBMS Optimized for bulk-load and fast aggregate query workloads Types Column-oriented MPP In-memory Hadoop NoSQL Databases Analytic Databases TAKE-AWAYS Pentaho provides complete integrated DI+BI for every leading big data platform.

Hadoop Core Components Hadoop Distributed File System (HDFS) Massive redundant storage across a commodity cluster MapReduce Map: distribute a computational problem across a cluster Reduce: Master node collects the answers to all the sub-problems and combines them Many distros available Big Data solutions are not databases. They don’t provide the capabilities that BI toolsets expect of a database. Hadoop also has a high latency. This means the smallest query possible has an execution time that is much slower than that of a database Hadoop is optimized for executing very intensive data processing tasks on very large amounts of data. It is not optimized for quick queries. Some Hadoop experts recommend configuring the workloads so that Hadoop jobs take an hour or more. This conflicts with OLAP performance criteria of 5-10 seconds per query. There are database implementations within the Hadoop world, Hive, HBase etc. US and Worldwide: +1 (866) 660-7555 | Slide © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

Major Hadoop Utilities Apache Pig High-level language for expressing data analysis programs Apache Hive Apache HBase SQL-like language and metadata repository The Hadoop database. Random, real -time read/write access Hue Apache Zookeeper Browser-based desktop interface for interacting with Hadoop Highly reliable distributed coordination service Oozie Flume Server-based workflow engine for Hadoop activities Distributed service for collecting and aggregating log and event data Sqoop Apache Whirr Integrating Hadoop with RDBMS Library for running Hadoop in the cloud

Hadoop & Databases

Big Data Platform Challenges “The working conditions can be are shocking” Unfortunately for developers who are used to working with data transformation tools, the productivity within the Hadoop environment is not what they are used to. ETL Developer

Challenges Somewhat immature Lack of tooling Steep technical learning curve Hiring qualified people Availability of enterprise-ready products and tools High latency (Hadoop) Running inside the cluster

Ingestion / Manipulation / Integration Challenges Scheduling Modeling Ingestion / Manipulation / Integration … or this? TAKE-AWAYS The better choice is obviously visual development Would you rather do this?

Investigating BI & Big Data Solutions

Questions to Ask Business Drivers Technical Mandate to reduce EDW costs? Clear use case that you need to solve? Do you have access to technical skill set? Technical Do you have more than one kind of big data store, for example Hadoop as well as HBase, MongoDB or Cassandra? Would you prefer to use the same tool for big data stores in addition to your traditional relational data stores? Are you ok waiting minutes or even hours to access your big data? Are you ok using a spreadsheet-like interface to access and analyze your data? Do you need complete BI capabilities, including reporting, interactive visualization, and predictive analytics? Do you need to enrich your big data with data from outside of the big data platform? Is the big data you want to analyze bigger than the amount of memory you have available? http://blog.pentaho.com/tag/ian-fyfe/

Demo © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Complete Big Data Analytics & Visual Data Management Data Ingestion Manipulation Integration Enterprise & Ad Hoc Reporting Data Discovery Visualization Predictive Analytics Pentaho Big Data Analytics Hadoop NoSQL Analytic Databases Relational

Open Discussion

Join the conversation. You can find us on: Thank You Join the conversation. You can find us on: blog.pentaho.com Facebook.com/Pentaho @Pentaho Pentaho Business Analytics