We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byKaelyn Levit
Modified about 1 year ago
© Hortonworks Inc Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1
© Hortonworks Inc Big Data: Organizational Game Changer Page 2 Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM WEB BIG DATA Offer details Support Contacts Customer Touches Segmentation Web logs Offer history A/B testing Dynamic Pricing Affiliate Networks Search Marketing Behavioral Targeting Dynamic Funnels User Generated Content Mobile Web SMS/MMS Sentiment External Demographics HD Video, Audio, Images Speech to Text Product/Service Logs Social Interactions & Feeds Business Data Feeds User Click Stream Sensors / RFID / Devices Spatial & GPS Coordinates Increasing Data Variety and Complexity Transactions + Interactions + Observations = BIG DATA
© Hortonworks Inc What is a Data Driven Business? DEFINITION Better use of available data in the decision making process RULE Key metrics derived from data should be tied to goals PROVEN RESULTS Firms that adopt Data-Driven Decision Making have output and productivity that is 5-6% higher than what would be expected given their investments and usage of information technology* Page 3 * “Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance?” Brynjolfsson, Hitt and Kim (April 22, 2011)
© Hortonworks Inc Page 4
© Hortonworks Inc Page 5
© Hortonworks Inc Page 6
© Hortonworks Inc optimize Big Data: Optimize Outcomes at Scale SportsChampionships IntelligenceDetection FinanceAlgorithms AdvertisingPerformance FraudPrevention Retail / WholesaleInventory turns ManufacturingSupply chains HealthcarePatient outcomes EducationLearning outcomes GovernmentCitizen services Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation. Page 7
© Hortonworks Inc Dashboards, Reports, Visualization, … CRM, ERP Web, Mobile Point of sale Enterprise Big Data Flows Page 8 Big Data Platform Big Data Platform Business Transactions & Interactions Business Intelligence & Analytics Unstructured Data Log files DB data Exhaust Data Social Media Sensors, devices Classic Data Integration & ETL Capture Big Data Collect data from all sources structured &unstructured Process Transform, refine, aggregate, analyze, report Distribute Results Interoperate and share data with applications/analytics Feedback Use operational data w/in the big data platform 1234
© Hortonworks Inc Univ. California - Irvine Medical Center Optimizing patient outcomes while lowering costs Current system, Epic holds 22 years of patient data, across admissions and clinical information –Significant cost to maintain and run system –Difficult to access, not-integrated, stand alone Apache Hadoop sunsets legacy system and augments new electronic medical records (EMR) –Migrate legacy data to Apache Hadoop, replaced existing ETL and temporary databases and captures complete info –Eliminate maint of legacy system for $500K annual savings –Integrate data with EMR and clinical front-end for Better patient service and improved research Page 9 UC Irvine Medical Center is ranked among the nation's best hospitals by U.S. News & World Report for the 12th year More than 400 specialty and primary care physicians Opened in bed medical facility
© Hortonworks Inc Data Platform for Big Data Data Platform Requirements for Big Data Page 10 Capture Collect data from all sources - structured and unstructured data all speeds batch, async, streaming, real-time Process Transform, refine, aggregate, analyze, report Exchange Deliver data with enterprise data systems Share data with analytic applications and processing Operate Provision, monitor, diagnose, manage at scale Reliability, availability, affordability, scalability, interoperability Operating Systems Virtual Platforms Cloud Platforms Big Data Appliances Across all deployment models
© Hortonworks Inc Big Data Transactions, Interactions, Observations Apache Hadoop & Big Data Use Cases Page 11 RefineExploreEnrich Business Case
© Hortonworks Inc Enterprise Data Warehouse Operational Data Refinery Hadoop as platform for ETL modernization Capture Capture new unstructured data along with log files all alongside existing sources Retain inputs in raw form for audit and continuity purposes Process Parse the data & cleanse Apply structure and definition Join datasets together across disparate data sources Exchange Push to existing data warehouse for downstream consumption Feeds operational reporting and online systems Page 12 UnstructuredLog files Refinery Structure and join Capture and archive Parse & Cleanse Refine Explore Enric h DB data Upload
© Hortonworks Inc “Big Bank” Key Benefits Capture and archive –Retain 3 – 5 years instead of 2 – 10 days –Lower costs –Improved compliance Transform, change, refine –Turn upstream raw dumps into small list of “new, update, delete” customer records –Convert fixed-width EBCDIC to UTF-8 (Java and DB compatible) –Turn raw weblogs into sessions and behaviors Upload –Insert into Teradata for downstream “as-is” reporting and tools –Insert into new exploration platform for scientists to play with Page 13
© Hortonworks Inc Visualization ToolsEDW / Datamart Explore Big Data Exploration & Visualization Hadoop as agile, ad-hoc data mart Capture Capture multi-structured data and retain inputs in raw form for iterative analysis Process Parse the data into queryable format Explore & analyze using Hive, Pig, Mahout and other tools to discover value Label data and type information for compatibility and later discovery Pre-compute stats, groupings, patterns in data to accelerate analysis Exchange Use visualization tools to facilitate exploration and find key insights Optionally move actionable insights into EDW or datamart Page 14 Capture and archive uploadJDBC / ODBC Structure and join Categorize into tables UnstructuredLog files DB data Refine Explore Enrich Optional
© Hortonworks Inc “Hardware Manufacturer” Key Benefits Capture and archive –Store 10M+ survey forms/year for > 3 years –Capture text, audio, and systems data in one platform Structure and join –Unlock freeform text and audio data –Un-anonymize customers Categorize into tables –Create HCatalog tables “customer”, “survey”, “freeform text” Upload, JDBC –Visualize natural satisfaction levels and groups –Tag customers as “happy” and report back to CRM database Page 15
© Hortonworks Inc Online Applications Enrich Application Enrichment Deliver Hadoop analysis to online apps Capture Capture data that was once too bulky and unmanageable Process Uncover aggregate characteristics across data Use Hive Pig and Map Reduce to identify patterns Filter useful data from mass streams (Pig) Micro or macro batch oriented schedules Exchange Push results to HBase or other NoSQL alternative for real time delivery Use patterns to deliver right content/offer to the right person at the right time Page 16 Derive/Filter Capture Parse NoSQL, HBase Low Latency Scheduled & near real time UnstructuredLog files DB data RefineExplore Enrich
© Hortonworks Inc “Clothing Retailer” Key Benefits Capture –Capture weblogs together with sales order history, customer master Derive useful information –Compute relationships between products over time –“people who buy shirts eventually need pants” –Score customer web behavior / sentiment –Connect product recommendations to customer sentiment Share –Load customer recommendations into HBase for rapid website service Page 17
© Hortonworks Inc Hadoop in Enterprise Data Architectures Page 18 EDW Existing Business Infrastructure ODS & Datamarts Applications & Spreadsheets Visualization & Intelligence Discovery Tools IDE & Dev Tools Low Latency/NoSQ L Web Applications Operations Custom Existing Templeton Sqoop WebHDFS Flume HCatalog Pig HBase Hive AmbariHAOozie ZooKeeper MapReduceHDFS Big Data Sources (transactions, observations, interactions) CRMERP Exhaust Data logsfiles financials Social Media New Tech Datameer Tableau Karmasphere Splunk
© Hortonworks Inc Batch & Scheduled Integration Near Real-Time Integration Existing Infrastructure HDFS Pig Hadoop Integration Options REST Hive HBase MapReduce HCatalog WebHDFS Databases & Warehouses Applications & Spreadsheets Visualization & Intelligence Flume Logs & Files Existing Infrastructure HDFS PigHiveHBase MapReduce HCatalog Databases & Warehouses Applications & Spreadsheets Visualization & Intelligence Logs & Files Data Integration (Talend, Informatica) ODBC/JDBC SQOOP
© Hortonworks Inc Storage What Is the Size of Your Cluster? Open Source data management with scale-out storage & distributed processing Page 20 HDFS Distributed across “nodes” Natively redundant Name node tracks locations Compute Map Reduce Splits a task across processors “near” the data & assembles results Self-Healing, High Bandwidth Clustered Storage Key Characteristics Scalable –Efficiently store & process petabytes –Linear scale driven by additional processing and storage Reliable –Redundant storage –Failover across nodes and racks Flexible –Store all types of data in any format –Apply schema on analysis & sharing of data Economical –Use commodity hardware –Open source software guards against vendor lock-in
© Hortonworks Inc Hadoop Balance Innovation & Stability Page 21 time relative % customers The CHASM Customers want solutions & convenience Customers want technology & performance Innovators, technology enthusiasts Early adopters, visionaries Early majority, pragmatists Late majority, conservatives Laggards, Skeptics Source: Geoffrey Moore - Crossing the Chasm
© Hortonworks Inc Where Does It Fit into Your Business? VerticalRefineExploreEnrich Retail & Web Log Analysis Ad Optimization Cross Channel Analytics Social Network Analysis Event Analytics Dynamic Pricing Session & Content Optimization Recommendation Engines Retail Loyalty Program Optimization Brand and Sentiment Analysis Dynamic Pricing/Targeted Offer Intelligence Threat IdentificationPerson of Interest DiscoveryCross Jurisdiction Queries Finance Risk Modeling & Fraud Identification Trade Performance Analytics Surveillance and Fraud Detection Customer Risk Analysis Real-time upsell, cross sales marketing offers Energy Smart Grid: Production Optimization Grid Failure Prevention Smart Meters Individual Power Grid Manufacturing Supply Chain OptimizationCustomer Churn Analysis Dynamic Delivery Replacement parts Healthcare & Payer Electronic Medical Records (EMPI) Clinical Trials Analysis Supply Chain Optimizations Insurance Premium Determination Page 22
© Hortonworks Inc Simplify deployment to get started quickly and easily Monitor, manage any size cluster with familiar console and tools Only platform to include data integration services to interact with any data Metadata services opens the platform for integration with existing applications Dependable high availability architecture Tested at scale to future proof your cluster growth Hortonworks Data Platform Page 23 Reduce risks and cost of adoption Lower the total cost to administer and provision Integrate with your existing ecosystem
© Hortonworks Inc Next Steps? Expert role based training Course for admins, developers and operators Certification program Custom onsite options Page 24 Download Hortonworks Data Platform hortonworks.com/download 1 2 Use the getting started guide hortonworks.com/get-started 3 Learn more… get support Full lifecycle technical support across four service levels Delivered by Apache Hadoop Experts/Committers Forward-compatible Hortonworks Support hortonworks.com/traininghortonworks.com/support
© Hortonworks Inc Hortonworks Page 1. © Hortonworks Inc Big Data Changes the Game Megabytes Gigabytes Terabytes Petabytes Purchase detail.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
IoT Scenario - Connected Cars / Devices Cloud gateways Queue Service Get Data Get Reference Data Business Logic Store Raw Data Store Reporting Data.
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
AZURE DISTRIBUTED DATA Storage, HDInsight Hadoop, Azure Data Lake.
Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Business Intelligence: The Next Big Thing (Really!) John Bair CTO, Ajilitee Sep 14, 2012 Presented to TDWI St. Louis Chapter.
© 2011 IBM Corporation Smarter Software for a Smarter Planet The Capabilities of IBM Software Borislav Borissov SWG Manager, IBM.
E-Business Systems CHAPTER 7 Lecture-9/ T. Nouf Almujally 1.
©2011 SAP AG. All rights reserved.1 Confidential SAP Sybase IQ– Big Data Opportunity Tele Cheat Sheet Topic, Audience, Industry Topic/ Product: SAP Sybase.
Tech Evangelist, Microsoft Responsible for Azure Evangelism in Denmark Economics and Statistics background (Aarhus University) Blog:
Hortonworks: Hadoop for the Enterprise ONLY 100 open source Apache Hadoop data platform % Founded in 2011 HADOOP 1 ST distribution to go public IPO Fall.
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
© 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche.
Course : Study of Digital Convergence. Name : Srijana Acharya. Student ID : Date : 11/28/2014. Big Data Analytics and the Telco : How Telcos.
Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.
Powered by the Microsoft Azure Platform, Truck Tin Helps Your Sales Consultants Improve Efficiency, Information Sharing, Client Relations MICROSOFT AZURE.
1© 2015 IBM Corporation Unlocking the power of the API economy Client Briefing Nov.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation (http://www.emc.com) More videos that.
© 2012 Datameer, Inc. All rights reserved. Page 1 © 2012 Datameer, Inc. All rights reserved. Hadoop in Financial Services Adam Gugliciello, Solutions Engineer.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
1 © Cloudera, Inc. All rights reserved. Alexander Bibighaus| Director of Engineering The Future of Data Management with Hadoop and the Enterprise Data.
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
Introduction to Hadoop and HDFS. Table of Contents Hadoop – Overview Hadoop Cluster HDFS.
1 Knowledge Portals and Knowledge Management Tools Lecture Eleven (Chapter 11, Notes; Chapter 13, Textbook)
Big Data for the SQL Eye Cindy Look, it’s SQL! SELECT score, fun FROM toDo WHERE type = 'they pay me for
© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.
Techcello Provides SaaS Lifecycle Management Solution to “SaaS-ify” Your Application Efficiently on the Powerful Microsoft Azure Cloud Platform MICROSOFT.
12-1 Copyright © 2013 Pearson Canada Inc. Enhancing Decision Making Oleh : Kundang K Juman Enhancing Decision Making Oleh : Kundang K Juman CHAPTER TWELVE.
Architecting for the Internet of Things & Big Data Robert Stackowiak, Oracle North America, VP Information Architecture & Big Data September 29, 2014.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
BUSINESS INTELLIGENCE & ADVANCED ANALYTICS DISCOVER | PLAN | EXECUTE JANUARY 14, 2016.
An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.
Accumulus Delivers Enterprise Class Subscription Billing and Automation Solutions for Gaming, Retail, and More on the Scalable Microsoft Azure Platform.
Increasing Manufacturing Uptime Is Made Easier with RtTech’s Industrial Facilities Application RtDuet, Powered by the Microsoft Azure Cloud MICROSOFT AZURE.
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
Data Warehousing at Acxiom Paul Montrose Data Warehousing at Acxiom Paul Montrose.
MICROSOFT AZURE ISV PROFILE: BUYING BUTLER LTD Our free concierge buying service makes complex purchases easy. Our first category is cars: We help consumers.
Dell Software Unified Communications Command Suite (UCCS) Provides Flexible, Cross-Platform Management, Reporting and Data Diagnostics MICROSOFT AZURE.
Optimize your Open Data 5 Best Practices for Designing Data-Driven Apps Glenn Hess Federal Sales Engineer Actuate, Inc.
Task Performance Group Provides Cutting-Edge E-Commerce B2B EDI Integration Using MegaXML SaaS Solution on Microsoft Azure Cloud Platform MICROSOFT AZURE.
Hadoop on Azure 101 What is the Big Deal? Dennis Mulder Solution Architect Microsoft Corporation.
McGraw-Hill/Irwin Copyright © 2008, The McGraw-Hill Companies, Inc. All rights reserved.
© 2017 SlidePlayer.com Inc. All rights reserved.