We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byKaelyn Levit
Modified over 2 years ago
© Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1
© Hortonworks Inc. 2012 Big Data: Organizational Game Changer Page 2 Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM WEB BIG DATA Offer details Support Contacts Customer Touches Segmentation Web logs Offer history A/B testing Dynamic Pricing Affiliate Networks Search Marketing Behavioral Targeting Dynamic Funnels User Generated Content Mobile Web SMS/MMS Sentiment External Demographics HD Video, Audio, Images Speech to Text Product/Service Logs Social Interactions & Feeds Business Data Feeds User Click Stream Sensors / RFID / Devices Spatial & GPS Coordinates Increasing Data Variety and Complexity Transactions + Interactions + Observations = BIG DATA
© Hortonworks Inc. 2012 What is a Data Driven Business? DEFINITION Better use of available data in the decision making process RULE Key metrics derived from data should be tied to goals PROVEN RESULTS Firms that adopt Data-Driven Decision Making have output and productivity that is 5-6% higher than what would be expected given their investments and usage of information technology* Page 3 * “Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance?” Brynjolfsson, Hitt and Kim (April 22, 2011) 1110010100001010011101010100010010100100101001001000010010001001000001000100000100 01001001000100001011100001001000100010100100101111010100100010010010100101001001111 1001010010100011111010001001010000010010001010010111101010011001001010010001000111
© Hortonworks Inc. 2012 Page 4
© Hortonworks Inc. 2012 Page 5
© Hortonworks Inc. 2012 Page 6
© Hortonworks Inc. 2012 optimize Big Data: Optimize Outcomes at Scale SportsChampionships IntelligenceDetection FinanceAlgorithms AdvertisingPerformance FraudPrevention Retail / WholesaleInventory turns ManufacturingSupply chains HealthcarePatient outcomes EducationLearning outcomes GovernmentCitizen services Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation. Page 7
© Hortonworks Inc. 2012 Dashboards, Reports, Visualization, … CRM, ERP Web, Mobile Point of sale Enterprise Big Data Flows Page 8 Big Data Platform Big Data Platform Business Transactions & Interactions Business Intelligence & Analytics Unstructured Data Log files DB data Exhaust Data Social Media Sensors, devices Classic Data Integration & ETL Capture Big Data Collect data from all sources structured &unstructured Process Transform, refine, aggregate, analyze, report Distribute Results Interoperate and share data with applications/analytics Feedback Use operational data w/in the big data platform 1234
© Hortonworks Inc. 2012 Univ. California - Irvine Medical Center Optimizing patient outcomes while lowering costs Current system, Epic holds 22 years of patient data, across admissions and clinical information –Significant cost to maintain and run system –Difficult to access, not-integrated, stand alone Apache Hadoop sunsets legacy system and augments new electronic medical records (EMR) –Migrate legacy data to Apache Hadoop, replaced existing ETL and temporary databases and captures complete info –Eliminate maint of legacy system for $500K annual savings –Integrate data with EMR and clinical front-end for Better patient service and improved research Page 9 UC Irvine Medical Center is ranked among the nation's best hospitals by U.S. News & World Report for the 12th year More than 400 specialty and primary care physicians Opened in 1976 422-bed medical facility
© Hortonworks Inc. 2012 Data Platform for Big Data Data Platform Requirements for Big Data Page 10 Capture Collect data from all sources - structured and unstructured data all speeds batch, async, streaming, real-time Process Transform, refine, aggregate, analyze, report Exchange Deliver data with enterprise data systems Share data with analytic applications and processing Operate Provision, monitor, diagnose, manage at scale Reliability, availability, affordability, scalability, interoperability Operating Systems Virtual Platforms Cloud Platforms Big Data Appliances Across all deployment models
© Hortonworks Inc. 2012 Big Data Transactions, Interactions, Observations Apache Hadoop & Big Data Use Cases Page 11 RefineExploreEnrich Business Case
© Hortonworks Inc. 2012 Enterprise Data Warehouse Operational Data Refinery Hadoop as platform for ETL modernization Capture Capture new unstructured data along with log files all alongside existing sources Retain inputs in raw form for audit and continuity purposes Process Parse the data & cleanse Apply structure and definition Join datasets together across disparate data sources Exchange Push to existing data warehouse for downstream consumption Feeds operational reporting and online systems Page 12 UnstructuredLog files Refinery Structure and join Capture and archive Parse & Cleanse Refine Explore Enric h DB data Upload
© Hortonworks Inc. 2012 “Big Bank” Key Benefits Capture and archive –Retain 3 – 5 years instead of 2 – 10 days –Lower costs –Improved compliance Transform, change, refine –Turn upstream raw dumps into small list of “new, update, delete” customer records –Convert fixed-width EBCDIC to UTF-8 (Java and DB compatible) –Turn raw weblogs into sessions and behaviors Upload –Insert into Teradata for downstream “as-is” reporting and tools –Insert into new exploration platform for scientists to play with Page 13
© Hortonworks Inc. 2012 Visualization ToolsEDW / Datamart Explore Big Data Exploration & Visualization Hadoop as agile, ad-hoc data mart Capture Capture multi-structured data and retain inputs in raw form for iterative analysis Process Parse the data into queryable format Explore & analyze using Hive, Pig, Mahout and other tools to discover value Label data and type information for compatibility and later discovery Pre-compute stats, groupings, patterns in data to accelerate analysis Exchange Use visualization tools to facilitate exploration and find key insights Optionally move actionable insights into EDW or datamart Page 14 Capture and archive uploadJDBC / ODBC Structure and join Categorize into tables UnstructuredLog files DB data Refine Explore Enrich Optional
© Hortonworks Inc. 2012 “Hardware Manufacturer” Key Benefits Capture and archive –Store 10M+ survey forms/year for > 3 years –Capture text, audio, and systems data in one platform Structure and join –Unlock freeform text and audio data –Un-anonymize customers Categorize into tables –Create HCatalog tables “customer”, “survey”, “freeform text” Upload, JDBC –Visualize natural satisfaction levels and groups –Tag customers as “happy” and report back to CRM database Page 15
© Hortonworks Inc. 2012 Online Applications Enrich Application Enrichment Deliver Hadoop analysis to online apps Capture Capture data that was once too bulky and unmanageable Process Uncover aggregate characteristics across data Use Hive Pig and Map Reduce to identify patterns Filter useful data from mass streams (Pig) Micro or macro batch oriented schedules Exchange Push results to HBase or other NoSQL alternative for real time delivery Use patterns to deliver right content/offer to the right person at the right time Page 16 Derive/Filter Capture Parse NoSQL, HBase Low Latency Scheduled & near real time UnstructuredLog files DB data RefineExplore Enrich
© Hortonworks Inc. 2012 “Clothing Retailer” Key Benefits Capture –Capture weblogs together with sales order history, customer master Derive useful information –Compute relationships between products over time –“people who buy shirts eventually need pants” –Score customer web behavior / sentiment –Connect product recommendations to customer sentiment Share –Load customer recommendations into HBase for rapid website service Page 17
© Hortonworks Inc. 2012 Hadoop in Enterprise Data Architectures Page 18 EDW Existing Business Infrastructure ODS & Datamarts Applications & Spreadsheets Visualization & Intelligence Discovery Tools IDE & Dev Tools Low Latency/NoSQ L Web Applications Operations Custom Existing Templeton Sqoop WebHDFS Flume HCatalog Pig HBase Hive AmbariHAOozie ZooKeeper MapReduceHDFS Big Data Sources (transactions, observations, interactions) CRMERP Exhaust Data logsfiles financials Social Media New Tech Datameer Tableau Karmasphere Splunk
© Hortonworks Inc. 2012 Batch & Scheduled Integration Near Real-Time Integration Existing Infrastructure HDFS Pig Hadoop Integration Options REST Hive HBase MapReduce HCatalog WebHDFS Databases & Warehouses Applications & Spreadsheets Visualization & Intelligence Flume Logs & Files Existing Infrastructure HDFS PigHiveHBase MapReduce HCatalog Databases & Warehouses Applications & Spreadsheets Visualization & Intelligence Logs & Files Data Integration (Talend, Informatica) ODBC/JDBC SQOOP
© Hortonworks Inc. 2012 Storage What Is the Size of Your Cluster? Open Source data management with scale-out storage & distributed processing Page 20 HDFS Distributed across “nodes” Natively redundant Name node tracks locations Compute Map Reduce Splits a task across processors “near” the data & assembles results Self-Healing, High Bandwidth Clustered Storage Key Characteristics Scalable –Efficiently store & process petabytes –Linear scale driven by additional processing and storage Reliable –Redundant storage –Failover across nodes and racks Flexible –Store all types of data in any format –Apply schema on analysis & sharing of data Economical –Use commodity hardware –Open source software guards against vendor lock-in
© Hortonworks Inc. 2012 Hadoop Balance Innovation & Stability Page 21 time relative % customers The CHASM Customers want solutions & convenience Customers want technology & performance Innovators, technology enthusiasts Early adopters, visionaries Early majority, pragmatists Late majority, conservatives Laggards, Skeptics Source: Geoffrey Moore - Crossing the Chasm www.hortonworks.com/moore
© Hortonworks Inc. 2012 Where Does It Fit into Your Business? VerticalRefineExploreEnrich Retail & Web Log Analysis Ad Optimization Cross Channel Analytics Social Network Analysis Event Analytics Dynamic Pricing Session & Content Optimization Recommendation Engines Retail Loyalty Program Optimization Brand and Sentiment Analysis Dynamic Pricing/Targeted Offer Intelligence Threat IdentificationPerson of Interest DiscoveryCross Jurisdiction Queries Finance Risk Modeling & Fraud Identification Trade Performance Analytics Surveillance and Fraud Detection Customer Risk Analysis Real-time upsell, cross sales marketing offers Energy Smart Grid: Production Optimization Grid Failure Prevention Smart Meters Individual Power Grid Manufacturing Supply Chain OptimizationCustomer Churn Analysis Dynamic Delivery Replacement parts Healthcare & Payer Electronic Medical Records (EMPI) Clinical Trials Analysis Supply Chain Optimizations Insurance Premium Determination Page 22
© Hortonworks Inc. 2012 1 1 Simplify deployment to get started quickly and easily Monitor, manage any size cluster with familiar console and tools Only platform to include data integration services to interact with any data Metadata services opens the platform for integration with existing applications Dependable high availability architecture Tested at scale to future proof your cluster growth Hortonworks Data Platform Page 23 Reduce risks and cost of adoption Lower the total cost to administer and provision Integrate with your existing ecosystem
© Hortonworks Inc. 2012 Next Steps? Expert role based training Course for admins, developers and operators Certification program Custom onsite options Page 24 Download Hortonworks Data Platform hortonworks.com/download 1 2 Use the getting started guide hortonworks.com/get-started 3 Learn more… get support Full lifecycle technical support across four service levels Delivered by Apache Hadoop Experts/Committers Forward-compatible Hortonworks Support hortonworks.com/traininghortonworks.com/support
© Hortonworks Inc Hortonworks Page 1. © Hortonworks Inc Big Data Changes the Game Megabytes Gigabytes Terabytes Petabytes Purchase detail.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Microsoft Ignite /28/2017 6:07 PM
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
AZURE DISTRIBUTED DATA Storage, HDInsight Hadoop, Azure Data Lake.
Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Business Intelligence: The Next Big Thing (Really!) John Bair CTO, Ajilitee Sep 14, 2012 Presented to TDWI St. Louis Chapter.
© 2011 IBM Corporation Smarter Software for a Smarter Planet The Capabilities of IBM Software Borislav Borissov SWG Manager, IBM.
Lecture-9/ T. Nouf Almujally
©2011 SAP AG. All rights reserved.1 Confidential SAP Sybase IQ– Big Data Opportunity Tele Cheat Sheet Topic, Audience, Industry Topic/ Product: SAP Sybase.
R and HDInsight in Microsoft Azure
Microsoft Partner since 2011
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
© 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche.
Course : Study of Digital Convergence. Name : Srijana Acharya. Student ID : Date : 11/28/2014. Big Data Analytics and the Telco : How Telcos.
© 2017 SlidePlayer.com Inc. All rights reserved.