Presentation is loading. Please wait.

Presentation is loading. Please wait.

Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Similar presentations


Presentation on theme: "Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM."— Presentation transcript:

1

2

3

4

5

6 Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM WEB BIG DATA Offer details Support Contacts Customer Touches Segmentation Web logs Offer history A/B testing Dynamic Pricing Affiliate Networks Search Marketing Behavioral Targeting Dynamic Funnels User Generated Content Mobile Web SMS/MMS Sentiment External Demographics HD Video, Audio, Images Speech to Text Product/Service Logs Social Interactions & Feeds Business Data Feeds User Click Stream Sensors / RFID / Devices Spatial & GPS Coordinates Increasing Data Variety and Complexity Transactions + Interactions + Observations = BIG DATA

7 APPLICATIONS DATA SYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMSEDWMPP Business Analytics Custom Applications Packaged Applications Source: IDC 2.8 ZB in 2012 85% from New Data Types 15x Machine Data by 2020 40 ZB by 2020 OLTP, ERP, CRM Systems Unstructured documents, emails Clickstream Server logs Sentiment, Web Data Sensor. Machine Data Geolocation

8 OPERATIONS TOOLS Provision, Manage & Monitor DEV & DATA TOOLS Build & Test DATA SYSTEM REPOSITORIES SOURCES RDBMSEDWMPP OLTP, ERP, CRM Systems Documents, Emails Web Logs, Click Streams Social Networks Machine Generated Sensor Data Geolocation Data Governance & Integration SecurityOperations Data Access Data Management APPLICATIONS Business Analytics Custom Applications Packaged Applications OLTP, ERP, CRM Systems Unstructured documents, emails Clickstream Server logs Sentiment, Web Data Sensor. Machine Data Geolocation

9 SCALE SCOPE New Analytic Apps New types of data LOB-driven

10 SCALE SCOPE A Modern Data Architecture/Data Lake New Analytic Apps New types of data LOB-driven RDBMS MPP EDW Governance & Integration SecurityOperations Data Access Data Management Data Lake An architectural shift in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale

11

12 HDP 2.1 Hortonworks Data Platform Provision, Manage & Monitor Ambari (SCOM) Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume WebHDFS YARN : Data Operating System DATA MANAGEMENT SECURITY DATA ACCESS GOVERNANCE & INTEGRATION Authentication Authorization Accounting Data Protection Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox OPERATIONS Script Pig Search Solr SQL Hive/Tez, HCatalog NoSQL HBase Stream Storm Others In-Memory Analytics, ISV engines 1°°°°°°°°° °°°°°°°°°° °°°°°°°°°° ° ° N HDFS (Hadoop Distributed File System) Batch Map Reduce Deployment Choice LinuxWindowsOn-PremiseCloud Hortonworks Data Platform (HDP) The Only Completely Open Distribution for Apache Hadoop Fundamentally Versatile and Comprehensive enterprise capabilities Wholly Integrated for deep ecosystem interoperability

13 HDP certifies most recent & stable community innovation Hortonworks Data Platform Solr Hadoop &YARN Pig Tez Hive & HCatalog HBase Sqoop Oozie Zookeeper Mahout Ambari Storm Flume Knox Phoenix 2.2.0 1.1.2 0.11.0 0.12.0 HDP 1.3 May 2013 2.4.0 0.12.1 HDP 2.0 October 2013 HDP 2.1 April 2014 SecurityOperations Data Access Data Management 0.13.0 0.94.6 0.96.1 0.98.0 0.9.1 0.7.0 0.8.0 0.9.0 4.7.2 1.4.3 1.4.4 1.3.1 1.4.0 1.2.5 1.4.4 1.5.1 3.3.2 4.0.0 3.4.5 0.4.0 4.0.0 Falcon 0.5.0 Governance & Integration

14 SOURCES APPLICATIONS OPERATIONAL TOOLS DEV & DATA TOOLS INFRASTRUCTURE xΩxΩ xΩxΩ a DATA SYSTEM HDInsight Azure New! Power BI

15 Traditional Database SCALE (storage & processing) Hadoop Platform NoSQL MPP Analytics EDW schema speed governance best fit use processing Required on write Required on read Reads are fast Writes are fast Standards and structured Loosely structured Limited, no data processing Processing coupled with data data types Structured Multi and unstructured Interactive OLAP Analytics Complex ACID Transactions Operational Data Store Data Discovery Processing unstructured data Massive Storage/Processing

16 All offerings co-engineered by Hortonworks and Microsoft Enjoy seamless interoperability across on-premises and cloud

17

18 DATA ACCESS YARN : Data Operating System DATA MANAGEMENT 1°°°°°°°°° °°°°°°°°°° °°°°°°°°°° ° ° N HDFS (Hadoop Distributed File System) Script Pig Search Solr SQL Hive/Tez, HCatalog NoSQL HBase Accumulo Stream Storm Others In-Memory Analytics, ISV engines Batch Map Reduce

19 Single Use System Batch Apps Multi Use Data Platform Batch, Interactive, Online, Streaming, … 1 st Gen of Hadoop HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) Redundant, Reliable Storage (HDFS) Efficient Cluster Resource Management & Shared Services (YARN) Flexible Data Processing Hive, Pig, others… Batch MapReduce Batch & Interactive Tez Online Data Processing HBase, Accumulo Stream Processing Storm others … 2 nd Gen of Hadoop Classic Hadoop Apps

20 NodeManager map 1.1 vertex 1.2.2 NodeManager map 1.2 reduce 1.1 Batch vertex 1.1.1 vertex 1.1.2 vertex 1.2.1 Interactive SQL ResourceManager Scheduler Real-Time nimbus 0 nimbus 1 nimbus 2

21 Business Analytics Custom Apps Apache YARN Apache MapReduce 1 ° ° ° ° ° ° ° ° ° ° ° ° ° N Apache Tez Apache Hive SQL ° ° ° ° ° ° HDFS (Hadoop Distributed File System) Apache Hive Contribution… an Open Community at its finest 1,672 Jira Tickets Closed 145 Developers 44 Companies ~390,000 Lines Of Code Added… (2x) 13 Months

22 Replaces MapReduce as primitive for Hive, Pig, etc Task with pluggable Input, Processor and Output Tez Task - Task Processor InputOutput

23 Hive – MRHive – Tez SELECT a.state JOIN (a, c) SELECT c.price SELECT b.id JOIN(a, b) GROUP BY a.state COUNT(*) AVERAGE(c.price) MMM R R MM R MM R M M R HDFS MMM R R R MM R R SELECT a.state, c.itemId JOIN (a, c) JOIN(a, b) GROUP BY a.state COUNT(*) AVERAGE(c.price) SELECT b.id SELECT a.state, COUNT(*), AVERAGE(c.price) FROM a JOIN b ON (a.id = b.id) JOIN c ON (a.itemId = c.itemId) GROUP BY a.state Tez avoids unneeded writes to HDFS

24 Hive SQL DatatypesHive SQL Semantics INTSELECT, INSERT TINYINT/SMALLINT/BIGINTGROUP BY, ORDER BY, SORT BY BOOLEANJOIN on explicit join key FLOATInner, outer, cross and semi joins DOUBLESub-queries in FROM clause STRINGROLLUP and CUBE TIMESTAMPUNION BINARYWindowing Functions (OVER, RANK, etc) DECIMALCustom Java UDFs ARRAY, MAP, STRUCT, UNIONStandard Aggregation (SUM, AVG, etc.) DATEAdvanced UDFs (ngram, Xpath, URL) VARCHARSub-queries for IN/NOT IN, HAVING CHARExpanded JOIN Syntax INTERSECT / EXCEPT Hive 0.12 (HDP 2.0) Hive 0.11 Hive 0.13 (HDP 2.1) SQL Compliance Hive provides a wide array of SQL datatypes and semantics so your existing tools integrate more seamlessly with Hadoop

25

26 Disaster Recovery and Backup between environments Publishing data between environments for Discovery Site to Site Site to Cloud

27 Define sophisticated retention policies Simplify data retention for audit, compliance, or for data re-processing Staged Data Retain 5 Years Cleansed Data Retain 3 Years Conformed Data Retain 3 Years Presented Data Retain Last Copy Only

28

29 HDFS (Hadoop Distributed File System) ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° MapReduce Indexing Job

30

31

32 Enterprise Identity Provider LDAP/AD Enterprise Identity Provider LDAP/AD Identity Providers Knox Gateway GWGW DMZ A stateless reverse proxy instance deployed in DMZ Firewall HDP Cluster 1 Masters JT NN Web HCat Oozie YARN HBase Hive DN TT HDP Hadoop Cluster 2 Masters JT NN Web HCat Oozie YARN HBase Hive DN TT -Requests streamed through GW to Hadoop services after auth. -URLs rewritten to refer to gateway -Requests streamed through GW to Hadoop services after auth. -URLs rewritten to refer to gateway Firewall REST Client JDBC Client Browser

33 Ambari: Deploy, Manage, Monitor AMBARI WEB compute & storage.......... PROVISION MANAGE MONITOR REST APIs AMBARI SERVER PROVISION | MANAGE | MONITOR

34 Ambari SCOM Mgmt Pack HADOOP Storage & Process at Scale Ambari SCOM Server Ambari SCOM Server aggregates + exposes Hadoop metrics Ambari SCOM monitors health + alerts in case of problems

35

36

37

38

39 www.microsoft.com/learning http://microsoft.com/msdn http://microsoft.com/technet http://channel9.msdn.com/Events/TechEd

40

41

42


Download ppt "Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM."

Similar presentations


Ads by Google