Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intel® Distribution for Apache Hadoop* Ram Lakshminarayan Asia Pac – BDM Datacenter.

Similar presentations


Presentation on theme: "Intel® Distribution for Apache Hadoop* Ram Lakshminarayan Asia Pac – BDM Datacenter."— Presentation transcript:

1 Intel® Distribution for Apache Hadoop* Ram Lakshminarayan Asia Pac – BDM Datacenter

2 Other brands and names are the property of their respective owners. From the dawn of civilization until 2003, we humans created 5 Exabyte of information. Now we create that same amount of information in two days! In 2012, the digital universe of data will expand to 2.72 zettabytes (ZB). Then it’s predicted to double every two years.

3 Other brands and names are the property of their respective owners. What is Big Data? 3 Datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze * Unstructured volume, variety, value and velocity *”Big data: The next frontier for innovation, competition, and productivity”, McKinsey Global Institute Time Volume Structured (relational) data Unstructured (multi-structured) data Intelligent Transportation System (Shanghai) Volume: massive scale & growth Variety: many different forms Value: predictive analytics Velocity: near-realtime processing Logs/records: 9TB/day Image: 900TB/day Video: 3PB/day Near realtime image/video processing needed Near realtime queries required Deep, complex analysis for traffic prediction, criminal detection, …

4 Other brands and names are the property of their respective owners. Big Data usage across industries Education Financial Services

5 Other brands and names are the property of their respective owners. Big Data opportunity, a vertical industry view Source: Gartner

6 Other brands and names are the property of their respective owners. Hadoop Introduction Source: http://blog.spec-india.com Source: http://www.bodhtree.comhttp://www.bodhtree.com Hadoop is: A flexible, extensible open source framework Hadoop includes: Storage (HDFS) No SQL database (Hbase) Distributed compute (Map Reduce) Plus more utilities

7 Other brands and names are the property of their respective owners. Responsive Energy Efficient High Availability Secure Intel’s Foundational Technologies Offer Advanced Solutions for Big data Analytics Choice Big Data Building Blocks Intelligent Storage 1 Scale-out Storage 1 Scale-up Storage 1 Intel ® SSD 710 series, DC S3700 (SATA) Intel ® SSD 910 series (PCIe) Intel ® Ethernet Controllers Intel ® Ethernet Adapters Intel ® Ethernet Switch Silicon Intel ® True Scale Fabric ComputeNetworkStorage Intel ® Distribution for Apache Hadoop Intel ® Data Center Manager Intel ® Node Manager Intel ® Expressway Service Gateway Intel ® Cache Acceleration Software Intel’s Lustre Intel ® VT and Intel ® TXT Intel ® AES-NI Software & Technologies Intel ® Xeon ® Product Family E3-E5-E7 Intel ® Atom™ Intel ® Xeon Phi TM Xeon-based storage systems are available in a wide range of configuration options from the industry’s leading storage vendors 7 What is in it for us?

8 Other brands and names are the property of their respective owners. Accelerating big data analytics through faster and more effective CPU, Storage, I/O, Network platform. Driving innovation in big data applications by providing optimized software stack and services. Foster the growth of big data ecosystem through broad collaboration with partners. Intel’s Role in Big Data Investing in Solution Research and Services for Big Data

9 Other brands and names are the property of their respective owners. Intel ® Distribution for Apache Hadoop What did we launch…? Focus on near real-time analytics w/ HBase & Hive enhancements Access control, encryption, secure data movement Job throughput efficiency for HDFS Dynamic replication for HDFS & HBase Intel optimized total solution architecture -distro, storage, network, compute Intel Supported Distribution Subscription Open Source Optimized Intel IA/Distro 5X Performance for Real-time jobs HBase as the data store. Query all CDR in month − Inserting 10000 records/second/server − Read from disk: >400 query/second/server Intel ® Manager for Hadoop* Software Deployment, Configuration, Monitoring, Alerting and Security HDFS* Hadoop Distributed File System MapReduce Distributed Processing Framework Hbase* Columnar Storage Zookeeper* Coordination Flume Log Collector Sqoop Data Exchange Pig* Scripting Hive* SQL-Like Query Oozie* Workflo w Mahout* Data Mining R- connect or

10 Other brands and names are the property of their respective owners. Intel ® Manager for Apache Hadoop Compatible with Intel or Other Popular Distributions Quick cluster/node deployment Tab navigate between components Node Guided wizards, tasks, workflows Single pane config for MapReduce fair or capacity scheduling Tuning controls for HBase data

11 Other brands and names are the property of their respective owners. Intel IA Architecture Performance Management Cloud Enablement Providing cross-stack optimizations using Hadoop as lead vehicle and open source as adoption driver Driving The Key Pillars for Big Data Flash Storage Caching & Non-volatile Memory Throughput Distributed Tables Across Data Centers Snapshots File based encryption MapReduce Jobs Access Control List at cell level SSE Instruction Sets Infiniband AES-NI Encryption HDFS Cross Data Center Replication Security Archival for cold data on HDFS OS Kernel cachingHot file replication API AuthN Data Movement NETWORK STORAGE COMPUTE Ensuring Scale-out architectures work best on Intel platforms

12 Other brands and names are the property of their respective owners. Intel Platform Benefits for Big Data TeraSort for 1TB Data - > 4 Hours to 7 Minutes Intel® Xeon® E5-2690 processor Intel® SSD 520 Series Intel® 10GbE Adapters Deploy Intel Distribution for Apache Hadoop* Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Intel Internal testing For more information go to : intel.com/performance ` >4 Hours ~7 mins

13 Other brands and names are the property of their respective owners. Government - Smart Traffic Intelligent Transport System Hadoop for Predictive Analytics 13 Crime prevention, Info sharing, Predictive Traffic Analytics Machine Generated Data: Embedded HBase client in camera for real-time inserts of structured/unstructured data 30000 + camera data collection points 2 billion HBase records Petabytes of traffic data Terabytes of images 1 week of Data mining Results: Automated queries for traffic violation Crime Prevention: ID fake licenses <1 minute Traffic Routing App Servers Regional Data Collection Distributed Processing Across District Nodes Derived Analytics Services Crime Prevention Citizen Traffic Services

14 Other brands and names are the property of their respective owners. Telco- China Mobile Group Guangdong Hadoop & Xeon optimized Big Data storage & analytics Challenge: Deliver real time access to Call Data Records (CDR) for billing self service Solution: Chose Hadoop + Xeon over RDMS to remove data access bottlenecks, increase storage, and scale system Benefits: Lower TCO, 30x performance increase, stable operation, analytics on subscriber usage for targeted promotions Data Characteristics: 30TB billing data/month Real-time retrieval of 30 days CDRs 300k records/second, 800k insert speed/sec 15 analytics queries 133 server nodes Analytics

15


Download ppt "Intel® Distribution for Apache Hadoop* Ram Lakshminarayan Asia Pac – BDM Datacenter."

Similar presentations


Ads by Google