Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.

Similar presentations


Presentation on theme: "Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014."— Presentation transcript:

1 Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise
OCJUG, 2014

2 Hadoop Ecosystem

3 Real-Time Insight with In-Memory ETL
Batch M I R O Intermediate Files RDBMS EDW MPP In Memory P O I Stream Data Accumulation 24 Hours Data Processing 8 Hours Application Complexity Map-Reduce Real-Time Event Driven Seconds Complex Processing

4

5

6 Enterprise Repositories
Visualize Business Analytics Business Intelligence Tools Visualization Tools SOURCE DATA Scalable Ingestion Enterprise Repositories RDBMS EDW NoSQL Analytics Alerts Hive Events Load Feed 1 Extract Transform XML Files Ad Hoc Query Feed 2 Load Load Feed…. Sensor data HDFS Raw Archive Feed 400 MS Queue’s Social Data Access Service Databases Feed Consumers/ Applications Scale Out

7 Stream Processing A Stream is a sequence of data events with schema
1 2 4 3 6 5 A Stream is a sequence of data events with schema An Operator takes input streams and compute output streams Each Operator is YOUR business logic in java, or from our library An Application is a Directed Acyclic Graph (DAG)

8 DataTorrent Hadoop GRID
DT Console 4 1 dtCLI 3 6 2 5 Resource Manager DT Gateway NM NM NM NM MapReduce StrAM MapReduce MapReduce 3 1 MapReduce 4 6 MapReduce 2 5 MapReduce

9 DataTorrent Platform:
. High Performance Extreme Scalability Mission Critical Hadoop 2.0 Native Real-time data ingestion In-memory processing Billions of operations per second DataTorrent automatically scales out/in to changing loads Sub-second latency with linear scalability Complex big data applications Built-in Fault-tolerance 24/7 uptime guaranteed Update your application while it's running! Runs on your existing Apache Hadoop cluster. Develop faster and support any business logic with our open-source framework. Integrate seamlessly with your existing data flow.

10 DataTorrent YARN Interaction
DataTorrent is an java interface based API Default Implementation – Platform Custom Implementation – Application Development Platform components have various configuration properties Container Size (Hadoop Dependent) Operator Memory Max Number of Containers Locality of the Operators and Streams C-Group (Coming soon – Hadoop Dependent) Static and Dynamic Partitioning

11 Checkpointing Transparent, Distributed, and Asynchronous
Resource Requirements directly proportional to Size of the state Frequency of checkpointing Most operators have small (a few KB) state footprint Techniques to lower the cost Identify the state with minimum footprint Use external storage Incremental checkpoints Faster media Stateless Operators Less frequent Disable

12 DataTorrent vs Alternatives
Developed Ground-up to do Streaming natively in Hadoop Relieves Application Developers from Fault Tolerance High Performance yet Resource Friendly Linearly Scalable Hadoop Native and co-exists with other Hadoop Applications 500+ Open Source Operators UI Dashboard Widgets Preferred by Enterprises after Trying Alternatives Enterprise Grade Support

13 Real-Time Fault Tolerant Use Cases
Big Data ETL Offload Predictive Analytics Scalable Ingestion Operational Monitoring and Alerts Real-Time Business Actions Internet of Things Security

14 Demos malhar-users@googlegroups.com


Download ppt "Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014."

Similar presentations


Ads by Google