Big thanks to everyone!.

Slides:



Advertisements
Similar presentations
A Ridiculously Easy & Seriously Powerful SQL Cloud Database Itamar Haber AVP Ops & Solutions.
Advertisements

Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Real-Time Big Data Use Cases John Leach CTO, Splice Machine.
Web Applications Development Using Coldbox Platform Eddie Johnston.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Running Hadoop-as-a-Service in the Cloud
Distributed Systems Architectures
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
1 SWE Introduction to Software Engineering Lecture 22 – Architectural Design (Chapter 13)
CprE 458/558: Real-Time Systems
Real-time Stream Processing Architecture for Comcast IP Video
Client/Server Architectures
Apache Spark and the future of big data applications Eric Baldeschwieler.
Tyson Condie.
12 Copyright © 2007, Oracle. All rights reserved. Database Maintenance.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
IT 456 Seminar 5 Dr Jeffrey A Robinson. Overview of Course Week 1 – Introduction Week 2 – Installation of SQL and management Tools Week 3 - Creating and.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
1 © Cloudera, Inc. All rights reserved. Engines, Algorithms, and Data Models Josh Wills | Senior Director of Data Science From Dimensional Modeling to.
Features Of SQL Server 2000: 1. Internet Integration: SQL Server 2000 works with other products to form a stable and secure data store for internet and.
Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.
Chapter 1 Database Access from Client Applications.
Stream Processing with Tamás István Ujj
Dato Confidential 1 Danny Bickson Co-Founder. Dato Confidential 2 Successful apps in 2015 must be intelligent Machine learning key to next-gen apps Recommenders.
Ignite in Sberbank: In-Memory Data Fabric for Financial Services
Microsoft Ignite /28/2017 6:07 PM
András Benczúr Head, “Big Data – Momentum” Research Group Big Data Analytics Institute for Computer.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
DATA Storage and analytics with AZURE DATA LAKE
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
TensorFlow– A system for large-scale machine learning
Table spaces.
Managing Multi-User Databases
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Streaming Analytics & CEP Two sides of the same coin?
Introduction to Spark Streaming for Real Time data analysis
Introduction to Distributed Platforms
Some practical information
ITCS-3190.
Curator: Self-Managing Storage for Enterprise Clusters
Scaling Apache Flink® to very large State
Spark Presentation.
Streaming Analytics with Apache Flink 1.0
Stream Analytics with SQL on Apache Flink®
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Data Platform and Analytics Foundational Training
Apache Flink and Stateful Stream Processing
QCon.ai, San Francisco April, 11th 2018
ETL Architecture for Real-Time BI
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Capitalize on modern technology
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
Microsoft Build /8/2018 5:15 AM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Introduction to Spark.
Tiers vs. Layers.
Software models - Software Architecture Design Patterns
Architecture for Real-Time ETL
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Technical Capabilities
Traditional Virtualized Infrastructure
CMPE 135: Object-Oriented Analysis and Design March 14 Class Meeting
Streaming data processing using Spark
COS 518: Advanced Computer Systems Lecture 12 Michael Freedman
Setting up PostgreSQL for Production in AWS
Twister2 for BDEC2 Poznan, Poland Geoffrey Fox, May 15,
Presentation transcript:

Big thanks to everyone!

The convergence of real-time analytics and event-driven applications @StephanEwen Flink Forward San Francisco April 11, 2017

2016 was the year when streaming technologies became mainstream 2017 is the year to realize the full spectrum of streaming applications

Some large scale streaming applications

@ Detecting fraud in real time As fraudsters get better, need to update models without downtime Live 24/7 service Credit card transactions Notifications and alerts Evolving fraud models built by data scientists

@ Athena X SQL to define metrics Thresholds and actions to trigger Blends analytics and actions Streams from Hadoop, Kafka, etc SQL, thresholds, actions Analytics Alerts Derived streams

@ Route events to Kafka, ES, Hive Complex interaction sessions rules Mix of stateless / small state / large state Stream Processing as a Service Launching, monitoring, scaling, updating DSL to define jobs

@ Blink based on Flink A core system in Alibaba Search Machine learning, search, recommendations A/B testing of search algorithms Online feature updates to boost conversion rate Alibaba is a major contributor to Flink Contributing many changes back to open source

@ Complete social network implemented using event sourcing and CQRS (Command Query Responsibility Segregation)

What can we learn from these? All these applications run on Flink  Applications, not just analytics Not just finding out what the data means but acting on that at the same time Workloads going beyond the traditional Hadoop realm Hadoop is possible deploy, source, and sink Container engines and other storage systems increasingly popular with Flink

So, what is data streaming? First wave for streaming was lambda architecture Aid batch systems to be more real-time Second wave was analytics (real time and lag-time) Based on distributed collections, functions, and windows The next wave is much broader: A new architecture for event-driven applications

Event–driven applications

Event–driven applications Stateful, event-driven, event-time-aware processing Stream Processing Event-driven Applications (streams, windows, …) (event sourcing, CQRS, …) Batch Processing (data sets)

Events, State, Time, and Snapshots f(a,b) Event-driven function executed distributedly

Events, State, Time, and Snapshots Maintain fault tolerant local state similar to any normal application f(a,b)

Events, State, Time, and Snapshots wall clock f(a,b) Access and react to notions of time and progress, handle out-of-order events event time clock

Events, State, Time, and Snapshots Snapshot point-in-time view for recovery, rollback, cloning, versioning, etc. wall clock f(a,b) event time clock

Event–driven applications Stateful, event-driven, event-time-aware processing Stream Processing Event-driven Applications (streams, windows, …) (event sourcing, CQRS, …) Batch Processing (data sets)

The APIs Analytics Stream SQL Stream- & Batch Processing Table API (dynamic tables) DataStream API (streams, windows) Stateful Event-Driven Applications Process Function (events, state, time)

Process Function class MyFunction extends ProcessFunction[MyEvent, Result] { // declare state to use in the program lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext().getState(…) def processElement(event: MyEvent, ctx: Context, out: Collector[Result]): Unit = { // work with event and state (event, state.value) match { … } out.collect(…) // emit events state.update(…) // modify state // schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) } def onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[Result]): Unit = { // handle callback when event-/processing- time instant is reached

Data Stream API val lines: DataStream[String] = env.addSource( new FlinkKafkaConsumer09<>(…)) val events: DataStream[Event] = lines.map((line) => parse(line)) val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path))

Table API & Stream SQL

Streaming Architecture for Event-driven Applications

Compute, State, and Storage Classic tiered architecture Streaming architecture compute + compute layer application state database layer stream storage and snapshot storage (backup) application state + backup

Performance Classic tiered architecture Streaming architecture all modifications are local synchronous reads/writes across tier boundary asynchronous writes of large blobs

Consistency Classic tiered architecture Streaming architecture exactly once per state snapshot consistency across states =1 =1 distributed transactions at scale typically at-most / at-least once

Scaling a Service Classic tiered architecture Streaming architecture provision compute provision compute and state together separately provision additional database capacity

Rolling out a new Service Classic tiered architecture Streaming architecture provision compute and state together provision a new database (or add capacity to an existing one) simply occupies some additional backup space

Time, Completeness, Out-of-order Classic tiered architecture Streaming architecture event time clocks define data completeness ? event time timers handle actions for out-of-order data

Repair External State Streaming architecture backed up data (HDFS, S3, etc.) wrong results streams (lets say Kafka etc) live application external state

Repair External State Streaming architecture backed up data (HDFS, S3, etc.) application on backup input overwrite with correct results streams (lets say Kafka etc) live application external state

Repair External State Each service doubles as a batch job! Streaming architecture backed up date (HDFS, S3, etc.) application on backup input overwrite with correct results Each service doubles as a batch job! streams (lets say Kafka etc) live application external state

Streaming has outgrown the Hadoop Stack Event-driven applications and realtime analytics converge with Apache Flink Event-driven applications become easier to manage, faster, and more powerful following a streaming architecture implemented with Flink