Presentation is loading. Please wait.

Presentation is loading. Please wait.

Robert Metzger, Aljoscha Connecting Apache Flink® to the World: Reviewing the streaming connectors.

Similar presentations


Presentation on theme: "Robert Metzger, Aljoscha Connecting Apache Flink® to the World: Reviewing the streaming connectors."— Presentation transcript:

1 Robert Metzger, Aljoscha Krettek @rmetzger_ Connecting Apache Flink® to the World: Reviewing the streaming connectors

2 What to expect from this talk  Overview of all available connectors  Kafka connector internals  End-to-end exactly-once  Apache Bahir and the future of connectors  [Bonus] Message Queues and the Message Acknowledging Source 2

3 Connectors in Apache Flink® “Hello World, let’s connect” 3

4 Connectors in Flink 1.1 ConnectorSourceSinkNotes Streaming filesBoth source and sink are exactly-once Apache KafkaConsumers (sources) exactly-once Amazon KinesisConsumers (sources) exactly-once RabbitMQ / AMQPConsumers (sources) exactly-once ElasticsearchNo guarantees Apache CassandraExactly-once with idempotent updates Apache NifiNo guarantees RedisNo guarantees 4 There is also a Twitter Source and an ActiveMQ connector in Apache Bahir

5 Streaming connectors by activity Streaming connectors ordered by number of threads/mentions on the user@flink list:  Apache Kafka (250+) (since 0.7)  Apache Cassandra (38) (since 1.1)  ElasticSearch (34) (since 0.10)  File sources (~30) (since 0.10)  Redis (27) (since 1.0)  RabbitMQ (11) (since 0.7)  Kinesis (10) (since 1.1)  Apache Nifi (5) (since 0.10) 5 Date of evaluation 5.9.2016

6 The Apache Kafka Connector 6

7 Apache Kafka connector: Intro “Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.” 7 This page contains material copied from http://kafka.apache.org/documentation.html#introduction

8 Apache Kafka connector: Consumer  Flink has two main Kafka consumer implementations For Kafka 0.8 an implementation against the “SimpleConsumer” API of Kafka For Kafka 0.9+ we are using the new Kafka consumer (KAFKA-1326)  The producers are basically the same 8

9 Kafka 0.8 Consumer 9 Fetcher Thread Kafka Broker topicB:1 topicB:3 Kafka Broker topicA:3 topicB:6 topicB:5 Kafka Broker topicB:4 topicB:2 topicA:1 Kafka Broker topicA:2 topicB:0 topicA:0 Consumer Thread Fetcher Thread Consumer Thread topicA:2 topicB:0 topicA:0 topicB:1 topicB:3 topicB:4 topicB:2 topicA:1 topicA:3 topicB:6 topicB:5 Kafka Cluster Flink Cluster Each TaskManager has one Consumer Thread, coordinating Fetcher Threads for each Kafka broker TaskManager

10 Kafka 0.8 Broker rebalance 10 Fetcher Thread Kafka Broker topicB:1 topicB:3 Kafka Broker topicA:3 topicB:6 topicB:5 Kafka Broker topicB:4 topicB:2 topicA:1 Kafka Broker topicA:2 topicB:0 topicA:0 Consumer Thread Fetcher Thread Consumer Thread topicA:2 topicB:0 topicA:0 topicB:1 topicB:3 topicB:4 topicB:2 topicA:1 topicA:3 topicB:6 topicB:5 Kafka Cluster Flink Cluster The consumer is able to handle broker failures 1 Broker fails 2 Thread returns partitions

11 Kafka 0.8 Broker rebalance 11 Fetcher Thread Kafka Broker topicB:1 topicB:3 Kafka Broker topicA:3 topicB:6 topicB:5 Kafka Broker topicB:4 topicB:2 topicA:1 Kafka Broker topicA:2 topicB:0 topicA:0 Consumer Thread Fetcher Thread Consumer Thread topicA:2 topicB:0 topicA:0 topicB:1 topicB:3 topicB:4 topicB:2 topicA:1 topicA:3 topicB:6 topicB:5 Kafka Cluster Flink Cluster On a failure, the Consumer Thread re-assigns partitions and spawns new threads as needed 1 Broker fails 2 Thread returns partitions topicB:4 topicB:2 topicA:1

12 Kafka 0.8 Broker rebalance 12 Fetcher Thread Kafka Broker topicB:1 topicB:3 Kafka Broker topicA:3 topicB:6 topicB:5 Kafka Broker topicB:4 topicB:2 topicA:1 Kafka Broker topicA:2 topicB:0 topicA:0 Consumer Thread Fetcher Thread Consumer Thread topicA:2 topicB:0 topicA:0 topicB:1 topicB:3 topicA:3 topicB:6 topicB:5 Kafka Cluster Flink Cluster On a failure, the Consumer Thread re-assigns partitions and spawns new threads as needed 3 Kafka reassigns partitions topicB:4 topicB:2 topicA:1 topicB:2 topicB:4 topicA:1 topicB:2 Fetcher Thread topicB:4 topicA:1 topicB:4 topicB:2 topicA:1 4 Flink assigns partitions to existing or new threads

13 Kafka 0.9+ Consumer 13 Kafka Broker topicB:1 topicB:3 Kafka Broker topicA:3 topicB:6 topicB:5 Kafka Broker topicB:4 topicB:2 topicA:1 Kafka Broker topicA:2 topicB:0 topicA:0 Consumer Thread Kafka Cluster Flink Cluster New Kafka Consumer Magic TaskManager Since Kafka 0.9, the new Consumer API handles broker failures/rebalancing, offset committing, topic querying, …

14 Exactly-once for Kafka consumers  Mechanism is the same for all connector versions  Offsets to Zookeeper / Broker for group.id restart and external tools (at-least-once)  Offsets checkpointed for exactly-once with Flink state 14

15 abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 0 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 0, 0 This toy example is reading from a Kafka topic with two partitions, each containing “a”, “b”, “c”, … as messages. The offset is set to 0 for both partitions, a counter is initialized to 0.

16 abcdeabcde Flink Kafka Consumer Flink Map Operator a a counter = 0 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 1, 0 The Kafka consumer starts reading messages from partition 0. Message “a” is in-flight, the offset for the first consumer has been set to 1.

17 abcdeabcde Flink Kafka Consumer Flink Map Operator a a counter = 1 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 2, 1 a a b b Trigger Checkpoint at source Message “a” arrives at the counter, it is set to 1. The consumers both read the next records (“b” and “a”). The offsets are set accordingly. In parallel, the checkpoint coordinator decides to trigger a checkpoint at the source …

18 abcdeabcde Flink Kafka Consumer Flink Map Operator a a counter = 2 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 3, 1 a a b b offsets = 2, 1 c c The source has created a snapshot of its state (“offset=2,1”), which is now stored in the checkpoint coordinator. The sources emitted a checkpoint barrier after messages “a” and “b”.

19 abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 3 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 3, 2 a a b b offsets = 2, 1 counter = 3 c c b b The map operator has received checkpoint barriers from both sources. It checkpoints its state (counter=3) in the coordinator. At the same time, the consumers are further reading more data from the Kafka partitions.

20 abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 4 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 3, 2 a a offsets = 2, 1 counter = 3 c c b b Notify checkpoint complete The checkpoint coordinator informs the Kafka consumer that the checkpoint has been completed. It commits the checkpoints offsets into Zookeeper. Note that Flink is not relying on the Kafka offsets in ZK for restoring from failures

21 abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 4 Zookeeper/Broker offset partition 0: 2 offset partition 1: 1 Flink Checkpoint Coordinator Pending: Completed: offsets = 3, 2 a a offsets = 2, 1 counter = 3 c c b b Checkpoint in Zookeeper/ Broker The checkpoint is now persisted in Zookeeper. External tools such as the Kafka Offset Checker can see the lag of the consumer group.

22 abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 5 Zookeeper/Broker offset partition 0: 2 offset partition 1: 1 Flink Checkpoint Coordinator Pending: Completed: offsets = 4, 2 offsets = 2, 1 counter = 3 c c b b d d The processing further advances

23 abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 5 Zookeeper/Broker offset partition 0: 2 offset partition 1: 1 Flink Checkpoint Coordinator Pending: Completed: offsets = 4, 2 offsets = 2, 1 counter = 3 c c b b d d Failure Some failure has happened (such as worker failure)

24 abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 3 Zookeeper/Broker offset partition 0: 2 offset partition 1: 1 Flink Checkpoint Coordinator Pending: Completed: offsets = 2, 1 counter = 3 Reset all operators to last completed checkpoint The checkpoint coordinator restores the state at all the operators participating at the checkpointing. The Kafka sources start from offset 2 and 1, the counter’s value is 3.

25 abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 3 Zookeeper/Broker offset partition 0: 2 offset partition 1: 1 Flink Checkpoint Coordinator Pending: Completed: offsets = 3, 1 offsets = 2, 1 counter = 3 Continue processing … c c The system continues with the processing, the counter’s value is consistent across a worker failure.

26 End-to-End exactly once 26

27 Consistently move and process data 27 Process Transform Analyze Process Transform Analyze Exactly-once: Apache Kafka Kinesis RabbitMQ / ActiveMQ File monitoring Exactly-once: Rolling file sink With idempotent updates Apache Cassandra Elasticsearch Redis At-least-once (duplicates): Apache Kafka  Flink allows to move data between systems, keeping consistency

28 Continuous File Monitoring 28 Some FileSystem Monitoring task Periodic Querying Parallel file reader File Path Offset File Path Offset Records  The monitoring task checkpoints the last “modification time”  The file readers checkpoint the current file + offset and the list of pending files to read

29 Rolling / Bucketing File Sink  System time bucketing  Bucketing based on record data 29 Bucketing Operator 11:00 10:00 9:00 9 9 5 5 1 1 1 1 8 8 4 4 2 2 4 4 6 6 2 2 3 3 4 4 Bucketing Operator 8:00 0-4 5-9

30 Bucketing File Sink exactly-once  On Hadoop 2.7+, we call truncate() to remove invalid data on restore  On earlier versions, we’ll write a metadata file with valid offsets  Downstream consumers must take valid offset metadata into account 30

31 Kafka Producer: Avoid data loss  Apache Kafka does currently not provide the infrastructure to produce in an exactly-once fashion  By avoiding data-loss, we can guarantee at-least-once. 31 Flink Kafka Producer Kafka broker Kafka partition unacknowledged=7 On checkpoint, Flink calls flush() and waits for unack == 0  Guarantee that data has been written ACK

32 Apache Bahir and the future of connectors What’s next 32

33 Future of Connectors in Flink  Kafka 0.10 support, with timestamps  Dynamic scaling support for Kafka and other connectors  Refactor Kafka connector API 33

34 Apache Bahir™  Bahir is a community specialized in connectors, allowing faster releases independent of engine releases.  Apache Bahir™ has been created for providing community- contributed connectors a platform, following Apache governance.  The Flink community decided to move some of our connectors there. Kafka, Kinesis, streaming files, … will stay in Flink!  Flink connectors in Bahir: ActiveMQ, Redis, Flume sink, RethinkDB (incoming), streaming Hbase (incoming).  New connector contributions are welcome! 34 Disclaimer: The description of the Bahir community is my personal view. I am not a representative of the project.

35 Time for questions… 35

36 Connectors in Apache Flink  Ask me now!  Follow me on Twitter: @rmetzger_  Ask the Flink community on user@flink.apache.org user@flink.apache.org  Ask me privately on rmetzger@apache.org rmetzger@apache.org 36

37 Message Queues Exactly-once for 37

38 Message Queues supported by Flink 38  Traditional message queues have different semantics than Kafka, Kinesis, etc.  RabbitMQ Advanced Message Queuing Protocol (AMQP) Available in Apache Flink  ActiveMQ Java Message Service (JMS) Available in Apache Bahir (no release yet) Image source: http://www.instructables.com/id/Spark-Core-Photon-and-CloudMQTT/step1/What-is-Message-Queuing/

39 Message Queue Semantics 39 Flink RabbitMQ Source Offset Flink Kafka Consumer  In MQs, messages are removed once they are consumed  Replay not possible

40 Message Acknowledging  Once a checkpoint has been completed by all operators, the messages in the queue are acknowledged, leading to their removal from the queue. 40 id=8 id=7 id=6 id=5 id=4 id=3 id=2 id=1 Flink RabbitMQ Source Flink RabbitMQ Source Checkpoint 1: id=1 id=2 id=3 Checkpoint 1: id=1 id=2 id=3 Unconfirmed Checkpoint 2: id=4 id=5 id=6 Checkpoint 2: id=4 id=5 id=6 Unconfirmed Checkpoint 1 completed id=8 id=7 id=6 id=5 id=4 Flink RabbitMQ Source Flink RabbitMQ Source Checkpoint 1: id=1 id=2 id=3 Checkpoint 1: id=1 id=2 id=3 Confirmed Checkpoint 2: id=4 id=5 id=6 Checkpoint 2: id=4 id=5 id=6 Unconfirmed Message queue ACK id=1 ACK id=1 ACK id=2 ACK id=2 ACK id=3 ACK id=3

41 Message Acknowledging  In case of a failure, all the unacknowledged messages are consumed again 41 id=8 id=7 id=6 id=5 id=4 id=3 id=2 id=1 Flink RabbitMQ Source Flink RabbitMQ Source Checkpoint 1: id=1 id=2 id=3 Checkpoint 1: id=1 id=2 id=3 Unconfirmed Checkpoint 2: id=4 id=5 id=6 Checkpoint 2: id=4 id=5 id=6 Unconfirmed System failure Flink RabbitMQ Source Flink RabbitMQ Source Checkpoint 1: id=1 id=2 id=3 Checkpoint 1: id=1 id=2 id=3 Message queue Unconfirmed id=8 id=7 id=6 id=5 id=4 id=3 id=2 id=1 Message are not lost and send again after recovery

42 Message Acknowledging  What happens if the system fails after a checkpoint is completed, but before all messages have been acknowledged? 42 Checkpoint 1 completed id=8 id=7 id=6 id=5 id=4 Flink RabbitMQ Source Flink RabbitMQ Source Checkpoint 1: id=1 id=2 id=3 Checkpoint 1: id=1 id=2 id=3 Confirmed Checkpoint 2: id=4 id=5 id=6 Checkpoint 2: id=4 id=5 id=6 Unconfirmed ACK id=1 ACK id=1 ACK id=2 ACK id=2 ACK id=3 ACK id=3 FAIL  Flink stores a correlation ID of each (un-acked) message to de-duplicate on restore id=3


Download ppt "Robert Metzger, Aljoscha Connecting Apache Flink® to the World: Reviewing the streaming connectors."

Similar presentations


Ads by Google