Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash Speaker : Haiping Wang

Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Agenda »Data at Facebook »Realtime Requirements »Data Freeway System Overview »Realtime Components Calligraphus/Scribe HDFS use case and modifications Calligraphus: a Zookeeper use case ptail Puma »Future Work

Big Data, Big Applications / Data at Facebook »Lots of data More than 500 million active users 50 million users update their statuses at least once each day More than 1 billion photos uploaded each month More than 1 billion pieces of content (web links, news stories, blog posts, notes, photos, etc.) shared each week Data rate: over 7 GB / second »Numerous products can leverage the data Revenue related: Ads Targeting Product/User Growth related: AYML, PYMK, etc Engineering/Operation related: Automatic Debugging Puma: streaming queries

Example: User related Application »Major challenges: Scalability, Latency

Realtime Requirements Scalability: 10-15 GBytes/second Reliability: No single point of failure Data loss SLA: 0.01% Loss due to hardware: means at most 1 out of 10,000 machines can lose data Delay of less than 10 sec for 99% of data Typically we see 2s Easy to use: as simple as tail –f /var/log/my-log-file

Data Freeway System Diagram »Scribe & Calligraphus get data into the system »HDFS at the core »Ptail provides data out »Puma is a emerging streaming analytics platform

Scribe Scalable distributed logging framework Very easy to use: scribe_log(string category, string message) Mechanics: Built on top of Thrift Runs on every machine at Facebook, Collect the log data into a bunch of destinations Buffer data on local disk if network is down History: 2007: Started at Facebook 2008 Oct: Open-sourced

Calligraphus »What Scribe-compatible server written in Java Emphasis on modular, testable code-base, and performance »Why? Extract simpler design from existing Scribe architecture Cleaner integration with Hadoop ecosystem HDFS, Zookeeper, HBase, Hive »History In production since November 2010 Zookeeper integration since March 2011

HDFS : a different use case »Message hub Add concurrent reader support and sync Writers + concurrent readers a form of pub/sub model

HDFS : add Sync »Sync Implement in 0.20 (HDFS-200) Partial chunks are flushed Blocks are persisted Provides durability Lowers write-to-read latency

HDFS : Concurrent Reads Overview »Without changes, stock Hadoop 0.20 does not allow access to the block being written »Need to read the block being written for realtime apps in order to achieve < 10s latency

HDFS : Concurrent Reads Implementation 1. DFSClient asks Namenode for blocks and locations 2. DFSClient asks Datanode for length of block being written 3. opens last block

Calligraphus: Log Writer Calligraphus Servers HDFS Scribe categories Server Category 1 Category 2 Category 3 How to persist to HDFS?

Calligraphus (Simple) Calligraphus Servers HDFS Scribe categories Number of categories Number of servers Total number of directories x = Server Category 1 Category 2 Category 3

Calligraphus Servers HDFS Scribe categories Number of categories Total number of directories = Category 1 Category 2 Category 3 Router Writer Calligraphus (Stream Consolidation) ZooKeeper

ZooKeeper: Distributed Map »Design ZooKeeper paths as tasks (e.g. /root/ / ) Cannonical ZooKeeper leader elections under each bucket for bucket ownership Independent load management – leaders can release tasks Reader-side caches Frequent sync with policy db A A 1 1 5 5 2 2 3 3 4 4 B B 1 1 5 5 2 2 3 3 4 4 C C 1 1 5 5 2 2 3 3 4 4 D D 1 1 5 5 2 2 3 3 4 4 Root

ZooKeeper: Distributed Map »Real-time Properties Highly available No centralized control Fast mapping lookups Quick failover for writer failures Adapts to new categories and changing throughput

Distributed Map: Performance Summary »Bootstrap (~3000 categories) Full election participation in 30 seconds Identify all election winners in 5-10 seconds Stable mapping converges in about three minutes »Election or failure response usually <1 second Worst case bounded in tens of seconds

Canonical Realtime ptail Application Hides the fact we have many HDFS instances: user can specify a category and get a stream Check pointing Puma

Puma Overview »Realtime analytics platform »Metrics count, sum, unique count, average, percentile »Uses ptail check pointing for accurate calculations in the case of failure »Puma nodes are sharded by keys in the input stream »HBase for persistence

Puma Write Path

Puma Read Path »Performance Elapsed time typically 200-300 ms for 30 day queries 99 th percentile, cross-country, < 500ms for 30 day queries

Future Work »Puma Enhance functionality: add application-level transactions on Hbase Streaming SQL interface »Compression

Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash Speaker : Haiping Wang

Similar presentations

Presentation on theme: "Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash Speaker : Haiping Wang"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash Speaker : Haiping Wang

Similar presentations

Presentation on theme: "Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash Speaker : Haiping Wang"— Presentation transcript:

Similar presentations

About project

Feedback