Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Similar presentations


Presentation on theme: "SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D."— Presentation transcript:

1 SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D

2 Index SAMSON Platform File-System Streaming MapReduce Eco-system Architecture 01 02 03 04 05

3 SAMSON Platform 01

4 Telefónica I+D Overview  Samson is a distributed processing engine especially designed for efficient analytics of stream-processing.  Internal distributed file-system optimized for shared data processing  Provides an extension of the MapReduce framework More efficient MapReduce Joins  Streaming MapReduce that allows for the incremental processing of data feeds  Uses existing BigData Storage solutions such as Apache HDFS or MongoDB for fetching and storing data  Built to be deployed on Ubuntu, Redhat Linux and in virtual machines 3

5 Telefónica I+D Samson 0.6 DEBSamson 0.6 RPMUser guideAvailable for VM

6 Telefónica I+D SAMSON Platform Architecture 5

7 Telefónica I+D Key Platform Components  File-system  Streaming MapReduce  Eco-system  Architecture 6

8 Telefónica I+D File-system 02

9 Telefónica I+D HD Cores HD Distributed big-data platform for high-performance Processing over unbounded streams of data SAMSON distributed file - system MapReduce for streamed processing Cores

10 Telefónica I+D SAMSON distributed file - system Apache Hadoop HDFSSAMSON Data is uniformly distributedBinary data is distributed by key Large heterogeneous clusters Non uniform datasets Homogenous cluster Uniform data-sets Maximum resource usageEfficient accumulation operations Efficient JOIN operations

11 Telefónica I+D We periodically receive a set of documents. We want to compute the accumulated word-count each time we receive an update. First input Second input Third input Reduce MapReduce 6 12 MapReduce 18 Redistribution 6 6 6 SAMSON distributed file - system

12 Telefónica I+D Streaming MapReduce 03

13 Telefónica I+D SAMSON DELILAH 4 Gb Upload Download Run operations Upload Download Run operations

14 Telefónica I+D SAMSON DELILAH Upload new operations & data types Run operations Open API for 3rd party developers

15 Telefónica I+D SAMSON

16 Telefónica I+D Operation SAMSON Map

17 Telefónica I+D Operation SAMSON State Input Output Reduce

18 Telefónica I+D Eco-system 04

19 Telefónica I+D 18 Top level view… samsonPush delilahsamsonPop samsonClient delilahsamsonPop Console-based client Upload data Download data Run commands Platform monitor samsonClient delilah samsonPush samsonPop samsonClient Binaries to stream data into and out of SAMSON C++ library to develop new plugins Examples: samsonPush samsonPop module 3rd Party C++ shared library New data types New operations Tools provided for simplified development!!

20 Telefónica I+D 19 Delilah client delilah

21 Telefónica I+D 20 SAMSON Module example… Module simple_mobility { title "Simple mobility example" author "Andreu Urruela" version "0.1.1" } data UserArea { system.String name; system.UInt x; system.UInt y; system.UInt radius; } data Position { system.UInt x; system.UInt y; system.TimeUnix time; } … parser parser_cdrs { out system.UInt simple_mobility.Position helpLine "Parse input CDRs to get user-position" } class parser_cdrs : public samson::system::SimpleParser { std::vector words; // Vector used to store words parsed at each line void parseLine( char * line, samson::KVWriter *writer ) { // Split line in words split_in_words( line, words ); // Expected format USER_ID CDR X Y time if( words.size() < 5 ) return; // No content for a valid instruction if( strcmp( words[1], "CDR" ) != 0 ) return; // Non valid format // Set the key key.value = atoll( words[0] ); // Set the position value.set( atoll( words[2] ), atoll( words[3] ), atoll( words[4] ) ); // Emit the key-value writer->emit( 0, &key, &value ); } }; module

22 Telefónica I+D 21 SAMSON ecosystem tools ToolDescription samsonModuleBootstrapCreate necessary files to start developing a new module samsonModuleParserCreate.h.cpp files from a description file samsonCatVisualize binary content of SAMSON samsonSetupEdit setup of a SAMSON system samsonStarterSetup a new SAMSON cluster samsonDataShow a transactional log for debugging samsonPushPush data from STDIN to a particular queue samsonPopPop data from a particular queue to STDOUT samsonLocalEmmulate a local cluster for fast development of modules

23 Telefónica I+D File system Stream MapReduceEcosystem Architecture Demo

24 Telefónica I+D Architecture 05

25 Telefónica I+D 24 Communication Protocols… delilah Platform messages Flexibility Back compatibility Data serialization Maximum data compression ( no field separator ) Best for fast-sequential processing Easy job distribution Proprietary serialization format Monitoring No recompilation needed Best tools for querying ( XPATH ) GoalSolutionWhy ?

26 Telefónica I+D 25 Worker Disk Manager Memory Manager Process Manager --- cores -- Runtime engine and notification system Engine library Independent development Network Manager

27 Telefónica I+D 26 Disk Manager // Network Manager Controller to access local disk and network connections Asynchronus notifications using engine notification system If required multiple threads are used Memory Manager: ( our retain-release model ….. similar to Objective-C ) Simple system to control memory usage Used to optimize memory allocation when under heavy load Process Manager ( similar to Apple’s Grand Central dispatch library ) System to control independent “heavy” task to be executed Automatic creation / destruction of threads Optional “fork” mode with shared-memory system to get output Runtime Engine & Notification system Inspired in message-passing system implemented in Objective-C Single loop to run all state-update operations Thread protection to interact with Disk/Network/Memory/Process Managers Engine library

28 Telefónica I+D 27 Worker Engine library Independent development multi-core Block Manager Disk – Memory balancer Disk Manager Memory Manager Process Manager --- cores -- Runtime engine and notification system Network Manager

29 Telefónica I+D 28 Maintains a reference of all blocks of data contained in a Worker ( in disk or memory ) It keeps a sorted list based on when they will be used ( future operations ) Low priority blocks are flushed to disk first High priority blocks are loaded from disk first Connected to the Disk Manager inside the engine using the EngineNotificationSystem Block Manager Schedule write operations To DiskManager Schedule read operations To DiskManager Important: Since the order of blocks changes continuously based on the scheduling of new processing operations, the Block Manager is made aware of the new order and is able to react accordingly.

30 Telefónica I+D 29 Worker Engine library Independent development multi-core Block Manager Disk – Memory balancer Queues Manager txt_cdrs cdrs Stream Operations Manager Operation A Operation B Input data users Operation C priority Disk Manager Memory Manager Process Manager --- cores -- Runtime engine and notification system Network Manager

31 Telefónica I+D 30 Contains reference to all the blocks contained in queues and stream operations Both systems are connected to Block Manager to inform about the priority of blocks Stream Operation Manager is connected with ProcessManager to schedule 3 rd party operations at Engine Subsystem Queue & Stream Operations Manager Queues Manager txt_cdrs cdrs Stream Operations Manager Operation A Operation B users Operation C priority

32


Download ppt "SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D."

Similar presentations


Ads by Google