Presentation is loading. Please wait.

Presentation is loading. Please wait.

System S lead: Nagui Halim

Similar presentations


Presentation on theme: "System S lead: Nagui Halim"— Presentation transcript:

1 System S lead: Nagui Halim
CLASP: Collaborating, Autonomous Stream Processing Systems Including a Brief Survey of “System S” – an Approach to Scale Out for Stream Computing Mike Branson, Fred Douglis, Brad Fawcett, Zhen Liu, Anton Riabov, Fan Ye System S lead: Nagui Halim Note that several of the slides in this talk are hidden. They might be used in a longer version of the talk. 9/20/2018

2 The Event-Driven World
IPv6 Map RFID-enabled Checkout VoIP, VoD, IMS, etc. Surveillance Intelligent Oil Field RFID-enabled Scanning A montage of applications in the event-driven space Emerging business needs in certain markets (program trading, fraud management, compliance, intelligent oil field, location based service, logistics, presence (SIP), and solution trends towards service-oriented architectures are driving the need to process intelligent data on a just-in-time basis Billing Security Fraud alerts Retrospective processing ENRON Chairman indicted…….. Fraud/Compliance Location Based Service (Traffic) Overview CLASP 9/20/2018

3 Latest Example Evaluation CLASP 9/20/2018

4 Stream Computing, Conceptual
Overview CLASP 9/20/2018

5 Stream Computing, Conceptual
… to web server 2 Listen to conversation (A,B) … to GSM player SGD PCP SAO PCP MU SPE SPA VBF JAE DSN DNV PSD MU JAE DSN DNV SPE SPA VBF DFE PSD SPM DFE PSD SPE SPA VBF DFE PSD … to web server 1 Overview CLASP 9/20/2018

6 Multi-site Flow Graphs
CLASP extends stream computing to multiple autonomous processing sites Increased scalability Resource reuse Increased availability Access to site-specific resources Site B Site B Site A Site A Site C Site C Overview CLASP 9/20/2018

7 Outline System S overview CLASP overview Virtual Organizations
Distributed execution Failover Performance evaluation Related Work Conclusions Overview CLASP 9/20/2018

8 Progressive Information Extraction
Packet fields Energy Pitch Speech/non-speech detection Compressed-Domain Speech Cepstrum Cepstrum f(.) Speaker detection, Speaker verification Raw Domain Speech Cepstrum raw Cepstrum Speech to text Volumetric Packet size Packet inter-arrival rate Activity detection, conversational pairing Processing complexity Overview CLASP 9/20/2018

9 Three PE Roles Source, Transform, and Sink – Properties related to position in application’s topology application specific protocol PE Class process(handle, payload) { } PE Container The Source PE transforms source data to one or more output streams of SDOs. The process method in the PE class executes continuously – reading source data, and writing SDOs. No input ports One or more output ports source data SDO Handler receive(handle, SDO) { } PE Container Transform PE The Transform PE consumes one or more streams of SDOs, and generates or forwards the SDOs on one or more output streams. An SDO handler is implemented to process inbound SDOs received on one or more ports One or more input ports output ports Source PE SDO Handler receive(handle, SDO) { } PE Container Sink PE Given one or more streams of inbound data, the Sink PE generates / saves findings from the stream application Knowledge Base One or more input ports No output ports Overview CLASP 9/20/2018

10 Transform PEs Classifiers, Annotators, Filters, Aggregators, etc – Properties related to the PE’s function There are many types of PEs in the transform role Transform Classifier Classifies and routes SDO to output port (e.g. application protocol – chat, , etc …) One input port, one or more output ports SDOs Input SDO forwarded to output port Association Determine logical proximity, COI One or more input ports, one output port Format and content of outbound SDOs differs from inbound SDO Frequency of outbound SDOs typically less than inbound Annotator Adds one or more attributes to SDO (e.g. identifies subject of document) One input port, one output port Input SDO forwarded to output port Segmenter Segments inbound SDO into one or more outbound One input port, one output port Format and content of outbound SDO differs from inbound SDO Filter Forwards only SDO that matches specified properties (e.g. forwards only SDOs where the topic is specific to coffee ) One input port, one output port Input SDO (conditionally) forwarded to output port Generic ? An unspecified transform. A generic type is used whenever a predefined type will not suffice. One or more input ports, and one or more output ports. and the types will evolve … Overview CLASP 9/20/2018

11 `` User Space Inquiry Manager Inquiry Services Job Manager
Users need to submit inquiries and eventually examine results `` <Inquiry> User Space How do flow graphs get built? How rendered into JDL? Not by users - their time should be spent on their real jobs. Inquiry Manager The Inquiry Services subsystem automatically renders inquiries into secure & privacy-preserving, deployable Jobs. Semantic Planner Plan Solver Renderer The Job Management subsystem accepts an inquiry job and deploys a flow graph Inquiry Services <Job> Resource Discovery Monitoring Collection Job Manager Graph Manager Optimizing Scheduler (SODA) Job Management A deployable set of stream-connected sources and analytics must be assembled and deployed across processing nodes. Stream Processing Core Overview CLASP 9/20/2018

12 Our Goal: Automatic Generation of Stream Processing Graphs
Ease of use Shorter Time to Deployment Reusable Software Components Analyst Inquiry Compiler SPA SPM DFE PSD INQ GUI Data Source PE NES IPN VBF “Mark” DUP WLG SPA  Streaming Protocol Annotator VBF  Value-Based Filter SCF  Stream Courtesy Finder JoinCC Join Cust & Emp to Courtesy Level IPN  Identified Person Notification SGD  Speech Gait Detection PCP  Progressive Conversation Pairing JAE  Join Asynch Events DSN  Denoise via Social Network DUPP Dig Up Person Pair Info N(CE)^S Employee Selector SPM  Stream Progression Monitor DFE  Distributed Front-End PSD  Progressive Speaker Detection Source PE WLG Source PE WLG N(CE)S^2 SGD PCP DSN IPN PFA IPN JAE JoinCC JoinL SLF IPN IPN IPN IPN Data Source Data Source VBF VBF SPA SPA JoinL DUPP IPN INQ GUI INQ GUI SPM SPM DFE DFE PSD PSD NES NES SSF SCF ES JoinSC IPN IPN DUP DUP N(CE)S^2 N(CE)S^2 Overview CLASP 9/20/2018

13 Reference Application & Demo Overview
Inquiry UI Screen Result Screen Location Anton Demo Screens Reference Application Background Global Services company manages business operations Using company’s VoIP communications network Monitor employee performance and customer satisfaction Uses VoIP calls as source data Behind the Scenes Inquiry Compiler SPC StreamSight Screen VoIP Packets IP Address SSID GSM Audio Joined Employee Info & Location Show what is believed to be the current p3 plan (of course some aspects are evolving and subject to change, but this is a reasonable view) This talk will focus on Job Management part Additionally demo will illustrate where just one of several integration successes is occurring are being made / barriers knocked down – and of course we are headed towards to overall sys System S Services Framework ISL inq spec lang SPPL stream proc plan language ORCA optimizer resource common area Session Information Speaker Identification Employee Info WLG Runtime Job Management Speaker Location Overview CLASP 9/20/2018

14 Semantic-based Technologies
Devise machine-interpretable semantic descriptions For users’ inquiries expressed in ISL, the Inquiry Specification Language For the stream processing components Data Sources PEs Develop an inquiry compiler that can interpret these descriptions and assemble components to satisfy inquiry goals type Image Planet imageOf ?EarthImage ?Keyword _1 Keyword describedBy Earth Produce Result {?EarthImage_1, ?Keyword_1 } Where ?EarthImage_1 a :Image; :imageOf :Earth; :describedBy ?Keyword_1 . ?Keyword_1 a :Keyword . Image Source Earth Hubble Hubble Earth Image Stream contains produces type Infrared Image Image subClassOf Telescope Space capturedBy Planet imageOf ?HubbleEarthImage_1 requires produces contains Image Input Stream Keyword-Annotated Image Output Stream Recognizer Pattern Image type ?Image_2 ?Keyword _1 Keyword describedBy Overview CLASP 9/20/2018

15 CLASP A single site provides a very powerful processing complex Multiple, cooperating sites enable additional benefits… Increased Breadth of Analysis Access data and HW uniquely available at another site Leverage results produced by another site Collaborate with users at another site Scalability Divide and conquer complex problems Utilize processing power of other sites Leverage unique processing capabilities Manage workload Reliability Recover from site failures Sites provide failover support to one another `` Site Evaluation CLASP 9/20/2018

16 Approach Cooperative VO Federated VO Independent Sites
May operate autonomously May cooperate with other sites Scalability Must allow for a large number of sites Sites must be organized to enable scalability Site interaction is controlled via policy A site’s Common Interest Policy (CIP) defines what sites it will cooperate with Policies define Virtual Organizations (VOs) Similar to Grid computing Evaluation CLASP 9/20/2018

17 Approach Independent Sites Scalability
Cooperative VO Federated VO Independent Sites May operate autonomously May cooperate with other sites Scalability Must allow for a large number of sites Sites must be organized to enable scalability Site interaction is controlled via policy A site’s common interest policy (CIP) defines what sites it will cooperate with Policies define Virtual Organizations (VOs) VOs are another unit of interaction VOs may cooperate as a unit with other sites and VOs. Flexible enough to support sophisticated organizational arrangements Scalable management of large sites Divide a large site into sub-sites Evaluation CLASP 9/20/2018

18 Cooperation Models Control Federated Cooperative
Sites under the influence of a common authority Centralized point of control, planning, trust Cooperative Peer-to-peer, for mutual benefit Decentralized planning VOs can be hierarchical and can overlap Resource Allocation Define CIPs Control access to streams, processing, failover Agreements (similar to WS-Agreements for Grid) Dynamic determination of what is allowed CIPs initially flat text files Structured versions (XML) in latest version Evaluation CLASP 9/20/2018

19 Virtual Organization Management
Sites join VOs to enable them to solve problems that they can’t achieve alone. VOs are defined by a Common Interest Policy Not a traditional style policy Declarative and data oriented Defines VO behavior style, as well as structure Membership VO cooperation style (federated vs. cooperative) Capability sharing terms Creation, and changes, to VO/CIPs require human involvement & approval (in most cases) Evaluation CLASP 9/20/2018

20 Stream Processing Core
CLASP Architecture VO Planner – create VO-wide distributed plans VO Manager – handle inter-site relationships and invocations Resource Awareness Engine – Identify resources at other sites and report status to other sites Remote Execution Coordinator – extends JMN to multiple sites Tunneling Manager – extends SPC to other sites Failover Manager – responsible for monitoring other sites and initiating recovery Heterogeneity Manager – handles differences in execution environments User Interface CLASP VO Planner VO MGR FM REC HM RAE TM Job Management Stream Processing Core Evaluation CLASP 9/20/2018

21 Example Capabilities Site 1 Site 2 Site 3 Site 4
Partitioning of distributed plans Remote execution of distributed plans Distributed plan inquiry Inquiry Service Special processing hardware only at Sites 2, 4 Analyst at Site 3 wants results Results Data sources available only at sites 1,4 Data Source REC TM REC TM Site 1 - - REC TM Site 2 Site 3 - - Site 4 Evaluation CLASP 9/20/2018

22 Example Capabilities Site 1 Site 2 Site 3 Site 4
Partitioning of distributed plans Remote execution of distributed plans Recovery of jobs from failed site Inquiry Service Results Data Source REC TM REC TM Site 1 REC TM Site 2 Site 3 failure notification FM FM heartbeat heartbeat FM heartbeat REC FM TM Site 4 Evaluation CLASP 9/20/2018

23 Testbed Hardware Software Federated VO with 4 sites
Dual Xeon 3.06GHz CPUs, 800MHz, 512KB L2 caches, 4GB memory, 80GB disk 1Gbps LAN Software Linux SUSE 9 Java prototype 40,000 LOC Federated VO with 4 sites Enterprise Global Service (EGS) application Monitor quality of customer service personnel over VOIP Location of employee Courtesy level Satisfaction of customers Evaluation CLASP 9/20/2018

24 Methodology Workload generation Analysis
Synthesized traffic from a set of parameters Analysis Location (loc), satisfaction (sat), courtesy (court or cor) of a particular person The flow graph depends on the type of inquiry Jobs reuse within one site, and across site within VO No sharing across sites in non-VO case Vary amount of reuse Constrains on jobs available Understand in smaller scale first, then larger scale. Inquiries that can be produced Evaluation CLASP 9/20/2018

25 Methodology Set Site 1 Site 2 Site 3 1: Maximum reuse 2: Minimum reuse
loc SHIMEI loc FAYE cor SHIMEI cor FAYE sat SHIMEI sat FAYE 2: Minimum reuse loc ENRIQUE loc NAOMI loc LEONARD loc EMILY cor LEONARD cor NORMAN cor MARK cor ENRIQUE sat LEONARD sat MARK sat EMILY sat NAOMI 3: Typical reuse cor NAOMI sat NORMAN loc MARCIA cor EMILY Three representative sets of inquiries in a spectrum of distribution Set 1, maximum reuse across sites Each execution site has the same 6 inquiries (2 loc/2 court/2 sat, 2 loc/2 court/2 sat, 2 loc/2 court/2 sat) Set 2, minimum reuse across sites Each execution site has 6 inquiries of different type (6 loc, 6 court, 6 sat) Set 3, average case across sites Each execution site has 6 random inquiries (mixture of types and person names) Constrains on jobs available Understand in smaller scale first, then larger scale. Inquiries that can be produced Evaluation CLASP 9/20/2018

26 NEC JCS IPN Set 1 site1 6 jobs 4:Courtesy B SXF 3:Courtesy A JCS IPN
DUPP SGD PCP JAE DSN NEC JCS IPN 6:SAT B NES JCF IPN 2:Loc B SPM DUP PSD NES SXF JCS IPN 1:Loc A JCF IPN 5:SAT A SPE SPA VBF DFE SXF Evaluation CLASP 9/20/2018

27 The same 18 inquiries in the VO
SO: Tunnel Source Each site has incoming tunnels to receive results from other sites and report to local users 6: Sat B 4: Courtesy B ID: report results 1:Loc A SI: Tunnel Sink 5: Sat A 3: Courtesy A 2:Loc B Evaluation CLASP 9/20/2018

28 Goals of Evaluation Rate of results production Impact of failover
Synergy of stream reuse Impact of system load Impact of failover Job deployment time Planning performance Evaluation CLASP 9/20/2018

29 Initial Results Running in VO has little gain! Why?
Jobs are lightly loaded and pumping out results at full speed Evaluation CLASP 9/20/2018

30 NEC JCS IPN Set 1 site1 6 jobs 4:Courtesy B SXF
Finding location Finding courtesy Finding satisfaction SXF Add processing loops (“load level”) to these PEs to create artificial load 3:Courtesy A JCS IPN DUPP SGD PCP JAE DSN NEC JCS IPN 6:SAT B NES JCF IPN 2:Loc B SPM DUP PSD NES SXF JCS IPN 1:Loc A JCF IPN 5:SAT A SPE SPA VBF DFE SXF Evaluation CLASP 9/20/2018

31 Total number of results in Set 1
As load level increases, jobs start to slow down and produce less results Up to 1/3-1/2 in highest load Jobs running in VO produce more than their counterpart in an individual site Up to 50% more Load before 40k are not high enough. Vo and Site are similar, so not shown. Evaluation CLASP 9/20/2018

32 Total number of results in Set 2
Interesting results: Running at individual sites is slightly better Why? Little reuse for jobs across site Still pay overhead Extra tunneling PEs Synchronization effect Lesson: If sites share little common processing, running them in VO may not gain Evaluation CLASP 9/20/2018

33 Total result number in set 3
Similar to that of set 1 Sites share significant common processing: each has a mixture of jobs types Evaluation CLASP 9/20/2018

34 Side-by-side Evaluation CLASP 9/20/2018

35 Gain of Failover No added load Load 80000 No synchronization Set up 7 VO jobs distributed across 3 execution sites, 6 separate jobs on 4th (backup) site After 60 seconds, kill one site; run another 120 seconds after recovery Compare two load levels: 0 vs When load is low, jobs are not affected after recovery When load is high, all jobs are affected and produce less results Evaluation CLASP 9/20/2018

36 Failover Time Breakdown
Result seqnum 419 sec Job recovered 430 sec App results resumed 415 sec Site failure detected 402 sec Site 2 fails Time (1/10 sec) Courtesy dist job 3.5 seconds for job recovery Average 3.6 sec for job recovery, 11.0 sec for app resuming Evaluation CLASP 9/20/2018

37 Job deployment time breakdown
Current implementation Each subjob is a separate deploying thread Each job within a subjob is a separate thread One normal job, one tunnel job for each sink/source tunnel PE Observations Overall job deployment time determined by the greatest of all jobs The sink query is the greatest among all components Needs to wait for source end to be ready Sink Sink Sink Source Source Source Evaluation CLASP 9/20/2018

38 Remaining Issues and Lessons
How to prevent one misbehaving/slow PE from affecting other jobs Can be severe in a large flow graph The dominating factor of gain is how much common processing exists across sites Simpler PEs for evaluation EGS is a complex application Some PEs have built-in adaptivity Evaluation CLASP 9/20/2018

39 Related Work Stream processing Grid
System S: SPC (ICDCS’06), failover (ARES’07), autonomic job management (ICAC’07), autonomic aspects of CLASP (HOTAC’07) Borealis (MIT/Brown), TelegraphCQ (Berkeley), STREAM (Stanford) Grid Virtual organizations WS-Agreements 9/20/2018

40 Future Work More complete integration with the rest of System S
Resource awareness Site failover, application checkpointing Interactions with other systems Heterogeneity General interfaces Negotiations More complicated agreements Run “real” applications Performance tuning 9/20/2018

41 Conclusions Stream processing systems have some similarity to Grid computing, but numerous differences require a new approach Continual data flows across sites Coping with intermittent failures Planning in System S very different from database-centric streaming systems Distributed planning an extra layer of complexity Collaborating stream systems can obtain gains in performance, availability, and scalability 9/20/2018

42 CLASP: Collaborating, Autonomous Stream Processing Systems
Fred Douglis 9/20/2018


Download ppt "System S lead: Nagui Halim"

Similar presentations


Ads by Google