Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of Microsoft StreamInsight Torsten Grabs Lead Program Manager Microsoft StreamInsight.

Similar presentations


Presentation on theme: "Overview of Microsoft StreamInsight Torsten Grabs Lead Program Manager Microsoft StreamInsight."— Presentation transcript:

1 Overview of Microsoft StreamInsight Torsten Grabs Lead Program Manager Microsoft StreamInsight

2 The Need for an Event-Driven Platform 2 Analytical results need to reflect important changes in business reality immediately and enable responses to them with minimal latency Database ApplicationsEvent-driven Applications Query Paradigm Ad-hoc queries or requests Continuous standing queries LatencySeconds, hours, daysMilliseconds or less Data RateHundreds of events/secTens of thousands of events/sec or more Query Semantics Declarative relational analytics Declarative relational and temporal analytics request response Event output stream input stream

3 Relational Database Applications Financial trading Applications Scenarios for Event-Driven Applications Aggregate Data Rate (Events/sec.) Latency ~1million Months Days hours Minutes Seconds 100 ms < 1ms Operational Analytics Applications, e.g., Logistics, etc. Manufacturing Applications Monitoring Applications CEP Target Scenarios Data Warehousing Applications Web Analytics Applications 3

4 Example Scenarios 4 Data Stream Stream Data Store & Archive Event Processing Engine Data Stream Asset Specs & Parameters Power, Utilities: Energy consumption Outages Smart grids 100,000 events/sec Visual trend-line and KPI monitoring Batch & product management Automated anomaly detection Real-time customer segmentation Algorithmic trading Proactive condition-based maintenance Visual trend-line and KPI monitoring Batch & product management Automated anomaly detection Real-time customer segmentation Algorithmic trading Proactive condition-based maintenance Web Analytics: Click-stream data Online customer behavior Page layout 100,000 events /sec Manufacturing: Sensor on plant floor React through device controllers Aggregated data 10,000 events/sec Threshold queries Event correlation from multiple sources Pattern queries Threshold queries Event correlation from multiple sources Pattern queries Lookup Asset Instrumentation for Data Acquisition, Subscriptions to Data Feeds Financial Services: Stock & news feeds Algorithmic trading Patterns over time Super-low latency 100,000 events /sec

5 Standing Queries Query Logic Event sources Event targets Devices, Sensors Web servers Event stores & Databases Stock ticker, news feeds Event stores & Databases Pagers & Monitoring devices KPI Dashboards, SharePoint UI Trading stations Input Adapters Output Adapters StreamInsight Engine Query Logic StreamInsight Application Development StreamInsight Application at Runtime StreamInsight Platform

6 What is Project “Austin”? Rich temporal (StreamInsight) and sequential (Reactive Framework) analytics models Dynamic, flexible query and data source management experience Turn key connectivity for platform data sources and sinks (SQL Azure, Windows Azure Table Storage) Integrated with Azure management portal and billing experiences Real time data collection from wide variety of connected devices (Sensors, Smart Meters, Servers, Tablets, Phones) Standards compliant endpoints (REST, XML, JSON) Securable data ingress with data enrichment and transformation (geo- tagging, etc.) Multi-tenant Azure service with flexible, elastic capacity for collection and analytics Federated scale out collection and analytics Distributed service monitoring and tracing

7 StreamInsight on Azure: “Austin” Standing Queries Stream- Insight Query Prebuilt Input Adapters Austin StreamInsight Engine Reactive Query Stream- Insight Query StreamInsight Application Development StreamInsight Application at Runtime Prebuilt Output Adapters Data Egress Adapter Management Service Monitoring Service Scalable Data Ingress Adapter Authentication Built-in Archive

8 Events Events expose different temporal characteristics Point in time events Interval events with fixed duration Interval events with initially unknown duration Rich payloads capture all properties of an event t1 t4 t3 t2 t5 Time  Payload/ value  a b c d e

9 Event Types Events in Microsoft’s CEP platform use the.NET type system Events are structured and can have multiple fields Fields are typed using the.NET framework types CEP engine provisioned timestamp fields capture all the different temporal event characteristics Event sources populate time stamp fields Timestamps /Metadata Long pumpID String Type String Location Double flow Double pressure ………………

10 Event Streams & Adapters A stream is a possibly infinite sequence of events Insertions of new events Changes to event durations Stream characteristics: Event/data arrival patterns Steady rate with end-of-stream indication Intermittent, random, or in bursts Out of order events: Order of arrival of events does not match the order of their application timestamps Adapters Receive/get events from the data source Enqueue events for processing in the engine 10

11 Typical CEP Queries Typical CEP queries require combination of functionality Complex type describes event properties Calculations introduce additional event properties Grouping by one or more event properties Aggregation for each event group over a pre-defined period of time, typically a window Multiple event groups monitored by the same query Correlate event streams Check for absence of activity with a data source Enrich events with reference data Collection of assets may change over time We want to make writing and maintaining those queries easy or even effortless

12 StreamInsight Query Features Operators over streams Calculations (PROJECT) Correlation of streams from different data sources (JOIN) Check for absence of activity with a data source (EXISTS) Selection of events from streams (FILTER) Stream partitioning (GROUP & APPLY) Aggregation (SUM, COUNT, …) Ranking and heavy hitters (TOP-K) Temporal operations: hopping window, sliding window Extensibility – to add new domain-specific operators

13 LINQ Query Examples LINQ Example – GROUP&APPLY, WINDOW: from e3 in MyStream3 group e3 by e3.i into SubStream from win in SubStream.HoppingWindow( FiveMinutes,ThreeSeconds) select new { i = SubStream.Key, a = win.Avg(e => e.f) }; LINQ Example – GROUP&APPLY, WINDOW: from e3 in MyStream3 group e3 by e3.i into SubStream from win in SubStream.HoppingWindow( FiveMinutes,ThreeSeconds) select new { i = SubStream.Key, a = win.Avg(e => e.f) }; LINQ Example – JOIN, PROJECT, FILTER: from e1 in MyStream1 join e2 in MyStream2 on e1.ID equals e2.ID where e1.f2 == “foo” select new { e1.f1, e2.f4 }; LINQ Example – JOIN, PROJECT, FILTER: from e1 in MyStream1 join e2 in MyStream2 on e1.ID equals e2.ID where e1.f2 == “foo” select new { e1.f1, e2.f4 }; Join Filter Project Grouping Window Project & Aggregate

14 Extensibility SDK Built-in operators do not cover all functionality Need for domain-specific extensions Integrate with functionality from existing libraries Support for extensions in the CEP platform: User-defined operators, functions, aggregates Code written in.NET, deployed as.NET assembly Query operators and LINQ can refer to functionality of the assembly Temporal snap-shot operator framework Interface to implement user-defined operators Manages operator state and snapshot changes Framework does the heavy lifting to deal with intricate temporal behavior such as out-of-order events

15 Resiliency Outages happen in computing Power outages “Patch Tuesday” Human mistakes Planned and unplanned downtime Systems need to be “resilient” to outages Minimize damage Become operational again quickly The specific requirements depend on how mission critical your applications is

16 Resiliency: Timeliness Timeliness: recover from outages quickly. Goal is simple: as fast as possible. StreamInsight doesn’t store event data, but it does store query state. This may be significant. This may be slow to recreate.

17 Resiliency: Correctness

18 What is Checkpointing? Checkpointing saves a query’s state to disk. You control when the checkpoint is initiated. SI takes care of saving out consistent state. After an outage, StreamInsight can restore this state. This limits state loss during an outage, speeding recovery. Level of correctness depends on additional work we are able to perform. Recovery process is coordinated by SI.

19 Checkpointing API public IAsyncResult server.BeginCheckpoint( Query query, AsyncCallback asyncCallback, object asyncState); public bool server.EndCheckpoint( IAsyncResult asyncResult); public void server.CancelCheckpoint( IAsyncResult asyncResult);

20 When is Checkpointing Useful? Provides a mechanism to recover from an outage: To recover from unexpected system failure. To handle expected outages (e.g., patch Tuesday). For machine migration. Not a panacea: Does not provide uninterrupted service. Does not protect against broken query logic.

21 Using Checkpoints We’ll walk through the three progressively- strict checkpointing scenarios: 1. State retention. 2. Equivalent events. 3. Exact equivalence.

22 Low Bar: State Retention Ideal output: Real output: HGFEDCBA … BA H’G’F’ …

23 Checkpointing jihgfedc … jihgfedc … Enqueue markers into input streams to instruct operators to save their state.

24 Checkpointing jihgfedc … jihgfedc … oops

25 Recovery nmlkjihg … nmlkjihg … Load saved operator state and then start consuming input.

26 Medium Bar: Equivalent Events Ideal output: Real output: HGFEDCBA … BA DCB …

27 Filling the Gaps StreamInsight needs help: Missing state since last checkpoint. Missed events during outage. Solution: replayable adapters. The dance: 1. StreamInsight picks a place in the input stream. 2. StreamInsight communicates this to the input adapter. 3. The input adapter replays from the chosen spot.

28 Checkpointing … … jihgfedc kjihgfedlkjihgfe jihgfedc kjihgfedlkjihgfe

29 Recovery lkjihgfe … lkjihgfe …

30 A Place in the Stream hgfedcba … Physical Stream

31 Communicating the State Input adapter factories can optionally implement one of IHighWaterMarkInputAdapterFactoryIHighWaterMarkTypedInputAdapterFactory In a recovery situation, StreamInsight will then call Create with a high-water mark. The factory is then responsible for properly cueing the input.

32 StreamInsight in Action Internet of Things Demo

33 The Demo StreamInsight “Austin”

34 StreamInsight Design Principles Scalability – Aggregate data rate keeps increasing. Minimum resources impact (co-located). Local computation Avoid flooding the network Programmability Extensibility – UserDefinedAggregates, UserDefinedFunctions, UserDefinedOperators. Composability. Developer experience (language, IDE, debugging, supportability) Adaptablity Easy to integrate via adapters. Portability (servers, edge devices) 34

35 StreamInsight Architecture 35 Host Process... Web Service Engine Compiler Expression / Type Service Runtime Execution Operators Stream Manager Event Manager Query Scheduler Plan Manager Synopsis Command Dispatcher Management Service Metadata Diagnostics / Tracing Stream OS Adap- ters

36 Host Process... Web Service Engine Compiler Expressio n / Type Service Runtime Execution Operators Stream Manager Event Manager Query Scheduler Plan Manager Synopsis Command Dispatcher Management Service Metadata Diagnostics / Tracing Stream OS Adapter s Highlights Manageability API for query management (i.e. create, start, stop, delete query) and supportability / monitoring of running queries Same manageability API for both embedded deployment and web service clients Management Service

37 Host Process... Web Service Engine Compiler Expression / Type Service Runtime Execution Operators Stream Manager Event Manager Query Scheduler Plan Manager Synopsis Command Dispatcher Management Service Metadata Diagnostics / Tracing Stream OS Adapter s Compiler & Expressions Highlights Standardized IL allows us to implement a variety of syntactic surfaces over the algebra - e.g., LINQ, CQL, etc. Allows for domain-specific front-end languages Prepared for future extensions Compile time type checking and type safe code generation for minimal runtime impact. Support for UDF’s, UDAggs, UDOs. JIT code generation for field references, expression evaluation for low latency processing of high event rates. Basing on CLR helps leverage – Code generator, JIT support Type System Tools and Libraries (LINQ Expressions, IDE, etc.)

38 Host Process... Web Service Engine Compiler Expressio n / Type Service Runtime Execution Operators Stream Manager Event Manager Query Scheduler Plan Manager Synopsis Command Dispatcher Management Service Metadata Diagnostics / Tracing Stream OS Adapter s Highlights JIT code generation for field references, expression evaluation because interpreting these references is sub-optimal for low latency processing of high event rates. Leverage JIT code generation support in CLR runtime for LINQ expressions. Bind the query to different deployment environments based on the metadata. Event manager is implemented as a combination of managed and native code in order to minimize overhead and ensure predictable performance. Events are read-only and referenced-counted by streams (minimize data copying) Events & Streams

39 Host Process... Web Service Engine Compiler Expressio n / Type Service Runtime Execution Operators Stream Manager Event Manager Query Scheduler Plan Manager Synopsis Command Dispatcher Management Service Metadata Diagnostics / Tracing Stream OS Adapter s Highlights A query is executed by scheduling the individual operators as they become active. Operator state transition is managed by the Scheduler. When an operator becomes active a thread is scheduled for execution. Scheduling decision based on priority of the query and other parameters. Data flow architecture: reduced coupling and pipeline parallelism Operators are affinitized to a thread/core (multi-core environments) to decrease lock contention and increase caching benefits. Periodic checks and migration for load balancing. Query Scheduler

40 Host Process... Web Service Engine Compiler Expressio n / Type Service Runtime Execution Operators Stream Manager Event Manager Query Scheduler Plan Manager Synopsis Command Dispatcher Management Service Metadata Diagnostics / Tracing Stream OS Adapter s Highlights Efficient implementation of operators that perform incremental evaluation as each event is processed. Clean, formal semantics. Leverage relational semantics whenever possible. GroupAndApply Operator Enables parallelism for scale-up (multi-core). Groups are dynamically instantiated and torn down based upon the data. Large numbers of groups can be simultaneously active. (~50M active groups for MSN.com) XYZ Group A,B,C Apply Union X,Y,Z ZZZ YYY XXX BBB AA ABC CC C Execution Operators

41 The StreamInsight Team Founded in 2008 based on incubation between MSR and SQL teams Small team – by Microsoft standards Small team – by Microsoft standards Roles in Microsoft engineering teams Program Managers: customer scenarios, functional specs, APIs, project mgmt, evangelism Developers: architecture, technical design, product code, unit tests Testers: test breakout, test code, lab runs, release signoff Using agile development methods

42 StreamInsight Roadmap StreamInsight 2.1 (on prem) Development experience Major API overhaul StreamInsight on Azure (Cloud) StreamInsight service on Windows Azure Currently private CTP GA this summer Using Scrum to organize and manage schedules Using Scrum to organize and manage schedules Work organized in sprints/milestones Work organized in sprints/milestones CTP (Community Technology Preview) after each milestone – similar to public beta CTP (Community Technology Preview) after each milestone – similar to public beta TAP (Technology Adopter Program) as we get closer to the planned release TAP (Technology Adopter Program) as we get closer to the planned release

43 For More Information StreamInsight download location: StreamInsight blog: StreamInsight MSDN documentation: us/library/ee362541(SQL.105).aspx us/library/ee362541(SQL.105).aspx us/library/ee362541(SQL.105).aspx StreamInsight MSDN portal: us/ee aspx us/ee aspx us/ee aspx

44


Download ppt "Overview of Microsoft StreamInsight Torsten Grabs Lead Program Manager Microsoft StreamInsight."

Similar presentations


Ads by Google