Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: DB the Cloud, and SQL & Stream Processing.

Similar presentations


Presentation on theme: "1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: DB the Cloud, and SQL & Stream Processing."— Presentation transcript:

1 1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: DB the Cloud, and SQL & Stream Processing

2 2 Outline DB in the Cloud SQL and Stream Processing

3 333 DB in the Cloud

4 Cloud Introduction Provide “why”, “what”, and “how” around SQL in the cloud Cloud perception: costs? New capabilities? Massively scalable computing? IaaS, PaaS, SaaS Private vs. public clouds Interoperability & standards 4

5 Cloud Introduction Definition: Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction The cloud model promotes availability and is composed of five essential characteristics: three service models and four deployment models 5

6 Cloud Characteristics 5 Essential Cloud Characteristics :  On-demand self-service (service & deployment model)  Broad network access (deployment model)  Resource pooling (service & deployment model) Location independent  Rapid elasticity (deployment model)  Measured service (service model) 6

7 Cloud Business View 7 Reducing costNew capabilitiesRiskOpportunities Remaking value, decision, influence chain

8 Cloud: Platform Dimensions Platform Dimensions:  Productivity  Scale  Trust  Viability  Velocity  Relationship Platform vendors succeed when the platform helps others succeed 8

9 Cloud Computing Architecture 9 Infrastructure-as-a-Service Security-as-a-Service Storage-as-a-Service Integration-as-a-Service Database-as-a-Service Information-as-a-Service Process-as-a-Service Platform-as-a-Service Application-as-a-Service Management/Governance-as-a-Service Testing-as-a-Service

10 Cloud: Success Factors Cloud Success Factors:  Utility Computing Capability Technical capability Datacenter Innovation capability  Application Pattern Capability Not just about the browser Platform, delivery, and tooling  Platform Ecosystem Work with ISVs, Sis, VARs & Business to get to Cloud 10

11 Cloud Challanges Challenges:  Identity and Access Management  Composition / Workflow  Trust Availability Performance Information protection  Latency It matters  Separating logical / physical administration 11

12 QoS-Aware Cloud Design and development of QoS  Ability to meet QoS application requirements as specified in hosting SLA  Current AS technology is not fully instrumented to meet those requirements Two principal middleware services:  Configuration Service (CS)  Monitoring Service (MS)  Complemented by adaptive Load Balancing Service Operate both on single AS and cluster of ASs 12

13 Cloud: Evolution of Virtualization 13

14 Cloud: Evolution of Virtualization 14 Power Saving with Distributed Power Management (PDM)

15 15 SQL and Stream Processing

16 Agenda Data Streams  What are they?  Why now? Applications… DSMS: Architecture & Issues Query Processing 16

17 Data Streams – What and Where? Continuous, unbounded, rapid, time-varying streams of data elements (tuples) Occur in a variety of modern applications  Network monitoring and traffic engineering  Sensor networks, RFID tags  Telecom call records  Financial applications  Web logs and click-streams  Manufacturing processes DSMS DSMS = Data Stream Management System 17

18 18 DBMS versus DSMS DBMS versus DSMS Persistent relations One-time queries Random access Access plan determined by query processor and physical DB design Transient streams (and persistent relations) Continuous queries Sequential access Unpredictable data characteristics and arrival patterns

19 Continuous Queries One time queries – Run once to completion over the current data set Continuous queries – Issued once and then continuously evaluated over the data  Example: Notify me when the temperature drops below X Tell me when prices of stock Y > 300 19

20 Stanford stream data manager 20 The (Simplified) Big Picture DSMS Scratch Store Input streams Register Query Streamed Result Stored Result Archive Stored Relations

21 (Simplified) Network Monitoring Register Monitoring Queries DSMS Scratch Store Network measurements, Packet traces Intrusion Warnings Online Performance Metrics Archive Lookup Tables 21

22 Triggers? Recall triggers in traditional DBMSs? Why not use triggers to process continuous queries over data streams? 22

23 Making Things Concrete DSMS Outgoing (call_ID, caller, time, event) Incoming (call_ID, callee, time, event) event = start or end Central Office Central Office ALICE BOB 23

24 24 Query 1 ( self-join ) Find all outgoing calls longer than 2 minutes SELECT O1.call_ID, O1.caller FROM Outgoing O1, Outgoing O2 WHERE (O2.time – O1.time > 2 AND O1.call_ID = O2.call_ID AND O1.event = start AND O2.event = end) Result requires unbounded storage Can provide result as data stream Can output after 2 min, without seeing end

25 25 Query 2 ( join ) Pair up callers and callees SELECT O.caller, I.callee FROM Outgoing O, Incoming I WHERE O.call_ID = I.call_ID Can still provide result as data stream Requires unbounded temporary storage … … unless streams are near-synchronized

26 26 Query 3 ( group-by aggregation ) Total connection time for each caller SELECT O1.caller, sum(O2.time – O1.time) FROM Outgoing O1, Outgoing O2 WHERE (O1.call_ID = O2.call_ID AND O1.event = start AND O2.event = end) GROUP BY O1.caller Cannot provide result in (append-only) stream  Output updates?  Provide current value on demand?  Memory?

27 27 DSMS – Architecture & Issues Data streams and stored relations – Architectural differences. Declarative language for registering continuous queries Flexible query plans and execution strategies Centralized ? Distributed ?

28 Agenda Data Streams  What are they?  Why now? Applications.. DSMS: Architecture & Issues Query Processing 28

29 DSMS – Issues Relation: Tuple Set or Sequence? Updates: Modifications or Appends? Query Answer: Exact or Approximate? Query Evaluation: One of multiple Pass? Query Plan: Fixed or Adaptive? 29

30 Architectural Issues DSMSDBMS Resource (memory, per- tuple computation) limited Reasonably complex, near real time, query processing Useful to identify what data to populate in database Query Evaluation: One pass Query Plan: Adaptive Resource (memory, disk, per-tuple computation) rich Extremely sophisticated query processing, analysis Useful to audit query results of data stream systems Query Evaluation: Arbitrary Query Plan: Fixed. 30

31 STREAM System Challenges Must cope with:  Stream Rates that may be high, variable, and bursty  Stream data that may be unpredictable, variable  Continuous query loads that may be high, variable Query Answer: Exact or Approximate? 31

32 32 Query Model User/ Application DSMS Query Processor 32

33 Agenda Data Streams What are they? Why now? Applications.. DSMS: Architecture & Issues Query Processing Language Operators Optimization Multi-Query Optimization 33

34 Agenda Data Streams  What are they?  Why now? Applications.. DSMS: Architecture & Issues Query Processing  Language  Operators  Optimization  Multi-Query Optimization 34

35 Stream Query Language SQL extension Queries reference/produce relations or streams Examples: GSQL [Gigascope], CQL [STREAM] Stream or Finite Relation Stream Query Language 35

36 Example: Continuous Query Language – CQL Start with SQL Then add… Streams as new data type Continuous instead of one-time semantics Windows on streams (derived from SQL-99) Sampling on streams (basic) 36

37 Impact of Limited Memory Continuous streams grow unboundedly Queries may require unbounded memory One solution: Approximate query evaluation 37

38 Approximate Query Evaluation Why?  Handling load – streams coming too fast  Avoid unbounded storage and computation  Ad hoc queries need approximate history How? Sliding windows, synopsis, samples, load-shed Major Issues?  Metric for set-valued queries  Composition of approximate operators  How is it understood/controlled by user?  Integrate into query language  Query planning and interaction with resource allocation  Accuracy-efficiency-storage tradeoff and global metric 38

39 Windows Mechanism for extracting a finite relation from an infinite stream Various window proposals for restricting operator scope:  Windows based on ordering attribute (e.g. time)  Windows based on tuple counts  Windows based on explicit markers (e.g. punctuations)  Variants (e.g., partitioning tuples in a window) Stream Finite relations manipulated using SQL Window specifications streamify 39

40 Windows Terminology Start timeCurrent time time t1t2t3 t4t5 Sliding Window timeTumbling Window 40

41 Query Operators Selections - Where clause Projections - Select clause Joins - From clause Group-by (Aggregations) – Group-by clause 41

42 Query Operators Selections and projections on streams - straightforward  Local per-element operators Projection may need to include ordering attribute Joins – Problematic  May need to join tuples that are arbitrarily far apart  Equijoin on stream ordering attributes may be tractable Majority of the work focuses on joins using windows 42

43 Blocking Operators Blocking  No output until entire input seen  Streams – input never ends Simple Aggregates – output “update” stream Set Output (sort, group-by)  Root – could maintain output data structure  Intermediate nodes – try non-blocking analogs Join  Apply sliding-window restrictions 43

44 Optimization in DSMS Traditionally table based cardinalities used in query optimizer.  Goal of query optimizer: Minimize the size of intermediate results Problematic in a streaming environment – All streams are unbounded = infinite size! Need novel optimization objectives that are relevant when the input sources are streams 44

45 Query Optimization in DSMS Novel notions of optimization:  Stream rate based [e.g. NiagaraCQ]  Resource-based [e.g. STREAM]  QoS based [e.g. Aurora] Continuous adaptive optimization Possibilities that objectives cannot be met:  Resource constraints  Bursty arrivals under limited processing capabilities. 45

46 Stream Projects Amazon/Cougar Amazon/Cougar (Cornell) – sensors Aurora (Brown/MIT) – sensor monitoring, dataflow Hancock Hancock (AT&T) – telecom streams Niagara (OGI/Wisconsin) – Internet XML databases OpenCQ OpenCQ (Georgia) – triggers, incr. view maintenance Stream (Stanford) – general-purpose DSMS Tapestry Tapestry (Xerox) – pub/sub content-based filtering Telegraph (Berkeley) – adaptive engine for sensors Tribeca Tribeca (Bellcore) – network monitoring 46

47 47 END


Download ppt "1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: DB the Cloud, and SQL & Stream Processing."

Similar presentations


Ads by Google