Presentation is loading. Please wait.

Presentation is loading. Please wait.

Time Series Data in MongoDB

Similar presentations


Presentation on theme: "Time Series Data in MongoDB"— Presentation transcript:

1 Time Series Data in MongoDB
Massimo Brignoli Senior Solutions Architect, MongoDB Inc.

2 Agenda What is time series data? Schema design considerations
Broader use case: operational intelligence MMS Monitoring schema design Thinking ahead Questions

3 What is time series data?

4 Time Series Data is Everywhere
Financial markets pricing (stock ticks) Sensors (temperature, pressure, proximity) Industrial fleets (location, velocity, operational) Social networks (status updates) Mobile devices (calls, texts) Systems (server logs, application logs)

5 Time Series Data at a Higher Level
Widely applicable data model Applies to several different “data use cases” Various schema and modeling options Application requirements drive schema design

6 Time Series Data Considerations
Resolution of raw events Resolution needed to support Applications Analysis Reporting Data retention policies Data ages out Retention

7 Schema Design Considerations

8 Designing For Writing and Reading
Document per event Document per minute (average) Document per minute (second) Document per hour

9 Document Per Event { server: “server1”, load: 92,
ts: ISODate(" T22:07: ") } Relational-centric approach Insert-driven workload Aggregations computed at application-level

10 Document Per Minute (Average)
{ server: “server1”, load_num: 92, load_sum: 4500, ts: ISODate(" T22:07: ") } Pre-aggregate to compute average per minute more easily Update-driven workload Resolution at the minute-level

11 Document Per Minute (By Second)
{ server: “server1”, load: { 0: 15, 1: 20, …, 58: 45, 59: 40 } ts: ISODate(" T22:07: ") } Store per-second data at the minute level Update-driven workload Pre-allocate structure to avoid document moves

12 Document Per Hour (By Second)
{ server: “server1”, load: { 0: 15, 1: 20, …, 3598: 45, 3599: 40 } ts: ISODate(" T22:00: ") } Store per-second data at the hourly level Update-driven workload Pre-allocate structure to avoid document moves Updating last second requires 3599 steps

13 Document Per Hour (By Second)
{ server: “server1”, load: { 0: {0: 15, …, 59: 45}, …. 59: {0: 25, …, 59: 75} ts: ISODate(" T22:00: ") } Store per-second data at the hourly level with nesting Update-driven workload Pre-allocate structure to avoid document moves Updating last second requires steps

14 Characterzing Write Differences
Example: data generated every second Capturing data per minute requires: Document per event: 60 writes Document per minute: 1 write, 59 updates Transition from insert driven to update driven Individual writes are smaller Performance and concurrency benefits

15 Characterizing Read Differences
Example: data generated every second Reading data for a single hour requires: Document per event: 3600 reads Document per minute: 60 reads Read performance is greatly improved Optimal with tuned block sizes and read ahead Fewer disk seeks

16 MMS Monitoring Schema Design

17 MMS Monitoring MongoDB Management System Monitoring
Available in two flavors Free cloud-hosted monitoring On-premise with MongoDB Enterprise Monitor single node, replica set, or sharded cluster deployments Metric dashboards and custom alert triggers

18 MMS Monitoring

19 MMS Monitoring

20 MMS Application Requirements
Resolution defines granularity of stored data Range controls the retention policy, e.g. after 24 hours only 5-minute resolution Display dictates the stored pre-aggregations, e.g. total and count

21 Monitoring Schema Design
{ timestamp_minute: ISODate(“ T23:06:00.000Z”), num_samples: 58, total_samples: , type: “memory_used”, values: { 0: , 59: } Per-minute document model Documents store individual metrics and counts Supports “total” and “avg/sec” display

22 Monitoring Data Updates
db.metrics.update( { timestamp_minute: ISODate(" T23:06:00.000Z"), type: “memory_used” }, {$set: {“values.59”: }}, {$inc: {num_samples: 1, total_samples: }} } ) Single update required to add new data and increment associated counts

23 Monitoring Data Management
Data stored at different granularity levels for read performance Collections are organized into specific intervals Retention is managed by simply dropping collections as they age out Document structure is pre-created to maximize write performance

24 Use Case: Operational Intelligence

25 What is Operational Intelligence
Storing log data Capturing application and/or server generated events Hierarchical aggregation Rolling approach to generate rollups e.g. hourly > daily > weekly > monthly Pre-aggregated reports Processing data to generate reporting from raw events

26 Storing Log Data { _id: ObjectId('4f442120eb03305789000000'),
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" "[ "Mozilla/4.08 [en] (Win98; I ;Nav)” { _id: ObjectId('4f442120eb '), host: " ", user: 'frank', time: ISODate(" T20:55:36Z"), path: "/apache_pb.gif", request: "GET /apache_pb.gif HTTP/1.0", status: 200, response_size: 2326, referrer: “ user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)" }

27 Pre-Aggregation Analytics across raw events can involve many reads
Alternative schemas can improve read and write performance Data can be organized into more coarse buckets Transition from insert-driven to update-driven workloads

28 Pre-Aggregated Log Data
{ timestamp_minute: ISODate(" T20:55:00Z"), resource: "/index.html", page_views: { 0: 50, 59: 250 } Leverage time-series style bucketing Track individual metrics (ex. page views) Improve performance for reads/writes Minimal processing overhead

29 Hierarchical Aggregation
Analytical approach as opposed to schema approach Leverage built-in Aggregation Framework or MapReduce Execute multiple tasks sequentially to aggregate at varying levels Raw events  Hourly  Weekly  Monthly Rolling approach distributes the aggregation workload

30 Thinking Ahead

31 Before You Start What are the application requirements?
Is pre-aggregation useful for your application? What are your retention and age-out policies? What are the gotchas? Pre-create document structure to avoid fragmentation and performance problems Organize your data for growth – time series data grows fast!

32 Down The Road Scale-out considerations Understanding the data
Vertical vs. horizontal (with sharding) Understanding the data Aggregation Analytics Reporting Deeper data analysis Patterns Predictions

33 Scaling Time Series Data in MongoDB
Vertical growth Larger instances with more CPU and memory Increased storage capacity Horizontal growth Partitioning data across many machines Dividing and distributing the workload

34 Time Series Sharding Considerations
What are the application requirements? Primarily collecting data Primarily reporting data Both Map those back to Write performance needs Read/write query distribution Collection organization (see MMS Monitoring) Example: {metric name, coarse timestamp}

35 Aggregates, Analytics, Reporting
Aggregation Framework can be used for analysis Does it work with the chosen schema design? What sorts of aggregations are needed? Reporting can be done on predictable, rolling basis See “Hierarchical Aggregation” Consider secondary reads for analytical operations Minimize load on production primaries

36 Deeper Data Analysis Leverage MongoDB-Hadoop connector
Bi-directional support for reading/writing Works with online and offline data (e.g. backup files) Compute using MapReduce Patterns Recommendations Etc. Explore data Pig Hive

37 Questions?

38 Resources Schema Design for Time Series Data in MongoDB data-in-mongodb Operational Intelligence Use Case Data Modeling in MongoDB Schema Design (webinar)


Download ppt "Time Series Data in MongoDB"

Similar presentations


Ads by Google