Presentation is loading. Please wait.

Presentation is loading. Please wait.

In quest of the operational database for real-time environmental monitoring and early warning systems Bartosz Baliś, Marian Bubak, Daniel Harezlak, Piotr.

Similar presentations


Presentation on theme: "In quest of the operational database for real-time environmental monitoring and early warning systems Bartosz Baliś, Marian Bubak, Daniel Harezlak, Piotr."— Presentation transcript:

1 In quest of the operational database for real-time environmental monitoring and early warning systems Bartosz Baliś, Marian Bubak, Daniel Harezlak, Piotr Nowakowski, Maciej Pawlik, and Bartosz Wilk Department of Computer Science AGH University of Science and Technology, Kraków, Poland ICCS 2017, Zürich, Switzerland

2 Outline Motivation and objectives
Research context: DSS, urgent computing Methodology: data sets and workloads Models for time series data for 4 databases: MongoDB, PostgreSQL, Redis, and InfluxDB Experimental evaluation: Performance of different models Mixed workload performance Conclusion I will present the broader context of this research which is the ISMOP project.

3 Challenges for data management
Solution: multiple data stores and models to address diverse needs Diverse data sets (spatial, time series, binary, metadata) and data usage patterns Solution: appropriate data infrastructure Data-intensive processing Threat level evaluation scenario: 130 GB and more of data to search per 1km of a levee

4 Motivation and objectives
Environmental monitoring and decision support systems need to process massive sensor data streams in real time Simultaneous reads and writes Objective of the study: evaluate four different DBs and corresponding data models: MongoDB, PostgreSQL, Redis, InfluxDB How to best represent time series data? What are the limits of the evaluated DBs?

5 Research context Data management in large-scale environmental monitoring, early warning, and decision support systems

6 Applied methodology Four database technologies chosen: MongoDB, PostgreSQL, Redis, InfluxDB Research questions: How best to implement time series data in a given data model and database? How do alternative models perform for different queries? What are the reasonable volume limits for an operational database? What are the performance limits of the alternative approaches and what factors influence them?

7 Methodology: test data sets
Time series records consisting of (time series id, time stamp, value) Generated data sets representing measurements from 10,000 sensors For experiments, databases were populated with from 10M to 1B records

8 Methodology: test workloads
Read workload: three test queries Query 1: random access. Return 1000 records for random time series IDs and time stamps. This represents a query which is difficult to optimize. It may occur in certain types of visualizations spanning many sensors. Query 2: recent measurements. Return 10 latest records for 100 random time series IDs. Query 3: downsampling. Return 100 records for 100 random time series IDs, where the returned records are selected by downsampling the latest n*100 records for each of the time series IDs. Write workload: 10,000 new records per second Written to DBs in batches of ,000

9 Databases MongoDB: document database PostgreSQL: relational database
Redis: in-memory dictionary data server InfluxDB: native time series database

10 Data models: MongoDB Model 1: one record = one document
Model 2: single document = multiple records (e.g. 1 hour records) customId and timestamp should be indexed to improve query performance Less documents in database Documents can be pre-created

11 Data models: PostgreSQL
Model 1: single monolithic table with three columns (id, time stamp, value) Model 2: partitioned table Time series ID as the partition key Increases DB scalability but introduces write overhead Model 3: multi-column table (not implemented) One row: id, time stamp, multiple values) Smaller tables Queries more difficult to implement

12 Data models: Redis Model 1: one record = one Redis HASH
Model 2: one record = one Redis STRING Model 3: SORTED SETS Elements of the set = values Name of the set = time series ID Score associated with values = time stamp

13 Data models: InfluxDB Only one model: InfluxDB’s native representation
One record = one point in a time series tagged with time series ID (indexed)

14 Write performance and disk usage
Redis achieves best write throughput, InfluxDB almost as good Influx disk space optimization is excellent Redis consumes surprisingly high amount of memory (10M records on 4GB machine)

15 Query execution Clear advantage of Mongo M2 over M1
Partitioning induces overhead for small DB sizes but for large DBs performance gain is larger Redis M3 performs best, but not for Q1 Influx has excellent performance and exceptional scalability

16 Mixed workload performance
Influx performs almost equally well for 1B records than Redis for 10M MongoDB 8-12s response times may not be sufficient Mixed workload affects PostgreSQL the most due to complex table locking and index updates (ACID compliance)

17 Conclusion Proper time series representation is crucial for performance Native time series Influx DB outperforms the competition However, there are other factors in choosing technology than just performance Sometimes one chooses “boring technology”(*) because it’s more predictable, easier to operate and maintain One may choose to store time series data in the same DB as metadata to keep technology stack small For less demanding use cases even good old RDB may prove sufficient (*)

18 More at http://www.ismop.edu.pl bubak@agh.edu.pl

19 Acknowledgement This research was supported by the National Centre for Research and Development (NCBiR) under Grant No. PBS1/B9/18/2013.


Download ppt "In quest of the operational database for real-time environmental monitoring and early warning systems Bartosz Baliś, Marian Bubak, Daniel Harezlak, Piotr."

Similar presentations


Ads by Google