Presentation is loading. Please wait.

Presentation is loading. Please wait.

Summary of Alma-OSF’s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013 Heavily based on the presentation by Tzu-Chiang Shen, Leonel.

Similar presentations


Presentation on theme: "Summary of Alma-OSF’s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013 Heavily based on the presentation by Tzu-Chiang Shen, Leonel."— Presentation transcript:

1 Summary of Alma-OSF’s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013 Heavily based on the presentation by Tzu-Chiang Shen, Leonel Peña ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, 17-19 April 2013

2 ICT-CPM1 17-19 April 2013 Monitoring Storage Requirement n Expected data rate with 66 antennas:  150,000 monitor points (“MP”s) total.  MPs get archived once per minute ~1 minute of MP data bucketed into a “clob”  ~ 7000 clobs/s ~ 25 - 30 GB/day, ~10 TB/year 2500 clobs/s + dependent MP demultiplexing + fluctuations  ~ equivalent to 310KByte/s or 2,485Mbit/s n Monitoring data characteristic  Simple data structure: [ID, timestamp, value]  But huge amount of data  Read-only data

3 ICT-CPM1 17-19 April 2013 Prior DB Investigations n Oracle: See Alisdair’s slides. n MySQL  Query problems, similar to Oracle DB n HBase (2011-08)  Got stuck with Java client problems  Poor support from the community n Cassandra (2011-10)  Keyspace / replicator issue resolved  Poor insert performance: Only 270 inserts / minute (unclear what size)  Clients froze n These experiments were done “only” with some help from archive operators, not in the scope of a student’s thesis like it was later with MongoDB. n Also “administrational complexity” was mentioned, without details.

4 ICT-CPM1 17-19 April 2013 n no-SQL and document oriented. n The storage format is BSON, a variation of JSON. n Documents within a collection can differ in structure.  For monitor data we don’t really need this freedom. n Other features: Sharding, Replication, Aggregation (Map/Reduce) Very Brief Introduction of MongoDB SQLmongoDB Database TableCollection RowDocument Field Index

5 ICT-CPM1 17-19 April 2013 Very Brief Introduction of MongoDB … A document in mongoDB: { _id: ObjectID("509a8fb2f3f4948bd2f983a0"), user_id: "abc123", age: 55, status: 'A' }

6 ICT-CPM1 17-19 April 2013 Schema Alternatives 1.) One MP value per doc n One MP value per doc: n One MongoDB collection total, or one per antenna.

7 ICT-CPM1 17-19 April 2013 n A clob (~1 minute of flattened MP data): n Collection per antenna / other device. Schema Alternatives 2.) MP clob per doc

8 ICT-CPM1 17-19 April 2013 n One monitor point data structure per day n Monthly database n Shard key = antenna + MP, keeps matching docs on the same node. n Updates of pre-allocated documents. Schema Alternatives 3.) Structured MP /day/doc

9 ICT-CPM1 17-19 April 2013 n Advantages of variant 3.):  Fewer documents within a collection There will be ~150,000 documents per day The amount of indexes will be lower as well.  No data fragmentation problem  Once a specific document is identified ( nlog(n) ), the access to a specific range or a single value can be done in O(1)  Smaller ratio of metadata / data Analysis

10 ICT-CPM1 17-19 April 2013 n Query to retrieve a value with seconds-level granularity:  Ej: To get the value of the FrontEnd/Cryostat/GATE_VALVE_STATE at 2012-09- 15T15:29:18. db.monitorData_[MONTH].findOne( {"metadata.date": "2012-9-15", "metadata.monitorPoint": "GATE_VALVE_STATE", "metadata.antenna": "DV10", "metadata.component": "FrontEnd/Cryostat”}, { 'hourly.15.29.18': 1 } ); How would a query look like?

11 ICT-CPM1 17-19 April 2013 n Query to retrieve a range of values  Ej: To get values of the FrontEnd/Cryostat/GATE_VALVE_STATE at minute 29 (at 2012-09-15T15:29) db.monitorData_[MONTH].findOne( {"metadata.date": "2012-9-15", "metadata.monitorPoint": "GATE_VALVE_STATE", "metadata.antenna": "DV10", "metadata.component": "FrontEnd/Cryostat”}, { 'hourly.15.29': 1 } ); How would a query look like …

12 ICT-CPM1 17-19 April 2013 n A typical query is restricted by:  Antenna name  Component name  Monitor point  Date db.monitorData_[MONTH].ensureIndex( { "metadata.antenna": 1, "metadata.component": 1, "metadata.monitorPoint": 1, "metadata.date": 1 } ); Indexes

13 ICT-CPM1 17-19 April 2013 n A cluster of two nodes were created  CPU: Intel Xeon Quad core X5410.  RAM: 16 GByte  SWAP: 16 GByte n OS:  RHEL 6.0  2.6.32-279.14.1.el6.x86_64 n MongoDB  V2.2.1 Testing Hardware / Software

14 ICT-CPM1 17-19 April 2013 n Real data from Sep-Nov of 2012 was used initially, but: n A tool to generate random data was implemented:  Month: 1 (February)  Number of days: 11  Number of antennas:70  Number of components by antenna: 41  Monitoring points by component: 35  Total daily documents: 100.450  Total of documents: 1.104.950  Average weight by document: 1,3MB  Size of the collection: 1,375.23GB  Total index size193MB Testing Data

15 ICT-CPM1 17-19 April 2013 Database Statistics

16 ICT-CPM1 17-19 April 2013 Data Sets

17 ICT-CPM1 17-19 April 2013 Data Sets …

18 ICT-CPM1 17-19 April 2013 Data Sets

19 ICT-CPM1 17-19 April 2013 Schema 1: One Sample of Monitoring Data per Document

20 ICT-CPM1 17-19 April 2013 Proposed Schema:

21 ICT-CPM1 17-19 April 2013 n For more tests, see https://adcwiki.alma.cl/bin/view/Software/HighVolu meDataTestingUsingMongoDB https://adcwiki.alma.cl/bin/view/Software/HighVolu meDataTestingUsingMongoDB More tests

22 ICT-CPM1 17-19 April 2013 n Test performance of aggregations/combined queries n Use Map/Reduce to create statistics (max, min, avg, etc) of range of data to improve performance of queries like:  i.e: Search monitoring points which values >= 10 n Test performance under a year worth of data n Stress tests with big amount of concurrent queries TODO

23 ICT-CPM1 17-19 April 2013 n MongoDB is suitable as an alternative for permanent storage of monitoring data. nReported 25,000 clobs/s ingestion rate in the tests. n The schema + indexes are fundamental to achieve milliseconds level of responses Conclusion @ OSF

24 ICT-CPM1 17-19 April 2013 n What are the requirements going to be like? nOnly extraction by time interval and offline processing? nOr also “data mining” running on the DB? nAll queries ad-hoc and responsive, or also batch jobs? nRepair / flagging of bad data? Later reduction of redundancies? n Can we hide the MP-to-document mapping from upserts/queries? nCurrently queries have to patch together results at the 24 hour and monthly breaks. Comments


Download ppt "Summary of Alma-OSF’s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013 Heavily based on the presentation by Tzu-Chiang Shen, Leonel."

Similar presentations


Ads by Google