Presentation is loading. Please wait.

Presentation is loading. Please wait.

ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, 17-19 April 2013 Evaluation of mongoDB for Persistent Storage of Monitoring.

Similar presentations


Presentation on theme: "ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, 17-19 April 2013 Evaluation of mongoDB for Persistent Storage of Monitoring."— Presentation transcript:

1 ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, 17-19 April 2013 Evaluation of mongoDB for Persistent Storage of Monitoring Data Tzu-Chiang Shen Leonel Peña

2 ICT-CPM1 17-19 April 2013 Monitoring Storage Requirement n Expected data rate with 66 antennas:  ~ 6000 - 7000 clobs/s ~ 25 - 30 GB/day  ~ equivalent to 310KByte/s or 2,485Mbit/s  ~ 130,000 - 150,000 monitor points n Monitoring data characteristic  Simple data structure: [timestamp, value]  But huge amount of data  Read-only data  Data is sorted at the moment of insertion

3 ICT-CPM1 17-19 April 2013 n no-SQL and document oriented. n The storage format is BSON, a variation of JSON. n A document within a collection, doesn’t required to have the same fields. n Other features: Sharding, Replication, Aggregation (Map/Reduce) Very Brief Introduction of MongoDB SQLmongoDB Database TableCollection RowDocument Field Index

4 ICT-CPM1 17-19 April 2013 Very Brief Introduction of MongoDB … A document in mongoDB: { _id: ObjectID("509a8fb2f3f4948bd2f983a0"), user_id: "abc123", age: 55, status: 'A' }

5 ICT-CPM1 17-19 April 2013 Alternatives of Schema for Monitoring Data n One monitoring point per document

6 ICT-CPM1 17-19 April 2013 n A clob per document Alternatives of Schema …

7 ICT-CPM1 17-19 April 2013 n A monitor point per day per document Alternatives of Schema …

8 ICT-CPM1 17-19 April 2013 n Advantages:  The amount of documents within a collection is bounded There will be ~150,000 documents per day The amount of indexes will be bounded as well.  No data fragmentation problem  Once a specific document is identified ( nlog(n) ), the access to a specific range or a single value can be done in O(1)  Smaller ratio of metadata / data Analysis

9 ICT-CPM1 17-19 April 2013 n Query to retrieve a value with seconds-level granularity:  Ej: To get the value of the FrontEnd/Cryostat/GATE_VALVE_STATE at 2012-09- 15T15:29:18. db.monitorData_[MONTH].findOne( {"metadata.date": "2012-9-15", "metadata.monitorPoint": "GATE_VALVE_STATE", "metadata.antenna": "DV10", "metadata.component": "FrontEnd/Cryostat”}, { 'hourly.15.29.18': 1 } ); How would a query look like?

10 ICT-CPM1 17-19 April 2013 n Query to retrieve a range of value  Ej: To get values of the FrontEnd/Cryostat/GATE_VALVE_STATE at minute 29 (at 2012-09-15T15:29) db.monitorData_[MONTH].findOne( {"metadata.date": "2012-9-15", "metadata.monitorPoint": "GATE_VALVE_STATE", "metadata.antenna": "DV10", "metadata.component": "FrontEnd/Cryostat”}, { 'hourly.15.29': 1 } ); How would a query looks like …

11 ICT-CPM1 17-19 April 2013 n A typical query is restricted by:  Antenna name  Component name  Monitor point  Date db.monitorData_[MONTH].ensureIndex( { "metadata.antenna": 1, "metadata.component": 1, "metadata.monitorPoint": 1, "metadata.date": 1 } ); Indexes

12 ICT-CPM1 17-19 April 2013 n A cluster of two nodes were created  CPU: Intel Xeon Quad core X5410.  RAM: 16 GByte  SWAP: 16 GByte n OS:  RHEL 6.0  2.6.32-279.14.1.el6.x86_64 n MongoDB  V2.2.1 Testing Hardware / Software

13 ICT-CPM1 17-19 April 2013 n Real data from from Sep-Nov of 2012 was used initially, but: n A tool to generate random data was implemented:  Month: 1 (February)  Number of days: 11  Number of antennas:70  Number of components by antenna: 41  Monitoring points by component: 35  Total daily documents: 100.450  Total of documents: 1.104.950  Average weight by document: 1,3MB  Size of the collection: 1,375.23GB  Total index size193MB Testing Data

14 ICT-CPM1 17-19 April 2013 Database Statistics

15 ICT-CPM1 17-19 April 2013 Data Sets

16 ICT-CPM1 17-19 April 2013 Data Sets …

17 ICT-CPM1 17-19 April 2013 Data Sets

18 ICT-CPM1 17-19 April 2013 Schema 1: One Sample of Monitoring Data per Document

19 ICT-CPM1 17-19 April 2013 Proposed Schema:

20 ICT-CPM1 17-19 April 2013 n For more tests, see https://adcwiki.alma.cl/bin/view/Software/HighVolu meDataTestingUsingMongoDB https://adcwiki.alma.cl/bin/view/Software/HighVolu meDataTestingUsingMongoDB More tests

21 ICT-CPM1 17-19 April 2013 n Test performance of aggregations/combined queries n Use Map/Reduce to create statistics (max, min, avg, etc) of range of data to improve performance of queries like:  i.e: Search monitoring points which values >= 10 n Test performance under a years worth of data n Stress tests with big amount of concurrent queries Pending

22 ICT-CPM1 17-19 April 2013 n MongoDB is suitable as an alternative for permanent storage of monitoring data n The schema + indexes are fundamental to achieve milliseconds level of responses Conclusion


Download ppt "ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, 17-19 April 2013 Evaluation of mongoDB for Persistent Storage of Monitoring."

Similar presentations


Ads by Google