Presentation is loading. Please wait.

Presentation is loading. Please wait.

02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical.

Similar presentations


Presentation on theme: "02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical."— Presentation transcript:

1 02 August 20041 OraMonPlans 08/04

2 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical data –Saving disk space OraMonArch Bugs Others OraMon OO development with Together OraMon changes for Maciej’s alarm interfacing system?

3 02 August 20043 OraMon DB redundancy layer Requirements: 1.OraMon should retry connect after loosing DB connection Currently (as for OraMon 0.0.3), upon DB connection failure, OraMon issues a [FATAL] log and stops 2.OraMon should support ‘Do(Not)InsertSamples’ command Currently, OraMon inserts or does not insert samples, according to the value of environment variable MR_READONLY 3.OraMon should have a ‘HeartBeat’ command Currently, one may check if an OraMon instance is alive by issuing a MR API query to it (via lemon-utils/lemon-cli.pl). Attitudes to satisfy ‘retry connect’ and ‘Do(Not)InsertSamples’: –‘External’: (do some variable setting and) start Oramon Pros: Simple to implement, no internal changes to OraMon Con: Few minutes down time –‘Internal’: Change OraMon to satisfy requirement by adding specific code Pros and cons are the opposite compared to ‘External’

4 02 August 20044 OraMon DB redundancy layer Requirements: 1.OraMon should retry connect after loosing DB connection Currently (as for OraMon 0.0.3), upon DB connection failure, OraMon issues a [FATAL] log (+ failure kind) and stops 2.OraMon should support ‘Do(Not)InsertSamples’ command Currently, OraMon inserts or does not insert samples, according to the value of environment variable MR_READONLY 3.OraMon should have a ‘HeartBeat’ command Currently, one may check if an OraMon instance is alive by issuing a MR API query to it (via lemon-utils/lemon-cli.pl). Attitudes to satisfy ‘retry connect’ and ‘Do(Not)InsertSamples’: –‘External’: (do some variable setting and) start Oramon Pros: Simple to implement, no internal changes to OraMon Con: Few minutes down time –‘Internal’: Change OraMon to satisfy requirement by adding specific code Pros and cons are the opposite compared to ‘External’

5 02 August 20045 OraMon DB redundancy layer ‘External’ solutions: 1.Retry connect after loosing DB connection A simple (restart-oramon like) service that issues: /etc/rc.d/init.d/OraMon start after OraMon stops, if ‘failure kind’ belongs to a TBD failure set. 2.‘InsertSamples’ command to OraMon restart OraMon after un/set MR_READONLY: Do insert: unset MR_READONLY ; /etc/rc.d/init.d/OraMon restart Do not insert: set MR_READONLY=yes ; /etc/rc.d/init.d/OraMon restart 3.OraMon ‘HeartBeat’ Check sane response to a lemon-cli.pl query Should not get: Failed to MRs_getSamples() : #-1 : Connection refused Example: perl lemon-utils/lemon-cli.pl --metrics="10002" --nodes="lcgmon002d« --remote-server="http://ccs002d:12510"

6 02 August 20046 OraMon DB redundancy layer ‘Internal’ solutions: 1.Retry connect after loosing DB connection Change OraMon code: when an SQL command fails, because of a TBD failure set, do not fail, but rather try to connect again first (for a few times, sleeping between each try) 2.‘InsertSamples’ command to OraMon Reuse and extend existing proprietary ‘insert samples’ protocol: Define ‘pseudo’ metricId (set) that OraMon interprets as commands rather than as metrics to be inserted Commands arrive from a specific port or from samples port. Commands may be added to ‘metrics configuration’ (like) configuration 3.OraMon ‘HeartBeat’: the same as previous

7 02 August 20047 Changing metrics configuration Related OraMon documentation : Changing metrics configurationChanging metrics configuration German’s email 19/7 [Lemon] changes in metric data fields: -changes (adding/removing/changing data fields) to latestOnly metrics: ok David: - ok. - When applying a new configuration, all (TBD changed) latest tables and views will be automatically dropped -changes to latestOnly metrics which have a historical table defined, but not (anylonger) used (reconfigured from 'latestOnly=false' to true): drop historical table altogether. David: - ok. - Also, drop tables of removed metrics? (- Also, is Archiving of tables to be dropped required?)

8 02 August 20048 Changing metrics configuration Cont. -changes to 'historical' metrics (not latestOnly): - added data fields: OK David: TBD: ok iff adding fields does not complicate restoring of old data that do not have new fields -removed and changed data fields: drop historical values in DB, or refuse (global OraMon configuration Boolean parameter). David: I doubt that dropping historical data will satisfy potential problems while restoring older data. Assuming this is correct, ‘refuse’ will always be applied. -changes where historical values should be preserved: define a new metric ID. I don't think any conversion magic is appropriate, and for being consistent, it should be applied as well to all historical data already archived into CASTOR, which is far from trivial. David: As a rule of thumb: I suggest to avoid applying changes to archived data

9 02 August 20049 Changing metrics configuration David’s suggestions -Observation: The OraMon level of complexity to add a field is similar to that of applying other ‘compatible’ changes: remove field, change length -In order to avoid clashes between existing OraMon data schemas and previously archived data, I suggest that: -Each change to a metricClass will have new metricIds -Previous metricIds will be marked ‘obsolete’, by new metadata field -Previous metricIds may have a ‘replaced by metricId’ metadata field -In order to preserve older data and allow data schema changes, I suggest that when a ‘compatible’ change is applied to a metricClass, its existing historical table will be renamed to the new name, and automatic fixes will be applied by OraMon.

10 02 August 200410 Expiry of historical data 4162expiry of historical data4162expiry of historical data To be discussed at CERN 2004-Jul-19 12:14 jveldik jveldik

11 02 August 200411 Saving disk space Compress partitions –Howto: OraMon partitions thread to compress partitions that are at least one day old –TBD: May cause unexpected complications –Saving space is important, but not urgent Make numbers (and strings) smaller –May be applied after applying all ‘Changing metrics configuration’ items

12 02 August 200412 OraMonArch OraMonArch documentation If ‘archive and not drop’ is required, implementation should be enhanced, since current implementation drops and returns data Two OraMonArch instances: continuous and non-continuous: Non continuous requests can not be queued OraMonArch transaction error when stop/crash after DDL command and before updating relevant checkpoint

13 02 August 200413 bug reports Item IDSummarySubmitted on Submitted by 4000 OraMon packaging issues, broken restart-oramon Minor: understand a minor rpm mistake: restart-oramon is installed by OraMon non config rpm 2004-Jul-05 07:33 gcancio 4001LSB compliance for OraMon LSB compliance for OraMon Minor 2004-Jul-05 07:40 gcancio 4002 OraMon should continue running with old metadata if incompatibility is found Medium: See: Compare and fix OraMon configurations 2004-Jul-05 07:58 Gcancio 4004Floating point exception error using OraMonAdminFloating point exception error using OraMonAdmin Small: Fix a bug 2004-Jul-05 08:35 gcancio 4015 define/document policy for valid / invalid configuration changes Small: OraMon should also check for valid characters and keywords for eg. metric field descriptions. This should be part of the documentation as well. Add: OraMon and/or the script that creates metrics configuration may be enhanced to check against using Oracle reserved words as identifiers. http://www-rohan.sdsu.edu/doc/oracle/server803/A54661_01/ares.htm Make sure that OraMon will not fail with fieldNames that consist more than one word + strange chars (see email from 19/7) http://www-rohan.sdsu.edu/doc/oracle/server803/A54661_01/ares.htm 2004-Jul-05 12:43 gcancio 4074OraMon - Validation Failures OraMon - Validation Failures Minor 2004-Jul-08 10:22 waldron 4097Add OraMon possible errors to its documentationAdd OraMon possible errors to its documentation Small 2004-Jul-12 12:28 dfront 4162expiry of historical dataexpiry of historical data To be discussed at CERN 2004-Jul-19 12:14 jveldik 4180 OraMon should support number sizes and a boolean type Small. Add: Learn if OraMon and agent can use the same code for metric validation. 2004-Jul-21 05:58 dfront

14 02 August 200414 Bugs found while installing OraMon 0.0.3 1)OraMon views indicate time that is later by one hour than the real time 2)OraMonArch/Cont service script (/etc/rc.d/init.d/OraMonArchContCtl): Return only after completing the work. Should return immediately. May cause computer to stuck at reboot. 3)Probable problem: metric validation errors at lcgmon002d differ from those at ccs002d 4)To be addressed to German: recognizing metric configuration change according to date causes rpm update to fail by mistake. Suggested fix: A hard coded date attribute. 5)To be checked: I suspect that logrotate does not work at ccs002d for /var/log/OraMon.log, because it did grow to: 66M as for 27/7 6)OraMonArch transaction error when stop.crash after DDL command and before updating relevant checkpoint (See above)


Download ppt "02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical."

Similar presentations


Ads by Google