UDM is watching you, UDMs

Oct 11, 2013 / By Andrey Goryunov

Tags: , , , ,

With time monitoring of several thousand targets of different versions and on different operating systems accumulates additional checks and user-defined metrics for specific requirements. With the presence of a dozen super administrator accounts, the Oracle Enterprise Manager environment demands specific monitoring so that checks that were customized and configured for certain targets are not lost.

How could this monitoring function be better organized? One of the approaches I find useful is to create user-defined metrics to monitor other user-defined metrics. Knowing the list of UDMs that should be configured for targets, I created UDMs that gather information about metrics and send alerts if there are any discrepancies.

select count(*) from (
with sql_metrics as (
            select 'OEMP' target_name, 'prod_targets_no_metrics' metric_name, '' note from dual
  union all select 'OEMP', 'target_removed', '' from dual
  union all select 'OEMP', 'test_targets_no_metrics', '' from dual
  union all select 'OEMP', 'UDM_count_autofiles', '' from dual
  union all select 'OEMP', 'UDM_asm_dg', '' from dual
  union all select 'PROD.WORLD', 'UDM_apply_status', '' from dual
  union all select 'TEST.WORLD', 'UDM_apply_status', '' from dual
  union all select 'STDBY.WORLD', 'UDM_standby_lag', '' from dual
  union all select 'PROD.WORLD', 'changes_lag', '' from dual
  union all select 'REP.WORLD', 'changes_lag', '' from dual
  union all select 'TEST.WORLD', 'changes_lag', '' from dual
  union all select 'TNP', 'standby_lag', '' from dual
  union all select 'E.WORLD', 'UDM_alertlog', '' from dual
),
sql_current as (select c.target_type, c.target_name, c.metric_label, c.column_label,
  max(c.collection_timestamp) last_date, count(*) cnt
  from mgmt$metric_current c
  where c.metric_label like 'User-Defined%Metric%'
  group by c.target_type, c.target_name, c.metric_label, c.column_label
)
select target_name, metric_name, decode(nvl(cnt, 0), 0,
'Error: Metric does not exist', 'Error: Metric exists but coll time older than 3 days') msg,
note from (
select m.target_name, m.metric_name, c.cnt, c.last_date, m.note
from sql_metrics m, sql_current c
where m.target_name = c.target_name(+)
and m.metric_name = c.column_label(+)
)
where last_date < sysdate - 3.4
or last_date is null
)

But what about the UDM check itself? What if it is removed as well? For that purpose, I created a script which scheduled on the OMS host and runs emcli collect_metric periodically. If collection is not completed successfully then the DBA on call is alerted.

/home/oracle/working/ag/report_udm_not_there.sh OEMP UDM_existence >/dev/null 2>&1

report_udm_not_there.sh:
#!/bin/bash
CMD_PATH=/home/oracle/working/ag
export JAVA_HOME=/e00/oracle/middleware/oms11g/jdk
export PATH=$JAVA_HOME/bin:$PATH
UDM=$2
DB=$1
LOG=$CMD_PATH/`basename $0 .sh`_${DB}_${UDM}.log
$CMD_PATH/emcli collect_metric -target_type=oracle_database -target_name=$DB -collection=$UDM >$LOG 2>$LOG
if [ `cat $LOG | grep "was collected at repository successfully" | wc -l` -ne 1 ]
then
  mailx -s "issues with $UDM at $DB" admin@site.com < $LOG
fi

Another way to monitor disassociation of UDMs and targets is to use the FLASHBACK feature which allows you to compare current and past data (let’s say 15 minutes ago) of mgmt$metric_current. However, it can be too general.

Happy OEM monitoring!

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>