Verify that Oracle RDS Alert Log Monitoring Works | Official Pythian®® Blog

Written by Babette Turner | May 12, 2022 4:00:00 AM

One of the standard Oracle DBA tasks is to monitor the alert log for errors. Whether you’re using the integrated AWS Cloudwatch or customized monitoring scripts, you’ll want to test that the monitoring function is working properly.

The Need for Reliable Alert Log Testing

There can be many noise pages that you want to exclude from alerting. For instance, if your network is sometimes unstable, you’ll frequently see TNS errors in your alert log–perhaps something like “ORA-12196: TNS:received an error from TNS”.

Managing Include and Exclude Lists

You’re aware of these errors and don’t want to be notified each time they occur. In this case, you have an “EXCLUDE list” which lists all errors you wish to ignore or exclude from paging. It’s less likely, but not impossible, that instead you have an “INCLUDE list” which includes only those alert log messages on which you wish to be notified.

Once you have created your EXCLUDE or INCLUDE list, how do you verify that your monitoring is working correctly and that you’re indeed excluding or including the appropriate notifications? You don’t really want to cause errors on your system to test the monitoring. Instead, you write fake error messages to your alert log, and then confirm your monitoring.

The Limitation of RDS Environments

Oracle provides a mechanism to write directly to the alert log using the DBMS_SYSTEM.KSDWRT procedure. However, this is not available in a relational database service (RDS) environment. So, we need to find another way to write to the alert log to test monitoring.

A Workaround for Writing to the Alert Log

The following provides an example of a simple workaround to write to the alert log. It requires only three database objects: a table, a procedure, and a scheduler job.

Step 1: Creating a Control Mechanism

To avoid filling the alert log with useless information, we put a simple mechanism in place that allows us to turn on/off writing errors to the alert log. We use a simple control table for this:

create table control_job_failing ( should_job_fail char(1)); insert into control_job_failing values ( 'N'); commit;

Step 2: Strategies for Forcing Errors

Next, you’ll cause Oracle to raise certain harmless error conditions and temporarily include these errors in your EXCLUDE or INCLUDE list. There are three ways to force Oracle to raise an error:

Oracle PL/SQL provides the built-in procedure RAISE_APPLICATION_ERROR. This allows you to raise an exception in the error range of -20000 to -20999.
The Oracle built-in RAISE allows you to raise one of the pre-defined exceptions.
Manually execute an invalid command, such as “divide by zero”.

Step 3: Developing the Test Procedure

This procedure will read the control table and then either terminate successfully or terminate with an error based on the control table value:

create or replace procedure proc_control_job_failing  as  v_should_job_fail char(1);  divisor number;  begin  select should_job_fail into v_should_job_fail  from control_job_failing;  if v_should_job_fail = 'Y' then  -- using one of the three ways to force an error  raise_application_error ( -20000, 'Forcing failure for testing');  -- raise INVALID_CURSOR ;  -- select 5/0 into divisor from dual;  end if;  end;  /

Step 4: Automating with DBMS_SCHEDULER

Lastly, create a scheduler job that will run the procedure on a regular basis:

begin  dbms_scheduler.create_job (  job_name => 'JOB_FAILING_OR_NOT',  job_type => 'STORED_PROCEDURE',  job_action => 'PROC_CONTROL_JOB_FAILING',  start_date => systimestamp + 5/1440,  repeat_interval => 'FREQ=HOURLY;BYMINUTE=0,5,10,15,20,25,30,35,40,45,50,55',  enabled => TRUE);  END;  /

Verification and Monitoring Analysis

Confirming the Scheduler Setup

Verify that the scheduler job has been created successfully:

 set linesize 150 trimspool on  col job_name format a30  col next_run_date format a30  col start_date format a30  col repeat_interval format a20  select job_name, enabled, next_run_date, start_date, repeat_interval  from dba_scheduler_jobs where job_name = 'JOB_FAILING_OR_NOT';

Running the Simulated Failure

Test the scheduler job by running it with the control setting to do nothing and then with the value set to cause the error to be written:

exec dbms_scheduler.run_job( 'AVAIL.JOB_FAILING_OR_NOT'); update control_job_failing set should_job_fail = 'Y' ; commit; exec dbms_scheduler.run_job( 'AVAIL.JOB_FAILING_OR_NOT');

View the scheduler run details to verify that it has been run by the scheduler:

set linesize 200 trimspool on  col log_date format a20  col status format a15  col additional_info word_wrapped format a80  select log_date, status, error#, additional_info  from dba_SCHEDULER_JOB_RUN_DETAILS  where job_name = 'JOB_FAILING_OR_NOT'  order by log_date;

Validating Alert Log Entries

You’ll want to confirm that the errors actually have been written to the alert log. You don’t necessarily want to review all records in the alert log to locate and view that your test errors were written. Use the following SQL to view the alert log messages from just before and just after the most recent alert log error:

 with find_time  as ( select max(originating_timestamp) max_timestamp  from alertlog where instr(message_text, 'ORA-') > 0 )  select ORIGINATING_TIMESTAMP, message_text  from alertlog, find_time  where ( ORIGINATING_TIMESTAMP > ( find_time.max_timestamp - 5/1440 )  and ORIGINATING_TIMESTAMP < ( find_time.max_timestamp + 5/1440 )  )  order by ORIGINATING_TIMESTAMP;

Final Thoughts on Monitoring Integrity

So, now we have shown how to force an error to the alert log and view alert log messages created by our testing. Hopefully by now, your regular alert log monitoring has notified you of the errors you have generated and caused to be written. If not, take this time to review your alert log monitoring and EXCLUDE list to verify if you should have been notified. I hope you’ve found this post helpful. Feel free to share your questions in the comments and subscribe so you don’t miss the next post.

Oracle Database Consulting Services

Ready to optimize your Oracle Database for the future?

View full post