One of the standard Oracle DBA tasks is to monitor the alert log for errors. Whether you’re using the integrated AWS Cloudwatch or customized monitoring scripts, you’ll want to test that the monitoring function is working properly.
There can be many noise pages that you want to exclude from alerting. For instance, if your network is sometimes unstable, you’ll frequently see TNS errors in your alert log–perhaps something like “ORA-12196: TNS:received an error from TNS”.
You’re aware of these errors and don’t want to be notified each time they occur. In this case, you have an “EXCLUDE list” which lists all errors you wish to ignore or exclude from paging. It’s less likely, but not impossible, that instead you have an “INCLUDE list” which includes only those alert log messages on which you wish to be notified.
Once you have created your EXCLUDE or INCLUDE list, how do you verify that your monitoring is working correctly and that you’re indeed excluding or including the appropriate notifications? You don’t really want to cause errors on your system to test the monitoring. Instead, you write fake error messages to your alert log, and then confirm your monitoring.
Oracle provides a mechanism to write directly to the alert log using the DBMS_SYSTEM.KSDWRT procedure. However, this is not available in a relational database service (RDS) environment. So, we need to find another way to write to the alert log to test monitoring.
The following provides an example of a simple workaround to write to the alert log. It requires only three database objects: a table, a procedure, and a scheduler job.
To avoid filling the alert log with useless information, we put a simple mechanism in place that allows us to turn on/off writing errors to the alert log. We use a simple control table for this:
create table control_job_failing ( should_job_fail char(1)); insert into control_job_failing values ( 'N'); commit;
Next, you’ll cause Oracle to raise certain harmless error conditions and temporarily include these errors in your EXCLUDE or INCLUDE list. There are three ways to force Oracle to raise an error:
RAISE_APPLICATION_ERROR. This allows you to raise an exception in the error range of -20000 to -20999.RAISE allows you to raise one of the pre-defined exceptions.This procedure will read the control table and then either terminate successfully or terminate with an error based on the control table value:
create or replace procedure proc_control_job_failing as v_should_job_fail char(1); divisor number; begin select should_job_fail into v_should_job_fail from control_job_failing; if v_should_job_fail = 'Y' then -- using one of the three ways to force an error raise_application_error ( -20000, 'Forcing failure for testing'); -- raise INVALID_CURSOR ; -- select 5/0 into divisor from dual; end if; end; /
Lastly, create a scheduler job that will run the procedure on a regular basis:
begin dbms_scheduler.create_job ( job_name => 'JOB_FAILING_OR_NOT', job_type => 'STORED_PROCEDURE', job_action => 'PROC_CONTROL_JOB_FAILING', start_date => systimestamp + 5/1440, repeat_interval => 'FREQ=HOURLY;BYMINUTE=0,5,10,15,20,25,30,35,40,45,50,55', enabled => TRUE); END; /
Verify that the scheduler job has been created successfully:
set linesize 150 trimspool on col job_name format a30 col next_run_date format a30 col start_date format a30 col repeat_interval format a20 select job_name, enabled, next_run_date, start_date, repeat_interval from dba_scheduler_jobs where job_name = 'JOB_FAILING_OR_NOT';
Test the scheduler job by running it with the control setting to do nothing and then with the value set to cause the error to be written:
exec dbms_scheduler.run_job( 'AVAIL.JOB_FAILING_OR_NOT'); update control_job_failing set should_job_fail = 'Y' ; commit; exec dbms_scheduler.run_job( 'AVAIL.JOB_FAILING_OR_NOT');
View the scheduler run details to verify that it has been run by the scheduler:
set linesize 200 trimspool on col log_date format a20 col status format a15 col additional_info word_wrapped format a80 select log_date, status, error#, additional_info from dba_SCHEDULER_JOB_RUN_DETAILS where job_name = 'JOB_FAILING_OR_NOT' order by log_date;
You’ll want to confirm that the errors actually have been written to the alert log. You don’t necessarily want to review all records in the alert log to locate and view that your test errors were written. Use the following SQL to view the alert log messages from just before and just after the most recent alert log error:
with find_time as ( select max(originating_timestamp) max_timestamp from alertlog where instr(message_text, 'ORA-') > 0 ) select ORIGINATING_TIMESTAMP, message_text from alertlog, find_time where ( ORIGINATING_TIMESTAMP > ( find_time.max_timestamp - 5/1440 ) and ORIGINATING_TIMESTAMP < ( find_time.max_timestamp + 5/1440 ) ) order by ORIGINATING_TIMESTAMP;
So, now we have shown how to force an error to the alert log and view alert log messages created by our testing. Hopefully by now, your regular alert log monitoring has notified you of the errors you have generated and caused to be written. If not, take this time to review your alert log monitoring and EXCLUDE list to verify if you should have been notified. I hope you’ve found this post helpful. Feel free to share your questions in the comments and subscribe so you don’t miss the next post.
Ready to optimize your Oracle Database for the future?