Blog | Pythian

Killing an Oracle Job Dead

Written by Lukas Vysusil | Mar 30, 2009 4:00:00 AM

If you ever thought it was easy to kill an Oracle job, let me show you something that gave me a headache today.

We’re testing an application that uses custom code to de-queue and propagate AQ messages between databases. There are nine processes in total, and all of them are submitted as jobs using DBMS_JOB. They are supposed to run all the time, waiting for messages to de-queue. Once in a while, I need to kill them in order to recompile the objects they are executing.

The steps are simple enough: remove/break all jobs in question, kill the jobs that are running, recompile the packages with the new version, and submit/un-break the jobs again. This time, however, things were not going as smoothly as I expected.

1. The Standard Approach: Removal and Session Termination

First, I generate and execute the commands to remove all currently running jobs using DBMS_IJOB.

Executing DBMS_IJOB.REMOVE

SQL

SYS@ORCL>select 'exec dbms_ijob.remove('||job||');' from dba_jobs_running;  'EXECDBMS_IJOB.REMOVE('||JOB||');' ---------------------------------------------------------------- exec dbms_ijob.remove(261); ... exec dbms_ijob.remove(269);  9 rows selected.  SYS@ORCL>exec dbms_ijob.remove(261); PL/SQL procedure successfully completed. ... 

Killing the Database Sessions

As a next step, I’ll just kill the sessions of the running jobs:

SQL

SYS@ORCL>select 'alter system kill session '''||sid||','||serial#||''';' from v$session where sid in (select sid from dba_jobs_running);  'ALTERSYSTEMKILLSESSION'''||SID||','||SERIAL#||''';' -------------------------------------------------------------------------------- alter system kill session '120,73'; ... alter system kill session '159,68';  9 rows selected.  SYS@ORCL>alter system kill session '120,73'; System altered. ... 

And the running jobs are gone:

SQL

SYS@ORCL>select count(*) from dba_jobs_running;    COUNT(*) ----------          0 

2. The Mystery of the Respawning Jobs

I’m thinking this is fine, but after a few seconds, the jobs are back!

Identifying the Ghost Processes

I found this behavior quite intriguing, so I searched the net and found a really nice blog explaining in great detail how DBMS_JOB behaves. That gave a hint that the OS process may actually keep the database processes alive.

So I think, that sounds reasonable—let’s kill the OS processes instead.

SQL

SYS@ORCL>!ps -ef | grep ora_j oracle   16234     1  0 12:58 ?        00:00:00 ora_j000_ORCL ... oracle   16250     1  9 12:58 ?        00:00:07 ora_j008_ORCL  SYS@ORCL>!kill -9 16234 16236 16238 16240 16242 16244 16246 16248 16240 16252 Done! 

Hmm, but not really—after a while, the jobs are back from the dead under different Oracle PIDs and different OS PIDs.

3. Desperate Measures: Tuning JOB_QUEUE_PROCESSES

How do I get rid of them? Of course! Why hadn’t I thought of this before? I’ll set job_queue_processes to 0. Kill the processes and they will not respawn.

Forcing the Queue to Zero

SQL

SYS@ORCL>alter system set job_queue_processes=0; System altered.  -- Killed all processes again.  SYS@ORCL>select count(*) from dba_jobs_running;   COUNT(*) ----------          0  SYS@ORCL>alter system set job_queue_processes=10; System altered.  SYS@ORCL>select count(*) from dba_jobs_running;   COUNT(*) ----------          9 

Oops, they’re back again! Notice that the jobs were gone from the DBA_JOBS table, and the running jobs were referencing non-existent job IDs. How can they be respawning?

4. The Solution: Don't Forget to Commit

I decided to expand my knowledge about jobs and read the whole of James Koopmann’s blog. And then I saw the light! User JB mentioned in the comments that one has to run COMMIT after manipulating jobs.

So after all of the above, I just typed...

SQL

SYS@ORCL> commit; Commit complete. 

...then killed the jobs again, and bingo! Those pesky jobs were finally gone.

Lessons Learned

I’ve done the same procedure in the past using DBMS_JOB and never faced the issue, but I realize that I always switched to the SYS user in order to kill sessions, so it was autocommitted upon reconnect.

It is important to mention—the respawned jobs were not doing what they were supposed to do; they were not de-queuing messages. The database version I’m using is 10.2.0.4, and it’s true for both DBMS_JOB/IJOB, as well as REMOVE and BROKEN procedures. After all, things are as simple as a commit, aren’t they?

Oracle Database Consulting Services

Ready to optimize your Oracle Database for the future?