Oracle Trace Files in User Directories

Mar 21, 2013 / By Babette Turner-Underwood

Tags: , ,

I encountered an unusual situation recently where Oracle was writing trace files into the user directory instead of the usual oracle diagnostic destinations. Our monitoring software indicated that the “/home” directory was filling up. When I checked the space, I saw it had become 85% full.

$ df /home
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/home 20642428 16503552 3090300 85% /home

Digging further, I found the Pythian user was using a large amount of space. This was very unusual since the monitoring software we use has a very small footprint and the log files rotate so as not to use a lot of disk space.

$ cd home
$ du -sh *
392K bturner
16K lost+found
....
4.5G pythian
...

Going even further, I found that an unfamiliar directory “oradig_pythian” had been created. I had never seen this directory before, and it was not related to our monitoring software.

$ cd pythian
$ du -sh *
2.4M avail
11M logs
4.3G oradiag_pythian
...

Drilling down the “oradiag_pythian”, I found an usually named directory “host_1460457767_80″, which was full of subdirectories that were similar in name to the Oracle diagnostic directories.

$ pwd
/home/pythian/oradiag_pythian/diag/clients/user_pythian/host_1460457767_80
$ du -sh *
2.9G alert
4.0K cdump
4.0K incident
4.0K incpkg
4.0K lck
276K metadata
4.0K metadata_dgif
4.0K metadata_pv
4.0K stage
4.0K sweep
1.4G trace

The only file in the “trace” directory was a sqlnet.log file. This 1.4GB file was being written to many times per minute, complaining of a missing Oracle diagnostic directory.

$ ls -ltr trace | tail -3
total 1466924
-rw-rw---- 1 pythian pythian 1500659093 Mar 13 09:36 sqlnet.log
$ cd trace
[pythian@bsn-s-ora11g-1 trace]$ tail -10 sqlnet.log
Wed Mar 13 09:42:04 2013
Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]
Wed Mar 13 09:42:04 2013
Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]
Wed Mar 13 09:42:05 2013
Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]
Wed Mar 13 09:42:05 2013
Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]
Wed Mar 13 09:42:05 2013
Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

In addition, the “alert” directory was full of “log.xml” files.

$ ls -ltr alert | tail -3
-rw-r—– 1 pythian pythian 10485849 Mar 10 14:43 log_293.xml
-rw-r—– 1 pythian pythian 10485849 Mar 12 19:05 log_294.xml
-rw-r—– 1 pythian pythian 2915094 Mar 13 09:36 log.xml
$ ls -lt alert | tail -3
-rw-r—– 1 pythian pythian 10485849 Jul 16 2012 log_3.xml
-rw-r—– 1 pythian pythian 10485849 Jul 15 2012 log_2.xml
-rw-rw—- 1 pythian pythian 10485895 Jul 14 2012 log_1.xml

Checking the contents of the most recent log.xml file also showed that log files were being written to and being filled up with messages of the missing directory.

$ tail -40 log.xml
host_addr='10.20.30.40'>
Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

In checking for directory, the first directory “/u01/app/oracle/product/11.2.0/db_1/log” did exist. The second directory “/u01/app/oracle/product/11.2.0/db_1/log/diag/clients” did not exist.

$ ls -l /u01/app/oracle/product/11.2.0/db_1/log
total 8
drwxr-xr-t 3 oracle oinstall 4096 May 10 2012 client_hostname
drwxrwxr-x 3 oracle oinstall 4096 May 10 2012 diag

$ ls -l /u01/app/oracle/product/11.2.0/db_1/log/diag/clients
ls: /u01/app/oracle/product/11.2.0/db_1/log/diag/clients: No such file or directory

$ cd /u01/app/oracle/product/11.2.0/db_1/log/diag
$ ls -l
total 8
-rw-r—– 1 oracle oinstall 16 Mar 6 14:32 adrci_dir.mif
drwxr-x— 2 oracle oinstall 4096 Nov 18 22:21 rdbms

I decided to create the directory and see what happened. Nothing happened. And by “nothing”, I mean “nothing” in a good way. Oracle stopped writing to the sqlnet.log file and the log.xml files to indicate the directory was missing.
At the same time, nothing was created in the new clients directory.

$ cd /u01/app/oracle/product/11.2.0/db_1/log/diag
$ mkdir clients
$ cd clients
$ ls

Seems like it was spending a lot of time complaining about a missing directory that it did not need to use. So why did Oracle create the “oradiag_pythian” directory and start complaining about the missing “client” directory?

I was unable to find anything on metalink that would explain this behavior. Thankfully, I still had the FIRST log.xml file that was created, and there was some clue there as to why these files were created.

$ head -40 log_1.xml

Create Relation ADR_CONTROL

Create Relation ADR_INVALIDATION

Create Relation INC_METER_IMPT_DEF

Create Relation INC_METER_PK_IMPTS

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log]
[/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log]
[/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

<msg org_id=’oracle’ /=” time=’2012-06-18T13:41:01.773-04:00′
type=’UNKNOWN’ level=’16′ host_id=’10.20.30.40′

Using the information in this first log file, I was able to find a similar incident that had occurred for someone else, and they had been able to track it down to Oracle bug 58917 (which I could not access on Metalink).

In the related incident ( https://savannah.cern.ch/bugs/?58917 ), they noted that this had occurred following an ORA-24550.

For the record: the Oracle signal handler producing these ORA-24550 can probably be completed disabled (e.g. if one wants to use the ROOT signal handler) by setting DIAG_SIGHANDLER_ENABLED=FALSE.

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>