Solving an uncommon Oracle error code - ORA-01041
So, here is the thing. We have a customer running a 11.2 Single Instance Oracle database on Windows. I know the version is a bit old and Windows may not be the most common choice to run an Oracle database, but this is not the point of the post. This customer asked us to back up the database directly to a CIFS volume using RMAN. This is a common and wise practice: save your database backups outside of the server where it is running. Piece of cake, the only caveat is that the Oracle Windows service must be running with the credentials of a domain account that has permission on the CIFS volume. Some basic research on MOS and voilà: How to Change Oracle Owner from Local System Account to Domain User Account in Windows (Doc ID 2035714.1). After obtaining the maintenance window and executing the action plan to change the services from under the local SYSTEM account to a domain account, a simple RMAN backup writing directly onto the CIFS volume works just fine:
RMAN> backup current controlfile format '\\backup-nas.domain.com\oracle\backups\oracle\ctl_file_text.bkp'; Starting backup at 16-08-2018 11:50:24 using channel ORA_DISK_1 channel ORA_DISK_1: starting full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set including current control file in backup set channel ORA_DISK_1: starting piece 1 at 16-08-2018 11:50:25 channel ORA_DISK_1: finished piece 1 at 16-08-2018 11:50:40 piece handle=\\backup-nas.domain.com\oracle\backups\oracle\CTL_FILE_TEXT.BKP tag=TAG20180816T115024 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:15 Finished backup at 16-08-2018 11:50:40 Starting Control File and SPFILE Autobackup at 16-08-2018 11:50:40 piece handle=G:\ORACLE\FAST_RECOVERY_AREA\PROTECT\AUTOBACKUP\2018_08_16\O1_MF_S_984311440_FQBOR0Y3_.BKP comment=NONE Finished Control File and SPFILE Autobackup at 16-08-2018 11:50:41We now reboot the server to make sure that everything comes up fine. Ooops!!
PS C:\Users\pythian.admin> sqlplus / as sysdba SQL*Plus: Release 220.127.116.11.0 Production on Thu Aug 16 11:01:38 2018 Copyright (c) 1982, 2017, Oracle. All rights reserved. ERROR: ORA-01041: internal error. hostdef extension doesn't exist
What happened?Good question, indeed. The OERR information is not life-saving:
Error: ORA 1041 Text: internal error. hostdef extension doesn't exist --------------------------------------------------------- Cause: Pointer to hstdef extension in hstdef is null. Action: Likely a known or new bug. Explanation: This is usually reported when a connection has broken for some reason. Diagnosis: 1) Check the same operation for any ORA 3113 or ORA 3114 type errors. The ORA 1041 error usually results from an unexpected disconnection. 2) Follow the same steps as you would to progress an ORA 3113Obviously, there are no ORA-3113 or ORA-3114 to be seen. Also, there is not much information about this error in MOS and Google does not help much, either. Although, one MOS note looked promising: ORA-1041 When Trying to Connect as Sysdba to Startup Database (Doc ID 552218.1). From the note:
The system time was set incorrectly. The local system time did not match the time on the domain server causing the authentication to fail.Why does this look promising? Because the database shows an uptime of one hour right after starting up, so something is changing the time where it shouldn't be changed.
Windows time service is fidgeting with my databaseAfter reviewing the Windows event log, I noticed a sudden jump in the time during the startup of the server. [caption id="attachment_105062" align="aligncenter" width="1298"] Note the jump in the time[/caption] This shows the Windows kernel updating the system time after the Windows Time service connects to the domain controller and synchronizes the server time with it. The problem with this is that it all happens after the Oracle services have started, hence the startup issue. To add to this hypothesis, after manually restarting the services after the server is up and running, everything works as expected.
Diagnosis and prognosticSo, the situation is as follows:
- Oracle Windows services work fine after being set up with a domain account.
- RMAN backups are working after the above setup.
- On Windows boot up, Oracle services start before the Window Time service can set the proper system time.
- After the system is up and running, restarting the Oracle services gets them up and running.