The best times to apply Oracle quarterly patches
When is the best time to apply an Oracle quarterly patch? I suppose the answer is, "It depends." As it turns out, we started to apply October 2018 Grid Infrastructure PSU for AIX the day after it was released and it was a disaster. There were library corruptions for GI and DB with Oracle Restart configuration. Oracle's only suggestion is to rollback which was easier said than done, since simple commands such as crsctl config would fail. Pretty ugly.
The Disaster: Library Corruptions on AIX
After applying the patch, we encountered immediate failures. Commands that should be routine began throwing fatal errors:
# crsctl config has exec(): 0509-036 Cannot load program crsctl.bin because of the following errors: rtld: 0712-001 Symbol ztca_Shutdown was referenced from module crsctl.bin(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol ztpk_SetKeyInfo was referenced from module /u01/11.2.0/grid/lib/libhasgen11.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol ztpk_DestroyKey was referenced from module /u01/11.2.0/grid/lib/libhasgen11.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol ztpk_Sign was referenced from module /u01/11.2.0/grid/lib/libhasgen11.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol ztpk_Verify was referenced from module /u01/11.2.0/grid/lib/libhasgen11.so(), but a runtime definition of the symbol was not found. rtld: 0712-002 fatal error: exiting.
Challenges with Oracle Support and Patch Integrity
To make matters worse, there was a breakdown in communication among Oracle's support engineers. I recall one engineer mentioned there were issues with the current patch release and, unfortunately, this was not documented in the SR.
Identifying Potential Points of Failure
Another contribution to the failure is that the patch download was corrupted since the checksum did not match and was not checked. Maybe there are no issues after all. Fast forward: We tried patching the environment again, only to have it fail for the second time. Finally, the same Oracle support engineer who did not think there were issues with patch release has now confirmed there are issues.
Lessons Learned and Best Practices for Patching
This experience served as a significant learning opportunity for our team. Here are the core takeaways:
- Don't be too hasty to patch: Waiting a few weeks for the community to "smoke test" a new release can save you from being the first to find a critical bug.
- Document everything: Always get the name of the engineer you are speaking with and ensure the conversation is documented in the SR.
- Trust, but verify: Oracle engineers are human too, and mistakes can happen.
- Always have a backup: This was our saving grace. How many backup GI and DB homes should you have before patching? As many as it takes to ensure a quick recovery.
Happy patching and stayed tuned! In the next post, I will share how we managed to recover from the disaster.
Oracle Database Consulting Services
Ready to optimize your Oracle Database for the future?
Share this
Share this
More resources
Learn more about Pythian by reading the following blogs and articles.
Oracle E-Business Suite Database Upgrade to 19c
Silent Installation of RAC 12c database
How to move Oracle DB to New Oracle Home on Same Windows Server
Ready to unlock value from your data?
With Pythian, you can accomplish your data transformation goals and more.