The Butterfly Effect
Apr 23, 2008 / By David Ashlock
If you check out the “Butterfly Effect” on Wikipedia, you’ll find a rather interesting reference to, “sensitive dependence on initial conditions in chaos theory.” Fascinating use of phrase that probably doesn’t mean much to normal people until it happens to you. I could give you lots of theoretical examples, but perhaps a real-life one will make more sense.
Last week, a client of Pythian’s came to us with an environment that had recently been upgraded from Oracle Applications 11.5.9 to 11.5.10. In the past, Pythian has not supported the Oracle Applications environment for this client, but that is one of the strengths of Pythian — we have DBAs with a broad range of knowledge and expertise to support just about anything thrown at us. The emails from the client users suggested a myriad of performance issues, ranging from forms being slow to POs not being processed.
I can sympathize with the client, as I used to be one (that’s how I came to work for Pythian — but it’s also another story that I shall relate sometime). Performance issues in Applications can be tricky at best, as there are so many diverse factors to consider. Not having supported Oracle Apps for this client before means that we were starting with a clean slate and just looking for things out of the ordinary based on experience with other clients, sort of “searching for a needle in a haystack.”
The first thing was to just login to the application and take a quick walk about the area — did anything stand out? What was the first impression? It took a bit of time to get connected initially, but the forms seemed to come up without any undue issues. One of the early comments in the email we had received mentioned that POs weren’t being processed by the workflows. Maybe a trip to “View Requests” was in order . . .
Ouch! About five minutes after launching my query I was able to get the results — now I felt the user’s pain. As bad as the result was, it did at least have a payoff: the Workflow Background Processes were failing, so there is someplace to start. Being a DBA can be like a doctor whose patients complain of random symptoms — our jobs are so much easier if something is quantifiably wrong. I’m thinking that doctors probably get excited when they see a broken bone that can be easily diagnosed. My thoughts exactly with these errors.
It wasn’t quite that simple — the log and out files were not being generated for the failures. Thwarted in my efforts, but not exactly at a dead end, I decided to take a quick peek at the Concurrent Manager log files. They should show some sort of detail about the Requests that were being processed. Sure enough, the Workflow Background Manager log file was 1100 pages long and chock full of the same, repeating error: it could not write to the
$APPLCSF/$APPLOUT directory on the server.
Feeling like the blind squirrel that found the nut, I logged into the server and verified that the directory did not exist. The
$APPLCSF/$APPLLOG directory did, but the
$APPLOUT one didn’t. I am speculating that during the upgrade from 11.5.9 to 11.5.10, the environment was changed to have context added to the
$APPLLOG directories, so that they were now in the format of
out/<sid>_<server> (perhaps to support multiple environments on this server at some future time, but once again I am speculating).
In this case, there was an out directory, but nothing else. The fix for this seemed rather straight-forward, so I created the directory and set the permissions on it. I also bounced the Concurrent Managers and ran the
cmclean.sql script for good measure. Immediately after the bounce, I was pleased to see that files were now showing up in the
I logged into the Application and verified that the Workflow Background Processes were completing normally and users were also able to confirm that POs were being created. All was well in Mudville . . .
Until I tried to query in the View Requests screen again — it still took five minutes, if not longer this time. I did a quick count on the
FND_CONCURRENT_REQUESTS table and was shocked to see that there were over 4.7 million records returned. Small wonder that the form was slow. Further investigation revealed that the last successful Purge of Concurrent Requests happened on October 3, 2007. There had been one that was scheduled for two days prior to my investigation, but naturally it had failed due to the missing directory.
After spending most of a weekend running the Purge requests on the environment in slices so that the data held in the
undo_tablespace didn’t get overwritten, I have this moral to the story: always be careful when making even the smallest configuration changes in Oracle Applications, lest the Butterfly Effect happen to your environment. I never had given much thought that a single directory could bring an entire Oracle Applications environment to its knees in a matter of days.