Upgraded to 11GR2? Congrats! You are in Direct Reads Trouble!
Nov 14, 2012 / By Yury Velikanov
I made this blog title intentionally provocative. However it isn’t far from the truth. Don’t believe me? Continue reading :)
Emergency call
I came up with the graph bellow to demonstrate the problem. I was called on an emergency to help a desperate client to take their Oracle e-Business Suite system’s performance under control yesterday. After researching for several hours we found the root cause of the problem. The graph reflects direct reads percentage vs all physical reads. You would say looking on aggregated data and percentages is a bad style and waste of time. Well … I will leave it to you to decide. I think the graph demonstrates the problem very well and this is all that counts.

You do not need to be a scientist to tell when the system has been upgraded to the 11.2.0.3 version. It was kind of obvious looking on the graph, right? Well it took us a bit of time to find where the problem was. To save your time I put an SQL together that you can use to build a graph just like this one.
Did you migrate your Oracle database to 11.2.0.2 or a higher version recently? “Are you ready see how deep the rabbit hole goes?” (c) Matrix. Here you go :)
direct_reads_11g_problem.sql
If you see a significant change in the pattern after upgrade to 11G, you may be interested to have a look on the next set of statistics.

The blue line represents the Physical Reads statistic. The red line are Direct Reads. Did you notice that Physical read volume increased from 100MB/s to 400MB/s? Well my client didn’t just notice it. An expensive storage array’s resources got saturated and the whole user community were paralyzed for several days. A good portion of the first day the client spent working with storage vendor as all other Oracle IOs (e.g. log file sync, db file scattered read, db file scattered read etc) became very slow . Than many hours work with Oracle Support followed. After all that the client called us as they had run out of ideas. If we would have the SQL handy we would avoid 1 day of troubleshooting efforts and focus our efforts on how to fix the issue. Keep the script handy. Who knows it may save you some troubleshooting efforts.
Other recently migrated client
Following up on the emergency call’s results, out of curiosity, I decided to check another client’s environment recently migrated to 11.2.0.3 version. Have a look. I think I don’t need to make too many additional comments.

and the Direct Reads contribution to total Physical Reads graph for the same database looks like the following

Quick fix
We are still in a discussion on how to resolve the issue for good. The following two actions have been implemented as a quick fix to give a bit of relief to the system’s user community.
- Disable a new functionality related to Direct reads enhancement introduced in the 11.2.0.2 version. Google for the event number and you will find several related blog posts including Tanel’s blog post here.
SQL> alter system set events '10949 trace name context forever';
- [see my comment bellow]
Switch offAutomatic Memory Management. IMHO: it should be switched off for any serious Oracle database. It introduces more problems than it adds value
High 'direct path read' waits in 11g [ID 793845.1]
NOTES:
- Be aware that the script is AWR based. Therefore a database should have Diagnostic licence in order to use it
- If you don’t have the Diagnostic licence just use the idea and convert it to STATSPACK ( don’t forget to share it with me and others :)
- Please do not hesitate to drop me a message if you spot any mistakes (I know I do those all the time) or have related comments
Let me know if the script I have introduced to you in this blog post helped you to spot something interesting in your system.
======================================================
Added by Yury on 2012.11.24. As result of internal discussion with Pythian folks (special thanks to Marc Billette)
Additional Comments related to this blog post
- % of DPR doesn’t mean anything. High number could be a good or bad thing for different systems. The answer as always …. It depends :)
- However in my clients’ case the DPR increase was accompanied by SIGNIFICANT physical IO increase (see my blog post second graph).
- As the change in behavior happened right after 11G upgrade it was enough to points us to the temporary fix
- A single serial DPR is more efficient (cheap from resources point of view) than traditional Oracle physical Read (db file sequential read or db file scattered read).
- If it is given that a process MUST read X physical blocks and none of table blocks are cached at the moment then DPR will be faster and utilize less resources on the server.
- However DPR doesn’t cache anything. Therefore none of others processes can share the work the process did.
On Exadata
- The Exadata designed to make a lot of DPR to leverage data processing offloading to cells hosts
- In fact I think the new serial direct reads decision mechanism was implemented because and for the Exadata (just guessing here)
- High DPR just an indication that an environment doesn’t use buffer cache. That’s all. Is it good or bad ….. It as always … depends :)
Just an idea
16 comments on “Upgraded to 11GR2? Congrats! You are in Direct Reads Trouble!”
Leave a Reply
You must be logged in to post a comment.

Also great example to the benefits of visualizing your performance metrics.
+1 to disabling automatic memory management. Other than very small, inconsequential, test databases, I have yet to see one instance where it is worth even considering!
The thing that has to be thought out is this: automating-this-or-that does not immediately imply that the target will be ideally tuned!
“Automatic management” is *NOT* a synonym for “automatically tuned”! Far from it.
Automation takes some time to recognize a change in conditions that warrants a change in parameters. In turn, that change of parameters may involve extensive re organization of the memory and its consequent use by other sections. That consequent change in turn may trigger other boundaries that cause further automation changes, with the consequence that the system may easily be chasing its tail trying to self-tune for a situation that *might* have been relevant hours ago!
Add to that an application that might work on daily cycles like e-business suite, and you got a test case for wild oscillations.
In electronics, this situation is called a self-oscillation: a condition where control feedback in a closed system reaches and affects the input processing waaaaay after it should have done so. And the system promptly enters into potentially destructive oscillation. I say potentially destructive because that might be the intention: oscillators and wave generators work exactly on such – self-oscillation through in-phase feedback, also called positive feedback.
As opposed to out-of-phase feedback which acts to control a system and not let it oscillate outside of a small range.
Unfortunately, most current “automated” software setups of late seem to suffer from a total lack of knowledge of their developers of these basic engineering principles. ANY closed system is susceptible to oscillation through feedback, and software is certainly no exception.
I have yet to see ANY of my 11.2.0.3 databases have ANY I/O problems, quite the contrary in fact. But then again, there is not a single one of them that is using automatic memory management.
As rightly mentioned by Martin Berger (@martinberx) on tweeter http://bit.ly/RT5f1l the AMM wasn’t really disabled.
We just increased minimal limits for shared_pool_size=5G and db_cache_size=38G as the sga_target was still set to the higher value AMM was still enabled.
On the other hand it looks we could implement one of the changes to fix the issue as
– setting the event disables the Direct Reads related feature introduced in 11.2.0.2
– increasing db_cache_size impacts on the default default threshold value (2%) for the small tables
Well in this case the team didn’t want to take any changes and therefore both changes found their way in prod.
Yury
PS Now we are advising/planning to disable AMM in this envrionment for good in few weeks time :)
Hi Yury,
This post was informative great troubleshooting, just few additions
ASMM Enabled in large Oracle Apps environment indicates that ERP runs on a vanilla installation
above which there would be tonnes of application customizations that will make the ASMM confused
or mislead it with its ever confusing workload, In Prod never recommended, Sholud take the memory
advisories suggestion to set the SGA manually, and non-zero value on sga_target will ignore the
manual AMM settings, So sga_target=0 will be good :)
Whereever there are such vanilla instances running the below is one common mistake that goes unnoticed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
And, I most cases there would be GATHER_STATS_JOB(default with 10g & above) enabled instead of
FND_STATS, If its running on a vanilla installation the performance degrades again.
If the above case is true, then it should be disabled using below command
exec DBMS_SCHEDULER.DISABLE(‘GATHER_STATS_JOB’);
Below should be the concurrent program that should be enabled in that case.
Name of the Concurrent / DBMS job: Gather Schema Statistics
Executable Name/short Name: FNDGSCST
Schedule / Submit Interval: , From start of prior run.
Parameter Reference:
Schema Name=ALL
Estimate Percent=10
Degree=3
Backup Flag=NoBackup
History Mode=None
Gather Option=Gather
Keep other parameters as defaulted.
Good point Ajith in regards the vanilla EBS instance VS EBS with customizations. I will use it in the future in discussions with my clients. Thanks you for the hint.
Hi, thanks for sharing. can you please share how did you create these graphs out of the sql you shared ?
Take h_date column horizontal ax and pyh_reads/dir_reads OR R_PCT OR pyh_mb_s/dir_mb_s as source for vertical ax. You should be good with figuring the rest.
Nice article , thank you for your effort
Maybe not necessary to disable AMM, Yury. Pegging the minimal amount of db cache and minimal amount of shared memory will ensure things don’t oscillate violently between too wide values. You’re simply forcing the AMM to not go bersek and cause other issues. “Putting on the brakes”, so to speak.
I think you guys went the right way with these changes, although likely I wouldn’t have entirely disabled the direct reads – they might serve a good purpose in some cases. Putting the “brakes” on AMM should be enough to keep things under control. Definitely keep us posted on the results once you turn AMM off completely and any observations after that.
I’m agree, “not necessary to disable AMM”, setting minimum value for key memory pools can help, and always “set up” KEEP or RECACLE pool to appropriate size, they will not be adjusted by AMM and not all memory pools are automatically resized by AMM.
What actually I hate with AMM I can’t lock the SGA to prevent “paging out” to disk, in rare situation it could be handy. Honestly I never use this in production, but I like to try in QA for instance
I like to enable AMM, the potential performance gain from AMM are significant, but everything depending on “workload”. I recently disable AMM on one instance, simply “workload” contains a huge number of sort & read intensive operations and causing shifting memory between SGA and PGA very frequently, and then single session is taking a huge amount of memory, I’m fed up with this and disable AMM.
In my opinion, it is not bad idea to enable AMM, the potential performance gain from AMM is significant, but monitoring and “setting minimum size for certain memory area” is extremely important.
By the way compatibility could be issue like AMM and LINUX &HugePages, …
So lot of plus and minus, nothing is perfect, but I’m not agreeing that AMM is useless.
Andjelko Miovcic
It’s a feature that I’m unconvinced about.
I understand that benefit that it’s meant to deliver but in reality I’ve seen it cause more problems that it resolves (but that’s because we get called about the problems, right?).
+1 for turning off AMM regardless. It probably works well on perfect systems but who’d know?
+1 for increasing the size of the buffer cache where possible. That’s my advice.
The last significant upgrade to 11gR2 that I was involved involved multiple testing cycles eventually led to a 4x increase in the BC – not just because of direct path reads – but that increase meant direct path was generally fine when used.
Should really be picked up by proper testing cycles in advance of migration.
Similar story from Coskan:
http://coskan.wordpress.com/2011/08/15/what-happens-when-you-did-not-do-a-load-test-before-release-upgrades/
Of course, the other factor of the feature is the “adaptive” part of “adaptive serial direct reads” where the behaviour can adapt to what is actually cached.
I blogged observations on this in a prod system where the plan didn’t change, the buffer cache size didn’t change, just the amount of data cached (probably) changed:
http://orastory.wordpress.com/2012/06/19/flippin-witch-hunt-adaptive-direct-path-read/
Thanks Dominic for reading, commenting and sharing good information sources with myself and the rest of community. I have added yours and Coskan’s blogs to my Google Reader.
I’ve seen this happen in two large databases that have been upgraded to 11GR2.
What seems to happen is that memory is very heavily allocated to the PGA and you end up with a very small SGA as a result. Hence you end up with ever increasing Direct Reads.
You can set SGA_TARGET and PGA_TARGET (which are minimum values) so that you leave only a small portion of your MEMORY_TARGET for the database to automatically manage.
I have also seen a suggestion (which I have not tested yet) that setting Dead Connection Detection will mitigate this problem. The reasoning being that orphaned sessions in the database that consumed a lot of PGA memory are not being cleaned up, so you end up with more memory being allocated to the PGA over time unnecessarily .
Hey Paul. As I am working with EBS environments and those tends to be relatively important to the clients (read gets a bit more DBA time than others) I don’t use MEMORY_TARGET at all and probable will not use in the nearest future. One of the disadvantages of MEMORY_TARGET is the fact that you can’t use Huge Pages in such configuration. IMHO: For SGAs over 5-10GB it starting to impact performance significantly.
I never saw anything with reads, however I did find that oracle took way more memory that it was assigned in 11.2+, even when memory target is set to around 500M, the processes for the database would spill well over that value.
Turning AMM off didn’t fix the issue.
This was on AIX 6.
http://blog.vishalgupta.com/2011/08/19/direct-path-reads-11g-changed-behaviour/