The Pitfalls of Oracle’s Automatic SGA Management

Dec 14, 2006 / By Raj Thukral

Tags: , ,

Of late, Oracle has had the tendency to automate a lot of the usual DBA tasks.

I didn’t complain on autoextensible datafiles, I just had to make sure my disk had enough space to handle the maximum potential space requirement. I also didn’t complain too loudly when a file extended to a size that triggered an Oracle bug and caused the database to crash irrecoverably. It was 8i after all, and things have improved since. Or have they?

I was mildly amused by automatic PGA and for the most part it worked well.
The default dba_scheduler job to refresh statistics interfered with my own job to gather statistics, and the schedule was the worst possible for the db I first saw it on, after an upgrade from 9i to 10.1 — but that was a minor irritant and quickly resolved.

However, I have to draw the line at automatic SGA management. I thought it seemed like a good feature and left it at its default in a new 10.2.0.2 database that I recently set up. As production ramps up on the db, I can see how a combination of application issues (non-bound SQL) and automatic SGA can quickly cause problems. The database has been running for about a week now. The application has very little bound SQL and our best efforts to get the developers to start binding SQL have been mostly unsuccessful, since they are hard–pressed to maintain the legacy code, which is mostly spaghetti.

So here’s how the SGA looks after one week of running in production:

SQL> show sga

Total System Global Area 7516192768 bytes
Fixed Size                  2082856 bytes
Variable Size            4513073112 bytes
Database Buffers         2952790016 bytes
Redo Buffers               48246784 bytes
SQL>

Wow… 4.5G of the 7.5G or so is “Variable Size” — I would expect that the db cache should have been bigger than the variable size, but no. The db cache, which started out at 6G+ is down to half its size.

Lets see where the variable size is allocated:

SQL> select component,current_size/1024/1024 as "Size MB" from v$sga_dynamic_components
2  /

COMPONENT                                                           Size MB
---------------------------------------------------------------- ----------
shared pool                                                            4272
large pool                                                               16
java pool                                                                16
streams pool                                                              0
DEFAULT buffer cache                                                   2816
KEEP buffer cache                                                         0
RECYCLE buffer cache                                                      0
DEFAULT 2K buffer cache                                                   0
DEFAULT 4K buffer cache                                                   0
DEFAULT 8K buffer cache                                                   0
DEFAULT 16K buffer cache                                                  0
DEFAULT 32K buffer cache                                                  0
ASM Buffer Cache                                                          0

13 rows selected.

SQL>

4.2G in the shared pool. It was just under 4G yesterday, it’s still growing!
Oracle thinks it can make queries run better by not having to parse them repeatedly, so it grows the shared pool to keep as many queries as it can. Problem: the shared pool is now 4G+. That takes a while to trawl through, which of course adds to the spins on the library cache lock and pushes up CPU usage. And of course I/O usage is high(er) because the db cache is now lower. A lose-lose situation I think. I’m going to get into more detail on this when I have the time, but for this db, I think I’m going to switch to manual SGA and set a hard limit of 500M-1G on the shared pool. There isn’t any point in trying to hold on to what is going to be mostly–unique SQL anyway.

Oracle, please spare me the “automatic” stuff unless you can make it a little more intelligent!

–Raj.

3 Responses to “The Pitfalls of Oracle’s Automatic SGA Management”

  • Noons says:

    Raj, there is much worse to come!

    Wait until you have to run multiple schemas in same db, with same table names.
    If you’re using ANY of the automatic sga and pga management features since 9ir1, it’s odds on that on a loaded system you’ll see the bugs I alert to in my blog.

    They are caused by appaling QA of the pga_aggregate code and other auto memory management parameters.

    As well: try to use BLOBs on ASSM tablespaces in any 9i or 10g and aside from the very latest – and unproven – patch releases, I can virtually guarantee you’ll get a corrupt LOB segment as soon as the load goes up!

    And Oracle wants us to go through all the trouble of installing regular “patch releases” when they can’t fix the most basic of basics? Helllooooo?

    All I can say at this stage is: use “auto-anything” with a grain of salt bigger than Ben-Hur…

  • krisgopala says:

    Just going through this entry, a question comes up on my mind. You have mentioned that the application uses very little binding. Isn’t that a basic design flaw ? Is it fair for us to expect Oracle to automatically fix mistakes in application design?

  • Andy says:

    We ran into the issue you described too. The trick is to set minimum values for the pools, especially the default buffer cache, that way when your unbound queries dry up the other pools, they can only dry them up to a point, and over time they’ll balance out again.

    We’re using 10.2.0.3, and we’ve loaded many terrabytes of *lobs into ASSM, and we haven’t had any problems. In fact, it stores them more efficently than the OS was storing them…we use less space by putting them in the DB then we did when they were out of the DB. I think this is because the full OS block is allocated in the OS, but ASSM is a little more efficient. I wonder in what version you experienced issues, Noons? We have had the shared cursor issue you mentioned, that’s a scary one…and I know of at least 3 different bugs that cause that. Basicly, Oracle matches the sql text in the shared pool for your query, and doesn’t consider the schema that executed that query…so 2 scheams w/the same table name will get the same result, even if they have different data in the tables.

    I agree krisgopala, but the problem is the impression from the users…they don’t know about the unbound sql, they only see the db running slowly and they blame Oracle, and of course, the DBA. Still, by setting minimum values, you can compensate for the bad code…a little bit.

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>