Oracle 10.2.0.3 coming soon, and a data guard corruption bug
Sep 22, 2006 / By Marc Fielding
It looks like Oracle has started testing the 10.2.0.3 patchset. A preliminary list of bugs fixed is at in MetaLink note 391116.1. The “important” bug fixes are
- corruption in NOCACHE LOB’s (bug 5212539, also fixed in 188.8.131.52 and upcoming 10.1.0.x release)
- wrong results in aggregate functions using the “hash group by” access path (bug 4604970)
- PGA corruption when using shared server (bug 5114396)
- Server handle leak in Windows (bug 5077897)
- workaround for changed locking behavior of SELECT FOR UPDATE queries in 184.108.40.206/10.1.0.4/10.2.0.1 (bug 4969880)
But the most serious issue is index corruption on databases upgraded to 10.1.0.5 through 10.2.0.2, when using data guard in redo apply mode. Paraphrasing note 386830.1, bad redo metadata for index blocks gets written, and is not detectable by standard corruption checks. If this same block is used to generate redo of its own (after a state change or instance recover, for example) the block may get corrupted. Errors will happen querying or updating such blocks on the standby in read-only mode or if the standby becomes a primary site. If the corrupt block is a bootstrap index, the database won’t start up at all.
For index corruption to occur, the following things must happen, in order:
- The database is upgraded from a pre-10.1.0.5 version to a version betweeen 10.1.0.5 and 10.2.0.2
- redo from an index block is applied elsewhere (typically a physical standby/data guard redo apply)
- the location where the redo was applied is modified and generates redo of its own (typically after a role change)
- Applying this newly-generated redo will result in corruption (typically done to the former primary database after a role change)
- Apply the one-off patch for bug 5380055 on your platform
- If you have a database that has already applied version 10.1.0.5+ redo (typically a physical standby), there are additional steps in note 386830.1 to “bump” the database SCN. This operation must be done in restricted mode, will require downtime, and can be dangerous, so be careful!