Oracle Database

Exadata's Best Kept Secret: Storage Indexes

2 min read

Jul 20, 2010 12:00:00 AM

While features like Smart Scan often steal the spotlight in Exadata discussions, Storage Indexes are a silent powerhouse of performance. Often mentioned only as a footnote in official documentation, they offer a unique advantage: unlike offloading, which optimizes workload processing on storage cells, storage indexes eliminate the need for I/O entirely.

On a typical production system, the impact can be staggering. Consider these real-world statistics:

SQL> select name,value from v$sysstat    2  where name in ('physical read total bytes',   3  'cell physical IO bytes saved by storage index');  NAME                                             VALUE ------------------------------------------------ ------------------ physical read total bytes                        468779565615616 cell physical IO bytes saved by storage index    251319174832128

In this instance, over a third of all I/O was avoided entirely because of storage indexes.

How Storage Indexes Work

Storage indexes function by dividing the storage on each ASM griddisk into small "chunks." They are not stored on disk; instead, they are an in-memory structure maintained by the storage server (cell) software.

The Creation and Maintenance Process

Initial Scan: The first time an entire chunk is read (typically during a full-table scan), the storage server records the highest and lowest (min/max) values for each column in that chunk.
Tracking: These values are stored in the storage cell's memory.
Updates/Inserts: As new data is inserted or updated, the storage server checks the column values against the existing min/max range and updates the index in memory if necessary.
Deletes: While deletes are trickier, it is likely that storage indexes are left as-is to avoid the heavy overhead of re-reading an entire chunk just to determine new boundaries.

The Power of Row Elimination

When a table scan request arrives, the storage server compares the fixed predicates in your WHERE clause against the min/max values for the relevant chunks. If the requested range falls outside a chunk’s boundaries, the storage server skips the I/O for that chunk entirely.

Why Storage Indexes are Effective

You might assume that a 1 MB chunk of data would contain such a wide range of values that an index would rarely be able to skip it. However, data in many systems is naturally "clustered."

The Correlation Factor

Data architects often use incrementing ID values. Because rows are generally inserted at the same time, their IDs (and often their dates) are physically stored close together. If you delete data in bulk or keep it indefinitely, your storage chunks naturally maintain highly correlated, narrow ranges of values. This clustering allows Storage Indexes to be incredibly efficient at eliminating unnecessary reads for specific ID or date ranges.

Advantages and Disadvantages

Storage indexes offer several benefits that traditional B-tree or Bitmap indexes cannot match, though they do come with specific trade-offs.

Key Advantages

Zero Storage Overhead: They consume no disk space.
Minimal Maintenance: They require negligible CPU to maintain compared to traditional indexes.
Automatic Coverage: They cover all commonly queried columns automatically, even in combination, without requiring a DBA to create them.

Known Drawbacks

Full Scan Dependency: They are designed specifically for full-table scans. Index-based access paths (like a primary key lookup) see fewer benefits.
In-Memory Only: Because they are in-memory structures, they do not survive a storage server shutdown and must be rebuilt after a reboot.
Security Restrictions: To prevent accidental data disclosure, storage indexes do not store min/max values for encrypted tables.

As DBMS products like Netezza and InfiniDB have shown, "I/O avoidance" is the next frontier in performance. On Exadata, Storage Indexes prove that sometimes the fastest I/O is the one you never have to do.