What's in your Exadata Smart Flash Cache?

Christo Kutrovsky

April 21, 2011

One of the exclusive Exadata features is the Smart Flash Cache ( Oracle White Paper PDF). On a full rack, there is 5 TB of flash cache, which can store a significant amount of data. Quite often it's several times more than the working set for a given reporting system.

What's so cool about the Exadata Smart Flash Cache?

This flash cache is quite smart: it can differentiate between full table scans and single I/O reads. There's been speculation that the cache can even prioritize things like file headers, controlfile headers and index root blocks, but I have no official information on the subject. The granularity is all the way to the object level, where you can define higher priority for caching (KEEP) or not to cache at all (NONE). NOTE: When you use the KEEP option on objects, even full table scans will become cacheable, with reads happening concurrently on flash cache and disk for maximum throughput. To achieve the 51 GB/sec marked scan speed, the objects need to have that option. Unfortunately, there is no easy way to monitor what's in that cache. All Oracle has provided is a "list flashcachecontent" command in the cellcli tool, which has no summarization options, and only displays object numbers. For example: [sql] CellCLI> list flashcachecontent where objectNumber = 130387 detail; cachedKeepSize: 0 cachedSize: 16384 dbID: 1890751346 dbUniqueName: QADW hitCount: 14 missCount: 1 objectNumber: 130387 tableSpaceNumber: 5 [/sql] Thus, why I wrote this handy little tool. It allows you to query the cell flash content on all cells, in a similar manner that you can query the buffer cache (db_cache) contents in v$bh. The way it works is by using 11g's new External table feature "preprocessor" and a perl parser script that executes the commands simultaneously on all cell nodes. Installation is very simple. Considering a standard Exadata installation:

On any compute node, create the /tmp/pythian directory and download pythian_cell_cache_extract.pl (link) into it
Make it executable: chmod +x /tmp/pythian/pythian_cell_cache_extract.pl
As the "oracle" user, copy the /tmp/pythian directory to all compute nodes in /home/oracle: for n in $(/u01/app/11.2.0/grid/bin/olsnodes) ; do scp -rp /tmp/pythian $n:/home/oracle ; done
Ensure there are no prompts and that you can reach every cell: perl -ne '/cell="([\d.]+)"/ && system "ssh celladmin\@$1 hostname"' /etc/oracle/cell/network-config/cellip.ora
As a DBA user, create an Oracle directory object pointing to that directory: [sql]create directory pythian_cc as '/home/oracle/pythian';[/sql]
Create the external table: [sql] create table pythian_cell_cache_usage ( cellNode varchar(50), cachedKeepSize number, cachedSize number, dbID number, dbUniqueName varchar2(30), hitCount number, missCount number, objectNumber number, tableSpaceNumber number ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY pythian_cc ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE PREPROCESSOR pythian_cc: 'pythian_cell_cache_extract.pl' nologfile nobadfile FIELDS TERMINATED BY '|' ) LOCATION ('.')/*location mandatory, using dummy*/ ) REJECT LIMIT UNLIMITED ; [/sql]
Flash cache information is now available in the pythian_cell_cache_usage table

I've provided this sample report query, but feel free to customize: [sql] select t.name, o.owner, nvl(o.object_name, '('||max(cc.objectnumber)||')') name, o.object_type, round(sum(cachedkeepsize)/1024/1024) keep_mb, round(sum(cachedsize)/1024/1024) tot_mb, round(ratio_to_report(sum(cachedsize)) over ()*100) "%", sum(hitcount) hits,sum(misscount) misses, round((sum(hitcount)/nullif(sum(hitcount)+sum(misscount),0)*100)) hit_ratio from pythian_cell_cache_usage cc, v$tablespace t, dba_objects o where cc.objectnumber = o.data_object_id (+) and cc.TABLESPACENUMBER = t.ts# (+) group by t.name, o.owner, o.object_name, o.object_type order by tot_mb desc; [/sql]

Sample output

TABLESPACE NAME	OWNER	NAME	OBJECT_TYPE	TOT_MB	%	HITS	MISSES	HIT_RATIO
DATA1	DW	FACT_1	TABLE SUBPARTITION	434834	22	112603679	31810169	78
DATA1	DW	SALE_1	TABLE SUBPARTITION	287343	14	28676301425	2540494574	92
DATA1	DW	FACT_2	TABLE SUBPARTITION	253586	13	221315469	101387498	69
DATA2	DS	TRAN	TABLE SUBPARTITION	218011	11	183859969	1242401286	13
DATA2	DS	BACK1	TABLE SUBPARTITION	132479	7	56857485	265896016	18
DATA2	DS	PROD	TABLE PARTITION	114091	6	267917339	504285619	35
DATA1	DW	FACT_1_OLD	TABLE SUBPARTITION	111587	6	600855731	64207493	90
DATA1	DW	DIM_1	TABLE PARTITION	85746	4	67589302	1919066582	3
NYN_DATA	DS	BAK_1_PK	INDEX	44406	2	9003500	1939610	82
NYN_DATA	DS	LOG1	TABLE SUBPARTITION	26926	1	709001	5225315	12
NYN_DATA	DS	TRAN_ID	INDEX SUBPARTITION	24260	1	185520035	860841	100
NYN_DATA	DS	PRODUCT_PK	INDEX	19030	1	22859063	529640	98
NYN_DATA	DS	PROCESS_PK	INDEX	16092	1	3631500	434151	89

What conclusions can we extract from this information, and can we tune the smart flash cache?

Determine if high space use objects have low hit ratio - Some workloads have such usage patterns that perhaps cannot benefit as much from the smart flash cache. Imagine a data usage scenario where historical data access is much higher than recent data access. However recent data access is still predominant. Using the data from cell flash cache usage, you will be able to determine that recent partitions have high hit ratio compared to historical ones. An action item from such application behavior could be to make recent partitions in "keep" mode.This query can be handy in that case (PARTITIONED tables only - a bit slow to run): [sql] with cc as ( select dbid, dbuniquename, t.name,objectnumber, sum(cachedkeepsize) cachedkeepsize, sum(cachedsize) cachedsize, sum(hitcount) hitcount, sum(misscount) misscount from pythian_cell_cache_usage c, v$tablespace t where c.TABLESPACENUMBER = t.ts# group by dbid, dbuniquename, t.name, objectnumber), o as (select distinct o.data_object_id, o.owner, o.object_name,o.object_type, tsp.table_owner, tsp.table_name, tsp.partition_name, ds.bytes from dba_objects o, dba_tab_subpartitions tsp, dba_segments ds where tsp.table_owner = o.owner and tsp.table_name = o.object_name and tsp.subpartition_name = o.SUBOBJECT_NAME and ds.owner = tsp.table_owner and ds.segment_name = tsp.table_name and ds.partition_name = tsp.subpartition_name and o.object_name = 'YOUR_TABLE' and o.owner = 'YOUR_OWNER' ) select cc.name, o.owner, o.table_name, o.partition_name, o.object_type, round(sum(cachedkeepsize)/1024/1024) keep_mb,round(sum(cachedsize)/1024/1024) tot_mb, round(sum(cachedsize)/nullif(sum(o.bytes),0)*100) "%obj C",round(ratio_to_report(sum(cachedsize)) over() * 100) "%", sum(hitcount) hits, round(sum(hitcount)/sum(nullif(cachedsize,0))*1024*1024) hits_p_mb, sum(misscount) misses, round((sum(hitcount) / nullif(sum(hitcount) + sum(misscount), 0) * 100)) hit_ratio from cc, o where cc.objectnumber = o.data_object_id group by cc.name, o.owner, o.object_name, o.object_type, o.table_name, o.partition_name order by tot_mb desc; [/sql]
Determine if your important objects are getting their fair share of cache space - If you have ETL jobs running at night, and user queries during the day, it's very likely that the nightly running jobs are polluting the cache with data that will not be accessed again. One example is data in staging tables. The information about flash usage can help determine if this is happening, and you can avoid such tables making it into the flash cache.
Determine how much of your "keep" objects you are using - If you are already leveraging the ability to have different cache policies based on importance, then this information will be vital. You will be able to determine if kept objects indeed have high hit ratios (saved IOs). The cell_cache_usage table is also the only way to determine how much space your "kept" objects are actually consuming, and how many more you can "keep".
See the cache hit ratio of keep objects - calculate "saved IOs" in comparison to other objects Identify high consumer objects and consider compressing them to trade cache space for CPU time. Hit ratios are very relevant for the flash cache, as they are less likely to be artificially inflated, since there's already a cache layer above that will absorb most abuse. Any "cache hit" in the flash cache is a saved IO. This is not the case with the buffer cache where it's normal to re-visit the same block multiple times during a single query. An example is a nested loop join with an index. The root index block is accessed many times.

Known issues

Some object numbers do not exist in the database. The query still accounts for those, and tries to match the tablespace, but only gives the "max" object_id for further research
Multiple databases on the same Exadata machine would require extra joins in the query
Hit and miss statistics are totals since last cell startup. To accurately use this information, snapshots needs to be taken and only deltas analyzed

UPDATE 2011/May/06 Please note that the tool has been updated to v1.12. This solves an error related to Oracle's new security model and a perl safety check. Post a comment below or on the tool home page and tell us what you think.

Insight and analysis of technology and business strategy

What's in your Exadata Smart Flash Cache?

What's so cool about the Exadata Smart Flash Cache?

Sample output

What conclusions can we extract from this information, and can we tune the smart flash cache?

Known issues

Top Categories

Tell us how we can help!