Oracle RAC Cache Fusion Efficiency: A Buffer Cache Analysis
My apologies — there will be no broccoli in this post.
Many people would like to know how well their application will run in RAC. Would it be faster or slower? Would it run at all?
Well, I have a query that can answer that question. Thereâ€™s a caveat however. You have to first put your application in RAC, then the query can tell you how well it runs.
Here is a bit of theory: The whole idea behind RAC is centuries old, and comes from big empires in the past: divide and conquer.
Read more after the jump, a buffer cache status query and the logic behind it.
Split your workload in pieces and have multiple machines do it. Sounds easy, but when you have multiple workers on the same project, some workers will run into each other. Add to this the consistency requirements and running into each other becomes an even greater problem.
For example, when you build a skyscraper, you have many people working on different floors at the same time. This is efficient. But when they need to look at the master plan, they may interfere with each other as there’s only 1 copy of the master plan, and only so many people may be around it at one time to read it.
Enough theory. Basically, the more evenly your work is split, the more efficient RAC is. To rephrase this: the fewer blocks you share on your nodes, the better RAC will run.
I recently upgraded my buffer cache status query to be RAC-aware. It used to be a select and join from
dba_objects, and a few “group byâ€™s” to make it more readable. It basically showed which objects had cached data and how much, with some extra details of how much of that data was dirty (needs to be flushed to disk).
To upgrade the query to RAC, I joined
gv$bh so that I could see the other nodes caches. Then I decided to extend it a bit more, and include CR (consistent read, read at a point of time) and ASSM space management block details. Then my numbers stopped adding up as some blocks were cached on both nodes, others were not.
Then the real RAC-awareness came. I added the possibility of differentiating between a block being cached on one node, or two nodes. I came up with the concept of a shared and non-shared cached block.
This is the output of the Oracle buffer cache status with cache fusion details:
Hereâ€™s a legend:
Note that the very first line has no owner/object at all. Itâ€™s a summary of your ENTIRE cache on ALL nodes. I have also edited the table content so that it fits our column.
Note too that the RAC awareness supports only 2 nodes at this point.
SUBOBJECT_NAME â€“ in case of partitioned objects, this will be the partition name. If empty, either non-partitioned object or summarized data for a partitioned object. (N.B.: to make the table narrower, I renamed this to SN.)
D% – percent of the cache (for this object) that needs to be flushed to disk. Empty if under 1% – I added this for a clearer report.
DIRTY â€“ number of dirty buffers (if the object is checkpointed, that many writes will be needed).
MBYTES â€“ megabytes of cache this object is occupying (everything).
SHA â€“ number of blocks that are cached (shared) on both nodes.
SHA% – percent shared for the object.
% – percent of cache this object is using (everything).
PI â€“ number of blocks that are representing a â€œpast imageâ€ â€“ see RAC concepts â€“ blocks that were dirty on the current node, and were requested in exclusive mode (for modifications) on the another node.
CR â€“ number of blocks that are a CR (consistent read). A CR is usually a read in the past. When you run a query and an update is issued after your query started, you will need to create CR blocks, by replicating CUR (current) blocks and applying UNDO records to them.
CR_SHA â€“ same as above, but shared on both nodes.
ASSM â€“ number of blocks cached (for the object) that are used to manage intra-segment object space. Basically, space management â€œoverheadâ€.
And finally cfe% – cache fusion Efficiency (assuming 2 nodes). What does this number show you? It shows you how much of your object(s) you are accessing uniquely from one node. The higher the number, the more unique your access is.
Here are some examples to help you better understand it.
- Your lookup tables (detail tables) will have a low cfe. The details are needed on both nodes for display purposes.
- Your transaction tables should have a high cfe. You should be sharing less of that data.
- If a table has blocks that are constantly jumping from node to node due to concurrent updates (worst case for RAC), you would have a low cfe and a high number of PI blocks.
And here is the query. Itâ€™s a beast, but it runs relatively fast, and provides a lot of data. I usually try to fit as few lines as possible, as I want to concentrate on the results. This gives you a top 100, which I find more the sufficient in most case:
/* RAC Buffer cache (db cache + cache fusion) contents status Christo Kutrovsky - The Pythian Group */ select * from (select * from (select distinct o.owner, o.object_name, o.object_type as type, SUBOBJECT_NAME, round(case when sum(d_cnt) / sum(tot) * 100 >= 1 then sum(d_cnt) / sum(tot) * 100 end, 2) as "D%", sum(d_cnt) as dirty, round(sum(tot) * p.bs / 1024, 1) as mbytes, round(sum(cur_sha) * p.bs / 1024, 1) as sha, round(sum(cur_sha) / sum(tot) * 100, 1) as "sha%", round(sum(r) * 100, 1) as "%", round(sum(pi) * p.bs / 1024, 1) as pi, round(sum(cr) * p.bs / 1024, 1) as cr, round((sum(cr_sha)) * p.bs / 1024, 1) as cr_sha, round((sum(assm)) * p.bs / 1024, 1) as assm, round((1 - sum(cur_sha) / sum(tot) * 2) * 100, 1) as "cfe2%", sum(cur_x) as x from (select to_number(decode(temp, 'Y', 9, decode(status, 'free', 0, objd))) as objd, temp, count(nullif(dirty, 'N')) as d_cnt, sum(pi) as pi, sum(cr) as cr, round(avg(cr), 1) as cr_i, sum(cr_min_inst / nullif(i, 1)) as cr_sha, sum(assm / i) as assm, sum(assm) - sum(assm / i) as assm_sha, sum(xcur) as cur_x, sum(scur / nullif(i, 1)) as cur_sha, sum(tot) as tot, sum(r) as r from (select inst_id, file#, block#, temp, dirty, status, objd, class#, count(*) as tot, decode(status, 'cr', count(*)) as cr, case when status in ('scur') then count(*) end as sha1, decode(status, 'pi', count(*)) as pi, count(distinct inst_id) over(partition by class#, file#, block#) as i, sum(decode(status, 'cr', count(*))) over(partition by inst_id, file#, block#) as cr_min_inst, decode(status, 'xcur', count(*)) as xcur, decode(status, 'scur', count(*)) as scur, case when class# in (8, 9, 10) then count(*) end as assm, ratio_to_report(count(*)) over() as r from gv$bh group by inst_id, file#, block#, status, temp, dirty, objd, class#) group by decode(status, 'free', 0, objd), temp) h, (select owner, object_name, subobject_name, object_id, data_object_id, object_type, row_number() over(partition by data_object_id order by object_type) rn, 'N' as temp from dba_objects where data_object_id > 0 union all select ' ', '<<<FREE BLOCKS>>>', null, null, 0, null, 1, 'N' from dual union all select ' ', '<<<ROLLBACK>>>', null, /*to_char(rownum)*/ null, 4294967296 - rownum, '', 1, 'N' from dual connect by dummy = dummy and rownum < 100 union all select ' ', '<<<TEMP SEGMENT>>>', null, null, 9, null, 1, 'Y' as temp from dual) o, (select value / 1024 as bs from v$parameter where name = 'db_block_size') p where o.data_object_id = h.objd and o.rn = 1 and o.temp = h.temp --and o.owner not in ('SYS','SYSTEM') group by p.bs, rollup((o.owner, o.object_name, o.object_type), (SUBOBJECT_NAME))) order by mbytes desc) where rownum <= 100