Friends of Pythian Referral Program - Earn up to $5000!

Is Oracle Smart Flash Cache a “SPOF”?

Posted in: DBA Lounge, Oracle, Technical Track

 

Oracle Smart Flash Cache (OSFC) is a nice feature that was introduced in Oracle 11g Release 2. As only recently I had a real use case for it, I looked into it with the main goal of determining if adding this additional caching layer would not introduce a new Single Point Of Failure (SPOF). This was a concern, because the solid-state cards/disks used for caching would normally have no redundancy to maximize the available space, and I couldn’t find what happens if any of the devices fail by looking in the documentation or My Oracle Support, so my decision was to test it!
The idea behind the OSFC is to provide a second level of “buffer cache” on solid-state devices that would have better response times compared to re-reading data blocks from spinning disks. When buffer cache runs out of space clean blocks (not “dirty”) would be evicted from it and written to the OSFC. The dirty blocks would be written by DBWR to the data files first, and only then would be copied to OSFC and evicted from the buffer cache. You can read more about what it is, how it works and how to configure OSFC in Oracle Database Administrator’s Guide for 11.2 and 12.1 and in this Oracle white paper “Oracle Database Smart Flash Cache“.

In my case the OSFC was considered for a database running on an Amazon AWS EC2 instance. We used EBS volumes for ASM disks for data files, and as EBS volumes are basically attached by networks behind the scenes, we wanted to remove that little bit of I/O latency by using the instance store (ephemeral SSDs) for the Smart Flash Cache. The additional benefit from using this would be reduction of IOPS done on the EBS volumes, and that’s a big deal, as it’s not that difficult to reach the IOPS thresholds on EBS volumes.

 

Configuration

I did the testing on my VirtualBox VM, which ran Oracle Linux 7.2 and Oracle Database 12.1.0.2 EE. In my case I simply added another VirtualBox disk, that I used for OSFC (reminder, not looking for performance testing here). The device was presented to the database via a separate ASM disk group named “FLASH”. Enabling the OCFS was done by setting the following parameters in the parameter file:

  • db_flash_cache_file=’+FLASH/flash.dat’
  • db_flash_cache_size=’8G’

The 1st surprise came when I bounced the database to enable the new settings, the DB didn’t start and an error was presented “ORA-00439: feature not enabled: Server Flash Cache”. Luckily, I found a known issue in a MOS note “Database Startup Failing With ORA-00439 After Enabling Flash Cache (Doc ID 1550735.1)”, and after forcefully installing two RPMs from OL5 (enterprise-release and redhat-release-5Server), the database came up.

 

Testing

The test I chose was a really simple. These are the preparation steps I did:

  • Reduced the buffer cache of the DB to approximately 700Mb.
  • Created table T1 of size ~1598Mb.
  • Set parameter _serial_direct_read=NEVER (to avoid direct path reads when scanning large tables. I really want to cache everything this time).

The next step was Full-scanning the table by running “select count(*) from T1”, and as I was also tracing the operation to see what was happening:

    • During the 1st execution I observed the following wait events (all multi-block reads from data files, as expected), however, I new the buffer cache was too small to fit all blocks, so a large volume of the blocks would end up in OSFC when they were flushed out from the buffer cache:
      WAIT #140182517664832: nam='db file scattered read' ela= 6057 file#=10 block#=90244 blocks=128 obj#=92736 tim=19152107066
      WAIT #140182517664832: nam='db file scattered read' ela= 4674 file#=10 block#=90372 blocks=128 obj#=92736 tim=19152113919
      WAIT #140182517664832: nam='db file scattered read' ela= 5486 file#=10 block#=90500 blocks=128 obj#=92736 tim=19152121510
      WAIT #140182517664832: nam='db file scattered read' ela= 4888 file#=10 block#=90628 blocks=128 obj#=92736 tim=19152129096
      WAIT #140182517664832: nam='db file scattered read' ela= 3754 file#=10 block#=90756 blocks=128 obj#=92736 tim=19152133997
      WAIT #140182517664832: nam='db file scattered read' ela= 8515 file#=10 block#=90884 blocks=124 obj#=92736 tim=19152143891
      WAIT #140182517664832: nam='db file scattered read' ela= 7177 file#=10 block#=91012 blocks=128 obj#=92736 tim=19152152344
      WAIT #140182517664832: nam='db file scattered read' ela= 6173 file#=10 block#=91140 blocks=128 obj#=92736 tim=19152161837
      
    • The 2nd execution of the query confirmed the reads from the OSFC:
      WAIT #140182517664832: nam='db flash cache single block physical read' ela= 989 p1=0 p2=0 p3=0 obj#=92736 tim=19288463835
      WAIT #140182517664832: nam='db file scattered read' ela= 931 file#=10 block#=176987 blocks=3 obj#=92736 tim=19288465203
      WAIT #140182517664832: nam='db flash cache single block physical read' ela= 589 p1=0 p2=0 p3=0 obj#=92736 tim=19288466044
      WAIT #140182517664832: nam='db file scattered read' ela= 2895 file#=10 block#=176991 blocks=3 obj#=92736 tim=19288469577
      WAIT #140182517664832: nam='db flash cache single block physical read' ela= 1582 p1=0 p2=0 p3=0 obj#=92736 tim=19288471506
      WAIT #140182517664832: nam='db file scattered read' ela= 1877 file#=10 block#=176995 blocks=3 obj#=92736 tim=19288473665
      WAIT #140182517664832: nam='db flash cache single block physical read' ela= 687 p1=0 p2=0 p3=0 obj#=92736 tim=19288474615
      

 

Crashing it?

Once the OSFC was in use I decided to “pull out the SSD” by removing the device /dev/asm-disk03-flash that I created using udev rules and that the FLASH disk group consisted of.
Once I did it, nothing happened, so I executed the query against the T1 table again, as it would access the data in OSFC. This is what I saw:

      1. The query didn’t fail, it completed normally. The OSFC was not used, and the query transparently fell back to the normal disk IOs.
      2. I/O errors for the removed disk were logged in the alert log, followed by messages about disabling of the Flash Cache. It didn’t crash the instance!
        Tue Dec 15 17:07:49 2015
        Errors in file /u01/app/oracle/diag/rdbms/lab12c/LAB12c/trace/LAB12c_ora_24987.trc:
        ORA-15025: could not open disk "/dev/asm-disk03-flash"
        ORA-27041: unable to open file
        Linux-x86_64 Error: 2: No such file or directory
        Additional information: 3
        Tue Dec 15 17:07:49 2015
        WARNING: Read Failed. group:2 disk:0 AU:8243 offset:1040384 size:8192
        path:Unknown disk
                 incarnation:0x0 synchronous result:'I/O error'
                 subsys:Unknown library krq:0x7f7ec93eaac8 bufp:0x8a366000 osderr1:0x0 osderr2:0x0
                 IO elapsed time: 0 usec Time waited on I/O: 0 usec
        WARNING: failed to read mirror side 1 of virtual extent 8191 logical extent 0 of file 256 in group [2.3848896167] from disk FLASH_0000  allocation unit 8243 reason error; if possible, will try another mirror side
        Tue Dec 15 17:07:49 2015
        Errors in file /u01/app/oracle/diag/rdbms/lab12c/LAB12c/trace/LAB12c_ora_24987.trc:
        ORA-15025: could not open disk "/dev/asm-disk03-flash"
        ORA-27041: unable to open file
        Linux-x86_64 Error: 2: No such file or directory
        Additional information: 3
        ORA-15081: failed to submit an I/O operation to a disk
        WARNING: Read Failed. group:2 disk:0 AU:8243 offset:1040384 size:8192
        path:Unknown disk
                 incarnation:0x0 synchronous result:'I/O error'
                 subsys:Unknown library krq:0x7f7ec93eaac8 bufp:0x8a366000 osderr1:0x0 osderr2:0x0
                 IO elapsed time: 0 usec Time waited on I/O: 0 usec
        WARNING: failed to read mirror side 1 of virtual extent 8191 logical extent 0 of file 256 in group [2.3848896167] from disk FLASH_0000  allocation unit 8243 reason error; if possible, will try another mirror side
        Tue Dec 15 17:07:49 2015
        Errors in file /u01/app/oracle/diag/rdbms/lab12c/LAB12c/trace/LAB12c_ora_24987.trc:
        ORA-15025: could not open disk "/dev/asm-disk03-flash"
        ORA-27041: unable to open file
        Linux-x86_64 Error: 2: No such file or directory
        Additional information: 3
        ORA-15081: failed to submit an I/O operation to a disk
        ORA-15081: failed to submit an I/O operation to a disk
        WARNING: Read Failed. group:2 disk:0 AU:8243 offset:1040384 size:8192
        path:Unknown disk
                 incarnation:0x0 synchronous result:'I/O error'
                 subsys:Unknown library krq:0x7f7ec93eaac8 bufp:0x8a366000 osderr1:0x0 osderr2:0x0
                 IO elapsed time: 0 usec Time waited on I/O: 0 usec
        WARNING: failed to read mirror side 1 of virtual extent 8191 logical extent 0 of file 256 in group [2.3848896167] from disk FLASH_0000  allocation unit 8243 reason error; if possible, will try another mirror side
        Tue Dec 15 17:07:49 2015
        Errors in file /u01/app/oracle/diag/rdbms/lab12c/LAB12c/trace/LAB12c_ora_24987.trc:
        ORA-15081: failed to submit an I/O operation to a disk
        ORA-15081: failed to submit an I/O operation to a disk
        ORA-15081: failed to submit an I/O operation to a disk
        Encounter unknown issue while accessing Flash Cache. Potentially a hardware issue
        Flash Cache: disabling started for file
        0
        
        Flash cache: future write-issues disabled
        Start disabling flash cache writes..
        Tue Dec 15 17:07:49 2015
        Flash cache: DBW0 stopping flash writes...
        Flash cache: DBW0 garbage-collecting for issued writes..
        Flash cache: DBW0 invalidating existing flash buffers..
        Flash cache: DBW0 done with write disabling. Checking other DBWs..
        Flash Cache file +FLASH/flash.dat (3, 0) closed by dbwr 0
        

 

Re-enabling the OSFC

Once the OSFC was automatically disabled I wanted to know if it can be re-enabled without bouncing the database. I added back the missing ASM disk, but it didn’t trigger the re-enabling of the OSFC automatically.
I had to set the db_flash_cache_size=’8G’ parameter again, and then the cache was re-enabled, which was also confirmed by a message in the alert log:

Tue Dec 15 17:09:46 2015
Dynamically re-enabling db_flash_cache_file 0
Tue Dec 15 17:09:46 2015
ALTER SYSTEM SET db_flash_cache_size=8G SCOPE=MEMORY;

Conclusions

Good news! It appears to be safe (and also logical) to configure Oracle Smart Flash Cache on non-redundant solid-state devices, as their failures don’t affect the availability of the database. However, you may experience a performance impact at the time the OSFC is disabled. I did the testing on 12.1.0.2 only, so this may behave differently in order versions.

 

Discover more about our expertise in the world of Oracle.

email

Interested in working with Maris? Schedule a tech call.

About the Author

Maris Elsins is an experienced Oracle Applications DBA currently working as Lead Database Consultant at The Pythian Group. His main areas of expertise are troubleshooting and performance tuning of Oracle Database and e-Business Suite systems. He is a blogger and a frequent speaker at Oracle related conferences such as UKOUG, Collaborate, Oracle OpenWorld, HotSos, and others. Maris is an Oracle ACE, an Oracle Certified Master, and a co-author of “Practical Oracle Database Appliance” (Apress, 2014). He's also a member of the board at Latvian Oracle User Group.

6 Comments. Leave new

Great info, thanks Maris.

Reply

Thanks Jared!

Reply

Just for fun try starting another instance that also wants to use asm-disk03-flash after it is already in use. The result might surprise you. :)

Reply

Hi Kevin,

thanks for the idea!
Based on your comment I assumed that both instances would use the same device for the OSFC, which would probably lead to wrong results, instance crash or something nasty like that. So, I tested it (again used the same 12.1.0.2) and the results surprised me indeed!

1. I was able to start both instances having the same device set for OSFC
2. I ran some queries to populate and use OSFC on both instances
3. I received the following errors in the alert log on both instances and the OSFC was automatically disabled.
Encounter problem verifying flash cache file /dev/asm-disk03-flash. Disable flash cache and issue an ORA-700 for diagnostics
Errors in file /u01/app/oracle/diag/rdbms/lab12cdr/lab12cdr/trace/lab12cdr_gen0_18705.trc (incident=45641) (PDBNAME=CDB$ROOT):
ORA-00700: soft internal error, arguments: [kcbl2vfyfh_action], [db_flash_cache_file integrity check failed], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/lab12cdr/lab12cdr/incident/incdir_45641/lab12cdr_gen0_18705_i45641.trc
Mon Jan 04 13:26:45 2016
Dumping diagnostic data in directory=[cdmp_20160104132645], requested by (instance=1, osid=18705 (GEN0)), summary=[incident=45641].
Mon Jan 04 13:26:45 2016
Flash Cache: disabling started for file
0

4. File lab12cdr_gen0_18705.trc contains:
Verifying flash cache header read wrong values: corrupt 0, bpid 3, inst_id 1, bsz 8192, db_unique_name LAB12cDR db id 1637252222, polluted 1, file_id_string Oracle RDBMS Flash Cache File (retry 0)
Verifying flash cache header read wrong values: corrupt 0, bpid 3, inst_id 1, bsz 8192, db_unique_name LAB12cDR db id 1637252222, polluted 1, file_id_string Oracle RDBMS Flash Cache File (retry 1)
Verifying flash cache header read wrong values: corrupt 0, bpid 3, inst_id 1, bsz 8192, db_unique_name LAB12cDR db id 1637252222, polluted 1, file_id_string Oracle RDBMS Flash Cache File (retry 2)
Incident 45641 created, dump file: /u01/app/oracle/diag/rdbms/lab12cdr/lab12cdr/incident/incdir_45641/lab12cdr_gen0_18705_i45641.trc
ORA-00700: soft internal error, arguments: [kcbl2vfyfh_action], [db_flash_cache_file integrity check failed], [], [], [], [], [], [], [], [], [], []

Start disabling flash cache (3, 0): /dev/asm-disk03-flash..
Flash Cache: disabling started for file
0
Flash cache: future write-issues disabled
Start disabling flash cache writes..

*** 2016-01-04 13:26:46.871
Flash cache: disable completed
Disabling completed.

It didn’t crash any of the instances and also didn’t allow wrong results or corruptions. It’s good there’s some protection built into it!

Maris

Reply

I am having the same issue ” “ORA-00439: feature not enabled: Server Flash Cache””.
How did you forcefully install RPM?
Is it supported by Oracle?
Thanks in advance

Reply

Hi,

I remember I installed the enterprise-release rpm forcefully (–force switch for rpm).
I also mention this is in the blog: after forcefully installing two RPMs from OL5 (enterprise-release and redhat-release-5Server), the database came up.

You many want to start with the following MOS document: Database Startup Failing With ORA-00439 After Enabling Flash Cache (Doc ID 1550735.1)
And there are few others too where the error and the workarounds are explained.

Maris

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *