Moving Oracle Datafiles to a ZFS Filesystem with the Correct Recordsize

6 min read
Aug 19, 2009 12:00:00 AM

 

Full credit for this tale should go to my colleague Christo Kutrovsky for the inspiration and basic plan involved.

Identifying the I/O Performance Bottleneck on Solaris x86

We recently migrated a large database from Solaris SPARC to Solaris x86-64. All seemed to go well with the migration, but in the next few weeks, we noticed some I/O issues cropping up. Some research led us to find that the ZFS filesystem used to hold the datafiles was killing us on I/O.

The default "recordsize" setting for ZFS was 128k.

$ /usr/sbin/zfs get recordsize zfs-data  NAME PROPERTY VALUE SOURCE  zfs-data recordsize 128K default 

An Oracle database typically uses 8k for the block size, but in this case it was 16k. We saw basically the same thing that Neelakanth Nadgir described in his blog post, Databases and ZFS:

With ZFS, not only was the throughput much lower, but we used more [than] twice the amount of CPU per transaction, and we are doing 2x times the IO. The disks are also more heavily utilized. We noticed that we were not only reading in more data, but we were also doing more IO operations [than] what is needed.

The Strategy: Aligning ZFS Recordsize with Oracle Block Size

The fix is to set the ZFS recordsize for a datafile filesystem to match the Oracle instance's db_block_size. We also read in the ZFS Best Practices Guide that redo logs should be in a separate filesystem with the default ZFS recordsize of 128k. We already had them separate, so we just needed to get our datafiles on a ZFS filesystem with a 16k recordsize.

The first step is to change the ZFS recordsize for the current filesystem. Easy enough, but the change in recordsize will only apply to new files written there. It doesn't do anything for our existing terabyte-plus of datafiles. This means we need to move the files to a new ZFS filesystem with the correct recordsize already set.

Managing Disk Space During Migration

That's where the magic happens. My first thought was that we'd need whole other slab of disk. Then I learned that we can simply create a new ZFS filesystem on the same pool of disks as the old filesystem. Obviously we won't be able to hold two complete copies of the datafiles, but this would let us migrate groups of files at a time (for example, a tablespace). When the move is complete, disk space is released from the old filesystem, so the sum of used disk is essentially unchanged.

Method 1: The Standard Tablespace Offline Move

The rest is just a matter of moving datafiles. Now, we all know you can't just move or copy Oracle datafiles from a running instance. We also know that clients want as little downtime as possible, so doing a complete shutdown/move/mount/rename/open is out of the question. We could offline a tablespace, move its datafiles, rename them in the database, then online the tablespace again. Here's a sample scenario, starting with tablespace and datafile creation:

SQL> create tablespace zfstest   2 datafile '/u01/app/oracle/oradata/orcl/zfs128k/zfstest01.dbf'   3 size 128m;    Tablespace created.    SQL> alter tablespace zfstest   2 add datafile '/u01/app/oracle/oradata/orcl/zfs128k/zfstest02.dbf'   3 size 128m;    Tablespace altered.    SQL> create table zfsobjects tablespace zfstest   2 as select * from dba_objects;    Table created. 

Accept the following:

  • /u01/app/oracle/oradata/orcl/zfs128k/ is a ZFS filesystem with default 128k recordsize
  • /u01/app/oracle/oradata/orcl/zfs16k/ is a ZFS filesystem with a 16k recordsize

So here we have datafiles created on a ZFS filesystem with the default 128k recordsize, and a small table just to test data access. To move the entire tablespace in one go, we do this:

SQL> alter tablespace zfstest offline;    Tablespace altered.    SQL> !mv /u01/app/oracle/oradata/orcl/zfs128k/zfstest01.dbf    /u01/app/oracle/oradata/orcl/zfs16k/zfstest01.dbf    SQL> alter database rename file '/u01/app/oracle/oradata/orcl/zfs128k/zfstest01.dbf' to   '/u01/app/oracle/oradata/orcl/zfs16k/zfstest01.dbf';    Database altered.    SQL> !mv /u01/app/oracle/oradata/orcl/zfs128k/zfstest02.dbf   /u01/app/oracle/oradata/orcl/zfs16k/zfstest02.dbf    SQL> alter database rename file '/u01/app/oracle/oradata/orcl/zfs128k/zfstest02.dbf' to   '/u01/app/oracle/oradata/orcl/zfs16k/zfstest02.dbf';  Database altered.    SQL> alter tablespace zfstest online;    Tablespace altered.    SQL> select count(*) from zfsobjects;     COUNT(*)   ----------    68844 

Method 2: Individual Datafile-Level Migration

Pretty simple. We do the one escape to the OS mv command to move the datafile. If you want to be safe, you could use cp instead of mv to copy the file. Just know that disk space from the old filesystem won't be released until you delete the old files.

Note that this operation could take a very long time in the case of large tablespaces. We could try a somewhat similar approach of taking one datafile offline at a time and moving it. We'd just need to perform a quick recovery before bringing it online:

SQL> alter database datafile '/u01/app/oracle/oradata/orcl/zfs128k/zfstest01.dbf' offline;    Database altered.    SQL> !mv /u01/app/oracle/oradata/orcl/zfs128k/zfstest01.dbf   /u01/app/oracle/oradata/orcl/zfs16k/zfstest01.dbf    SQL> alter database rename file '/u01/app/oracle/oradata/orcl/zfs128k/zfstest01.dbf' to   '/u01/app/oracle/oradata/orcl/zfs16k/zfstest01.dbf';    Database altered.    SQL> recover datafile '/u01/app/oracle/oradata/orcl/zfs16k/zfstest01.dbf';  Media recovery complete.   SQL> alter database datafile '/u01/app/oracle/oradata/orcl/zfs16k/zfstest01.dbf' online;    Database altered. 

Note that your instance needs to be in ARCHIVELOG mode to be able to take datafiles offline. Obviously, if someone tried to access data on an offline datafile they'd get this error:

SQL> select count(*) from zfsobjects;  select count(*) from zfsobjects   *  ERROR at line 1:  ORA-00376: file 6 cannot be read at this time  ORA-01110: data file 6: '/u01/app/oracle/oradata/orcl/zfs16k/zfstest01.dbf' 

Method 3: Optimal Migration with RMAN "Backup as Copy"

So if a datafile is particularly large, this can mean still a few minutes of some data being offline. We have one more tool at our disposal that provides for a quicker turnaround: RMAN's BACKUP AS COPY command.

As it says, this creates a copy of a datafile as a backup. The benefit is that it does this while the current datafile (and data) remains online. We then just need to offline, switch, recover, and online each datafile, which takes very little time compared to the time spent moving a large datafile.

Executing the RMAN Switch

First, we want to know the file numbers of our datafiles:

SQL> select file_id, file_name   2 from dba_data_files   3 where tablespace_name='ZFSTEST';     FILE_ID FILE_NAME  ---------- -------------------------------------------------------  6 /u01/app/oracle/oradata/orcl/zfs128k/zfstest01.dbf  7 /u01/app/oracle/oradata/orcl/zfs128k/zfstest02.dbf 

Now we create our RMAN command file and execute it:

RMAN> backup as copy  2> tablespace zfstest format '/u01/app/oracle/oradata/orcl/zfs16k/%U';  3>   4> sql 'alter database datafile 6 offline';  5> switch datafile 6 to copy;  6> recover datafile 6;  7> sql 'alter database datafile 6 online';  8>   9> sql 'alter database datafile 7 offline';  10> switch datafile 7 to copy;  11> recover datafile 7;  12> sql 'alter database datafile 7 online'; 

Voila! The tablespace is moved with minimal downtime:

SQL> select file_id, file_name   2 from dba_data_files   3 where tablespace_name='ZFSTEST';     FILE_ID FILE_NAME  ---------- --------------------------------------------------------------------------------  6          /u01/app/oracle/oradata/orcl/zfs16k/data_D-ORCL_I-1203425866_TS-ZFSTEST_FNO-6_01kn0irh  7          /u01/app/oracle/oradata/orcl/zfs16k/data_D-ORCL_I-1203425866_TS-ZFSTEST_FNO-7_02kn0is0 

Post-Migration Cleanup and Verification

Obviously, you can generate all your statements from SQL scripts, and you'll definitely want to do this if your tablespace has a large amount of datafiles. If you are scripting, you could opt to backup one datafile at a time if you need to preserve file names.

Deleting Residual Datafile Copies

UPDATE: Teammate Marc Billette pointed out that the old datafiles were not removed when doing a datafile copy and switch. I clearly left them on disk, continuing to use space! So, once we are done with all the datafile switches, then our original datafiles become the copies, and we can list and delete those in RMAN like this:

RMAN> list datafilecopy all;    List of Datafile Copies  =======================    Key File S Completion Time Ckp SCN Ckp Time   ------- ---- - --------------- ---------- ---------------  4 6 A 18-AUG-09 629982 18-AUG-09    Name: /u01/app/oracle/oradata/orcl/zfs128k/zfstest01.dbf    5 7 A 18-AUG-09 630001 18-AUG-09    Name: /u01/app/oracle/oradata/orcl/zfs128k/zfstest02.dbf    RMAN> delete datafilecopy all; 

Thanks to Marc for catching this early in the published life of the article! I hope this post can help other people that run into this problem with Oracle on ZFS (I've heard from one on twitter already), or people just wanting to move datafiles.

Again, thanks to Pythian DBA Christo Kutrovsky for laying this process out for us!

Note: this process was tested on Oracle 10gR2 (10.2.0.2) and 11gR1 (11.1.0.6).

Oracle Database Consulting Services

Ready to optimize your Oracle Database for the future?

 

On this page

Ready to unlock value from your data?

With Pythian, you can accomplish your data transformation goals and more.