Google ‘Backup and DR’ for Oracle

Raivis Saldabols

November 1, 2023

Tags: Oracle, Google Cloud, Technical Track, Vmware, Recovery, Backups, Databases

Whenever I’m thinking of Oracle backup, I basically think of putting Oracle RMAN backup to a disk (simplified version) and probably some retention, monitoring, and other details.

However, once it’s in a disk, somebody else in the organization takes care of it and ensures it lives up to its decided retention standards. Backup is secure, redundant, available, encrypted, etc. Backup is not just for production workloads. Now we do backup everything, even development and test environments. We test our backup scripts before moving to live environments. We perform restore and recovery tests. We clone test environments from production backups. Backups have a crucial role in any IT infrastructure.

Let’s talk about backups

In particular, let’s focus on Oracle database and application backups. Today, there is so much more than an Oracle application and database services in an organization's IT infrastructure. There are connected Business intelligence tools, external and internal integrations, data pipelines, and so much more. Usually, one or two teams are taking care of support and backups. In a perfect world, it would be nice for those systems to align in the same policy, patterns, and retention - however, it’s usually not the case. There are several regulatory obligations for financial data, other regulations for auditing, etc.

It is not simple. Several scripts are tailored for its use cases - specific scripts for applications, filesystems, or databases. Databases are specific as such, and you can’t simply copy datafiles and consider them as a backup. Each database has its own tool. For Oracle, it’s RMAN (Recovery Manager), and Oracle Database Backup and Recovery User's Guide has 924 pages. It is feature-rich, but it’s 924 pages of technical details.

That said, modern IT architecture requires some sort of backup management tool, especially in the Cloud, and there are tools and services available for various use cases. Let’s look more into one of them - Google Backup and DR service.

What is Google Backup and DR service?

According to the website:

“Managed backup and disaster recovery (DR) service for centralized, application-consistent data protection. Protect workloads running in Google Cloud and on-premises by backing them up to Google Cloud.”

It is very easy to deploy - basically, login to the Google Cloud Platform (GCP) console, enable the service, and deploy the appliance. Once provisioned (takes around one hour), there’ll be a management console provided.

It does not have the native GCP look and feel, and, in fact, this tool was previously known as Actifio and was acquired by Google back in 2020. It is now integrated into GCP as a service. Actifio users will find it really similar with very few changes. The service also comes almost pre-configured (backup pools, thresholds, etc.) to be able to use it immediately after the deployment, with only several additional configuration efforts, like profiles and templates based on your organizational retention requirements.

Note that Google provides solid documentation on Backup and DR service and how to manage it, but still, there are some low-level things Actifo documentation has in much greater detail.

Opening the Backup menu, these are the databases and applications support options available out of the box.

Natively, Google backup and DR solution can work with Compute Engine VMs (using service account privileges) and VMWare Engine (using specific vSphere service users). Every other application or database requires an agent to be installed and configured on the server, iSCSI support, and firewall openings (for enterprise solutions, there are multiple port openings required).

One more requirement is to have a backup template. This template has all the policies of data movement into snapshot pools and OnVaults (Buckets), backup frequencies, retention, and other attributes.

Key difference and concept

It is all storage snapshots. As an Oracle DBA, it is quite hard to understand (at least for me) and trust this solution. Eventually, it makes a lot of sense, and You can open new possibilities with it, and I’ll try to explain.

I come from an Oracle E-Business Suite background, and usually, when we do Oracle RMAN backup, we issue something like this:

RMAN> backup incremental level 0 database tag 'LVL0_<date>' to destination '/backup/';

Of course, there are a lot of additional lines on retention, channels, archive logs, control files, etc. but in essence, it is LVL0 RMAN backup taken on weekends and LVL1 incremental backup every other day. That gives a full database (usually compressed) backup on a disk, we trust it and we can use it whenever we need it. However, it comes with a cost of additional disk usage, and most importantly, every time we do a full or LVL0 backup, it is time-consuming (for bigger databases), I/O and CPU expensive as we have to read through all database datafiles, compress them, and write on a disk. Oracle EBS database backups are often offloaded to standby to avoid extra loading of the primary database.

With a Google Backup and DR solution, it utilizes Oracle Incremental merge backups. Basically, it is an Oracle database capable of creating a copy of a database and periodically updating that copy by merging incremental changes into that copy.

This is what happens in high-level detail:

Agent mounts a virtual disk volume to the server using the iSCSI controller.
Oracle updates a copy of the database with changes using the construct:

RMAN> backup incremental level 1 for recover of copy with tag ‘db_copy’ database;

Datafile copy media recovery.
Regular archivelog and control file backups happen for consistency.
Agent un-mounts the virtual disk attached.
Backup Appliance does perform a snapshot of the virtual disk volume.
The process repeats.

As a result, there is a consistent backup copy on the virtual disk. Experience shows that multiple TB database backup/snapshot with medium Oracle EBS load takes 10-15 minutes to complete, except the first backup, of course. The Backup appliance will display this backup as a snapshot in the snapshot pool, and for a DBA, it appears as a “full and consistent” backup. This backup can be used to restore or clone a database within minutes on the same or different server with the same or different database name. We no longer do database regular restore operations from backup sets but attach a virtual disk with consistent database datafile copies, copy and do minimum recovery, and start the database.

And I want to mention a few key things to keep in mind:

All the Oracle database backup management is done through the Backup and DR Appliance - backup, restore, recover, or clone operation. There are no more backup scripts scheduled in cron.
All the cloning operations can be also managed through Backup and DR appliance if preferred.
The same Backup templates can be applied to the Filesystem, NFS mount, or Oracle database.
The Backup service is Oracle ASM, RAC database, and Clusterware aware.
Backups can still be taken on the Standby Database.
A cloned database lives on an iSCSI virtual disk volume, and it’s forked from a snapshot (called active mount). For a long-term database, we need to think about cloning into server-attached disks.

Point-in-Time Recovery

As an Oracle DBA, we’re used to being able to restore point in time, until SCN, or other requirements. It turns out that Google backup and DR services have those capabilities built in as well. Along with the incremental merge backup, Oracle archive logs are being backed up as well. Here’s documentation that talks about it in detail as well.

And here’s a little note. While running version 11.0.6, those screens were not available to me, so I could not execute those tests out of the box (maybe the next updates will fix this). However, it’s important for DBA to be able to perform point-in-time recovery, which is still doable. Each Backup and DR Oracle backup includes the subsequent archivelog, controlfile, and spfile backups. Point-in-time recovery can be achieved by mounting two consecutive snapshots and doing some manual RMAN recovery using the available archivelogs. It is not unusual to have snapshots every 30 minutes, even for bigger databases, so recovery should not take too long.

Some use cases

Now, let’s see some use cases once we have those snapshots available.

Daily or weekly clone

There may be a requirement to perform a daily production environment clone for development or business requirements. If that is a huge database, regular restore or duplicate database operations may take too long for daily operations. Being on the cloud, it’s also important to know the data location as prod and non-prod may not be in the same Availability zone or Region. Then daily multi-terabyte data moves come at an egress cost.

In this scenario, Google backup and DR workflow service becomes very handy. It can automate, schedule, and orchestrate full database daily/weekly cloning activity. In fact, we have recently enabled such workflow for Oracle EBS customers. The workflow can be configured with pre and post-custom scripts. In the pre-workflow script, we need to make sure that virtual volume is not used in order to disconnect it. The workflow executes both the Oracle database and Oracle EBS application FS clone operations. Lastly, with post-workflow scripts, we execute Rapid clone steps with pre-saved context files along with other post-EBS clone operations (like password change, URL change, data-masking, etc.)

Heavy report offload

Let’s say, You need to run an extremely expensive report that can’t be run in a live database due to performance degradation. Again, You can clone the database on any server with Oracle software installed, run the report, and destroy the database once not needed. It is not a very likely use case as most probably standby-read-only databases could be better used in this case. I hope it shows the flexibility of Oracle database access using those snapshots that are close to instant compared to old-fashioned RMAN restore/recover operations.

Things to keep in mind

Using such a tool, you need to be sure of several things:

Backups are running per schedules.
There is enough space configured.
Failures must be noted.
Backup appliance and agent updates.

As Google Backup and DR service is part of the Google GCP ecosystem, it can be plugged into Google Cloud Monitoring & Logging. It unlocks the capabilities of receiving monitoring alerts about failed backups or other issues reported by the Backup service.

In the Backup and DR appliance reporting section, there are a bunch of already pre-created reporting templates. It’s quite easy to have a summary of backed-up applications, completed or failed jobs, and tons of other events. However, it was a surprise to me that I was not able to find a way to email the reports out of the box.

There are also updates being released for the Backup and DR service itself and its agents from time to time. The good thing is that it is just a few clicks away and fully managed from the Backup and DR appliances. The bad thing is that you can’t have a running backup job during the update process. So, those updates have to be planned and manually triggered or scheduled.

Insight and analysis of technology and business strategy