Reviewing the operation modes of Oracle GoldenGate BigQuery Handler

4 min read
Jan 31, 2019 12:00:00 AM

GoldenGate for Big Data 12.3.2.1.1 introduces a new target — Google BigQuery. The BigQuery handler can work in two Audit log modes:

  1. auditLogMode = true
  2. auditLogMode = false

I want to review the differences between these two operation modes supported by the BigQuery handler. I’m going to use a simple, single table using Oracle -> BigQuery replication.

Configuring the Source Oracle Environment

Let's create the source table:

SQL> create table rasskazov.tst(id number primary key, name1 varchar2(20), name2 varchar2(20), name3 varchar2(20)); Table created.  

I’m going to use the following extract and pump configuration files. This configuration is suitable for testing purposes only. Production systems should use password encryption. I’m using a non-default remote host port because GoldenGate for Big Data is running on the same test VM.

[oracle@ol6-121-rac1 dirprm]$ cat bigext.prm  EXTRACT bigext  USERID ggate@testdb, PASSWORD ...  EXTTRAIL ./dirdat/or  TABLE RASSKAZOV.TST;   oracle@ol6-121-rac1 dirprm]$ cat bigpump.prm  EXTRACT bigpump  userid ggadmin@testdb, password "..."  RMTHOST ol6-121-rac1, MGRPORT 7839  RMTTRAIL /u01/app/oracle/product/oggbd/dirdat/or  TABLE RASSKAZOV.TST;  

Let’s enable supplemental logging and create the extracts:

GGSCI (ol6-121-rac1.localdomain) 1> dblogin USERID ggate@testdb, PASSWORD ... Successfully logged into database.  GGSCI (ol6-121-rac1.localdomain as ggate@testdb) 2> add trandata rasskazov.tst Logging of supplemental redo data enabled for table RASSKAZOV.TST.  TRANDATA for scheduling columns has been added on table 'RASSKAZOV.TST'.  TRANDATA for instantiation CSN has been added on table 'RASSKAZOV.TST'.  GGSCI (ol6-121-rac1.localdomain as ggate@testdb) 3> add extract bigext, INTEGRATED TRANLOG, BEGIN NOW EXTRACT (Integrated) added.  GGSCI (ol6-121-rac1.localdomain as ggate@testdb) 4> add exttrail ./dirdat/or, extract bigext EXTTRAIL added.  GGSCI (ol6-121-rac1.localdomain as ggate@testdb) 5> ADD EXTRACT bigpump, EXTTRAILSOURCE ./dirdat/or EXTRACT added.  GGSCI (ol6-121-rac1.localdomain as ggate@testdb) 6> ADD RMTTRAIL /u01/app/oracle/product/oggbd/dirdat/or, EXTRACT bigpump RMTTRAIL added.  GGSCI (ol6-121-rac1.localdomain as ggate@testdb) 7> start extract bigext Sending START request to MANAGER ... EXTRACT BIGEXT starting  GGSCI (ol6-121-rac1.localdomain as ggate@testdb) 8> start extract bigpump Sending START request to MANAGER ... EXTRACT BIGPUMP starting  

Setting Up BigQuery Replicats: Audit Mode vs. Normal Mode

Now I'm going to prepare the replicats. Sample parameter files are located in $OGGHOME/AdapterExamples/big-data/bigquery directory. I’m going to create two replicats, one for each audit mode.

The auditLogMode = true replicat uses the following bigquery.props file:

[oracle@ol6-121-rac1 dirprm]$ cat bigquery.props  gg.handlerlist=bigquery  gg.handler.bigquery.type=bigquery  gg.handler.bigquery.projectId=goldengate-test-project  gg.handler.bigquery.datasetId=testdataset  gg.handler.bigquery.credentialsFile=/u01/app/oracle/product/oggbd/dirprm/credentials.json  gg.handler.bigquery.connectionTimeout=20000  gg.handler.bigquery.readTimeout=30000  gg.handler.bigquery.pkUpdateHandling=delete-insert  # If audit log mode is set to true, specify the metacolumns template as well  gg.handler.bigquery.auditLogMode=true  gg.handler.bigquery.metaColumnsTemplate=${optype},${position}  goldengate.userexit.writers=javawriter  javawriter.stats.display=TRUE  javawriter.stats.full=TRUE  gg.log=log4j  gg.log.level=INFO  gg.report.time=30sec  gg.classpath=/home/oracle/bigquery/libs/*:/home/oracle/bigquery/* javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar  

The second replicat is using the noaudbq.props parameter file. The only difference is gg.handler.bigquery.auditLogMode is set to "false".

These are the replicat parameter files:

[oracle@ol6-121-rac1 dirprm]$ cat rbigq.prm  REPLICAT rbigq  TARGETDB LIBFILE libggjava.so SET property=dirprm/bigquery.props  DDL INCLUDE ALL  GROUPTRANSOPS 1000  MAPEXCLUDE dbo.excludetable  MAP rasskazov.tst, TARGET A.TST;   [oracle@ol6-121-rac1 dirprm]$ cat rnoaud.prm  REPLICAT rnoaud  TARGETDB LIBFILE libggjava.so SET property=dirprm/noaudbq.props  DDL INCLUDE ALL  GROUPTRANSOPS 1000  MAPEXCLUDE dbo.excludetable  MAP rasskazov.tst, TARGET B.TST;  

Initial Data Ingestion and Table Auto-Creation

Lets' create and start our replicats:

GGSCI (ol6-121-rac1.localdomain) 5> info all  Program     Status      Group       Lag at Chkpt  Time Since Chkpt  MANAGER     RUNNING                                             REPLICAT    RUNNING     RBIGQ       00:00:00      00:00:05          REPLICAT    RUNNING     RNOAUD      00:03:17      00:00:06          

Let's insert the first record.

SQL> insert into rasskazov.tst values(1, 'a', 'b', 'c'); 1 row created. SQL> commit; Commit complete.  

I did not create the destination BigQuery tables, but GoldenGate created A_TST and B_TST automatically. A_TST table is populated by the RBIGQ replicat running in log mode, featuring optype and position columns. The B_TST table has a deleted column which the replicat sets to true if a record has been removed.

Update Operations and Supplemental Logging Risks

Let's try to update our record:

SQL> update rasskazov.tst set name1 = 'd'; 1 row updated. SQL> commit; Commit complete.  

We can see the first problem: B_TST table running in auditLogMode = false lost the NAME2 and NAME3 values. To prevent this, we have to enable supplemental logging for all columns:

GGSCI (ol6-121-rac1.localdomain as ggate@testdb) 2> add trandata rasskazov.tst allcols  SQL> update rasskazov.tst set name1 = 'd'; 1 row updated. SQL> commit; Commit complete.  

Everything looks much better now. We have three records in A_TST (log mode) and a single record in B_TST.

Delete Operations and Resulting Data States

Let's test the delete operation:

SQL
SQL> delete rasskazov.tst; 1 row deleted. SQL> commit; Commit complete.  

Everything works as expected. There is one new record with optype='D' in A_TST and the deleted field is set to true in B_TST.

The Streaming Buffer and Potential for Inconsistency

But wait, how can this work for the B_TST table running in auditLogMode = false, assuming that GoldenGate does not use DML statements because of BigQuery limitations? The answer is the Streaming Buffer.

Let's insert another row, but update it after a while, when the Streaming Buffer is disconnected from a table.

We can see Streaming Buffer statistics via the bq show --format=prettyjson command. We might see estimated rows in the buffer, but 0 rows for the actual table. I assume GoldenGate uses streaming inserts in de-duplication mode. But this doesn’t work if the record is not in the Streaming Buffer.

If we wait for the buffer to flush and then perform an update:

SQL> update rasskazov.tst set name1 = 'd' where id = 2; 1 row updated. SQL> commit; Commit complete.  

We can see an inconsistency here. There are two records with the same ID and no way to determine which is correct. If we run a delete statement after the buffer flushes, we end up with multiple entries for the same ID, some marked as deleted and some not.

Conclusion: Choosing the Right Mode

In summary, auditLogMode = false behavior depends on whether the record is in the Streaming Buffer or not. This mode should be used very carefully because it can lead to inconsistent data.

Oracle Database Consulting Services

Ready to optimize your Oracle Database for the future?

 

On this page

Ready to unlock value from your data?

With Pythian, you can accomplish your data transformation goals and more.