GoldenGate 12.2 big data adapters: part 1 - HDFS

Gleb Otochkin

February 29, 2016

Tags: Cloud, Business Insights, Microservices

December 2015 brought us a new version of GoldenGate, and a new version for Big Data adapters for the GoldenGate. Let's have a look at what we have now and how it works. I am going to start from the HDFS adapter. As a first step, we need to prepare our source database for replication. It becomes easier with every new GoldenGate version. We will need to perform several steps: a) Enable archive logging on our database. This particular step requires downtime. [code lang="sql"] orcl> alter database mount; Database altered. orcl> alter database archivelog; Database altered. orcl> alter database open; [/code] b) Enable force logging and minimal supplemental logging. No need to shutdown database for this. [code lang="sql"] orcl> alter database add supplemental log data; Database altered. orcl> alter database force logging; Database altered. orcl> SELECT supplemental_log_data_min, force_logging FROM v$database; SUPPLEME FORCE_LOGGING -------- --------------------------------------- YES YES [/code] c) Switch parameter "enable_goldengate_replication" to "TRUE". Can be done online. [code lang="sql"] orcl> alter system set enable_goldengate_replication=true sid='*' scope=both; System altered. orcl> [/code] And we are almost done. Now we can create a schema for a GoldenGate administrator, and provide required privileges. I've just granted DBA role to the user to simplify process. In any case you will need it in case of integrated capture. For a production installation I advise you to have a look at the documentation to verify necessary privileges and roles. [code lang="sql"] orcl> create user ogg identified by welcome1 default tablespace users temporary tablespace temp; orcl> grant connect, dba to ogg; [/code] Let's create a test schema to be replicated. We will call it schema on the source as ggtest and I will name the destination schema as bdtest. It will allow us also to check how the mapping works in our replication. [code lang="sql"] orcl> create tablespace ggtest; -- optional step orcl> create user ggtest identified by welcome1 default tablespace ggtest temporary tablespace temp; orcl> grant connect, resource to ggtest; [/code] Everything is ready on our source database for the replication. Now we are installing Oracle GoledenGate for Oracle to our database server. We can get the software from the Oracle site on the download page in the Middleware section, GoldenGate, Oracle GoldenGate for Oracle databases. We are going to use 12.2.0.1.1 version of the software. The installation is easy - you need to unzip the software and run Installer which will guide you through couple of simple steps. The installer will unpack the software to the destination location, create subdirectories, and register GoldenGate in the Oracle global registry. [code lang="bash"] [oracle@sandbox distr]$ unzip fbo_ggs_Linux_x64_shiphome.zip [oracle@sandbox distr]$ cd fbo_ggs_Linux_x64_shiphome/Disk1/ [oracle@sandbox Disk1]$ ./runInstaller [/code] We continue by setting up parameters for Oracle GoldenGate (OGG) manager and starting it up. You can see that I've used a default blowfish encryption for the password. In a production environment you may consider another encryption like AES256. I've also used a non-default port for the manager since I have more than one GoldenGate installation on my test sandbox. [code lang="text"] [oracle@sandbox ~]$ export OGG_HOME=/u01/oggora [oracle@sandbox ~]$ cd $OGG_HOME [oracle@sandbox oggora]$ ./ggsci GGSCI (sandbox.localdomain) 1> encrypt password welcome1 BLOWFISH ENCRYPTKEY DEFAULT Using Blowfish encryption with DEFAULT key. Encrypted password: AACAAAAAAAAAAAIARIXFKCQBMFIGFARA Algorithm used: BLOWFISH GGSCI (sandbox.localdomain) 2> edit params mgr PORT 7829 userid ogg@orcl,password AACAAAAAAAAAAAIARIXFKCQBMFIGFARA, BLOWFISH, ENCRYPTKEY DEFAULT purgeoldextracts /u01/oggora/dirdat/*, usecheckpoints GGSCI (sandbox.localdomain) 3> start manager Manager started. [/code] Let's prepare everything for initial load, and later online replication. I've decided to use GoldenGate initial load extract as the way for initial load for the sake of consistency for the resulted dataset on Hadoop. I prepared the parameter file to replicate my ggtest schema and upload all data to the trail file on remote site. I've used a minimum number of options for all my processes, providing only handful of parameters required for replication. Extract options is a subject deserving a dedicated blog post. Here is my simple initial extract: [code lang="text"] [oracle@sandbox oggora]$ cat /u01/oggora/dirprm/ini_ext.prm SOURCEISTABLE userid ogg@orcl,password AACAAAAAAAAAAAIARIXFKCQBMFIGFARA, BLOWFISH, ENCRYPTKEY DEFAULT --RMTHOSTOPTIONS RMTHOST sandbox, MGRPORT 7839 RMTFILE /u01/oggbd/dirdat/initld, MEGABYTES 2, PURGE --DDL include objname ggtest.* TABLE ggtest.*; [/code] Then we run the initial load extract in passive node and it will create a trail file with the data. The trail file will be used later for our initial load on the target side. [code lang="text"] [oracle@sandbox oggora]$ ./extract paramfile dirprm/ini_ext.prm reportfile dirrpt/ini_ext.rpt [oracle@sandbox oggora]$ ll /u01/oggbd/dirdat/initld* -rw-r-----. 1 oracle oinstall 3028 Feb 16 14:17 /u01/oggbd/dirdat/initld [oracle@sandbox oggora]$ [/code] We can also prepare our extract on the source site as well. I haven't used datapump in my configuration limiting the topology only by simplest and strait-forward extract to replicat configuration. Of course, in any production configuration I would advise using datapump on source for staging our data. Here are my extract parameters, and how I added it. I am not starting it yet because I must have an Oracle GoldenGate Manager running on the target, and the directory for the trail file should be created. You may have guessed that the Big Data GoldenGate will be located in /u01/oggbd directory. [code lang="text"] [oracle@sandbox oggora]$ ggsci Oracle GoldenGate Command Interpreter for Oracle Version 12.2.0.1.1 OGGCORE_12.2.0.1.0_PLATFORMS_151211.1401_FBO Linux, x64, 64bit (optimized), Oracle 12c on Dec 12 2015 02:56:48 Operating system character set identified as UTF-8. Copyright (C) 1995, 2015, Oracle and/or its affiliates. All rights reserved. GGSCI (sandbox.localdomain) 1> edit params ggext extract ggext userid ogg@orcl,password AACAAAAAAAAAAAIARIXFKCQBMFIGFARA, BLOWFISH, ENCRYPTKEY DEFAULT --RMTHOSTOPTIONS RMTHOST sandbox, MGRPORT 7839 RMTFILE /u01/oggbd/dirdat/or, MEGABYTES 2, PURGE DDL include objname ggtest.* TABLE ggtest.*; GGSCI (sandbox.localdomain) 2> dblogin userid ogg@orcl,password AACAAAAAAAAAAAIARIXFKCQBMFIGFARA, BLOWFISH, ENCRYPTKEY DEFAULT Successfully logged into database. GGSCI (sandbox.localdomain as ogg@orcl) 3> register extract GGEXT database 2016-02-16 15:37:21 INFO OGG-02003 Extract GGEXT successfully registered with database at SCN 17151616. GGSCI (sandbox.localdomain as ogg@orcl) 4> add extract ggext, INTEGRATED TRANLOG, BEGIN NOW EXTRACT (Integrated) added. [/code] Let's leave our source site for a while and switch to the target . Our target is going to be a box where we have hadoop client and all requirement java classes. I used the same box just to save resources on my sandbox environment. You may run different GoldeGate versions on the same box considering, that Manager ports for each of them will be different. Essentially we need a Hadoop client on the box, which can connect to HDFS and write data there. Installation of Hadoop client is out of the scope for this article, but you can easily get all necessary information from the Hadoop home page . Having all required Hadoop classes we continue by installing Oracle GoldenGate for Big Data, configuring and starting it up. In the past I received several questions from people struggling to find the exact place where all adapters could be uploaded. The Adapters were well "hidden" on "Oracle edelivery", but now it is way simpler. You are going to GoldenGate download page on Oracle site and find the section "Oracle GoldenGate for Big Data 12.2.0.1.0" where you can choose the OGG for Linux x86_64, Windows or Solaris. You will need an Oracle account to get it. We upload the file to our linux box, unzip and unpack the tar archive. I created a directory /u01/oggbd as our GoldenGate home and unpacked the tar archive there. The next step is to create all necessary directories. We start command line utility and create all subdirectories. [code lang="text"] [oracle@sandbox ~]$ cd /u01/oggbd/ [oracle@sandbox oggbd]$ ./ggsci Oracle GoldenGate Command Interpreter Version 12.2.0.1.0 OGGCORE_12.2.0.1.0_PLATFORMS_151101.1925.2 Linux, x64, 64bit (optimized), Generic on Nov 10 2015 16:18:12 Operating system character set identified as UTF-8. Copyright (C) 1995, 2015, Oracle and/or its affiliates. All rights reserved. GGSCI (sandbox.localdomain) 1> create subdirs Creating subdirectories under current directory /u01/oggbd Parameter files /u01/oggbd/dirprm: created Report files /u01/oggbd/dirrpt: created Checkpoint files /u01/oggbd/dirchk: created Process status files /u01/oggbd/dirpcs: created SQL script files /u01/oggbd/dirsql: created Database definitions files /u01/oggbd/dirdef: created Extract data files /u01/oggbd/dirdat: created Temporary files /u01/oggbd/dirtmp: created Credential store files /u01/oggbd/dircrd: created Masterkey wallet files /u01/oggbd/dirwlt: created Dump files /u01/oggbd/dirdmp: created GGSCI (sandbox.localdomain) 2> [/code] We are changing port for our manager process from default and starting it up. I've already mentioned that the port was changed due to the existence off several GoldenGate managers running from different directories. [code lang="text"] GGSCI (sandbox.localdomain) 2> edit params mgr PORT 7839 ..... GGSCI (sandbox.localdomain) 3> start manager Manager started. [/code] Now we have to prepare parameter files for our replicat processes. Let's assume the environment variable OGGHOME represents the GoldenGate home and in our case it is going to be /u01/oggbd. Examples for the parameter files can be taken from $OGGHOME/AdapterExamples/big-data directories. There you will find examples for flume, kafka, hdfs, hbase and for metadata providers. Today we are going to work with HDFS adapter. I copied files to my parameter files directory ($OGGHOME/dirprm) and modified them accordingly: [code lang="bash"] oracle@sandbox oggbd]$ cp /u01/oggbd/AdapterExamples/big-data/hdfs/* /u01/oggbd/dirprm/ oracle@sandbox oggbd]$ vi /u01/oggbd/dirprm/hdfs.props [/code] Here are my values for the hdfs.props file: [code lang="bash"] [oracle@bigdata dirprm]$ cat hdfs.props gg.handlerlist=hdfs gg.handler.hdfs.type=hdfs gg.handler.hdfs.includeTokens=false gg.handler.hdfs.maxFileSize=1g gg.handler.hdfs.rootFilePath=/user/oracle/gg gg.handler.hdfs.fileRollInterval=0 gg.handler.hdfs.inactivityRollInterval=0 gg.handler.hdfs.fileSuffix=.txt gg.handler.hdfs.partitionByTable=true gg.handler.hdfs.rollOnMetadataChange=true gg.handler.hdfs.authType=none gg.handler.hdfs.format=delimitedtext #gg.handler.hdfs.format.includeColumnNames=true gg.handler.hdfs.mode=tx goldengate.userexit.timestamp=utc goldengate.userexit.writers=javawriter javawriter.stats.display=TRUE javawriter.stats.full=TRUE gg.log=log4j gg.log.level=INFO gg.report.time=30sec gg.classpath=/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop/etc/hadoop/:/usr/lib/hadoop/lib/native/* javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar [/code] You can find information about all those parameters in oracle documentation here, but there are parameters you will most likely need to change from default:

gg.handler.hdfs.rootFilePath - it will tell where the directories and files have to be written on HDFS.
gg.handler.hdfs.format - you can setup one of the four formats supported by adapter.
goldengate.userexit.timestamp - it will depend from your preferences for transactions timestamps written to your hdfs files.
gg.classpath - it will depend from location for your hadoop jar classes and native libraries.

You can see I've mentioned the gg.handler.hdfs.format.includeColumnNames parameter. It is supposed to put column name before each value in the output file on HDFS. It may be helpful in some cases, but at the same time it makes the file bigger. If you are planning to create an external Hive table, you may consider commenting on it as I have. The next parameter file is for our data initialization replicat file. You may consider using a Sqoop or another way to make the initial load for your tables, but I think it makes sense to use the GG replicat if the table size is relatively small. It makes the resulting file-set more consistent with the following replication, since it will be using the same engine and format. So, here is my replicat for initial load: [code lang="text"] [oracle@sandbox dirprm]$ cat /u01/oggbd/dirprm/irhdfs.prm --passive REPLICAT for initial load irhdfs -- Trail file for this example is located in "dirdat/initld" -- Command to run REPLICAT: -- ./replicat paramfile dirprm/irhdfs.prm reportfile dirrpt/ini_rhdfs.rpt SPECIALRUN END RUNTIME EXTFILE /u01/oggbd/dirdat/initld --DDLERROR default discard setenv HADOOP_COMMON_LIB_NATIVE_DIR=/usr/lib/hadoop/lib/native DDL include all TARGETDB LIBFILE libggjava.so SET property=dirprm/hdfs.props REPORTCOUNT EVERY 1 MINUTES, RATE GROUPTRANSOPS 10000 MAP ggtest.*, TARGET bdtest.*; [/code] I was running the initial load in passive mode, without creating a managed process and just running it from command line. Here is an example: [code lang="bash"] [oracle@sandbox oggbd]$ ./replicat paramfile dirprm/irhdfs.prm reportfile dirrpt/ini_rhdfs.rpt [oracle@sandbox oggbd]$ hadoop fs -ls /user/oracle/gg/ Found 2 items drwxr-xr-x - oracle oracle 0 2016-02-16 14:37 /user/oracle/gg/bdtest.test_tab_1 drwxr-xr-x - oracle oracle 0 2016-02-16 14:37 /user/oracle/gg/bdtest.test_tab_2 [oracle@sandbox oggbd]$ hadoop fs -ls /user/oracle/gg/bdtest.test_tab_1 Found 1 items -rw-r--r-- 1 oracle oracle 624 2016-02-16 14:37 /user/oracle/gg/bdtest.test_tab_1/bdtest.test_tab_1_2016-02-16_14-37-43.376.txt [oracle@sandbox oggbd]$ hadoop fs -tail /user/oracle/gg/bdtest.test_tab_1/bdtest.test_tab_1_2016-02-16_14-37-43.376.txt IBDTEST.TEST_TAB_12016-02-16 19:17:40.7466992016-02-16T14:37:43.37300000000000-100000020121371O62FX2014-01-24:19:09:20RJ68QYM52014-01-22:12:14:30 IBDTEST.TEST_TAB_12016-02-16 19:17:40.7466992016-02-16T14:37:44.89600000000000-100000021552371O62FX2014-01-24:19:09:20HW82LI732014-05-11:05:23:23 IBDTEST.TEST_TAB_12016-02-16 19:17:40.7466992016-02-16T14:37:44.89600100000000-100000022983RXZT5VUN2013-09-04:23:32:56RJ68QYM52014-01-22:12:14:30 IBDTEST.TEST_TAB_12016-02-16 19:17:40.7466992016-02-16T14:37:44.89600200000000-100000024414RXZT5VUN2013-09-04:23:32:56HW82LI732014-05-11:05:23:23 [oracle@sandbox oggbd]$ [/code] You can see the Hadoop directories and files created by the initial load. As soon as the initial load has run we can start our extract and replicat to keep the destination side updated. We are moving to source and starting our extract prepared earlier. [code lang="text"] GGSCI (sandbox.localdomain as ogg@orcl) 6>start extract ggext Sending START request to MANAGER ... EXTRACT GGEXT starting [/code] So, moving back to target and preparing our replicat. I used the replicat with the following parameters: [code lang="text"] [oracle@sandbox oggbd]$ cat /u01/oggbd/dirprm/rhdfs.prm REPLICAT rhdfs -- Trail file for this example is located in "dirdat/or" directory -- Command to add REPLICAT -- add replicat rhdfs, exttrail dirdat/or --DDLERROR default discard setenv HADOOP_COMMON_LIB_NATIVE_DIR=/usr/lib/hadoop/lib/native DDL include all TARGETDB LIBFILE libggjava.so SET property=dirprm/hdfs.props REPORTCOUNT EVERY 1 MINUTES, RATE GROUPTRANSOPS 10000 MAP ggtest.*, TARGET bdtest.*; [/code] We are adding the replicat to our configuration and it is going to carry on the replication. [code lang="text"] GGSCI (sandbox.localdomain) 1> add replicat rhdfs, exttrail dirdat/or REPLICAT added. GGSCI (sandbox.localdomain) 2> start replicat rhdfs Sending START request to MANAGER ... REPLICAT RHDFS starting GGSCI (sandbox.localdomain) 3> [/code] Our replication is up and running, the initial load worked fine, and we can test and see what we have on source and target. Here is the data in one of our source tables: [code lang="sql"] orcl> select * from ggtest.test_tab_1; PK_ID RND_STR USE_DATE RND_STR_1 ACC_DATE ---------------- ---------- ----------------- ---------- ----------------- 1 371O62FX 01/24/14 19:09:20 RJ68QYM5 01/22/14 12:14:30 2 371O62FX 01/24/14 19:09:20 HW82LI73 05/11/14 05:23:23 3 RXZT5VUN 09/04/13 23:32:56 RJ68QYM5 01/22/14 12:14:30 4 RXZT5VUN 09/04/13 23:32:56 HW82LI73 05/11/14 05:23:23 orcl> [/code] I've created an external Hive table for the table test_tab_1 to have better look. [code lang="sql"] hive> CREATE EXTERNAL TABLE BDTEST.TEST_TAB_1 (tran_flag string, tab_name string, tran_time_utc timestamp, tran_time_loc string,something string, something1 string, > PK_ID INT, RND_STR VARCHAR(10),USE_DATE string,RND_STR_1 string, ACC_DATE string) > stored as textfile location '/user/oracle/gg/bdtest.test_tab_1'; OK Time taken: 0.327 seconds hive> select * from BDTEST.TEST_TAB_1; OK I BDTEST.TEST_TAB_1 2016-02-16 19:17:40.746699 2016-02-16T14:37:43.373000 00000000-10000002012 1 371O62FX 2014-01-24:19:09:20 RJ68QYM5 2014-01-22:12:14:30 I BDTEST.TEST_TAB_1 2016-02-16 19:17:40.746699 2016-02-16T14:37:44.896000 00000000-10000002155 2 371O62FX 2014-01-24:19:09:20 HW82LI73 2014-05-11:05:23:23 I BDTEST.TEST_TAB_1 2016-02-16 19:17:40.746699 2016-02-16T14:37:44.896001 00000000-10000002298 3 RXZT5VUN 2013-09-04:23:32:56 RJ68QYM5 2014-01-22:12:14:30 I BDTEST.TEST_TAB_1 2016-02-16 19:17:40.746699 2016-02-16T14:37:44.896002 00000000-10000002441 4 RXZT5VUN 2013-09-04:23:32:56 HW82LI73 2014-05-11:05:23:23 Time taken: 0.155 seconds, Fetched: 4 row(s) hive> [/code] You can see the table definition is a bit different from what we have on the source site and you will see why. We got additional columns on the destination side. Interesting that while some of them have a pretty clear purpose, the other columns are not totally clear and have null values. The first column is a flag for operation, and it shows what kind of operation we have gotten in this row. It can be "I" for insert, "D" for delete and "U" for an update. The second column is table name. The third one is a timestamp in UTC showing when the transaction occurred. The next one is another time in local timezone informing the time of commit, and the next column has a commit sequence number. Those columns can help you to construct the proper data set for any given time. Let's insert and update some row(s) on source and see how it will show up on the target: [code lang="sql"] orcl> insert into ggtest.test_tab_1 values (5,'TEST_1',sysdate,'TEST_1',sysdate); 1 row created. orcl> commit; orcl> update ggtest.test_tab_1 set RND_STR='TEST_1_1' where PK_ID=5; 1 row updated. orcl> commit; Commit complete. orcl> [/code] Let's check how it is reflected on the destination site. We see two new rows, where each row represents a DML operation. One was for the insert and the second one was for the update. [code lang="sql"] [highlight="7,8" hive> select * from BDTEST.TEST_TAB_1; OK I BDTEST.TEST_TAB_1 2016-02-16 19:17:40.746699 2016-02-16T14:37:43.373000 00000000-10000002012 1 371O62FX 2014-01-24:19:09:20 RJ68QYM5 2014-01-22:12:14:30 I BDTEST.TEST_TAB_1 2016-02-16 19:17:40.746699 2016-02-16T14:37:44.896000 00000000-10000002155 2 371O62FX 2014-01-24:19:09:20 HW82LI73 2014-05-11:05:23:23 I BDTEST.TEST_TAB_1 2016-02-16 19:17:40.746699 2016-02-16T14:37:44.896001 00000000-10000002298 3 RXZT5VUN 2013-09-04:23:32:56 RJ68QYM5 2014-01-22:12:14:30 I BDTEST.TEST_TAB_1 2016-02-16 19:17:40.746699 2016-02-16T14:37:44.896002 00000000-10000002441 4 RXZT5VUN 2013-09-04:23:32:56 HW82LI73 2014-05-11:05:23:23 I BDTEST.TEST_TAB_1 2016-02-16 20:43:32.000231 2016-02-16T15:43:37.199000 00000000000000002041 5 TEST_1 2016-02-16:15:43:25 TEST_1 2016-02-16:15:43:25 U BDTEST.TEST_TAB_1 2016-02-16 20:43:53.000233 2016-02-16T15:43:56.056000 00000000000000002243 5 TEST_1_1 Time taken: 2.661 seconds, Fetched: 6 row(s) hive> [/code] It work for deletes too, only flag will be "D" instead of "I" for insert and "U" for updates. What about DDL support? Let's truncate the table. [code lang="sql"] orcl> truncate table ggtest.test_tab_1; Table truncated. orcl> [/code] And here, there is nothing in our hdfs files. Maybe I've missed something, but it looks like the truncate operation is not creating any record. I need to dig a bit more. I will try to make a separate blog post about DDL support for the Big Data Adapters. It works pretty well when we create a new table and insert new rows. It also works if we change one of the existing tables, adding or dropping a column. Let's try to create a new table here: [code lang="sql"] orcl> create table ggtest.test_tab_4 (pk_id number, rnd_str_1 varchar2(10),acc_date date); Table created. orcl> insert into ggtest.test_tab_4 select * from ggtest.test_tab_2; 1 row created. orcl> commit; Commit complete. orcl> [/code] You can see that it has created a new directory and file for the new table. Additionally, if you add a column the new file will be used for all DML operations for the updated table. It will help to separate tables with different structure. [code lang="bash"] [oracle@sandbox oggbd]$ hadoop fs -ls /user/oracle/gg/ Found 3 items drwxr-xr-x - oracle oracle 0 2016-02-16 15:43 /user/oracle/gg/bdtest.test_tab_1 drwxr-xr-x - oracle oracle 0 2016-02-16 14:37 /user/oracle/gg/bdtest.test_tab_2 drwxr-xr-x - oracle oracle 0 2016-02-16 15:56 /user/oracle/gg/bdtest.test_tab_4 [oracle@sandbox oggbd]$ hadoop fs -ls /user/oracle/gg/bdtest.test_tab_4/ Found 1 items -rw-r--r-- 1 oracle oracle 127 2016-02-16 15:56 /user/oracle/gg/bdtest.test_tab_4/bdtest.test_tab_4_2016-02-16_15-56-50.373.txt [oracle@sandbox oggbd]$ hadoop fs -tail /user/oracle/gg/bdtest.test_tab_4/bdtest.test_tab_4_2016-02-16_15-56-50.373.txt IBDTEST.TEST_TAB_42016-02-16 20:56:47.0009532016-02-16T15:56:50.371000000000000000000068327IJWQRO7T2013-07-07:08:13:52 [oracle@sandbox oggbd]$ [/code] At first glance it looks good, but let's try to create a table as select. [code lang="sql"] orcl> create table ggtest.test_tab_3 as select * from ggtest.test_tab_2; Table created. orcl> [/code] Not sure if it is an expected behavior or bug, but apparently it is not working. Our replicat is broken and complains that "DDL Replication is not supported for this database". [code lang="text"] [oracle@sandbox oggbd]$ tail -5 ggserr.log 2016-02-16 15:43:37 INFO OGG-02756 Oracle GoldenGate Delivery, rhdfs.prm: The definition for table GGTEST.TEST_TAB_1 is obtained from the trail file. 2016-02-16 15:43:37 INFO OGG-06511 Oracle GoldenGate Delivery, rhdfs.prm: Using following columns in default map by name: PK_ID, RND_STR, USE_DATE, RND_STR_1, ACC_DATE. 2016-02-16 15:43:37 INFO OGG-06510 Oracle GoldenGate Delivery, rhdfs.prm: Using the following key columns for target table bdtest.TEST_TAB_1: PK_ID. 2016-02-16 15:48:49 ERROR OGG-00453 Oracle GoldenGate Delivery, rhdfs.prm: DDL Replication is not supported for this database. 2016-02-16 15:48:49 ERROR OGG-01668 Oracle GoldenGate Delivery, rhdfs.prm: PROCESS ABENDING. [oracle@sandbox oggbd]$ [oracle@sandbox oggbd]$ hadoop fs -ls /user/oracle/gg/ Found 2 items drwxr-xr-x - oracle oracle 0 2016-02-16 15:43 /user/oracle/gg/bdtest.test_tab_1 drwxr-xr-x - oracle oracle 0 2016-02-16 14:37 /user/oracle/gg/bdtest.test_tab_2 [oracle@sandbox oggbd]$ [/code] What can we say in summary? The replication works, and supports all DML, and some DDL commands. You will need to prepare to get consistent datasets for any given time using flag for operation, and time for the transaction. In my next few posts, I will cover other Big Data adapters for GoldenGate.

Insight and analysis of technology and business strategy

GoldenGate 12.2 big data adapters: part 1 - HDFS

Pythian Blogs

Tell us how we can help!