GoldenGate 12.2 big data adapters: part 1 - HDFS
December 2015 brought us a new version of GoldenGate, and a new version for Big Data adapters for the GoldenGate. Let's have a look at what we have now and how it works. I am going to start from the HDFS adapter. As a first step, we need to prepare our source database for replication. It becomes easier with every new GoldenGate version. We will need to perform several steps: a) Enable archive logging on our database. This particular step requires downtime. [code lang="sql"] orcl> alter database mount; Database altered. orcl> alter database archivelog; Database altered. orcl> alter database open; [/code] b) Enable force logging and minimal supplemental logging. No need to shutdown database for this. [code lang="sql"] orcl> alter database add supplemental log data; Database altered. orcl> alter database force logging; Database altered. orcl> SELECT supplemental_log_data_min, force_logging FROM v$database; SUPPLEME FORCE_LOGGING -------- --------------------------------------- YES YES [/code] c) Switch parameter "enable_goldengate_replication" to "TRUE". Can be done online. [code lang="sql"] orcl> alter system set enable_goldengate_replication=true sid='*' scope=both; System altered. orcl> [/code] And we are almost done. Now we can create a schema for a GoldenGate administrator, and provide required privileges. I've just granted DBA role to the user to simplify process. In any case you will need it in case of integrated capture. For a production installation I advise you to have a look at the documentation to verify necessary privileges and roles. [code lang="sql"] orcl> create user ogg identified by welcome1 default tablespace users temporary tablespace temp; orcl> grant connect, dba to ogg; [/code] Let's create a test schema to be replicated. We will call it schema on the source as ggtest and I will name the destination schema as bdtest. It will allow us also to check how the mapping works in our replication. [code lang="sql"] orcl> create tablespace ggtest; -- optional step orcl> create user ggtest identified by welcome1 default tablespace ggtest temporary tablespace temp; orcl> grant connect, resource to ggtest; [/code] Everything is ready on our source database for the replication. Now we are installing Oracle GoledenGate for Oracle to our database server. We can get the software from the Oracle site on the download page in the Middleware section, GoldenGate, Oracle GoldenGate for Oracle databases. We are going to use 220.127.116.11.1 version of the software. The installation is easy - you need to unzip the software and run Installer which will guide you through couple of simple steps. The installer will unpack the software to the destination location, create subdirectories, and register GoldenGate in the Oracle global registry. [code lang="bash"] [oracle@sandbox distr]$ unzip fbo_ggs_Linux_x64_shiphome.zip [oracle@sandbox distr]$ cd fbo_ggs_Linux_x64_shiphome/Disk1/ [oracle@sandbox Disk1]$ ./runInstaller [/code] We continue by setting up parameters for Oracle GoldenGate (OGG) manager and starting it up. You can see that I've used a default blowfish encryption for the password. In a production environment you may consider another encryption like AES256. I've also used a non-default port for the manager since I have more than one GoldenGate installation on my test sandbox. [code lang="text"] [oracle@sandbox ~]$ export OGG_HOME=/u01/oggora [oracle@sandbox ~]$ cd $OGG_HOME [oracle@sandbox oggora]$ ./ggsci GGSCI (sandbox.localdomain) 1> encrypt password welcome1 BLOWFISH ENCRYPTKEY DEFAULT Using Blowfish encryption with DEFAULT key. Encrypted password: AACAAAAAAAAAAAIARIXFKCQBMFIGFARA Algorithm used: BLOWFISH GGSCI (sandbox.localdomain) 2> edit params mgr PORT 7829 userid ogg@orcl,password AACAAAAAAAAAAAIARIXFKCQBMFIGFARA, BLOWFISH, ENCRYPTKEY DEFAULT purgeoldextracts /u01/oggora/dirdat/*, usecheckpoints GGSCI (sandbox.localdomain) 3> start manager Manager started. [/code] Let's prepare everything for initial load, and later online replication. I've decided to use GoldenGate initial load extract as the way for initial load for the sake of consistency for the resulted dataset on Hadoop. I prepared the parameter file to replicate my ggtest schema and upload all data to the trail file on remote site. I've used a minimum number of options for all my processes, providing only handful of parameters required for replication. Extract options is a subject deserving a dedicated blog post. Here is my simple initial extract: [code lang="text"] [oracle@sandbox oggora]$ cat /u01/oggora/dirprm/ini_ext.prm SOURCEISTABLE userid ogg@orcl,password AACAAAAAAAAAAAIARIXFKCQBMFIGFARA, BLOWFISH, ENCRYPTKEY DEFAULT --RMTHOSTOPTIONS RMTHOST sandbox, MGRPORT 7839 RMTFILE /u01/oggbd/dirdat/initld, MEGABYTES 2, PURGE --DDL include objname ggtest.* TABLE ggtest.*; [/code] Then we run the initial load extract in passive node and it will create a trail file with the data. The trail file will be used later for our initial load on the target side. [code lang="text"] [oracle@sandbox oggora]$ ./extract paramfile dirprm/ini_ext.prm reportfile dirrpt/ini_ext.rpt [oracle@sandbox oggora]$ ll /u01/oggbd/dirdat/initld* -rw-r-----. 1 oracle oinstall 3028 Feb 16 14:17 /u01/oggbd/dirdat/initld [oracle@sandbox oggora]$ [/code] We can also prepare our extract on the source site as well. I haven't used datapump in my configuration limiting the topology only by simplest and strait-forward extract to replicat configuration. Of course, in any production configuration I would advise using datapump on source for staging our data. Here are my extract parameters, and how I added it. I am not starting it yet because I must have an Oracle GoldenGate Manager running on the target, and the directory for the trail file should be created. You may have guessed that the Big Data GoldenGate will be located in /u01/oggbd directory. [code lang="text"] [oracle@sandbox oggora]$ ggsci Oracle GoldenGate Command Interpreter for Oracle Version 18.104.22.168.1 OGGCORE_22.214.171.124.0_PLATFORMS_151211.1401_FBO Linux, x64, 64bit (optimized), Oracle 12c on Dec 12 2015 02:56:48 Operating system character set identified as UTF-8. Copyright (C) 1995, 2015, Oracle and/or its affiliates. All rights reserved. GGSCI (sandbox.localdomain) 1> edit params ggext extract ggext userid ogg@orcl,password AACAAAAAAAAAAAIARIXFKCQBMFIGFARA, BLOWFISH, ENCRYPTKEY DEFAULT --RMTHOSTOPTIONS RMTHOST sandbox, MGRPORT 7839 RMTFILE /u01/oggbd/dirdat/or, MEGABYTES 2, PURGE DDL include objname ggtest.* TABLE ggtest.*; GGSCI (sandbox.localdomain) 2> dblogin userid ogg@orcl,password AACAAAAAAAAAAAIARIXFKCQBMFIGFARA, BLOWFISH, ENCRYPTKEY DEFAULT Successfully logged into database. GGSCI (sandbox.localdomain as ogg@orcl) 3> register extract GGEXT database 2016-02-16 15:37:21 INFO OGG-02003 Extract GGEXT successfully registered with database at SCN 17151616. GGSCI (sandbox.localdomain as ogg@orcl) 4> add extract ggext, INTEGRATED TRANLOG, BEGIN NOW EXTRACT (Integrated) added. [/code] Let's leave our source site for a while and switch to the target . Our target is going to be a box where we have hadoop client and all requirement java classes. I used the same box just to save resources on my sandbox environment. You may run different GoldeGate versions on the same box considering, that Manager ports for each of them will be different. Essentially we need a Hadoop client on the box, which can connect to HDFS and write data there. Installation of Hadoop client is out of the scope for this article, but you can easily get all necessary information from the Hadoop home page . Having all required Hadoop classes we continue by installing Oracle GoldenGate for Big Data, configuring and starting it up. In the past I received several questions from people struggling to find the exact place where all adapters could be uploaded. The Adapters were well "hidden" on "Oracle edelivery", but now it is way simpler. You are going to GoldenGate download page on Oracle site and find the section "Oracle GoldenGate for Big Data 126.96.36.199.0" where you can choose the OGG for Linux x86_64, Windows or Solaris. You will need an Oracle account to get it. We upload the file to our linux box, unzip and unpack the tar archive. I created a directory /u01/oggbd as our GoldenGate home and unpacked the tar archive there. The next step is to create all necessary directories. We start command line utility and create all subdirectories. [code lang="text"] [oracle@sandbox ~]$ cd /u01/oggbd/ [oracle@sandbox oggbd]$ ./ggsci Oracle GoldenGate Command Interpreter Version 188.8.131.52.0 OGGCORE_184.108.40.206.0_PLATFORMS_151101.1925.2 Linux, x64, 64bit (optimized), Generic on Nov 10 2015 16:18:12 Operating system character set identified as UTF-8. Copyright (C) 1995, 2015, Oracle and/or its affiliates. All rights reserved. GGSCI (sandbox.localdomain) 1> create subdirs Creating subdirectories under current directory /u01/oggbd Parameter files /u01/oggbd/dirprm: created Report files /u01/oggbd/dirrpt: created Checkpoint files /u01/oggbd/dirchk: created Process status files /u01/oggbd/dirpcs: created SQL script files /u01/oggbd/dirsql: created Database definitions files /u01/oggbd/dirdef: created Extract data files /u01/oggbd/dirdat: created Temporary files /u01/oggbd/dirtmp: created Credential store files /u01/oggbd/dircrd: created Masterkey wallet files /u01/oggbd/dirwlt: created Dump files /u01/oggbd/dirdmp: created GGSCI (sandbox.localdomain) 2> [/code] We are changing port for our manager process from default and starting it up. I've already mentioned that the port was changed due to the existence off several GoldenGate managers running from different directories. [code lang="text"] GGSCI (sandbox.localdomain) 2> edit params mgr PORT 7839 ..... GGSCI (sandbox.localdomain) 3> start manager Manager started. [/code] Now we have to prepare parameter files for our replicat processes. Let's assume the environment variable OGGHOME represents the GoldenGate home and in our case it is going to be /u01/oggbd. Examples for the parameter files can be taken from $OGGHOME/AdapterExamples/big-data directories. There you will find examples for flume, kafka, hdfs, hbase and for metadata providers. Today we are going to work with HDFS adapter. I copied files to my parameter files directory ($OGGHOME/dirprm) and modified them accordingly: [code lang="bash"] oracle@sandbox oggbd]$ cp /u01/oggbd/AdapterExamples/big-data/hdfs/* /u01/oggbd/dirprm/ oracle@sandbox oggbd]$ vi /u01/oggbd/dirprm/hdfs.props [/code] Here are my values for the hdfs.props file: [code lang="bash"] [oracle@bigdata dirprm]$ cat hdfs.props gg.handlerlist=hdfs gg.handler.hdfs.type=hdfs gg.handler.hdfs.includeTokens=false gg.handler.hdfs.maxFileSize=1g gg.handler.hdfs.rootFilePath=/user/oracle/gg gg.handler.hdfs.fileRollInterval=0 gg.handler.hdfs.inactivityRollInterval=0 gg.handler.hdfs.fileSuffix=.txt gg.handler.hdfs.partitionByTable=true gg.handler.hdfs.rollOnMetadataChange=true gg.handler.hdfs.authType=none gg.handler.hdfs.format=delimitedtext #gg.handler.hdfs.format.includeColumnNames=true gg.handler.hdfs.mode=tx goldengate.userexit.timestamp=utc goldengate.userexit.writers=javawriter javawriter.stats.display=TRUE javawriter.stats.full=TRUE gg.log=log4j gg.log.level=INFO gg.report.time=30sec gg.classpath=/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop/etc/hadoop/:/usr/lib/hadoop/lib/native/* javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar [/code] You can find information about all those parameters in oracle documentation here, but there are parameters you will most likely need to change from default:
- gg.handler.hdfs.rootFilePath - it will tell where the directories and files have to be written on HDFS.
- gg.handler.hdfs.format - you can setup one of the four formats supported by adapter.
- goldengate.userexit.timestamp - it will depend from your preferences for transactions timestamps written to your hdfs files.
- gg.classpath - it will depend from location for your hadoop jar classes and native libraries.