In the other posts in the series I reviewed four standard handlers provided out of the box. But what if you need more than that? What if you have another platform as a replication target? A good place to start your search is the Oracle Data Integration project on java.net. If you work regularly with ODI or GoldenGate it is one of the sites to save in your bookmarks. On this site you can find software you need, or even join the project and contribute as a developer. All software in the project is distributed under CDDL license.
In this article I am going to test the Oracle GoldenGate Adapter for MongoDB taken from the Oracle Data Integration project. The adapter is supposed to work with Oracle GoldenGate for Big Data, and deliver the same functionality as the standard handlers we have tested before. We already have the source side with extract working on an Oracle 12c database.
The replication is running from Oracle GoldenGate against on Oracle database to to the GoldenGate for Big Data where we've placed the adapter for MongoDB. The adapter can be downloaded. In the zip file you will find the adapter jar file, source code, and examples for the necessary configuration files.
As a first step I created a directory called "mongo" in my GoldenGate home for Big Data, and unpacked everything there. Later in the excerpts and output the environment variable $OGGBD is used to reference the Oracle GoldenGate Big Data home.
[oracle@sandbox ~]$ wget https://java.net/projects/oracledi/downloads/download/GoldenGate/Oracle%20GoldenGate%20Adapter%20for%20MongoDB/OGG%20for%20mongodb%20adapter_v1.0.zip .................. [oracle@sandbox ~]$ cd $OGGBD [oracle@sandbox oggbd]$ mkdir mongo [oracle@sandbox oggbd]$ unzip -q /u01/distr/OGG_for_mongodb_adapter_v1.0.zip -d mongo/ [oracle@sandbox oggbd]$ cd mongo/ [oracle@sandbox mongo]$ ll total 36 drwxr-xr-x. 2 oracle oinstall 4096 Mar 15 15:08 bin drwxr-xr-x. 2 oracle oinstall 4096 Feb 18 2016 dirprm -rw-r--r--. 1 oracle oinstall 16621 Feb 25 2016 LICENSE.txt -rw-r--r--. 1 oracle oinstall 2465 Mar 15 15:08 pom.xml drwxr-xr-x. 3 oracle oinstall 4096 Feb 18 2016 src [oracle@sandbox mongo]$
Now that we have the adapter placed in $OGGBD/mongo/bin, we need a java driver to access the MongoDB instance. You can read how to get the drivers in the installation guide. This is where we find a “MongoDB driver” which has all interfaces but requires the bson and mongodb-driver-core, and we have “Uber MongoDB Java Driver” containing everything. The latter is what we'll use for our replication. I am creating a new sub-directory "lib" inside the $OGGBD/mongo where I place the driver:
[oracle@sandbox mongo]$ mkdir lib [oracle@sandbox mongo]$ cd lib [oracle@sandbox lib]$ wget https://oss.sonatype.org/content/repositories/releases/org/mongodb/mongo-java-driver/3.2.2/mongo-java-driver-3.2.2.jar --2016-09-01 11:46:34-- https://oss.sonatype.org/content/repositories/releases/org/mongodb/mongo-java-driver/3.2.2/mongo-java-driver-3.2.2.jar Resolving oss.sonatype.org... 52.22.249.229, 107.23.166.173 Connecting to oss.sonatype.org|52.22.249.229|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1484724 (1.4M) [application/java-archive] Saving to: “mongo-java-driver-3.2.2.jar” 100%[=====================================================================================================================================================================>] 1,484,724 1.76M/s in 0.8s 2016-09-01 11:46:35 (1.76 MB/s) - “mongo-java-driver-3.2.2.jar” saved [1484724/1484724] [oracle@sandbox lib]$
The next step is to create our parameter files based on the examples located in $OGGBD/mongo/dirprm. For the MongoDB parameter file I just copied the file mongo.props to $OGGBD/dirprm/mongo.props and adjusted the gg.classpath pointing to our downloaded java driver and the MongoDB adapter while keeping the rest of the file intact:
[oracle@sandbox oggbd]$ cat $OGGBD/dirprm/mongo.props gg.handlerlist=mongodb gg.handler.mongodb.type=oracle.goldengate.delivery.handler.mongodb.MongoDBHandler gg.handler.mongodb.clientURI=mongodb://localhost:27017/ #gg.handler.mongodb.clientURI=mongodb://ogg:ogg@localhost:27017/?authSource=admin&authMechanism=SCRAM-SHA-1 gg.handler.mongodb.mode=tx goldengate.userexit.timestamp=utc goldengate.userexit.writers=javawriter javawriter.stats.display=TRUE javawriter.stats.full=TRUE gg.log=log4j gg.log.level=DEBUG gg.report.time=30sec ##CHANGE THE PATH BELOW gg.classpath=/u01/oggbd/mongo/lib/mongo-java-driver-3.2.2.jar:/u01/oggbd/mongo/bin/ogg-mongodb-adapter-1.0.jar: javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar:
For an initial load I've created a parameter file irmongo.prm based on sample rmongo.prm from $OGGBD/mongo/dirprm.
[oracle@sandbox oggbd]$ cat $OGGBD/dirprm/irmongo.prm -- Initial load SPECIALRUN END RUNTIME EXTFILE dirdat/initld TARGETDB LIBFILE libggjava.so SET property=dirprm/mongo.props REPORTCOUNT EVERY 1 MINUTES, RATE GROUPTRANSOPS 10000 MAP ggtest.*, TARGET ggtest.*;
So, GoldenGate is now ready, and we can focus on the MongoDB instance. For demonstration we are using a really simple installation without any special customization. I’ve used the latest community edition of MongoDB 3.2. It can be easily installed to your box using yum service from the MongoDB repository.
[oracle@sandbox oggbd]$ cat /etc/yum.repos.d/mongodb-org-3.2.repo [mongodb-org-3.2] name=MongoDB Repository baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.2/x86_64/ gpgcheck=1 enabled=1 gpgkey=https://www.mongodb.org/static/pgp/server-3.2.asc [root@sandbox ~]# yum install -y mongodb-org ............ [root@sandbox ~]# service mongod start [root@sandbox ~]# service mongod status mongod (pid 10459) is running...
I've made minimal changes in the default mongodb configuration and hence am pretty much using it as it is out of box. From the mongodb shell you can check existing databases and see that nothing has been created so far.
[oracle@sandbox ~]$ mongo MongoDB shell version: 3.2.9 connecting to: test > show dbs local 0.000GB >
Now that everything is ready to go, I can run the initial load using an OGG trail file. As you can see I’ve used a passive replicat with the parameter file created earlier.
[oracle@sandbox oggbd]$ ./replicat paramfile dirprm/irmongo.prm reportfile dirrpt/irmongo.rpt [oracle@sandbox oggbd]$
As result I got a new database in MongoDB with two collections(tables) inside. Exactly what we would expect.
> show dbs GGTEST 0.000GB local 0.000GB > use GGTEST switched to db GGTEST > show collections TEST_TAB_1 TEST_TAB_2 > db.TEST_TAB_1.find() { "_id" : ObjectId("57c8789452ed9223e6036190"), "ACC_DATE" : "2014-10-26:08:18:58", "USE_DATE" : "2016-02-13:08:34:19", "RND_STR_1" : "BGBXRKJL", "RND_STR_2" : "M6ZSQF4V", "ID" : "1" } ...
The initial load worked perfectly well. We got the database, collections(tables) and data inside. You can see that the new mongodb “_id” key was created for each record and the date format is “YYYY-MM-DD:HH24:MI:SS” which looks like a default date format for GoldenGate.
To start ongoing replication, we need a parameter file for the replicat. I've copied the rmongo.prm and changed only the schema mapping line.
[oracle@sandbox oggbd]$ cat $OGGBD/dirprm/rmongo.prm REPLICAT rmongo TARGETDB LIBFILE libggjava.so SET property=dirprm/mongo.props REPORTCOUNT EVERY 1 MINUTES, RATE GROUPTRANSOPS 10000 SOURCECATALOG ORCL MAP GGTEST.*, TARGET GGTEST.*;
Now I can add the replicat and start it up:
GGSCI (sandbox.localdomain) 7> add replicat rmongo, exttrail dirdat/or REPLICAT added. GGSCI (sandbox.localdomain) 8> start replicat rmongo Sending START request to MANAGER ... REPLICAT RMONGO starting
The first test was a simple insert of a record into our test table:
orcl> insert into ggtest.test_tab_1 values (6,'test_ins',sysdate,'test_ins',sysdate); 1 row created. orcl> commit;
It worked perfectly well and we see the record replicated to the collection on MongoDB side.
> db.TEST_TAB_1.find({"ID":"6"}) { "_id" : ObjectId("57c87a6952ed9223e6036195"), "ACC_DATE" : "2016-09-01:14:58:41", "USE_DATE" : "2016-09-01:14:58:41", "RND_STR_1" : "test_ins", "RND_STR_2" : "test_ins", "ID" : "6" }
The update and delete operations worked correctly as well, with changes successfully appearing in MongoDB in near real-time.
I tried some DDL, starting with indexes. On MongoDB we have only one default index for “_id” key. I tried to create a new index on the Oracle side, but it was not replicated to the MongoDB.
When you add a column to the table it will work, but only for newly inserted records. If you try to update one of the old records and put a new value to the new column it is not going to change the old record.
I also tried to run a "create a table as select" (CTAS) and received the same error as for any other Big Data adapters:
ERROR OGG-00453 Oracle GoldenGate Delivery, rmongo.prm: DDL Replication is not supported for this database. ERROR OGG-01668 Oracle GoldenGate Delivery, rmongo.prm: PROCESS ABENDING.
Of course we can use a workaround by splitting the operation into two steps: creating the table first and inserting all records from the other table after that.
Truncates are not replicated and you have to be careful because you may end up with duplicated rows in your MongoDB. According to documentation they have to be replicated, but in my tests, the old data was not wiped out and all the new inserts created duplicated records in the collection.
> db.TEST_TAB_1.count() 10
I continued the test by checking performance. A simple insert for about 1,000,000 records revealed interesting results.
Initially, with default parameters, the rate was about 300,000 records per hour (83 ops). Switching gg.handler.mongodb.mode to "op" (operational) increased this to 807 ops. However, the biggest bottleneck was the log level. Switching gg.log.level from "DEBUG" to "INFO" changed everything—the rate jumped to 9,500 ops!
When increasing speed, the java Garbage Collector may struggle. I had to increase the java heap memory from 512M to 1024M to avoid java.lang.OutOfMemoryError.
In the final OLTP workload tests using JMeter, I reached up to 227 transactions per second with 75% updates without issues, aside from slight delays in retrieving status statistics.
Here is a short summary. The MongoDB adapter is easy to use, it has most of expected configuration options, replicates all DML correctly and with good speed. It doesn't replicate truncates and some other DDL like an index creation, and cannot handle Create Table As Select (CTAS) clause. You need to keep in mind all table changes will be applied to only new inserted rows. I would advise to switch logging from "DEBUG" to "INFO" and use "op" (operational) mode for the replicat to improve performance and avoid memory related issues with big transactions. Also, you may need to tune java memory parameters for the adapter since the default value did show instability on large transactions. And the last thing I would love to see in the package is a piece of proper documentation with a description of all possible parameters. I hope you have found this article helpful. Happy implementations.
Ready to optimize your Oracle Database for the future?