In this blog post, we continue our review of the new Oracle GoldenGate Big Data adapters. In the first part of the series, I tested the basic HDFS adapter and checked how it worked with some DML and DDL. In this article, I will try the Flume adapter and see how it performs.
A quick reminder on what Flume is: we aren't talking about the popular Australian musician. Apache Flume is a pipeline or streaming system designed to move large amounts of data efficiently.
It has a simple architecture consisting of three main components:
The focus of this article is how we can pass data from Oracle to Flume using GoldenGate. Let's assume we have an Oracle source system replicating DML and DDL for the GGTEST schema using Oracle GoldenGate 12.2.
First, we ensure the GoldenGate for Big Data (OGG BD) manager is up and running:
GGSCI (sandbox.localdomain) 1> info manager Manager is running (IP port sandbox.localdomain.7839, Process ID 18521).
We need to prepare the configuration file for the agent to handle the incoming stream. We will set our source to Avro (though Thrift is also supported) and the sink to HDFS. While using Flume to write to HDFS might seem redundant (since OGG has a native HDFS adapter), this setup is excellent for comparing adapter capabilities.
Flume Agent Configuration (flume.conf):
# Name/aliases for the components on this agent agent.sources = ogg1 agent.sinks = hdfs1 agent.channels = ch1 # Avro source agent.sources.ogg1.type = avro agent.sources.ogg1.bind = 0.0.0.0 agent.sources.ogg1.port = 4141 # Describe the sink agent.sinks.hdfs1.type = hdfs agent.sinks.hdfs1.hdfs.path = hdfs://sandbox/user/oracle/ggflume # Use a channel which buffers events in memory agent.channels.ch1.type = memory agent.channels.ch1.capacity = 100000 agent.channels.ch1.transactionCapacity = 10000 # Bind the source and sink to the channel agent.sources.ogg1.channels = ch1 agent.sinks.hdfs1.channel = ch1
Now we prepare the OGG configuration. Examples for the Flume adapter can be found in $OGG_HOME/AdapterExamples/big-data/flume/.
We need to adjust flume.props to point to our handler and define the format.
dirprm/flume.props snippet:
gg.handlerlist = flumehandler gg.handler.flumehandler.type=flume gg.handler.flumehandler.RpcClientPropertiesFile=custom-flume-rpc.properties gg.handler.flumehandler.format=avro_op gg.handler.flumehandler.mode=tx gg.handler.flumehandler.EventMapsTo=tx gg.handler.flumehandler.PropagateSchema=true gg.handler.flumehandler.format.WrapMessageInGenericAvroMessage=true
The custom-flume-rpc.properties file is used by the OGG adapter to connect to the flume-ng agent
client.type=default hosts=h1 hosts.h1=localhost:4141 batch-size=100 connect-timeout=20000 request-timeout=20000
With the configurations in place, we start with an initial load using a passive replicat.
# Executing initial load [oracle@sandbox oggbd]$ ./replicat paramfile dirprm/irflume.prm reportfile dirrpt/irflume.rpt
Upon success, three new files appear on HDFS: two containing the schema description and one containing the actual data for the replicated tables.
Next, we start a permanent replicat (rflume) to handle ongoing changes. We tested this by inserting a row on the source Oracle database
orclbd> insert into ggtest.test_tab_1 values (7, dbms_random.string('x', 8), sysdate-7, dbms_random.string('x', 8), sysdate-6); orclbd> commit;
Immediately after the commit, Flume generates new files on HDFS. The first file contains the updated schema, and the second contains the payload (the transaction data).
I executed regression testing using JMeter, pushing approximately 29 transactions per second. Even with a single Flume channel and a modest Hadoop environment, the system maintained a healthy response time without errors, packing about 900 transactions per HDFS file.
The adapter's behavior during DDL varies depending on the command:
The Oracle GoldenGate Flume adapter works as expected, successfully supporting the flow of transactions from Oracle to Flume using both Avro and Thrift sources. While this test served as a basic functional validation, a production implementation would require a more robust architecture.
Ready to optimize your Oracle Database for the future?