At the ‘Extract Data’ phase of the migration process (see migration process here ), add the ‘GenerateTableFetch’ processor. This processor generates ‘SQL select’ statements to fetch data from the table in multiple chunks. Edit the properties of the processor and set the ‘Partition Size’ property to limit the number of rows fetched in a single chunk. Set the ‘Partition Size’ based on the configuration (heap size) of the Apache Nifi instance. The ‘GenerateTableFetch’ processor is commonly used for incremental data extraction. Given a column/list of columns, which uniquely represents a row, to keep a record of the previously fetched rows, the processor fetches only the new or updated records from the table. Add the column-name(s), which will be used for incremental data extraction, to the ‘Maximum-value Columns’ property of the processor. Note that if the 'Maximum-value Columns' property is left empty, then all columns of the table will be used for incremental fetch and will have a performance impact. One of the drawbacks of the first approach is that, for a better performance, the 'GenerateTableFetch' processor requires a list of column(s) to be added as a property to use for incremental data extraction.
In the second approach, extract all the table data using ExecuteSQL processor at once, and then split the data before ingesting. At the ‘Ingest Data’ phase of the migration process, use the ‘SplitAvro’ processor to split the incoming Avro file into multiple multiple Avro files. Set the ‘Output Size’ property of the processor equal to the number of rows to be contained in a single split file. In this way, the ‘PutDatabaseRecord’ processor will receive multiple small flowfiles, preventing ‘OutOfMemory’ error or Nifi crash. Unlike the previous approach, the advantage of this approach is that the ‘SplitAvro’ processor does not require a list of column name(s) for splitting the flowfile.
In this post, we discussed two approaches to handle large table during database migration. If you have any comments or questions, please post them in the comments section.
Ready to optimize your Oracle Database for the future?