{"id":41688,"date":"2025-11-30T23:04:54","date_gmt":"2025-11-30T22:04:54","guid":{"rendered":"https:\/\/www.dbi-services.com\/blog\/?p=41688"},"modified":"2025-11-30T23:04:56","modified_gmt":"2025-11-30T22:04:56","slug":"oracle-to-postgresql-migration-with-flink-cdc","status":"publish","type":"post","link":"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/","title":{"rendered":"Oracle to PostgreSQL Migration with Flink CDC"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" id=\"h-introduction\">Introduction<\/h2>\n\n\n\n<p>When wanting to migrate from the big red to PostgreSQL, most of the time you can afford the downtime of the export\/import process and starting from something fresh. It is simple and reliable. Ora2pg being one of the go-to tools for that. But sometimes, you can afford the downtime, either because the database is critical for business operations or either because the DB is to big to run the export\/import process. <br>Hence the following example of using &#8220;Logical replication&#8221; between Oracle and PostgreSQL using Flink CDC. I call it like that even though it is a even stream process because for DBAs it will have roughly the same limitations and constraints as standard logical replication. <\/p>\n\n\n\n<p>Here is the layout : <\/p>\n\n\n\n<p><strong>Oracle Source \u2192 Flink CDC \u2192 JDBC Sink \u2192 PostgreSQL Target<\/strong><\/p>\n\n\n\n<p>This approach is based on production experience migrating large Oracle databases, where we achieved throughput of <strong>19,500 records per second<\/strong>\u2014a 65x improvement over our initial baseline. But more importantly, it transformed a &#8220;big bang&#8221; migration event into a controlled, observable, and recoverable process.<\/p>\n\n\n\n<p>The geek in me says that Flink CDC is a powerful tool for migrations. The consultant says it should not be used blindly\u2014it&#8217;s relevant for specific use cases where the benefits outweigh the operational complexity.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-each-piece-does\">What Each Piece Does<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Oracle (source)<\/strong>: The source database. Flink CDC reads directly from tables via JDBC for snapshot mode, or from redo logs for streaming CDC mode.<\/li>\n\n\n\n<li><strong>Flink CDC source (Oracle)<\/strong>: A Flink table that wraps the Debezium Oracle connector. It reads data and turns it into a changelog stream (insert\/update\/delete). Key options control snapshot mode, parallelism, and fetch sizes.<\/li>\n\n\n\n<li><strong>Flink runtime<\/strong>: Runs a streaming job that:\n<ul class=\"wp-block-list\">\n<li><strong>Snapshot<\/strong>: Reads current table state, optionally in parallel chunks<\/li>\n\n\n\n<li><strong>Checkpoints<\/strong>: State is stored so restarts resume exactly from the last acknowledged point<\/li>\n\n\n\n<li><strong>Transforms<\/strong>: You can filter, project, cast types, and even aggregate in Flink SQL<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>JDBC sink (PostgreSQL)<\/strong>: Another Flink table. With a PRIMARY KEY defined, the connector performs UPSERT semantics (<code>INSERT ... ON CONFLICT DO UPDATE<\/code> in PostgreSQL). It writes in batches, flushes on checkpoints, and retries on transient errors.<\/li>\n\n\n\n<li><strong>PostgreSQL (target)<\/strong>: Receives the stream and ends up with the migrated data. With proper tuning (especially <code>rewriteBatchedInserts=true<\/code>), it can handle high throughput.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-flink-and-debezium-how-cdc-works\">Flink and Debezium: How CDC Works<\/h2>\n\n\n\n<p>Flink CDC connectors use Debezium which is an open-source platform for Change Data Capture that captures row-level changes in databases by reading transaction logs.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502                        Flink CDC Architecture                         \u2502\n\u2502                                                                       \u2502\n\u2502  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510    \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510    \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502\n\u2502  \u2502    Oracle    \u2502    \u2502      Flink CDC Connector       \u2502    \u2502  Sink  \u2502 \u2502\n\u2502  \u2502   Database   \u2502    \u2502  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510   \u2502    \u2502Database\u2502 \u2502\n\u2502  \u2502              \u2502\u2500\u2500\u2500\u25ba\u2502  \u2502   Debezium (embedded)   \u2502   \u2502\u2500\u2500\u2500\u25ba\u2502        \u2502 \u2502\n\u2502  \u2502  \u2022 Redo logs \u2502    \u2502  \u2502   \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500     \u2502   \u2502    \u2502  Post- \u2502 \u2502\n\u2502  \u2502  \u2022 Tables    \u2502    \u2502  \u2502   \u2022 Oracle connector    \u2502   \u2502    \u2502  greSQL\u2502 \u2502\n\u2502  \u2502              \u2502    \u2502  \u2502   \u2022 Log parsing         \u2502   \u2502    \u2502        \u2502 \u2502\n\u2502  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518    \u2502  \u2502   \u2022 Event streaming     \u2502   \u2502    \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502\n\u2502                      \u2502  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518   \u2502               \u2502\n\u2502                      \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518               \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-why-debezium\">Why Debezium?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Log-based CDC<\/strong>: Reads database transaction logs, not polling tables\u2014much lower overhead<\/li>\n\n\n\n<li><strong>Low impact<\/strong>: Minimal performance hit on source database<\/li>\n\n\n\n<li><strong>Exactly-once delivery<\/strong>: When combined with Flink&#8217;s checkpointing<\/li>\n\n\n\n<li><strong>Schema tracking<\/strong>: Handles schema evolution in streaming scenarios<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-snapshot-vs-cdc-modes\">Snapshot vs. CDC Modes<\/h3>\n\n\n\n<p>When you configure a Flink CDC source, you can choose:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Snapshot Only<\/strong>: Read current table state (what we use in this demo)\u2014fastest for one-time migrations<\/li>\n\n\n\n<li><strong>Snapshot + CDC<\/strong>: Initial snapshot, then stream ongoing changes\u2014for zero-downtime migrations<\/li>\n\n\n\n<li><strong>CDC Only<\/strong>: Stream only new changes (requires existing snapshot)<br><br><em>Note : Snapshot itself can be done with in one transaction (can be long for big tables) or using incremental snapshot<\/em>.<em> Since I am using an Oracle express edition for this demo I will stick with the normal Snapshot.<\/em> <em>In case having big tables to load standard\/enterprise editions are required for supplemental logs<\/em>.  <\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-anatomy-of-a-flink-sql-pipeline\">Anatomy of a Flink SQL Pipeline<\/h2>\n\n\n\n<p>A Flink SQL migration pipeline has <strong>four distinct parts<\/strong>. Understanding each part is critical for troubleshooting and optimization.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-part-1-runtime-configuration-set-statements\">Part 1: Runtime Configuration (SET Statements)<\/h3>\n\n\n\n<p>These settings control how the Flink job executes. Think of them as the &#8220;knobs&#8221; you turn to tune behavior:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n-- Pipeline identification\nSET &#039;pipeline.name&#039; = &#039;Oracle-to-PostgreSQL: CUSTOMERS Migration&#039;;\n\n-- Runtime mode: STREAMING for CDC, BATCH for one-time loads\nSET &#039;execution.runtime-mode&#039; = &#039;STREAMING&#039;;\n\n-- Parallelism: how many workers process data concurrently\nSET &#039;parallelism.default&#039; = &#039;4&#039;;\n\n-- Checkpointing: how often Flink saves progress for recovery\nSET &#039;execution.checkpointing.mode&#039; = &#039;AT_LEAST_ONCE&#039;;\nSET &#039;execution.checkpointing.interval&#039; = &#039;60 s&#039;;\nSET &#039;execution.checkpointing.timeout&#039; = &#039;10 min&#039;;\nSET &#039;execution.checkpointing.min-pause&#039; = &#039;30 s&#039;;\n\n-- Restart strategy: what happens on failure\nSET &#039;restart-strategy.type&#039; = &#039;fixed-delay&#039;;\nSET &#039;restart-strategy.fixed-delay.attempts&#039; = &#039;3&#039;;\nSET &#039;restart-strategy.fixed-delay.delay&#039; = &#039;10 s&#039;;\n<\/pre><\/div>\n\n\n<p><strong>Key points:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>AT_LEAST_ONCE<\/code> is faster than <code>EXACTLY_ONCE<\/code> for snapshot migrations where idempotency is guaranteed by upserts<\/li>\n\n\n\n<li>Checkpoint interval affects both recovery granularity and overhead<\/li>\n\n\n\n<li>Higher parallelism isn&#8217;t always better\u2014you can hit contention on the target<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-part-2-source-table-definition-oracle-cdc\">Part 2: Source Table Definition (Oracle CDC)<\/h3>\n\n\n\n<p>This defines how Flink reads from Oracle. The column definitions must match your Oracle schema, using Flink SQL types:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nDROP TABLE IF EXISTS src_customers;\n\nCREATE TABLE src_customers (\n    -- Column definitions must match Oracle schema\n    -- Use Flink SQL types that map to Oracle types\n    CUSTOMER_ID   DECIMAL(10,0),\n    FIRST_NAME    STRING,\n    LAST_NAME     STRING,\n    EMAIL         STRING,\n    PHONE         STRING,\n    CREATED_AT    TIMESTAMP(6),\n    STATUS        STRING,\n    -- Primary key is required for CDC (NOT ENFORCED = Flink won&#039;t validate)\n    PRIMARY KEY (CUSTOMER_ID) NOT ENFORCED\n) WITH (\n    -- Connector type: oracle-cdc (uses Debezium internally)\n    &#039;connector&#039; = &#039;oracle-cdc&#039;,\n\n    -- Oracle connection details\n    &#039;hostname&#039; = &#039;oracle&#039;,\n    &#039;port&#039; = &#039;1521&#039;,\n    &#039;username&#039; = &#039;demo&#039;,\n    &#039;password&#039; = &#039;demo&#039;,\n\n    -- Database configuration (pluggable database for Oracle XE)\n    -- Use url to connect via service name instead of SID\n    &#039;url&#039; = &#039;jdbc:oracle:thin:@\/\/oracle:1521\/XEPDB1&#039;,\n    &#039;database-name&#039; = &#039;XEPDB1&#039;,\n    &#039;schema-name&#039; = &#039;DEMO&#039;,\n    &#039;table-name&#039; = &#039;CUSTOMERS&#039;,\n\n    -- Snapshot mode: &#039;initial&#039; = full snapshot, then stop (for snapshot-only)\n    &#039;scan.startup.mode&#039; = &#039;initial&#039;,\n\n    -- IMPORTANT: Disable incremental snapshot for this demo\n    -- Incremental snapshot requires additional Oracle configuration\n    &#039;scan.incremental.snapshot.enabled&#039; = &#039;false&#039;,\n\n    -- Debezium snapshot configuration\n    &#039;debezium.snapshot.mode&#039; = &#039;initial&#039;,\n    &#039;debezium.snapshot.fetch.size&#039; = &#039;10000&#039;\n);\n<\/pre><\/div>\n\n\n<p><strong>Key concepts:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>PRIMARY KEY NOT ENFORCED<\/strong>: Tells Flink the key exists but it won&#8217;t validate uniqueness<\/li>\n\n\n\n<li><strong><code>scan.incremental.snapshot.enabled<\/code><\/strong>: Set to <code>false<\/code> for simple snapshots; <code>true<\/code> requires Oracle archive log mode and supplemental logging<\/li>\n\n\n\n<li><strong><code>debezium.snapshot.fetch.size<\/code><\/strong>: How many rows to fetch per database round-trip\u2014larger = fewer round-trips<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-part-3-sink-table-definition-postgresql-jdbc\">Part 3: Sink Table Definition (PostgreSQL JDBC)<\/h3>\n\n\n\n<p>This defines how Flink writes to PostgreSQL:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nDROP TABLE IF EXISTS sink_customers;\n\nCREATE TABLE sink_customers (\n    -- Column definitions for PostgreSQL target\n    customer_id   BIGINT,\n    first_name    STRING,\n    last_name     STRING,\n    email         STRING,\n    phone         STRING,\n    created_at    TIMESTAMP(6),\n    status        STRING,\n    PRIMARY KEY (customer_id) NOT ENFORCED\n) WITH (\n    -- Connector type: jdbc\n    &#039;connector&#039; = &#039;jdbc&#039;,\n\n    -- PostgreSQL connection with batch optimization\n    -- rewriteBatchedInserts=true is CRITICAL for performance (5-10x improvement)\n    &#039;url&#039; = &#039;jdbc:postgresql:\/\/postgres:5432\/demo?rewriteBatchedInserts=true&#039;,\n    &#039;table-name&#039; = &#039;customers&#039;,\n    &#039;username&#039; = &#039;demo&#039;,\n    &#039;password&#039; = &#039;demo&#039;,\n    &#039;driver&#039; = &#039;org.postgresql.Driver&#039;,\n\n    -- Sink parallelism (tune based on target DB capacity)\n    -- Too high can cause contention; 4-8 is usually optimal\n    &#039;sink.parallelism&#039; = &#039;4&#039;,\n\n    -- Buffer configuration for throughput\n    &#039;sink.buffer-flush.max-rows&#039; = &#039;10000&#039;,\n    &#039;sink.buffer-flush.interval&#039; = &#039;5 s&#039;,\n\n    -- Retry configuration\n    &#039;sink.max-retries&#039; = &#039;3&#039;\n);\n<\/pre><\/div>\n\n\n<p><strong>Key optimization: <code>rewriteBatchedInserts=true<\/code><\/strong> is critical for PostgreSQL performance. This tells the JDBC driver to rewrite individual INSERT statements into a single multi-row INSERT:<\/p>\n\n\n\n<p>Without this:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nINSERT INTO t (a,b) VALUES (1,&#039;x&#039;);\nINSERT INTO t (a,b) VALUES (2,&#039;y&#039;);\nINSERT INTO t (a,b) VALUES (3,&#039;z&#039;);\n<\/pre><\/div>\n\n\n<p>With <code>rewriteBatchedInserts=true<\/code>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nINSERT INTO t (a,b) VALUES (1,&#039;x&#039;),(2,&#039;y&#039;),(3,&#039;z&#039;);\n<\/pre><\/div>\n\n\n<p>This single change gave us a <strong>5-10x throughput improvement<\/strong> in production.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-part-4-data-flow-insert-select\">Part 4: Data Flow (INSERT\u2026SELECT)<\/h3>\n\n\n\n<p>This starts the actual data migration. The <code>CAST<\/code> operations convert Oracle types to PostgreSQL types:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nINSERT INTO sink_customers\nSELECT\n    CAST(CUSTOMER_ID AS BIGINT) AS customer_id,\n    FIRST_NAME AS first_name,\n    LAST_NAME AS last_name,\n    EMAIL AS email,\n    PHONE AS phone,\n    CREATED_AT AS created_at,\n    STATUS AS status\nFROM src_customers;\n<\/pre><\/div>\n\n\n<p>This single statement:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Reads from the Oracle source table<\/li>\n\n\n\n<li>Transforms data types (CAST operations)<\/li>\n\n\n\n<li>Writes to the PostgreSQL sink table<\/li>\n\n\n\n<li>Handles batching, parallelism, and fault tolerance automatically<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-complete-sql-example\">Complete SQL Example<\/h2>\n\n\n\n<p>Here is the complete migration pipeline that you can run in Flink SQL Client. This is production-ready code with all the optimizations we&#8217;ve discussed:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- =============================================================================\n-- Flink CDC Pipeline: Migrate CUSTOMERS table (Oracle -&gt; PostgreSQL)\n-- =============================================================================\n-- Mode: Snapshot-only (no incremental, no streaming)\n-- Source: Oracle XE 21c\n-- Target: PostgreSQL 18\n-- =============================================================================\n\n-- ============================================================================\n-- PART 1: Runtime Configuration (SET statements)\n-- ============================================================================\n-- These settings control how the Flink job executes\n\nSET &#039;pipeline.name&#039; = &#039;Oracle-to-PostgreSQL: CUSTOMERS Migration&#039;;\nSET &#039;execution.runtime-mode&#039; = &#039;STREAMING&#039;;\nSET &#039;parallelism.default&#039; = &#039;4&#039;;\n\n-- Checkpointing configuration\n-- AT_LEAST_ONCE is faster for snapshot\/migration workloads\nSET &#039;execution.checkpointing.mode&#039; = &#039;AT_LEAST_ONCE&#039;;\nSET &#039;execution.checkpointing.interval&#039; = &#039;60 s&#039;;\nSET &#039;execution.checkpointing.timeout&#039; = &#039;10 min&#039;;\nSET &#039;execution.checkpointing.min-pause&#039; = &#039;30 s&#039;;\n\n-- Restart strategy for fault tolerance\nSET &#039;restart-strategy.type&#039; = &#039;fixed-delay&#039;;\nSET &#039;restart-strategy.fixed-delay.attempts&#039; = &#039;3&#039;;\nSET &#039;restart-strategy.fixed-delay.delay&#039; = &#039;10 s&#039;;\n\n-- ============================================================================\n-- PART 2: Source Table Definition (Oracle CDC)\n-- ============================================================================\n-- This defines how Flink reads from Oracle using Debezium under the hood\n\nDROP TABLE IF EXISTS src_customers;\n\nCREATE TABLE src_customers (\n    -- Column definitions must match Oracle schema\n    -- Use Flink SQL types that map to Oracle types\n    CUSTOMER_ID   DECIMAL(10,0),\n    FIRST_NAME    STRING,\n    LAST_NAME     STRING,\n    EMAIL         STRING,\n    PHONE         STRING,\n    CREATED_AT    TIMESTAMP(6),\n    STATUS        STRING,\n    -- Primary key is required for CDC (NOT ENFORCED = Flink won&#039;t validate)\n    PRIMARY KEY (CUSTOMER_ID) NOT ENFORCED\n) WITH (\n    -- Connector type: oracle-cdc (uses Debezium internally)\n    &#039;connector&#039; = &#039;oracle-cdc&#039;,\n\n    -- Oracle connection details\n    &#039;hostname&#039; = &#039;oracle&#039;,\n    &#039;port&#039; = &#039;1521&#039;,\n    &#039;username&#039; = &#039;demo&#039;,\n    &#039;password&#039; = &#039;demo&#039;,\n\n    -- Database configuration (pluggable database for Oracle XE)\n    -- Use url to connect via service name instead of SID\n    &#039;url&#039; = &#039;jdbc:oracle:thin:@\/\/oracle:1521\/XEPDB1&#039;,\n    &#039;database-name&#039; = &#039;XEPDB1&#039;,\n    &#039;schema-name&#039; = &#039;DEMO&#039;,\n    &#039;table-name&#039; = &#039;CUSTOMERS&#039;,\n\n    -- Snapshot mode: &#039;initial&#039; = full snapshot, then stop (for snapshot-only)\n    &#039;scan.startup.mode&#039; = &#039;initial&#039;,\n\n    -- IMPORTANT: Disable incremental snapshot for this demo\n    -- Incremental snapshot requires additional Oracle configuration\n    &#039;scan.incremental.snapshot.enabled&#039; = &#039;false&#039;,\n\n    -- Debezium snapshot configuration\n    &#039;debezium.snapshot.fetch.size&#039; = &#039;10000&#039;\n);\n\n-- ============================================================================\n-- PART 3: Sink Table Definition (PostgreSQL JDBC)\n-- ============================================================================\n-- This defines how Flink writes to PostgreSQL\n\nDROP TABLE IF EXISTS sink_customers;\n\nCREATE TABLE sink_customers (\n    -- Column definitions for PostgreSQL target\n    customer_id   BIGINT,\n    first_name    STRING,\n    last_name     STRING,\n    email         STRING,\n    phone         STRING,\n    created_at    TIMESTAMP(6),\n    status        STRING,\n    PRIMARY KEY (customer_id) NOT ENFORCED\n) WITH (\n    -- Connector type: jdbc\n    &#039;connector&#039; = &#039;jdbc&#039;,\n\n    -- PostgreSQL connection with batch optimization\n    -- rewriteBatchedInserts=true is CRITICAL for performance (5-10x improvement)\n    &#039;url&#039; = &#039;jdbc:postgresql:\/\/postgres:5432\/demo?rewriteBatchedInserts=true&#039;,\n    &#039;table-name&#039; = &#039;customers&#039;,\n    &#039;username&#039; = &#039;demo&#039;,\n    &#039;password&#039; = &#039;demo&#039;,\n    &#039;driver&#039; = &#039;org.postgresql.Driver&#039;,\n\n    -- Sink parallelism (tune based on target DB capacity)\n    -- Too high can cause contention; 4-8 is usually optimal\n    &#039;sink.parallelism&#039; = &#039;4&#039;,\n\n    -- Buffer configuration for throughput\n    &#039;sink.buffer-flush.max-rows&#039; = &#039;10000&#039;,\n    &#039;sink.buffer-flush.interval&#039; = &#039;5 s&#039;,\n\n    -- Retry configuration\n    &#039;sink.max-retries&#039; = &#039;3&#039;\n);\n\n-- ============================================================================\n-- PART 4: Data Flow (INSERT...SELECT)\n-- ============================================================================\n-- This starts the actual data migration\n-- CAST operations convert Oracle types to PostgreSQL types\n\nINSERT INTO sink_customers\nSELECT\n    CAST(CUSTOMER_ID AS BIGINT) AS customer_id,\n    FIRST_NAME AS first_name,\n    LAST_NAME AS last_name,\n    EMAIL AS email,\n    PHONE AS phone,\n    CREATED_AT AS created_at,\n    STATUS AS status\nFROM src_customers;\n<\/pre><\/div>\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-lab-setup\">LAB Setup <\/h3>\n\n\n\n<p>In my LAB I am using PG18 and Oracle XE Docker container and the Flink task and job manager container with the follwing definition : <br> <br><strong>Create a <code>docker-compose.yml<\/code>:<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: yaml; title: ; notranslate\" title=\"\">\nservices:\n  # Oracle Database 21c XE (Source)\n  oracle:\n    image: gvenzl\/oracle-xe:21-slim-faststart\n    container_name: oracle-demo\n    environment:\n      ORACLE_PASSWORD: OracleDemo123\n      APP_USER: demo\n      APP_USER_PASSWORD: demo\n    ports:\n      - &quot;1521:1521&quot;\n    volumes:\n      - oracle-data:\/opt\/oracle\/oradata\n      - .\/oracle-init:\/container-entrypoint-initdb.d\n    healthcheck:\n      test: &#x5B;&quot;CMD&quot;, &quot;healthcheck.sh&quot;]\n      interval: 30s\n      timeout: 10s\n      retries: 10\n      start_period: 120s\n    networks:\n      - flink-network\n\n  # PostgreSQL 18 (Target)\n  postgres:\n    image: postgres:18\n    container_name: postgres-demo\n    environment:\n      POSTGRES_USER: demo\n      POSTGRES_PASSWORD: demo\n      POSTGRES_DB: demo\n    ports:\n      - &quot;5432:5432&quot;\n    volumes:\n      - postgres-data:\/var\/lib\/postgresql\n      - .\/postgres-init:\/docker-entrypoint-initdb.d\n    healthcheck:\n      test: &#x5B;&quot;CMD-SHELL&quot;, &quot;pg_isready -U demo -d demo&quot;]\n      interval: 10s\n      timeout: 5s\n      retries: 5\n    networks:\n      - flink-network\n\n  # Flink Job Manager\n  flink-jobmanager:\n    build:\n      context: .\/flink\n      dockerfile: Dockerfile\n    container_name: flink-jobmanager\n    ports:\n      - &quot;8081:8081&quot;\n    command: jobmanager\n    environment:\n      - |\n        FLINK_PROPERTIES=\n        jobmanager.rpc.address: flink-jobmanager\n        jobmanager.memory.process.size: 1600m\n        parallelism.default: 4\n        state.backend.type: rocksdb\n    volumes:\n      - .\/pipelines:\/opt\/flink\/pipelines\n    networks:\n      - flink-network\n    depends_on:\n      oracle:\n        condition: service_healthy\n      postgres:\n        condition: service_healthy\n\n  # Flink Task Manager\n  flink-taskmanager:\n    build:\n      context: .\/flink\n      dockerfile: Dockerfile\n    container_name: flink-taskmanager\n    command: taskmanager\n    environment:\n      - |\n        FLINK_PROPERTIES=\n        jobmanager.rpc.address: flink-jobmanager\n        taskmanager.memory.process.size: 2048m\n        taskmanager.numberOfTaskSlots: 8\n    networks:\n      - flink-network\n    depends_on:\n      - flink-jobmanager\n\nvolumes:\n  oracle-data:\n  postgres-data:\n\nnetworks:\n  flink-network:\n    driver: bridge\n<\/pre><\/div>\n\n\n<p><\/p>\n\n\n\n<p><strong>Create <code>flink\/Dockerfile<\/code>:<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: yaml; title: ; notranslate\" title=\"\">\nFROM flink:1.20.3-scala_2.12-java11\n\n# Download Flink CDC connector for Oracle\nRUN wget -P \/opt\/flink\/lib\/ \\\n    https:\/\/repo1.maven.org\/maven2\/org\/apache\/flink\/flink-sql-connector-oracle-cdc\/3.5.0\/flink-sql-connector-oracle-cdc-3.5.0.jar\n\n# Download JDBC connector\nRUN wget -P \/opt\/flink\/lib\/ \\\n    https:\/\/repo1.maven.org\/maven2\/org\/apache\/flink\/flink-connector-jdbc\/3.2.0-1.19\/flink-connector-jdbc-3.2.0-1.19.jar\n\n# Download PostgreSQL JDBC driver\nRUN wget -P \/opt\/flink\/lib\/ \\\n    https:\/\/repo1.maven.org\/maven2\/org\/postgresql\/postgresql\/42.7.4\/postgresql-42.7.4.jar\n\n# Download Oracle JDBC driver\nRUN wget -P \/opt\/flink\/lib\/ \\\n    https:\/\/repo1.maven.org\/maven2\/com\/oracle\/database\/jdbc\/ojdbc11\/23.5.0.24.07\/ojdbc11-23.5.0.24.07.jar\n<\/pre><\/div>\n\n\n<p><\/p>\n\n\n\n<p>Access the Flink Web UI at: http:\/\/localhost:8081<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-running-the-migration\">Running the Migration<\/h2>\n\n\n\n<p>Let&#8217;s execute the actual migration with full command outputs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-step-1-verify-source-data-before-migration\">Step 1: Verify Source Data (Before Migration)<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; title: ; notranslate\" title=\"\">\n$ docker exec oracle-demo bash -c &quot;echo &#039;SELECT COUNT(*) FROM customers;&#039; | \\\n    sqlplus -s demo\/demo@\/\/localhost:1521\/XEPDB1&quot;\n\n  COUNT(*)\n----------\n     10000\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-step-2-verify-target-is-empty-before-migration\">Step 2: Verify Target is Empty (Before Migration)<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; title: ; notranslate\" title=\"\">\n$ docker exec postgres-demo psql -U demo -d demo -c &quot;SELECT COUNT(*) FROM customers;&quot;\n\n count\n-------\n     0\n(1 row)\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-step-3-run-the-migration-pipeline\">Step 3: Run the Migration Pipeline<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n$ docker exec flink-jobmanager \/opt\/flink\/bin\/sql-client.sh \\\n    -f \/opt\/flink\/pipelines\/migrate-customers.sql\n\n&#x5B;INFO] Executing SQL from file.\nFlink SQL&gt; SET &#039;pipeline.name&#039; = &#039;Oracle-to-PostgreSQL: CUSTOMERS Migration&#039;;\n&#x5B;INFO] Execute statement succeeded.\n...\nFlink SQL&gt; INSERT INTO sink_customers SELECT ...\n&#x5B;INFO] Submitting SQL update statement to the cluster...\n&#x5B;INFO] SQL update statement has been successfully submitted to the cluster:\nJob ID: c554d99dce69b084607080502c13ffca\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-step-4-monitor-progress\">Step 4: Monitor Progress<\/h3>\n\n\n\n<p>Check the Flink Web UI at http:\/\/localhost:8081 or use the REST API:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n$ curl -s http:\/\/localhost:8081\/jobs | jq &#039;.jobs&#x5B;] | select(.status == &quot;RUNNING&quot; or .status == &quot;FINISHED&quot;)&#039;\n{\n  &quot;id&quot;: &quot;c554d99dce69b084607080502c13ffca&quot;,\n  &quot;status&quot;: &quot;FINISHED&quot;\n}\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-step-5-verify-migration-after\">Step 5: Verify Migration (After)<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; title: ; notranslate\" title=\"\">\n$ docker exec postgres-demo psql -U demo -d demo -c &quot;SELECT COUNT(*) FROM customers;&quot;\n\n count\n-------\n 10000\n(1 row)\n\n$ docker exec postgres-demo psql -U demo -d demo -c &quot;SELECT * FROM customers LIMIT 3;&quot;\n\n customer_id |  first_name   |  last_name   |          email           |    phone\n-------------+---------------+--------------+--------------------------+-------------\n        8836 | FirstName8836 | LastName8836 | customer8836@example.com | +1-555-8836\n        4740 | FirstName4740 | LastName4740 | customer4740@example.com | +1-555-4740\n        8835 | FirstName8835 | LastName8835 | customer8835@example.com | +1-555-8835\n(3 rows)\n<\/pre><\/div>\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-type-mapping-reference\">Type Mapping Reference<\/h2>\n\n\n\n<p>Before migrating data, you need to understand the type conversions. Here&#8217;s a reference:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Oracle<\/th><th>Flink SQL<\/th><th>PostgreSQL<\/th><th>Notes<\/th><\/tr><\/thead><tbody><tr><td>NUMBER(10)<\/td><td>DECIMAL(10,0)<\/td><td>BIGINT<\/td><td>Use CAST in SELECT<\/td><\/tr><tr><td>NUMBER(12,2)<\/td><td>DECIMAL(12,2)<\/td><td>NUMERIC(12,2)<\/td><td>Direct mapping<\/td><\/tr><tr><td>VARCHAR2(n)<\/td><td>STRING<\/td><td>VARCHAR(n)<\/td><td>Direct mapping<\/td><\/tr><tr><td>DATE<\/td><td>TIMESTAMP(6)<\/td><td>TIMESTAMP<\/td><td>Oracle DATE includes time<\/td><\/tr><tr><td>TIMESTAMP<\/td><td>TIMESTAMP(6)<\/td><td>TIMESTAMP<\/td><td>Direct mapping<\/td><\/tr><tr><td>CLOB<\/td><td>STRING<\/td><td>TEXT<\/td><td>Large text<\/td><\/tr><tr><td>BLOB<\/td><td>BYTES<\/td><td>BYTEA<\/td><td>Binary data<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-performance-optimization-what-we-learned\">Performance Optimization: What We Learned<\/h2>\n\n\n\n<p>Based on production experience, here are the key optimizations that improved throughput from <strong>300 rec\/sec to 19,500 rec\/sec<\/strong> (65x improvement).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-understanding-cpu-bound-vs-iops-bound-pipelines\">Understanding CPU-Bound vs. IOPS-Bound Pipelines<\/h3>\n\n\n\n<p>Before tuning, you need to understand what&#8217;s limiting your pipeline. This is critical because the solutions are different:<\/p>\n\n\n\n<p><strong>CPU-Bound Pipeline:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Symptoms: High CPU usage on Flink Task Manager, low disk I\/O on target database<\/li>\n\n\n\n<li>Cause: Complex transformations, serialization\/deserialization overhead, too few parallel workers<\/li>\n\n\n\n<li>Solution: Increase parallelism, simplify transformations, use more Task Manager slots<\/li>\n<\/ul>\n\n\n\n<p><strong>IOPS-Bound Pipeline:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Symptoms: Low CPU usage on Flink, high disk I\/O or lock contention on target database<\/li>\n\n\n\n<li>Cause: Too many small writes, target database bottleneck, excessive parallelism causing lock contention<\/li>\n\n\n\n<li>Solution: Larger batch sizes, <code>rewriteBatchedInserts=true<\/code>, reduce sink parallelism, tune target database<\/li>\n<\/ul>\n\n\n\n<p><strong>Network-Bound Pipeline:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Symptoms: High network wait times, gaps between source reads and sink writes<\/li>\n\n\n\n<li>Cause: Small fetch sizes, high latency between Flink and databases<\/li>\n\n\n\n<li>Solution: Larger fetch sizes, co-locate components when possible<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-identify-your-bottleneck\">How to Identify Your Bottleneck<\/h3>\n\n\n\n<p>In the Flink Web UI, look at:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Backpressure indicators<\/strong>: Red\/yellow backpressure on source = sink can&#8217;t keep up (IOPS-bound)<\/li>\n\n\n\n<li><strong>Records sent\/received<\/strong>: Compare source output rate vs. sink input rate<\/li>\n\n\n\n<li><strong>Checkpoint duration<\/strong>: Long checkpoints often indicate IOPS issues on state backend<\/li>\n\n\n\n<li><strong>Task Manager metrics<\/strong>: CPU%, memory usage, GC pauses<\/li>\n<\/ol>\n\n\n\n<p>On your databases:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n# Oracle: Check redo log generation rate\nSELECT * FROM V$SYSSTAT WHERE NAME LIKE &#039;%redo%&#039;;\n\n# PostgreSQL: Check write activity\nSELECT * FROM pg_stat_bgwriter;\nSELECT * FROM pg_stat_database WHERE datname = &#039;demo&#039;;\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-critical-optimizations\">Critical Optimizations<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-1-jdbc-batch-rewriting-5-10x-improvement\">1. JDBC Batch Rewriting (5-10x Improvement)<\/h4>\n\n\n\n<p>The single most impactful optimization for IOPS-bound pipelines:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n&#039;url&#039; = &#039;jdbc:postgresql:\/\/host\/db?rewriteBatchedInserts=true&#039;\n<\/pre><\/div>\n\n\n<p>This is so important I&#8217;ll repeat it: <strong>this single parameter gave us 5-10x throughput improvement<\/strong>. Without it, every row is a separate INSERT statement. With it, rows are batched into efficient multi-row INSERTs.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-2-sink-parallelism-2-4x-improvement\">2. Sink Parallelism (2-4x Improvement)<\/h4>\n\n\n\n<p>More workers can process more data\u2014but there&#8217;s a sweet spot:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n&#039;sink.parallelism&#039; = &#039;12&#039;\n<\/pre><\/div>\n\n\n<p>Our testing showed:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Parallelism<\/th><th>Throughput<\/th><th>Notes<\/th><\/tr><\/thead><tbody><tr><td>1<\/td><td>5,000 rec\/sec<\/td><td>Baseline<\/td><\/tr><tr><td>4<\/td><td>12,000 rec\/sec<\/td><td>Good improvement<\/td><\/tr><tr><td>8<\/td><td>17,000 rec\/sec<\/td><td>Still scaling<\/td><\/tr><tr><td>12<\/td><td>19,500 rec\/sec<\/td><td>Sweet spot<\/td><\/tr><tr><td>16<\/td><td>18,000 rec\/sec<\/td><td>Contention starts<\/td><\/tr><tr><td>24<\/td><td>15,000 rec\/sec<\/td><td>Too much contention<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>Why does too much parallelism hurt?<\/strong> Lock contention on the target database. Each parallel writer tries to acquire locks, and beyond a certain point, they spend more time waiting than writing.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-3-buffer-size-tuning\">3. Buffer Size Tuning<\/h4>\n\n\n\n<p>Larger buffers = fewer flushes = better throughput (at cost of memory and latency):<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n&#039;sink.buffer-flush.max-rows&#039; = &#039;50000&#039;\n&#039;sink.buffer-flush.interval&#039; = &#039;10 s&#039;\n<\/pre><\/div>\n\n\n<p>For IOPS-bound pipelines, larger buffers are critical. For CPU-bound pipelines, smaller buffers with higher parallelism may be better.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-4-source-fetch-size\">4. Source Fetch Size<\/h4>\n\n\n\n<p>Reduce round-trips to the source database:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- For JDBC connector:\n&#039;scan.fetch-size&#039; = &#039;20000&#039;\n\n-- For CDC connector:\n&#039;debezium.snapshot.fetch.size&#039; = &#039;20000&#039;\n<\/pre><\/div>\n\n\n<p>Larger fetch sizes reduce network overhead but increase memory usage. Find your balance based on available memory.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-5-checkpointing-mode\">5. Checkpointing Mode<\/h4>\n\n\n\n<p>For migrations (where exactly-once is less critical):<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSET &#039;execution.checkpointing.mode&#039; = &#039;AT_LEAST_ONCE&#039;;\n<\/pre><\/div>\n\n\n<p><code>AT_LEAST_ONCE<\/code> is faster than <code>EXACTLY_ONCE<\/code> because it doesn&#8217;t require barriers to align data across all parallel paths. Since our sink uses upserts (INSERT ON CONFLICT), duplicate processing is idempotent anyway.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-6-checkpoint-interval\">6. Checkpoint Interval<\/h4>\n\n\n\n<p>Longer intervals = less overhead, but longer recovery time on failure:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSET &#039;execution.checkpointing.interval&#039; = &#039;60 s&#039;;\n<\/pre><\/div>\n\n\n<p>For our production migrations, 45-60 seconds was optimal. Shorter intervals caused excessive state backend I\/O (another IOPS consideration).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-performance-reference\">Performance Reference<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Setting<\/th><th>Baseline<\/th><th>Optimized<\/th><th>Impact<\/th><\/tr><\/thead><tbody><tr><td>rewriteBatchedInserts<\/td><td>false<\/td><td>true<\/td><td>5-10x<\/td><\/tr><tr><td>sink.parallelism<\/td><td>1<\/td><td>12<\/td><td>2-4x<\/td><\/tr><tr><td>buffer-flush.max-rows<\/td><td>1000<\/td><td>50000<\/td><td>1.5-2x<\/td><\/tr><tr><td>fetch-size<\/td><td>1000<\/td><td>20000<\/td><td>1.3-1.5x<\/td><\/tr><tr><td>checkpoint.mode<\/td><td>EXACTLY_ONCE<\/td><td>AT_LEAST_ONCE<\/td><td>1.2-1.3x<\/td><\/tr><tr><td><strong>Combined Throughput<\/strong><\/td><td>300 rec\/sec<\/td><td>19,500 rec\/sec<\/td><td><strong>65x<\/strong><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-real-world-tuning-process\">Real-World Tuning Process<\/h3>\n\n\n\n<p>Here&#8217;s how I approach tuning a new migration:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Start with defaults<\/strong>: Run the pipeline and observe behavior in Flink UI<\/li>\n\n\n\n<li><strong>Identify the bottleneck<\/strong>: Is it CPU, IOPS, or network?<\/li>\n\n\n\n<li><strong>Apply the biggest lever first<\/strong>: Usually <code>rewriteBatchedInserts=true<\/code> for PostgreSQL<\/li>\n\n\n\n<li><strong>Increase parallelism gradually<\/strong>: Watch for the point where throughput stops improving<\/li>\n\n\n\n<li><strong>Tune batch sizes<\/strong>: Larger for IOPS-bound, smaller for CPU-bound<\/li>\n\n\n\n<li><strong>Monitor the target database<\/strong>: Watch for lock contention, checkpoint lag, WAL accumulation<\/li>\n\n\n\n<li><strong>Document your findings<\/strong>: Each environment is different; what works for one may not work for another<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-incremental-snapshot-for-large-databases\">Incremental Snapshot for Large Databases<\/h2>\n\n\n\n<p>For databases larger than ~100GB, incremental snapshot mode is essential. Instead of reading entire tables at once (which can cause locks and memory issues), incremental snapshot divides tables into chunks.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-what-is-incremental-snapshot\">What is Incremental Snapshot?<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502                   Incremental Snapshot                          \u2502\n\u2502                                                                 \u2502\n\u2502  Table (1M rows, chunked by ID):                                \u2502\n\u2502                                                                 \u2502\n\u2502  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510                      \u2502\n\u2502  \u2502 Chunk \u2502 Chunk \u2502 Chunk \u2502 Chunk \u2502 Chunk \u2502                      \u2502\n\u2502  \u2502  1    \u2502  2    \u2502  3    \u2502  4    \u2502  5    \u2502  ...                 \u2502\n\u2502  \u2502 1-200K\u2502200K-  \u2502400K-  \u2502600K-  \u2502800K-  \u2502                      \u2502\n\u2502  \u2502       \u2502 400K  \u2502 600K  \u2502 800K  \u2502 1M    \u2502                      \u2502\n\u2502  \u2514\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2518                      \u2502\n\u2502      \u2502       \u2502       \u2502       \u2502                                  \u2502\n\u2502      \u25bc       \u25bc       \u25bc       \u25bc                                  \u2502\n\u2502   Process in parallel, no table locks                           \u2502\n\u2502                                                                 \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-oracle-requirements\">Oracle Requirements<\/h3>\n\n\n\n<p>Incremental snapshot with CDC requires additional Oracle configuration:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Archive Log Mode<\/strong>: Must be enabled <code>-- Check current mode SELECT LOG_MODE FROM V$DATABASE; -- Enable (requires DB restart) SHUTDOWN IMMEDIATE; STARTUP MOUNT; ALTER DATABASE ARCHIVELOG; ALTER DATABASE OPEN;<\/code><\/li>\n\n\n\n<li><strong>Supplemental Logging<\/strong>: <code>ALTER DATABASE ADD SUPPLEMENTAL LOG DATA; ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;<\/code><\/li>\n\n\n\n<li><strong>LogMiner Privileges<\/strong> for the CDC user: <code>GRANT SELECT ON V_$DATABASE TO cdc_user; GRANT SELECT ON V_$LOG TO cdc_user; GRANT SELECT ON V_$LOGFILE TO cdc_user; GRANT SELECT ON V_$ARCHIVED_LOG TO cdc_user; GRANT EXECUTE ON DBMS_LOGMNR TO cdc_user; GRANT EXECUTE ON DBMS_LOGMNR_D TO cdc_user; GRANT SELECT ON V_$LOGMNR_LOGS TO cdc_user; GRANT SELECT ON V_$LOGMNR_CONTENTS TO cdc_user; GRANT FLASHBACK ANY TABLE TO cdc_user;<\/code><\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-pipeline-configuration\">Pipeline Configuration<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nCREATE TABLE src_large_table (...) WITH (\n    &#039;connector&#039; = &#039;oracle-cdc&#039;,\n    &#039;url&#039; = &#039;jdbc:oracle:thin:@\/\/oracle:1521\/XEPDB1&#039;,\n    &#039;database-name&#039; = &#039;XEPDB1&#039;,\n    &#039;schema-name&#039; = &#039;DEMO&#039;,\n    &#039;table-name&#039; = &#039;LARGE_TABLE&#039;,\n\n    -- Enable incremental snapshot\n    &#039;scan.incremental.snapshot.enabled&#039; = &#039;true&#039;,\n    &#039;scan.incremental.snapshot.chunk.size&#039; = &#039;100000&#039;,\n    &#039;scan.incremental.snapshot.chunk.key-column&#039; = &#039;ID&#039;,\n\n    -- Debezium settings\n    &#039;debezium.snapshot.fetch.size&#039; = &#039;20000&#039;\n);\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-when-to-use-incremental-snapshot\">When to Use Incremental Snapshot<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Database Size<\/th><th>Recommendation<\/th><\/tr><\/thead><tbody><tr><td>&lt; 10 GB<\/td><td>Standard snapshot (JDBC)<\/td><\/tr><tr><td>10-100 GB<\/td><td>Either approach works<\/td><\/tr><tr><td>&gt; 100 GB<\/td><td>Incremental snapshot required<\/td><\/tr><tr><td>Active production DB<\/td><td>Incremental snapshot recommended<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-production-implementation-advice\">Production Implementation Advice<\/h2>\n\n\n\n<p>Before taking this approach to production, there are several considerations to keep in mind. First, this lab setup runs Flink in standalone mode which is fine for testing but lacks persistence\u2014if you restart the Flink processes, your pipelines are lost. For production, you&#8217;ll want to deploy on Kubernetes using the official Flink Kubernetes Operator, which provides proper state management, automatic recovery, and horizontal scaling. Second, pay close attention to version compatibility because not all latest versions of Flink, CDC connectors, and JDBC drivers work together\u2014I learned this the hard way, so check the compatibility matrix before building your stack and stick with LTS versions like Flink 1.20 when possible. Third, externalize your checkpoints to durable storage like S3, MinIO, or HDFS rather than local filesystem, as this enables true fault tolerance and job recovery across restarts. Fourth, implement proper monitoring by connecting Flink&#8217;s metrics to Prometheus and Grafana, setting up alerts for checkpoint failures, backpressure, and throughput drops\u2014the Web UI is great for debugging but not for 24\/7 operations. Fifth, secure your connections by using SSL\/TLS for database connections, storing credentials in a secrets manager rather than plain text in SQL files, and implementing network segmentation between Flink and your databases. Finally, if your organization allows it, seriously consider managed services like AWS Managed Flink, Confluent Cloud, or Azure Stream Analytics, which eliminate most of the operational burden of running Flink clusters yourself. The official documentation provides comprehensive guidance for production deployments: <a href=\"https:\/\/nightlies.apache.org\/flink\/flink-cdc-docs-stable\/docs\/get-started\/introduction\/\">Apache Flink CDC Introduction<\/a>.<br><br>As per example, in a migration project for an Oracle database of 800GB, around 1500 tables and 4.8 Billions rows the VM that hosted the Flink services was 16 cores and 48GB of RAM. The initial incremental snapshot lasted for 3.5 days with a throughput of 18 000 records\/sec and 15k IOPS. Several automation had to be created like how to generate the pipelines for all tables and how to sequentially go from the initial load to the streaming part while maintaining the CPU cores busy. <\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-we-ve-learned\">What We&#8217;ve Learned<\/h2>\n\n\n\n<p>Through this guide, we&#8217;ve explored database migration with Flink CDC and learned several important lessons. On the technical side, start simple with snapshot mode first and add complexity like incremental or streaming CDC only when needed\u2014don&#8217;t overengineer for a one-time migration. Understanding your bottleneck is critical because the tuning strategy differs completely depending on whether your pipeline is CPU-bound, IOPS-bound, or network-bound. The <code>rewriteBatchedInserts=true<\/code> parameter is magic for PostgreSQL, giving us a 5-10x improvement with a single setting, and parallelism has a sweet spot where more isn&#8217;t always better\u2014we found 12 workers optimal before lock contention started hurting performance. Checkpointing is a trade-off between throughput and recovery time, with 45-60 seconds being optimal for migrations, and type mapping matters because incorrect Oracle \u2192 Flink \u2192 PostgreSQL conversions cause silent data corruption. Operationally, monitor everything using the Flink Web UI alongside source and target database metrics, test thoroughly on a test environment first because production surprises are expensive, have a rollback plan by keeping the source database running until cutover is verified, and document your tuning because each environment is different. Strategically, know when NOT to use Flink since simpler tools are better for small databases or same-technology migrations, factor in the operational complexity of maintaining another system, and consider cloud-managed Flink\/CDC solutions if your organization allows it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-conclusion\">Conclusion<\/h2>\n\n\n\n<p>Flink CDC transforms database migrations from anxious &#8220;big bang&#8221; events into controlled, observable, and recoverable processes by combining real-time monitoring in the Flink Web UI, fault tolerance through checkpointing, configurable parallelism for performance, and transform capabilities in Flink SQL\u2014making it a powerful tool for cross-technology migrations. We achieved a 65x throughput improvement (300 \u2192 19,500 rec\/sec) by understanding our bottlenecks and applying targeted optimizations, with the key insight being to identify whether you&#8217;re CPU-bound or IOPS-bound and tune accordingly. As with any tool, use it where it fits: for large, cross-technology migrations with near zero-downtime requirements, Flink CDC is excellent, but for small databases or simple same-technology copies, stick with native tools.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-resources\">Resources<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/flink.apache.org\/docs\/\">Apache Flink Documentation<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/apache\/flink-cdc\">Flink CDC Connectors<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/debezium.io\/documentation\/reference\/connectors\/oracle.html\">Debezium Oracle Connector<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/ora2pg.darold.net\/\">ora2pg Tool<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/jdbc.postgresql.org\/\">PostgreSQL JDBC Driver<\/a><\/li>\n<\/ul>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction When wanting to migrate from the big red to PostgreSQL, most of the time you can afford the downtime of the export\/import process and starting from something fresh. It is simple and reliable. Ora2pg being one of the go-to tools for that. But sometimes, you can afford the downtime, either because the database is [&hellip;]<\/p>\n","protected":false},"author":153,"featured_media":41707,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[59,83],"tags":[614,3667,2562,96,2602],"type_dbi":[2749],"class_list":["post-41688","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oracle","category-postgresql","tag-cdc","tag-flink","tag-migration-2","tag-oracle","tag-postgresql-2","type-postgresql"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.2) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Oracle to PostgreSQL Migration with Flink CDC - dbi Blog<\/title>\n<meta name=\"description\" content=\"Migrate Oracle to PostgreSQL with Apache Flink CDC. Complete guide with SQL examples, Docker setup, and performance tuning tips that achieved 65x throughput gains.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Oracle to PostgreSQL Migration with Flink CDC\" \/>\n<meta property=\"og:description\" content=\"Learn how to migrate an Oracle database to PostgreSQL using Apache Flink CDC with this complete step-by-step guide. We cover the full pipeline setup from source to sink, explain how Flink CDC enables cross-database logical replication through transaction log parsing, and share production-tested performance optimizations that achieved 65x throughput improvement (300 to 19,500 records per second). Whether you need a one-time snapshot migration or continuous zero-downtime replication, this tutorial provides the SQL examples, Docker environment, and tuning strategies to get you started.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/\" \/>\n<meta property=\"og:site_name\" content=\"dbi Blog\" \/>\n<meta property=\"article:published_time\" content=\"2025-11-30T22:04:54+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-30T22:04:56+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/11\/Screenshot-2023-09-10-125939.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"1494\" \/>\n\t<meta property=\"og:image:height\" content=\"727\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"Adrien Obernesser\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Adrien Obernesser\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/\"},\"author\":{\"name\":\"Adrien Obernesser\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd\"},\"headline\":\"Oracle to PostgreSQL Migration with Flink CDC\",\"datePublished\":\"2025-11-30T22:04:54+00:00\",\"dateModified\":\"2025-11-30T22:04:56+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/\"},\"wordCount\":2351,\"commentCount\":0,\"image\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/11\/Screenshot-2023-09-10-125939.webp\",\"keywords\":[\"cdc\",\"Flink\",\"migration\",\"Oracle\",\"postgresql\"],\"articleSection\":[\"Oracle\",\"PostgreSQL\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/\",\"name\":\"Oracle to PostgreSQL Migration with Flink CDC - dbi Blog\",\"isPartOf\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/11\/Screenshot-2023-09-10-125939.webp\",\"datePublished\":\"2025-11-30T22:04:54+00:00\",\"dateModified\":\"2025-11-30T22:04:56+00:00\",\"author\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd\"},\"description\":\"Migrate Oracle to PostgreSQL with Apache Flink CDC. Complete guide with SQL examples, Docker setup, and performance tuning tips that achieved 65x throughput gains.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#primaryimage\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/11\/Screenshot-2023-09-10-125939.webp\",\"contentUrl\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/11\/Screenshot-2023-09-10-125939.webp\",\"width\":1494,\"height\":727},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\/\/www.dbi-services.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Oracle to PostgreSQL Migration with Flink CDC\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#website\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/\",\"name\":\"dbi Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd\",\"name\":\"Adrien Obernesser\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"caption\":\"Adrien Obernesser\"},\"url\":\"https:\/\/www.dbi-services.com\/blog\/author\/adrienobernesser\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Oracle to PostgreSQL Migration with Flink CDC - dbi Blog","description":"Migrate Oracle to PostgreSQL with Apache Flink CDC. Complete guide with SQL examples, Docker setup, and performance tuning tips that achieved 65x throughput gains.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/","og_locale":"en_US","og_type":"article","og_title":"Oracle to PostgreSQL Migration with Flink CDC","og_description":"Learn how to migrate an Oracle database to PostgreSQL using Apache Flink CDC with this complete step-by-step guide. We cover the full pipeline setup from source to sink, explain how Flink CDC enables cross-database logical replication through transaction log parsing, and share production-tested performance optimizations that achieved 65x throughput improvement (300 to 19,500 records per second). Whether you need a one-time snapshot migration or continuous zero-downtime replication, this tutorial provides the SQL examples, Docker environment, and tuning strategies to get you started.","og_url":"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/","og_site_name":"dbi Blog","article_published_time":"2025-11-30T22:04:54+00:00","article_modified_time":"2025-11-30T22:04:56+00:00","og_image":[{"width":1494,"height":727,"url":"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/11\/Screenshot-2023-09-10-125939.webp","type":"image\/webp"}],"author":"Adrien Obernesser","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Adrien Obernesser","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#article","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/"},"author":{"name":"Adrien Obernesser","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd"},"headline":"Oracle to PostgreSQL Migration with Flink CDC","datePublished":"2025-11-30T22:04:54+00:00","dateModified":"2025-11-30T22:04:56+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/"},"wordCount":2351,"commentCount":0,"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/11\/Screenshot-2023-09-10-125939.webp","keywords":["cdc","Flink","migration","Oracle","postgresql"],"articleSection":["Oracle","PostgreSQL"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/","url":"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/","name":"Oracle to PostgreSQL Migration with Flink CDC - dbi Blog","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#primaryimage"},"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/11\/Screenshot-2023-09-10-125939.webp","datePublished":"2025-11-30T22:04:54+00:00","dateModified":"2025-11-30T22:04:56+00:00","author":{"@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd"},"description":"Migrate Oracle to PostgreSQL with Apache Flink CDC. Complete guide with SQL examples, Docker setup, and performance tuning tips that achieved 65x throughput gains.","breadcrumb":{"@id":"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#primaryimage","url":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/11\/Screenshot-2023-09-10-125939.webp","contentUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/11\/Screenshot-2023-09-10-125939.webp","width":1494,"height":727},{"@type":"BreadcrumbList","@id":"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.dbi-services.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Oracle to PostgreSQL Migration with Flink CDC"}]},{"@type":"WebSite","@id":"https:\/\/www.dbi-services.com\/blog\/#website","url":"https:\/\/www.dbi-services.com\/blog\/","name":"dbi Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd","name":"Adrien Obernesser","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","caption":"Adrien Obernesser"},"url":"https:\/\/www.dbi-services.com\/blog\/author\/adrienobernesser\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/41688","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/users\/153"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/comments?post=41688"}],"version-history":[{"count":19,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/41688\/revisions"}],"predecessor-version":[{"id":41708,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/41688\/revisions\/41708"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media\/41707"}],"wp:attachment":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media?parent=41688"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/categories?post=41688"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/tags?post=41688"},{"taxonomy":"type","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/type_dbi?post=41688"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}