GoldenGate is a powerful replication tool, but it is so versatile that you should always check whether what you intend to do is supported or not. Oracle provides a few interesting scripts and views to check your environments before attempting a replication with GoldenGate, and I wanted to write about the DBA_GOLDENGATE_NOT_UNIQUE view.

Before explaining what this view contains and how to interpret it, you should remember that GoldenGate replicates data by replaying changes that occurred on the source database. One of the most important problems GoldenGate must solve is to identify the exact row that must be updated or deleted on the target database. This means that GoldenGate needs a way to uniquely identify every row in a table.

The simplest solution is a primary key, but GoldenGate row identification can also be done through a non-null unique index. When neither exists, GoldenGate must fall back to less efficient mechanisms. This means using all available columns to identify a row.

The DBA_GOLDENGATE_NOT_UNIQUE view contains tables that have no primary key and no non-null unique index. It is there to help you identify potential issues in your database schema and prepare in the best possible way a migration with GoldenGate or any replication work.

Here is an example of the content of the view:

OWNER			       TABLE_NAME		      BAD_COLUMN
------------------------------ ------------------------------ ----------
APP_PDB1		       TEST_NOPK		      N
APP_PDB1		       TEST_CLOB		      Y

The view is very simple, yet very often misunderstood. The owner and table_name column names are self-explanatory, but let’s talk about bad_column. This column has two possible values: Y and N.

First case – bad_column = 'N'

N is what you will most often see in this view. You should treat these as warnings. If you intend to replicate a table from this list, you can choose to ignore the warning if you don’t care about replication performance.

Tables in this list will be replicated using all columns as a substitute key. GoldenGate can usually handle these tables without any functional issue. However, UPDATE and DELETE operations become more expensive because every column must be used to identify the target row.

From my experience, you should still always discuss with the application owner or someone who knows the underlying database schema to try to avoid performance issues. A list of solutions is given at the end of the blog.

Second case – bad_column = 'Y'

If there is one point to remember from this blog, it’s this: you should never replicate data from tables with bad_column='Y' without thorough investigation.

Tables listed here contain one or more columns that GoldenGate cannot safely use as part of a substitute key, such as LONG or CLOB.

Since GoldenGate cannot build a complete substitute key from all columns, row identification becomes problematic. Depending on the table structure and replication configuration, GoldenGate may be unable to correctly locate rows for UPDATE or DELETE operations.

In practice, a table with bad_column='Y' should be considered a candidate for schema remediation before replication (see the list of options below).

Will there be an error if GoldenGate cannot identify uniqueness ?

This is a very common misconception when talking about these primary key issues. GoldenGate does not verify that the columns used for row identification are truly unique. If duplicate rows exist, a replicat may update or delete an unintended row. In many cases no explicit GoldenGate error will be generated because, from GoldenGate’s perspective, a matching row was found. It could silently delete the wrong row, and you will never hear a word about it. In the logs, you won’t see the difference.

What can be done to solve these issues ?

Essentially, there are four possible approaches:

  • If you have the ability to create a primary key or a non-null unique index on the source database, go for it before the initial load of your target database. This is the safest approach because uniqueness is enforced by the database itself.
  • If you cannot alter the source schema but are 100% sure that a subset of columns that already exists in the table can uniquely identify every row, you can use the KEYCOLS keyword in your parameter files.
  • Sometimes, the application design prevents duplicates from being generated. Again, you need to be sure about this. But sometimes, it means that you could rely on the application design.
  • When bad_column is N, you could decide to not do anything. For tables with a lot of activity, it could slow down your replication. During a migration, it could even completely block it in production. This should only be done in last resort.

A common misconception is that every table listed in DBA_GOLDENGATE_NOT_UNIQUE is unsupported by GoldenGate. This is not the case. At the other end of the spectrum, I’ve heard countless of times that these warnings can be ignored. Do that, and you will have nightmares solving issues once you hit production. The view merely highlights tables for which GoldenGate cannot rely on a primary key or a non-null unique index for row identification. The real concern is the bad_column flag: N is a performance warning, while Y deserves immediate investigation before any replication project.