By Franck Pachot

.
Comparison of NULL can be misleading and it’s even worse for unique constraint validation. Having partial nulls in a composite key can be tricky because the SQL ANSI specification is not very easy to understand, and implementation can depend on the RDBMS. Here is an example with composite unique key and foreign key on Oracle.

Unique constraint

I create a table with a composite unique constraint:

SQL> create table TABLE1 (a char, b char, unique(a,b));
Table TABLE1 created.

I can insert a row with a=’X’ and b=’X’:

SQL> insert into TABLE1 values ('X','X');
1 row inserted.

I cannot insert the same row:

SQL> insert into TABLE1 values ('X','X');
*
ERROR at line 1:
ORA-00001: unique constraint (SYS.SYS_C0015464) violated

I insert another row with same value for column a but different value for column b:

SQL> insert into TABLE1 values ('X','Y');
1 row inserted.

And another row with same value for column a but a null for column b:

SQL> insert into TABLE1 values ('X',null);
1 row inserted.

However, I cannot insert the same a second time:

SQL> insert into TABLE1 values ('X',null);
*
ERROR at line 1:
ORA-00001: unique constraint (SYS.SYS_C0015464) violated

If you look at documentation, this is documented as:
Because of the search mechanism for unique key constraints on multiple columns, you cannot have identical values in the non-null columns of a partially null composite unique key constraint.

It looks like an implementation reason (the search mechanism is the index that enforces the unique constraint). What is documented in SQL-92?
A unique constraint is satisfied if and only if no two rows in a table have the same non-null values in the unique columns.

How to interpret this? We cannot insert two (‘X’,null) because that would be two rows with same non-null value (a=’X’) and the Oracle implementation is compilent.

Or is it? We can also read the definition as the unique constraint being violated only when we find rows that have non-null values and they are the same. This is what MySQL and PostgresSQL do: accept duplicates when there is at least one null.
This is also what I found more intuitive: I usually consider NULL as a value that is not known at insert time but that will be assigned a value later during the lifecycle of the row. Thus, I expect to be able to insert rows where there is a null and check the constraint only when all columns have a value.

It is probably an implementation choice from Oracle which stores nulls as a zero-length string and then cannot have two identical entries in a unique index.

Now inserting a row where a is null and b is null:

SQL> insert into TABLE1 values (null,null);
1 row inserted.

And because that do not violate the rule whatever the way we read it (non-null values are not the same as there are no non-null values at all here) I can insert a second one:

SQL> insert into TABLE1 values (null,null);
1 row inserted.

This is documented as
Unless a NOT NULL constraint is also defined, a null always satisfies a unique key constraint

About implementation, there is no problem because full null entries are not stored in the index. They are stored in bitmap indexes, but bitmap indexes cannot be used to enforce a unique constraint.

In summary, here is what can be stored on a table where (a,b) is unique but nullable:

SQL> select rownum,TABLE1.* from TABLE1;
 
    ROWNUM A B
---------- - -
         1 X X
         2 X Y
         3 X  
         4    
         5   

Foreign key

Now that I have a unique key, I can reference it:

SQL> create table TABLE2 (a char, b char, foreign key(a,b) references TABLE1(a,b));
Table TABLE2 created.

Yes. You don’t need to reference the primary key. Any unique key, even with nullable columns, can be referenced.

I can insert a row where parent exists:

SQL> insert into TABLE2 values('X','X');
1 row inserted.

As I’ve no unique key on the child, it’s many to one relationship:

SQL> insert into TABLE2 values('X','X');
1 row inserted.

I also have a parent with a=’X’ and b=’Y’:

SQL> insert into TABLE2 values('X','Y');
1 row inserted.

But I’ve no parent with a=’Y’:

SQL> insert into TABLE2 values('Y','Y');
*
ERROR at line 1:
ORA-02291: integrity constraint (SYS.SYS_C0015465) violated - parent key not found

So far so good. I said that I have a many to one relationship, but it’s a many to one or zero because my columns are nullable:

SQL> insert into TABLE2 values(null,null);
1 row inserted.

So far so good. But I have a composite key with nullable columns here, and I can insert a row where a=’X’ and b is null:

SQL> insert into TABLE2 values('X',null);
1 row inserted.

But do you think that all non null parent values must exist?

SQL> insert into TABLE2 values('Y',null);
1 row inserted.

Once again, this is documented as:
If any column of a composite foreign key is null, then the non-null portions of the key do not have to match any corresponding portion of a parent key.

And this is what is specified in SQL-92:
If no <match type> was specified then, for each row R1 of the referencing table, either at least one of the values of the referencing columns in R1 shall be a null value, or the value of each referencing column in R1 shall be equal to the value of the corresponding referenced column in some row of the referenced table. More detail about the other match types in Oracle Development Guide.

That may look strange, but, still thinking about NULLS as unknown values, you can consider that constraints cannot be validated until we know all values.

Here is what I was able to insert into my table even with no a=’Y’ in the parent:

SQL> select rownum,TABLE2.* from TABLE2;
 
    ROWNUM A B
---------- - -
         1 X X
         2 X X
         3 X Y
         4 X  
         5    
         6 Y  

So what?

Having nulls in composite unique key or foreign key can be misleading, then it’s better to ensure that what you define fits what you expect. It’s probably better to prevent partial nulls in foreign key (a check constraint can ensure that if one column is null then all columns must be null) or to have and additional referential integrity constraint which ensures that you can set only the allowed values for a subset of columns (in our case, a table with column a as primary key that we can reference).