A customer, who is on 19c with his Oracle databases asked me recently why he still has Height Balanced Histograms in his database? E.g.

SQL> select histogram, count(*) from dba_tab_columns where histogram <> 'NONE' group by histogram order by 2;

HISTOGRAM         COUNT(*)
--------------- ----------
TOP-FREQUENCY            4
HEIGHT BALANCED          5
HYBRID                  39
FREQUENCY              492

SQL> 

In 12.1. Oracle introduced Top Frequency and Hybrid Histograms, which should replace Height Balanced histograms. There are 2 main reasons why Height Balanced histograms may still be there:

1. Top Frequency and/or Hybrid histograms are disabled

By setting the preference ENABLE_HYBRID_HISTOGRAMS and ENABLE_TOP_FREQ_HISTOGRAMS to 0 (globally or on table level) you can disable the new histogram types. The default value is 3 and enables the 2 new histogram types:

SQL> select dbms_stats.get_prefs('ENABLE_HYBRID_HISTOGRAMS') hist_enabled from dual;

HIST_ENABLED
------------
3

SQL> select dbms_stats.get_prefs('ENABLE_TOP_FREQ_HISTOGRAMS') hist_enabled from dual;

HIST_ENABLED
------------
3

SQL> 

2. When gathering statistics and using a non-default value for ESTIMATE_PERCENT (default is DBMS_STATS.AUTO_SAMPLE_SIZE) then Height Balanced histograms will be used instead of the new histogram types.

The question is on how to find out what caused HEIGHT BALANCED histograms to be created?

First let’s check what table-columns have Height Balanced histograms and when they’ve been created:

SQL> select table_name, column_name, last_analyzed from dba_tab_columns where histogram='HEIGHT BALANCED';

TABLE_NAME                       COLUMN_NAME                      LAST_ANALYZED
-------------------------------- -------------------------------- -------------------
T1                               TIMESTAMP                        27.07.2022 13:11:31
T1                               LAST_DDL_TIME                    27.07.2022 13:11:31
T1                               CREATED                          27.07.2022 13:11:31
T1                               OBJECT_ID                        27.07.2022 13:11:31
T1                               OBJECT_NAME                      27.07.2022 13:11:31

SQL> 

Ok, I do only have a test table T1 (copy of ALL_OBJECTS) where 5 columns have a HEIGHT BALANCED histogram. What caused them to be created?

Are the new histogram types disabled?

SQL> select dbms_stats.get_prefs('ENABLE_HYBRID_HISTOGRAMS') hist_enabled from dual;

HIST_ENABLED
------------
3

SQL> select dbms_stats.get_prefs('ENABLE_TOP_FREQ_HISTOGRAMS') hist_enabled from dual;

HIST_ENABLED
------------
3

SQL> 

REMARK: Alternatively you may also run this query to get the global preferences:

SQL> select sname, nvl(to_char(sval1),spare4) value
  2  from sys.optstat_hist_control$
  3  where sname like '%HISTOGRAMS';

SNAME                          VALUE
------------------------------ --------------------------------
ENABLE_TOP_FREQ_HISTOGRAMS     3
ENABLE_HYBRID_HISTOGRAMS       3

SQL> 

So globally the new histogram types are enabled and I could also check if something specific has been set on table-level:

SQL> select dbms_stats.get_prefs('ENABLE_HYBRID_HISTOGRAMS',ownname=>'CBLEILE',tabname=>'T1') hist_enabled from dual;

HIST_ENABLED
------------
3

SQL> select dbms_stats.get_prefs('ENABLE_TOP_FREQ_HISTOGRAMS',ownname=>'CBLEILE',tabname=>'T1') hist_enabled from dual;

HIST_ENABLED
------------
3

SQL> 

I.e. the new histogram types are enabled. So that’s not the reason I still have height balanced histograms. To find out how statistics were gathered on table T1 the NOTES column on DBA_OPTSTAT_OPERATIONS is very useful. I split this into 2 SQL-statements to improve the readability:

SQL> select target, end_time, operation
  2  from dba_optstat_operations
  3  where end_time between to_date('27-JUL-2022 13:11:00','dd-mon-yyyy hh24:mi:ss')
  4  and to_date('27-JUL-2022 13:12:00','dd-mon-yyyy hh24:mi:ss');

TARGET          END_TIME                            OPERATION
--------------- ----------------------------------- ------------------------
"CBLEILE"."T1"  27-JUL-22 01.11.31.849682 PM +01:00 gather_table_stats

SQL> select notes
  2  from dba_optstat_operations
  3  where end_time between to_date('27-JUL-2022 13:11:00','dd-mon-yyyy hh24:mi:ss')
  4  and to_date('27-JUL-2022 13:12:00','dd-mon-yyyy hh24:mi:ss');

NOTES
--------------------------------------------------
<params><param name="block_sample" val="FALSE"/><p
aram name="cascade" val="NULL"/><param name="concu
rrent" val="FALSE"/><param name="degree" val="NULL
"/><param name="estimate_percent" val="50"/><param
 name="force" val="FALSE"/><param name="granularit
y" val="AUTO"/><param name="method_opt" val="FOR A
LL COLUMNS SIZE 254"/><param name="no_invalidate"
val="NULL"/><param name="ownname" val="CBLEILE"/><
param name="partname" val=""/><param name="reporti
ng_mode" val="FALSE"/><param name="statid" val=""/
><param name="statown" val=""/><param name="statta
b" val=""/><param name="stattype" val="DATA"/><par
am name="tabname" val="T1"/></params>

So statistics were gathered with dbms_stats.gather_table_stats and ESTIMATE_PERCENT => 50 was used (see the NOTES column). Hence HEIGHT BALANCED histograms were created.

To fix this there are 2 possibilities:

  1. Tell the developer or DBA who gathers statistics to use the default for ESTIMATE_PERCENT so that he can fix the code accordingly.
  2. Ignore non-default settings when gathering statistics. I.e. there is a preference PREFERENCE_OVERRIDES_PARAMETER which actually ignores parameters provided in dbms_stats and uses the preference on the table instead. E.g.
SQL> select dbms_stats.get_prefs('ESTIMATE_PERCENT','CBLEILE','T1') from dual;

DBMS_STATS.GET_PREFS('ESTIMATE_PERCENT','CBLEILE','T1')
------------------------------------------------------------------------------
DBMS_STATS.AUTO_SAMPLE_SIZE

SQL> exec dbms_stats.set_table_prefs('CBLEILE','T1','PREFERENCE_OVERRIDES_PARAMETER','TRUE');
SQL> exec dbms_stats.gather_table_stats('CBLEILE','T1',ESTIMATE_PERCENT=>50,options=>'GATHER',method_opt=>'FOR ALL COLUMNS SIZE 254');
SQL> select column_name, histogram from user_tab_columns
  2  where table_name='T1' and histogram <> 'NONE';

COLUMN_NAME                      HISTOGRAM
-------------------------------- ---------------
OBJECT_TYPE                      FREQUENCY

SQL>

Why do I only have a frequency based histogram and no other histograms anymore?

The reason is that the setting
method_opt=>’FOR ALL COLUMNS SIZE 254′
has also been overwritten by the preference on the table:

SQL> select dbms_stats.get_prefs('METHOD_OPT','CBLEILE','T1') t1_prefs from dual;

T1_PREFS
----------------------------
FOR ALL COLUMNS SIZE AUTO

I.e. to use ‘FOR ALL COLUMNS SIZE 254’ I have to set the preference as well (here for test purposes):

SQL> exec dbms_stats.set_table_prefs('CBLEILE','T1','METHOD_OPT','FOR ALL COLUMNS SIZE 254');

SQL> exec dbms_stats.gather_table_stats('CBLEILE','T1',ESTIMATE_PERCENT=>50,options=>'GATHER',method_opt=>'FOR ALL COLUMNS SIZE 1');

REMARK: I used method_opt=>’FOR ALL COLUMNS SIZE 1′ on purpose which means “disable histograms”.

SQL> select column_name, histogram from user_tab_columns
  2  where table_name='T1' and histogram <> 'NONE';

COLUMN_NAME                      HISTOGRAM
-------------------------------- ---------------
OWNER                            FREQUENCY
OBJECT_NAME                      HYBRID
SUBOBJECT_NAME                   HYBRID
OBJECT_ID                        HYBRID
DATA_OBJECT_ID                   HYBRID
OBJECT_TYPE                      FREQUENCY
CREATED                          HYBRID
LAST_DDL_TIME                    HYBRID
TIMESTAMP                        HYBRID
STATUS                           FREQUENCY
TEMPORARY                        FREQUENCY
GENERATED                        FREQUENCY
SECONDARY                        FREQUENCY
NAMESPACE                        FREQUENCY
SHARING                          FREQUENCY
EDITIONABLE                      FREQUENCY
ORACLE_MAINTAINED                FREQUENCY
APPLICATION                      FREQUENCY
DEFAULT_COLLATION                FREQUENCY
DUPLICATED                       FREQUENCY
SHARDED                          FREQUENCY

21 rows selected.

SQL> 

I.e. with the preference PREFERENCE_OVERRIDES_PARAMETER = TRUE I can “force” to use all preferences set on the table (or the global preferences if the table preferences have not been set). So I overwrote my manual settings ESTIMATE_PERCENT=>50 and method_opt=>’FOR ALL COLUMNS SIZE 1′ here.

REMARK: Check if statistics have been gathered, because options=>’GATHER’ may also have been overwritten with the preference on the table.

Be careful with PREFERENCE_OVERRIDES_PARAMETER = TRUE because it may have unwanted side-effects (as seen above) or statistics gathering in the application may take longer than before.

Summary: If you still see HEIGHT BALANCED histograms in your database then you probably use ESTIMATE_PERCENT <> Default when gathering statistics. It’s recommended to fix that and gather statistics with ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE. There have been lots of improvements since 11g to speed up stats gathering (like e.g. the use of approx_for_count_distinct = TRUE). Hence settings like ESTIMATE_PERCENT => 1 to speed up stats gathering on huge tables are usually not necessary anymore today.