Docker becomes more and more popular these days and a lot of companies start to really use it. At one project we decided to build our own customized Docker image instead of using the official PostgreSQL one. The main reason for that is that we wanted to compile from source so that we only get want is really required. Why having PostgreSQL compiled with tcl support when nobody will ever use that? Here is how we did it …
To dig in right away, this is the simplified Dockerfile:
FROM debian
# make the "en_US.UTF-8" locale so postgres will be utf-8 enabled by default
ENV LANG en_US.utf8
ENV PG_MAJOR 10
ENV PG_VERSION 10.1
ENV PG_SHA256 3ccb4e25fe7a7ea6308dea103cac202963e6b746697366d72ec2900449a5e713
ENV PGDATA /u02/pgdata
ENV PGDATABASE ""
PGUSERNAME ""
PGPASSWORD ""
COPY docker-entrypoint.sh /
RUN set -ex
&& apt-get update && apt-get install -y
ca-certificates
curl
procps
sysstat
libldap2-dev
libpython-dev
libreadline-dev
libssl-dev
bison
flex
libghc-zlib-dev
libcrypto++-dev
libxml2-dev
libxslt1-dev
bzip2
make
gcc
unzip
python
locales
&& rm -rf /var/lib/apt/lists/*
&& localedef -i en_US -c -f UTF-8 en_US.UTF-8
&& mkdir /u01/
&& groupadd -r postgres --gid=999
&& useradd -m -r -g postgres --uid=999 postgres
&& chown postgres:postgres /u01/
&& mkdir -p "$PGDATA"
&& chown -R postgres:postgres "$PGDATA"
&& chmod 700 "$PGDATA"
&& curl -o /home/postgres/postgresql.tar.bz2 "https://ftp.postgresql.org/pub/source/v$PG_VERSION/postgresql-$PG_VERSION.tar.bz2"
&& echo "$PG_SHA256 /home/postgres/postgresql.tar.bz2" | sha256sum -c -
&& mkdir -p /home/postgres/src
&& chown -R postgres:postgres /home/postgres
&& su postgres -c "tar
--extract
--file /home/postgres/postgresql.tar.bz2
--directory /home/postgres/src
--strip-components 1"
&& rm /home/postgres/postgresql.tar.bz2
&& cd /home/postgres/src
&& su postgres -c "./configure
--enable-integer-datetimes
--enable-thread-safety
--with-pgport=5432
--prefix=/u01/app/postgres/product/$PG_VERSION \
--with-ldap
--with-python
--with-openssl
--with-libxml
--with-libxslt"
&& su postgres -c "make -j 4 all"
&& su postgres -c "make install"
&& su postgres -c "make -C contrib install"
&& rm -rf /home/postgres/src
&& apt-get update && apt-get purge --auto-remove -y
libldap2-dev
libpython-dev
libreadline-dev
libssl-dev
libghc-zlib-dev
libcrypto++-dev
libxml2-dev
libxslt1-dev
bzip2
gcc
make
unzip
&& apt-get install -y libxml2
&& rm -rf /var/lib/apt/lists/*
ENV LANG en_US.utf8
USER postgres
EXPOSE 5432
ENTRYPOINT ["/docker-entrypoint.sh"]
We based the image on the latest Debian image, that is line 1. The following lines define the PostgreSQL version we will use and define some environment variables we will user later. What follows is basically installing all the packages required for building PostgreSQL from source, adding the operating system user and group, preparing the directories, fetching the PostgreSQL source code, configure, make and make install. Pretty much straight forward. Finally, to shrink the image, we remove all the packages that are not any more required after PostgreSQL was compiled and installed.
The final setup of the PostgreSQL instance happens in the docker-entrypoint.sh script which is referenced at the very end of the Dockerfile:
#!/bin/bash
# this are the environment variables which need to be set
PGDATA=${PGDATA}/${PG_MAJOR}
PGHOME="/u01/app/postgres/product/${PG_VERSION}"
PGAUTOCONF=${PGDATA}/postgresql.auto.conf
PGHBACONF=${PGDATA}/pg_hba.conf
PGDATABASENAME=${PGDATABASE}
PGUSERNAME=${PGUSERNAME}
PGPASSWD=${PGPASSWORD}
# create the database and the user
_pg_create_database_and_user()
{
${PGHOME}/bin/psql -c "create user ${PGUSERNAME} with login password '${PGPASSWD}'" postgres
${PGHOME}/bin/psql -c "create database ${PGDATABASENAME} with owner = ${PGUSERNAME}" postgres
}
# start the PostgreSQL instance
_pg_prestart()
{
${PGHOME}/bin/pg_ctl -D ${PGDATA} -w start
}
# start postgres and do not disconnect
# required for docker
_pg_start()
{
${PGHOME}/bin/postgres "-D" "${PGDATA}"
}
# stop the PostgreSQL instance
_pg_stop()
{
${PGHOME}/bin/pg_ctl -D ${PGDATA} stop -m fast
}
# initdb a new cluster
_pg_initdb()
{
${PGHOME}/bin/initdb -D ${PGDATA} --data-checksums
}
# adjust the postgresql parameters
_pg_adjust_config() {
# PostgreSQL parameters
echo "shared_buffers='128MB'" >> ${PGAUTOCONF}
echo "effective_cache_size='128MB'" >> ${PGAUTOCONF}
echo "listen_addresses = '*'" >> ${PGAUTOCONF}
echo "logging_collector = 'on'" >> ${PGAUTOCONF}
echo "log_truncate_on_rotation = 'on'" >> ${PGAUTOCONF}
echo "log_filename = 'postgresql-%a.log'" >> ${PGAUTOCONF}
echo "log_rotation_age = '1440'" >> ${PGAUTOCONF}
echo "log_line_prefix = '%m - %l - %p - %h - %u@%d '" >> ${PGAUTOCONF}
echo "log_directory = 'pg_log'" >> ${PGAUTOCONF}
echo "log_min_messages = 'WARNING'" >> ${PGAUTOCONF}
echo "log_autovacuum_min_duration = '60s'" >> ${PGAUTOCONF}
echo "log_min_error_statement = 'NOTICE'" >> ${PGAUTOCONF}
echo "log_min_duration_statement = '30s'" >> ${PGAUTOCONF}
echo "log_checkpoints = 'on'" >> ${PGAUTOCONF}
echo "log_statement = 'none'" >> ${PGAUTOCONF}
echo "log_lock_waits = 'on'" >> ${PGAUTOCONF}
echo "log_temp_files = '0'" >> ${PGAUTOCONF}
echo "log_timezone = 'Europe/Zurich'" >> ${PGAUTOCONF}
echo "log_connections=on" >> ${PGAUTOCONF}
echo "log_disconnections=on" >> ${PGAUTOCONF}
echo "log_duration=off" >> ${PGAUTOCONF}
echo "client_min_messages = 'WARNING'" >> ${PGAUTOCONF}
echo "wal_level = 'replica'" >> ${PGAUTOCONF}
echo "hot_standby_feedback = 'on'" >> ${PGAUTOCONF}
echo "max_wal_senders = '10'" >> ${PGAUTOCONF}
echo "cluster_name = '${PGDATABASENAME}'" >> ${PGAUTOCONF}
echo "max_replication_slots = '10'" >> ${PGAUTOCONF}
echo "work_mem=8MB" >> ${PGAUTOCONF}
echo "maintenance_work_mem=64MB" >> ${PGAUTOCONF}
echo "wal_compression=on" >> ${PGAUTOCONF}
echo "max_wal_senders=20" >> ${PGAUTOCONF}
echo "shared_preload_libraries='pg_stat_statements'" >> ${PGAUTOCONF}
echo "autovacuum_max_workers=6" >> ${PGAUTOCONF}
echo "autovacuum_vacuum_scale_factor=0.1" >> ${PGAUTOCONF}
echo "autovacuum_vacuum_threshold=50" >> ${PGAUTOCONF}
# Authentication settings in pg_hba.conf
echo "host all all 0.0.0.0/0 md5" >> ${PGHBACONF}
}
# initialize and start a new cluster
_pg_init_and_start()
{
# initialize a new cluster
_pg_initdb
# set params and access permissions
_pg_adjust_config
# start the new cluster
_pg_prestart
# set username and password
_pg_create_database_and_user
}
# check if $PGDATA exists
if [ -e ${PGDATA} ]; then
# when $PGDATA exists we need to check if there are files
# because when there are files we do not want to initdb
if [ -e "${PGDATA}/base" ]; then
# when there is the base directory this
# probably is a valid PostgreSQL cluster
# so we just start it
_pg_prestart
else
# when there is no base directory then we
# should be able to initialize a new cluster
# and then start it
_pg_init_and_start
fi
else
# initialze and start the new cluster
_pg_init_and_start
# create PGDATA
mkdir -p ${PGDATA}
# create the log directory
mkdir -p ${PGDATA}/pg_log
fi
# restart and do not disconnect from the postgres daemon
_pg_stop
_pg_start
The important point here is: PGDATA is a persistent volume that is linked into the Docker container. When the container comes up we need to check if something that looks like a PostgreSQL data directory is already there. If yes, then we just start the instance with what is there. If nothing is there we create a new instance. Remember: This is just a template and you might need to do more checks in your case. The same is true for what we add to pg_hba.conf here: This is nothing you should do on real systems but can be handy for testing.
Hope this helps …