Because PostgreSQL is fully open source there are many forks of it. One of them is called Greenplum which describes itself as “an advanced, fully featured, open source data warehouse, based on PostgreSQL”. Sounds interesting, so lets give it a try. This will be a series of blog posts and in this first one we’re going to prepare the operating system, install the software and verify the installation afterwards.
What follows is basically a short version of the installation guide which you can find here.
One of the requirements is either to disable SELinux or to configure it properly for the Greenplum installation. As this is only a playground, let’s do it the easy way and just disable it. This can be done by setting SELinux to “disabled” in /etc/sysconfig/selinux and reboot the system (I am using Rocky Linux 9 here):
[gpadmin@rocky9-gp7-master ~]$ grep -w SELINUX /etc/sysconfig/selinux
# SELINUX= can take one of these three values:
# NOTE: Up to RHEL 8 release included, SELINUX=disabled would also
SELINUX=disabled
[root@rocky9-gp7-master ~]$ reboot
[root@rocky9-gp7-master ~]$ getenforce
Disabled
The same for the local firewall, either disable it or configure it properly:
[root@rocky9-gp7-master ~]$ systemctl stop firewalld
[root@rocky9-gp7-master ~]$ systemctl disable firewalld
Removed "/etc/systemd/system/multi-user.target.wants/firewalld.service".
Removed "/etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service".
To avoid DNS the hosts file on all my three nodes looks like this:
[root@rocky9-gp7-master ~]$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.122.200 rocky9-gp7-master rocky9-gp7-master.it.dbi-services.com
192.168.122.201 rocky9-gp7-segment1 rocky9-gp7-segment1.it.dbi-services.com
192.168.122.202 rocky9-gp7-segment2 rocky9-gp7-segment2.it.dbi-services.com
The first node is the so called “Coordinator Host”. This one will receive all the client requests and route them to one of the so called “Segment Hosts”. In this case there are two segment nodes and those will host the actual data.
For the kernel & system requirements this are the recommended settings:
[root@rocky9-gp7-master ~]$ cat /etc/sysctl.conf
# kernel.shmall = _PHYS_PAGES / 2 # See Shared Memory Pages
kernel.shmall = 197951838
# kernel.shmmax = kernel.shmall * PAGE_SIZE
kernel.shmmax = 810810728448
kernel.shmmni = 4096
vm.overcommit_memory = 2 # See Segment Host Memory
vm.overcommit_ratio = 95 # See Segment Host Memory
net.ipv4.ip_local_port_range = 10000 65535 # See Port Settings
kernel.sem = 250 2048000 200 8192
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
kernel.core_pattern=/var/core/core.%h.%t
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ipfrag_high_thresh = 41943040
net.ipv4.ipfrag_low_thresh = 31457280
net.ipv4.ipfrag_time = 60
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.swappiness = 10
vm.zone_reclaim_mode = 0
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
vm.dirty_background_ratio = 0 # See System Memory
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736
vm.dirty_bytes = 4294967296
[root@rocky9-gp7-master ~]$ sysctl -p
[root@rocky9-gp7-master ~]$ egrep "^\*" /etc/security/limits.conf
* soft nofile 524288
* hard nofile 524288
* soft nproc 131072
* hard nproc 131072
* soft core unlimited
Another requirement is, that rc.local needs to be enabled or, in other words, it needs to be executable when the systems are starting up:
[root@rocky9-gp7-master ~]$ chmod +x /etc/rc.d/rc.local
[root@rocky9-gp7-master ~]$ reboot
As usual on system swhich host a database it is recommended to disable transparent huge pages (this required a reboot as well):
[root@rocky9-gp7-master ~]$ grubby --update-kernel=ALL --args="transparent_hugepage=never"
Deactivate systemd’s IPC object removal (this is already the default on Rocky Linux 9, but anyway):
[root@rocky9-gp7-master ~]$ sed -i 's/#RemoveIPC=no/RemoveIPC=no/g' /etc/systemd/logind.conf
[root@rocky9-gp7-master ~]$ systemctl restart systemd-logind.service
As Greenplum should run under a dedicated user, let’s create it:
[root@rocky9-gp7-master ~]$ groupadd gpadmin
[root@rocky9-gp7-master ~]$ useradd -g gpadmin -m gpadmin
[root@rocky9-gp7-master ~]$ passwd gpadmin
Changing password for user gpadmin.
New password:
BAD PASSWORD: The password fails the dictionary check - it is based on a dictionary word
Retype new password:
passwd: all authentication tokens updated successfully.
sudo configuration is optional, but as it makes life a lot easier, lets configure this as well for the gpadmin user:
[root@rocky9-gp7-master ~]$ grep gpadmin /etc/sudoers
gpadmin ALL=(ALL) NOPASSWD: ALL
The installation of Greenplum is just a matter of installing the rpm, which can be downloaded from the project’s Github repository:
[root@rocky9-gp7-master ~]$ su - gpadmin
Last login: Wed Feb 28 14:48:01 CET 2024 on pts/0
[gpadmin@rocky9-gp7-master ~]$ ls -l
total 50320
-rw-r--r-- 1 gpadmin gpadmin 51527129 Feb 28 14:50 open-source-greenplum-db-7.1.0-el9-x86_64.rpm
[gpadmin@rocky9-gp7-master ~]$ sudo dnf localinstall ./open-source-greenplum-db-7.1.0-el9-x86_64.rpm
Rocky Linux 9 - BaseOS 14 kB/s | 4.1 kB 00:00
Rocky Linux 9 - BaseOS 5.6 MB/s | 2.2 MB 00:00
Rocky Linux 9 - AppStream 22 kB/s | 4.5 kB 00:00
Rocky Linux 9 - AppStream 12 MB/s | 7.4 MB 00:00
Rocky Linux 9 - Extras 6.7 kB/s | 2.9 kB 00:00
Rocky Linux 9 - Extras 24 kB/s | 14 kB 00:00
Dependencies resolved.
=====================================================================================================================================
Package Architecture Version Repository Size
=====================================================================================================================================
Installing:
open-source-greenplum-db-7 x86_64 7.1.0-1.el9 @commandline 49 M
Installing dependencies:
annobin x86_64 12.12-1.el9 appstream 977 k
apr x86_64 1.7.0-12.el9_3 appstream 122 k
apr-util x86_64 1.6.1-23.el9 appstream 94 k
apr-util-bdb x86_64 1.6.1-23.el9 appstream 12 k
...
tar-2:1.34-6.el9_1.x86_64 unzip-6.0-56.el9.x86_64
zip-3.0-35.el9.x86_64
Complete!
[gpadmin@rocky9-gp7-master ~]$ sudo chown -R gpadmin:gpadmin /usr/local/greenplum-db*
(the last step could also be done automatically by the package, but as it is not and the documentation recommends doing it, lets do so)
As password-less ssh is a requirement as well, let’s generate ssh keys on the coordinator node, create the authorized_keys file and then copy over the whole “.ssh” directory to the other nodes. Once this is done, password-less SSH connections should already work between the nodes:
[gpadmin@rocky9-gp7-master ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/gpadmin/.ssh/id_rsa):
Created directory '/home/gpadmin/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/gpadmin/.ssh/id_rsa
Your public key has been saved in /home/gpadmin/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:+8uzanzyzxWSPhvtEHQiZ3s5+qSM9/YdUshPjEz4ojg [email protected]
The key's randomart image is:
+---[RSA 3072]----+
| |
| . |
| ..=.. |
| ===+. |
| S .=*=+ |
| .....*+o |
| E.. *.+o |
| =oo+.@o o|
| ..=B**o+.o|
+----[SHA256]-----+
[gpadmin@rocky9-gp7-master ~]$ ssh-keygen
[gpadmin@rocky9-gp7-master ~]$ scp -r .ssh/ rocky9-gp7-segment1:/home/gpadmin/
[gpadmin@rocky9-gp7-master ~]$ scp -r .ssh/ rocky9-gp7-segment2:/home/gpadmin/
[gpadmin@rocky9-gp7-master ~]$ ssh rocky9-gp7-segment1
Last login: Wed Feb 28 15:04:18 2024 from 192.168.122.200
[gpadmin@rocky9-gp7-segment1 ~]$
logout
Connection to rocky9-gp7-segment1 closed.
[gpadmin@rocky9-gp7-master ~]$ ssh rocky9-gp7-segment2
Last login: Wed Feb 28 14:50:50 2024
[gpadmin@rocky9-gp7-segment2 ~]$
logout
Connection to rocky9-gp7-segment2 closed.
To verify the SSH setup there is utility called “gpssh”. Before using this create a file called “hostfile_exkeys” and add all the host names which will be part of the cluster:
[gpadmin@rocky9-gp7-master ~]$ echo "rocky9-gp7-master
rocky9-gp7-segment1
rocky9-gp7-segment2" > /home/gpadmin/hostfile_exkeys
Testing the SSH setup can then be done by asking “gpssh” to execute commands on all the nodes like this:
[gpadmin@rocky9-gp7-master ~]$ /usr/local/greenplum-db/bin/gpssh -f /home/gpadmin/hostfile_exkeys -e 'ls -l /usr/local/greenplum-db'
Traceback (most recent call last):
File "/usr/local/greenplum-db/bin/gpssh", line 32, in <module>
from gppylib.util import ssh_utils
ModuleNotFoundError: No module named 'gppylib'
… and this fails. The reason is that the Greenplum environment is not yet set properly. This can be done by sourcing “greenplum_path.sh” into the gpadmin user’s environment:
[gpadmin@rocky9-gp7-master ~]$ tail -1 .bash_profile
. /usr/local/greenplum-db/greenplum_path.sh
[gpadmin@rocky9-gp7-master ~]$ /usr/local/greenplum-db/bin/gpssh -f /home/gpadmin/hostfile_exkeys -e 'ls -l /usr/local/greenplum-db'
This fails again with:
Traceback (most recent call last):
File "/usr/local/greenplum-db/bin/gpssh", line 32, in <module>
from gppylib.util import ssh_utils
File "/usr/local/greenplum-db-7.1.0/lib/python/gppylib/util/ssh_utils.py", line 13, in <module>
from gppylib.commands.unix import Hostname, Echo
File "/usr/local/greenplum-db-7.1.0/lib/python/gppylib/commands/unix.py", line 18, in <module>
from pkg_resources import parse_version
ModuleNotFoundError: No module named 'pkg_resources'
The reason is, that the python3-setuptools package is not installed on the system. So, lets do this and try again:
[gpadmin@rocky9-gp7-master ~]$ sudo dnf install -y python3-setuptools
[gpadmin@rocky9-gp7-master ~]$ /usr/local/greenplum-db/bin/gpssh -f /home/gpadmin/hostfile_exkeys -e 'ls -l /usr/local/greenplum-db'
[rocky9-gp7-segment1] ls -l /usr/local/greenplum-db
[rocky9-gp7-segment1] lrwxrwxrwx 1 gpadmin gpadmin 29 Feb 28 14:53 /usr/local/greenplum-db -> /usr/local/greenplum-db-7.1.0
[ rocky9-gp7-master] ls -l /usr/local/greenplum-db
[ rocky9-gp7-master] lrwxrwxrwx 1 gpadmin gpadmin 29 Feb 28 14:52 /usr/local/greenplum-db -> /usr/local/greenplum-db-7.1.0
[rocky9-gp7-segment2] ls -l /usr/local/greenplum-db
[rocky9-gp7-segment2] lrwxrwxrwx 1 gpadmin gpadmin 29 Feb 28 14:53 /usr/local/greenplum-db -> /usr/local/greenplum-db-7.1.0
Now everything looks fine and we can proceed with creating the “Data Storage Areas”, but this will be the topic of the next post.
Paolo
03.04.2024Hi Daniel,
Nice compact guide compared to the online install documentation.
Where exactly did you find the open source rpm file? I can only find the ones on the VMWare site, on the GitHub repo are only source archives.
Daniel Westermann
06.04.2024Hi Paolo,
you're welcome.
I am pretty sure it was available on the Github repo at that time, but it isn't anymore. Either download it from here (https://network.pivotal.io/products/vmware-greenplum) or build it from source?
Cheers,
Daniel