dbi Blog

In a previous blog, I talked about some basis and presented some possible architectures for Alfresco. Now that this introduction has been done, let’s dig into the real blogs about how to setup a HA/Clustering Alfresco environment. In this blog in particular, I will talk about the Repository layer.

For the Repository Clustering, there are three prerequisites (and that’s all you need):

A valid license which include the Repository Clustering
A shared file system which is accessible from all Alfresco nodes in the Cluster. This is usually a NAS accessed via NFS
A shared database

Clustering the Repository part is really simple to do: you just need to put the correct properties in the alfresco-global.properties file. Of course, you could also manage it all from the Alfresco Admin Console but that’s not recommended, you should really always use the alfresco-global.properties by default. The Alfresco Repository Clustering is using Hazelcast. It was using JGroups and EHCache as well before Alfresco 4.2 but now it’s just Hazelcast. So to define an Alfresco Cluster, simply put the following configuration in the alfresco-global.properties of the Alfresco node1:

[alfresco@alf_n1 ~]$ getent hosts `hostname` | awk '{ print $1 }'
10.10.10.10
[alfresco@alf_n1 ~]$
[alfresco@alf_n1 ~]$ cat $CATALINA_HOME/shared/classes/alfresco-global.properties
...
### Content Store
dir.root=/shared_storage/alf_data
...
### DB
db.username=alfresco
db.password=My+P4ssw0rd
db.name=alfresco
db.host=db_vip
## MySQL
#db.port=3306
#db.driver=com.mysql.jdbc.Driver
#db.url=jdbc:mysql://${db.host}:${db.port}/${db.name}?useUnicode=yes&characterEncoding=UTF-8
#db.pool.validate.query=SELECT 1
## PostgreSQL
db.driver=org.postgresql.Driver
db.port=5432
db.url=jdbc:postgresql://${db.host}:${db.port}/${db.name}
db.pool.validate.query=SELECT 1
## Oracle
#db.driver=oracle.jdbc.OracleDriver
#db.port=1521
#db.url=jdbc:oracle:thin:@${db.host}:${db.port}:${db.name}
#db.pool.validate.query=SELECT 1 FROM DUAL
...
### Clustering
alfresco.cluster.enabled=true
alfresco.cluster.interface=10.10.10.10-11
alfresco.cluster.nodetype=Alfresco_node1
alfresco.hazelcast.password=Alfr3sc0_hz_Test_pwd
alfresco.hazelcast.port=5701
alfresco.hazelcast.autoinc.port=false
alfresco.hazelcast.max.no.heartbeat.seconds=15
...
[alfresco@alf_n1 ~]$

And for the Alfresco node2, you can use the same content:

[alfresco@alf_n2 ~]$ getent hosts `hostname` | awk '{ print $1 }'
10.10.10.11
[alfresco@alf_n2 ~]$
[alfresco@alf_n2 ~]$ cat $CATALINA_HOME/shared/classes/alfresco-global.properties
...
### Content Store
dir.root=/shared_storage/alf_data
...
### DB
db.username=alfresco
db.password=My+P4ssw0rd
db.name=alfresco
db.host=db_vip
## MySQL
#db.port=3306
#db.driver=com.mysql.jdbc.Driver
#db.url=jdbc:mysql://${db.host}:${db.port}/${db.name}?useUnicode=yes&characterEncoding=UTF-8
#db.pool.validate.query=SELECT 1
## PostgreSQL
db.driver=org.postgresql.Driver
db.port=5432
db.url=jdbc:postgresql://${db.host}:${db.port}/${db.name}
db.pool.validate.query=SELECT 1
## Oracle
#db.driver=oracle.jdbc.OracleDriver
#db.port=1521
#db.url=jdbc:oracle:thin:@${db.host}:${db.port}:${db.name}
#db.pool.validate.query=SELECT 1 FROM DUAL
...
### Clustering
alfresco.cluster.enabled=true
alfresco.cluster.interface=10.10.10.10-11
alfresco.cluster.nodetype=Alfresco_node2
alfresco.hazelcast.password=Alfr3sc0_hz_Test_pwd
alfresco.hazelcast.port=5701
alfresco.hazelcast.autoinc.port=false
alfresco.hazelcast.max.no.heartbeat.seconds=15
...
[alfresco@alf_n2 ~]$

Description of the Clustering parameters:

alfresco.cluster.enabled: Whether or not you want to enable the Repository Clustering for the local Alfresco node. The default value is false. You will want to set that to true for all Repository nodes that will be used by Share or any other client. If the Repository is only used for Solr Tracking, you can leave that to false
alfresco.cluster.interface: This is the network interface on which Hazelcast will listen for Clustering messages. This has to be an IP, it can’t be a hostname. To keep things simple and to have the same alfresco-global.properties on all Alfresco nodes however, it is possible to use a specific nomenclature:
- 10.10.10.10: Hazelcast will try to bind on 10.10.10.10 only. If it’s not available, then it won’t start
- 10.10.10.10-11: Hazelcast will try to bind on any IP within the range 10-11 so in this case 2 IPs: 10.10.10.10 or 10.10.10.11. If you have, let’s say, 4 IPs assigned to the local host and you don’t want Hazelcast to use 2 of these, then specify the ones that it can use and it will pick one from the list. This can also be used to have the same content for the alfresco-global.properties on different hosts… One server with IP 10.10.10.10 and a second one with IP 10.10.10.11
- 10.10.10.* or 10.10.*.*: Hazelcast will try to bind on any IP in this range, this is an extended version of the XX-YY range above
alfresco.cluster.nodetype: A human-friendly string to represent the local Alfresco node. It doesn’t have any use for Alfresco, that’s really more for you. It is for example interesting to put a specific string for Alfresco node that won’t take part in the Clustering but that are still using the same Content Store and Database (like a Repository dedicated for the Solr Tracking, as mentioned above)
alfresco.hazelcast.password: The password to use for the Alfresco Repository Cluster. You need to use the same password for all members of the same Cluster. You should as well try to use a different password for each Cluster that you might have if they are in the same network (DEV/TEST/PROD for example), otherwise it will get ugly
alfresco.hazelcast.port: The default port that will be used for Clustering messages between the different members of the Cluster
alfresco.hazelcast.autoinc.port: Whether or not you want to allow Hazelcast to find another free port in case the default port (“alfresco.hazelcast.port”) is currently used. It will increment the port by 1 each time. You should really set this to false and just use the default port, to have full control over the channels that Clustering communications are using otherwise it might get messy as well
alfresco.hazelcast.max.no.heartbeat.seconds: The maximum time in seconds allowed between two heartbeat. If there is no heartbeat in this period of time, Alfresco will assume the remote node isn’t running/available

As you can see above, it’s really simple to add Clustering to an Alfresco Repository. Since you can(should?) have the same set of properties (except the nodetype string maybe), then it also really simplifies the deployment… If you are familiar with other Document Management System like Documentum for example, then you understand the complexity of some of these solutions! If you compare that to Alfresco, it’s like walking on the street versus walking on the moon where you obviously first need to go to the moon… Anyway, once it’s done, the logs of the Alfresco Repository node1 will display something like that when you start it:

2019-07-20 15:14:25,401  INFO  [cluster.core.ClusteringBootstrap] [localhost-startStop-1] Cluster started, name: MainRepository-<generated_id>
2019-07-20 15:14:25,405  INFO  [cluster.core.ClusteringBootstrap] [localhost-startStop-1] Current cluster members:
  10.10.10.10:5701 (hostname: alf_n1)

Wait for the Repository node1 to be fully started and once done, you can start the Repository node2, it needs to be started sequentially normally. You will see on the logs of the Repository node1 that another node joined automatically the Cluster:

2019-07-20 15:15:06,528  INFO  [cluster.core.MembershipChangeLogger] [hz._hzInstance_1_MainRepository-<generated_id>.event-3] Member joined: 10.10.10.11:5701 (hostname: alf_n2)
2019-07-20 15:15:06,529  INFO  [cluster.core.MembershipChangeLogger] [hz._hzInstance_1_MainRepository-<generated_id>.event-3] Current cluster members:
  10.10.10.10:5701 (hostname: alf_n1)
  10.10.10.11:5701 (hostname: alf_n2)

On the logs of the Repository node2, you can see directly at the initialization of the Hazelcast Cluster that the two nodes are available.

If you don’t want to check the logs, you can see pretty much the same thing from the Alfresco Admin Console. By accessing “http(s)://<hostname>:<port>/alfresco/s/enterprise/admin/admin-clustering“, you can see currently available cluster members (online nodes), non-available cluster members (offline nodes) as well as connected non-cluster members (nodes using the same DB & Content Store but with “alfresco.cluster.enabled=false”, for example to dedicate a Repository to Solr Tracking).

Alfresco also provides a small utility to check the health of the cluster which will basically ensure that the communication between each member is successful. This utility can be accessed at “http(s)://<hostname>:<port>/alfresco/s/enterprise/admin/admin-clustering-test“. It is useful to include a quick check using this utility in a monitoring solution for example, to ensure that the cluster is healthy.

Other posts of this series on Alfresco HA/Clustering: