I had a case at a customer recently where newly uploaded documents into Alfresco would always be searchable but only after 2min45s to 3min. The environment in question is an Alfresco Content Services 7.1 with High Availability (Repository and Share Clustering, Solr Sharding on multi-nodes, …) used for QA/TEST. It’s an ongoing project to upgrade an Alfresco 6.1 to 7.1 and during the testing, the documents take time to be visible through searches, everything else works properly.

 

Since there are a few similar environments on these exact same versions and setup, I tried to replicate the issue on two others but without success. On other instances, documents are searchable properly within the next 15 to 20s more or less, which is then expected based on the Solr Tracker schedule. No specific configurations were put that are out of the ordinary, it’s a rather straight forward setup that matches the previous one.

 

Since it’s a new version (ACS 7.1 / ASS 2.0.2), I thought that maybe it could be linked to the Sharding methods or the number of Shards, even if it shouldn’t (except if there is a bug, of course). So, I did some tests to reduce the number of Shards, change the Sharding method as well as completely remove Sharding and going back to a standard Alfresco/Archive cores setup. Here is an example of the main steps that can be used to remove the current Shards and create a unique new one for live documents:

### Solr Node1 & Node2
## Remove Solr Shards
$ curl -v "http://localhost:8983/solr/admin/cores?action=removeCore&storeRef=workspace://SpacesStore&coreName=alfresco-0"
$ curl -v "http://localhost:8983/solr/admin/cores?action=removeCore&storeRef=workspace://SpacesStore&coreName=alfresco-1"
$ curl -v "http://localhost:8983/solr/admin/cores?action=removeCore&storeRef=workspace://SpacesStore&coreName=alfresco-2"
$ curl -v "http://localhost:8983/solr/admin/cores?action=removeCore&storeRef=archive://SpacesStore&coreName=archive-0"
## Stop Solr
$ sudo systemctl stop solr.service
## Cleanup config
$ rm -rf $SOLR_HOME/solrhome/rerank--a*
## Cleanup indexes
$ ls $SOLR_DATA_HOME/
content  index  models
$ rm -rf $SOLR_DATA_HOME/*/*
## Start Solr
$ sudo systemctl start solr.service
## Create Solr Shard for alfresco-0
$ solr_node_id=1    # for Solr Node2: solr_node_id=2
$ range=25000000
$ total_shards=20
$ for shard_id in `seq 0 0`; do
  begin_range=$((${shard_id} * ${range}))
  end_range=$(((${shard_id} + 1) * ${range}))
  curl -v "http://localhost:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=${total_shards}&numNodes=${total_shards}&nodeInstance=${solr_node_id}&template=rerank&coreName=alfresco&shardIds=${shard_id}&property.shard.method=DB_ID_RANGE&property.shard.range=${begin_range}-${end_range}&property.shard.instance=${shard_id}"
  echo ""
  echo "  -->  Range N°${shard_id} created with: ${begin_range}-${end_range}"
  echo ""
  sleep 2
$ done

 

After a full reindex, the behavior was the same, as we could expect. My next test would have been to try to revert to the previous Alfresco Search Services version (ASS 1.4.3) that was used but while doing the testing and checking the SUMMARY and REPORT generated by Solr (E.g.: http://localhost:8983/solr/admin/cores?action=SUMMARY&core=alfresco-0), I found it strange that the value of “Date for last TX on server” and “Last Index TX Commit Date” didn’t match the time when I uploaded my latest document on Alfresco, it was delayed… I deleted and re-uploaded the document again and saw the same thing, the time didn’t match. This could only be because of Operating System time that doesn’t match between the Solr and Alfresco/DB servers.

 

What happened is that these servers do not have access to internet, and they weren’t set with an internal Time Server. The “systemd-timesyncd” service was enabled but it couldn’t synchronize the time because of that:

root@solr_node1:~$ timedatectl status
                      Local time: Mon 2021-12-13 12:35:34 UTC
                  Universal time: Mon 2021-12-13 12:35:34 UTC
                        RTC time: Mon 2021-12-13 12:38:05
                       Time zone: Etc/UTC (UTC, +0000)
       System clock synchronized: no
systemd-timesyncd.service active: yes
                 RTC in local TZ: no
root@solr_node1:~$
root@solr_node1:~$ timedatectl set-ntp off
root@solr_node1:~$ timedatectl set-ntp on
root@solr_node1:~$
root@solr_node1:~$ systemctl restart systemd-timesyncd.service
root@solr_node1:~$
root@solr_node1:~$ systemctl status systemd-timesyncd.service
● systemd-timesyncd.service - Network Time Synchronization
   Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2021-12-13 12:38:55 UTC; 30s ago
     Docs: man:systemd-timesyncd.service(8)
 Main PID: 29451 (systemd-timesyn)
   Status: "Connecting to time server 91.189.94.4:123 (ntp.ubuntu.com)."
    Tasks: 2 (limit: 4915)
   CGroup: /system.slice/systemd-timesyncd.service
           └─29451 /lib/systemd/systemd-timesyncd

Dec 13 12:38:55 solr_node1 systemd[1]: Starting Network Time Synchronization...
Dec 13 12:38:55 solr_node1 systemd[1]: Started Network Time Synchronization.
Dec 13 12:39:05 solr_node1 systemd-timesyncd[29451]: Timed out waiting for reply from 91.189.91.157:123 (ntp.ubuntu.com).
Dec 13 12:39:15 solr_node1 systemd-timesyncd[29451]: Timed out waiting for reply from 91.189.89.198:123 (ntp.ubuntu.com).
root@solr_node1:~$

 

As a quick workaround, the time was manually synchronized:

root@solr_node1:~$ date -s "13 Dec 2021 12:41:16Z"
Mon Dec 13 12:41:16 UTC 2021
root@solr_node1:~$

 

Right after, the issue was gone, documents were searchable around 20s after the upload to Alfresco. Obviously, the long-term solution is to setup correctly the Operating System with the customer’s correct Time Servers. If they are blocking internet, they must probably have their dedicated Time Servers. For information, this is how to configure custom Time Servers on Ubuntu 18.04:

root@solr_node1:~$ vi /etc/systemd/timesyncd.conf
root@solr_node1:~$
root@solr_node1:~$ grep -v '^#' /etc/systemd/timesyncd.conf
[Time]
NTP=ntp1.domain.com
FallbackNTP=ntp2.domain.com
root@solr_node1:~$
root@solr_node1:~$ systemctl restart systemd-timesyncd.service
root@solr_node1:~$
root@solr_node1:~$ systemctl status systemd-timesyncd.service
● systemd-timesyncd.service - Network Time Synchronization
   Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2021-12-13 15:28:26 UTC; 15s ago
     Docs: man:systemd-timesyncd.service(8)
 Main PID: 29817 (systemd-timesyn)
   Status: "Synchronized to time server 10.10.10.10:123 (ntp1.domain.com)."
    Tasks: 2 (limit: 4915)
   CGroup: /system.slice/systemd-timesyncd.service
           └─29817 /lib/systemd/systemd-timesyncd

Dec 13 15:28:25 solr_node1 systemd[1]: Starting Network Time Synchronization...
Dec 13 15:28:26 solr_node1 systemd[1]: Started Network Time Synchronization.
Dec 13 15:28:35 solr_node1 systemd-timesyncd[29817]: Synchronized to time server 10.10.10.10:123 (ntp1.domain.com).
root@solr_node1:~$
root@solr_node1:~$ timedatectl
                      Local time: Mon 2021-12-13 15:28:50 UTC
                  Universal time: Mon 2021-12-13 15:28:50 UTC
                        RTC time: Mon 2021-12-13 15:28:50
                       Time zone: Etc/UTC (UTC, +0000)
       System clock synchronized: yes
systemd-timesyncd.service active: yes
                 RTC in local TZ: no
root@solr_node1:~$