{"id":12450,"date":"2019-04-30T12:34:39","date_gmt":"2019-04-30T10:34:39","guid":{"rendered":"https:\/\/www.dbi-services.com\/blog\/two-node-patroni-does-not-failover-when-one-node-crashes\/"},"modified":"2024-09-10T17:37:47","modified_gmt":"2024-09-10T15:37:47","slug":"two-node-patroni-does-not-failover-when-one-node-crashes","status":"publish","type":"post","link":"https:\/\/www.dbi-services.com\/blog\/two-node-patroni-does-not-failover-when-one-node-crashes\/","title":{"rendered":"Two node patroni does not failover when one node crashes"},"content":{"rendered":"<p>A few days ago I had the interesting mission to build a two Node Patroni Postgres 9.6 Cluster using etcd in our openDB Appliance environment. Sounds easy, one leader, one secondary, both running etcd, that&#8217;s it. But that&#8217;s only the first impression.<br \/>\n<!--more--><\/p>\n<p>So we start with two newly created <a href=\"https:\/\/www.dbi-services.com\/opendb-appliance\/\" target=\"\u201d_blank\u201d\" rel=\"noopener noreferrer\"> OpenDB Appliance servers <\/a> and deploy the database home using the openDB Appliance.<br \/>\nAfterwards we installed all the stuff for patroni and etcd. We can do all the configuration for the cluster, etcd and patroni (for more information about building a patroni cluster click <a href=\"https:\/\/www.dbi-services.com\/blog\/using-ansible-to-bring-up-a-three-node-patroni-cluster-in-minutes\/\" target=\"\u201d_blank\u201d\" rel=\"noopener noreferrer\"> here <\/a> ) .<\/p>\n<p>When we start the etcd and patroni services, everything looks fine. Both Members up and running, replication also worked fine.<br \/>\n<code><br \/>\nopendb@opendb_stby:\/home\/opendb\/ [PG1] patronictl list<br \/>\n\/home\/opendb\/.local\/lib\/python2.7\/site-packages\/psycopg2\/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use \"pip install psycopg2-binary\" instead. For details see: &lt;http:\/\/initd.org\/psycopg\/docs\/install.html#binary-install-from-pypi&gt;.<br \/>\n\"\"\")<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n| Cluster |     Member     |      Host     |  Role  |  State  | TL | Lag in MB |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n|   PG1   | opendb_primary | 192.168.22.33 | Leader | running |  3 |       0.0 |<br \/>\n|   PG1   |  opendb_stby   | 192.168.22.34 |        | running |  3 |       0.0 |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n<\/code><\/p>\n<p>Let&#8217;s try a manual failover &#8211; this works as expected:<br \/>\n<code><br \/>\nopendb@opendb_stby:\/home\/opendb\/ [PG1] patronictl switchover<br \/>\n\/home\/opendb\/.local\/lib\/python2.7\/site-packages\/psycopg2\/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use \"pip install psycopg2-binary\" instead. For details see: &lt;http:\/\/initd.org\/psycopg\/docs\/install.html#binary-install-from-pypi&gt;.<br \/>\n\"\"\")<br \/>\nMaster [opendb_primary]:<br \/>\nCandidate ['opendb_stby'] []: opendb_stby<br \/>\nWhen should the switchover take place (e.g. 2015-10-01T14:30)  [now]:<br \/>\nCurrent cluster topology<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n| Cluster |     Member     |      Host     |  Role  |  State  | TL | Lag in MB |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n|   PG1   | opendb_primary | 192.168.22.33 | Leader | running |  3 |       0.0 |<br \/>\n|   PG1   |  opendb_stby   | 192.168.22.34 |        | running |  3 |       0.0 |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\nAre you sure you want to switchover cluster PG1, demoting current master opendb_primary? [y\/N]: y<br \/>\n2019-04-25 01:19:07.89086 Successfully switched over to \"opendb_stby\"<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n| Cluster |     Member     |      Host     |  Role  |  State  | TL | Lag in MB |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n|   PG1   | opendb_primary | 192.168.22.33 |        | stopped |    |   unknown |<br \/>\n|   PG1   |  opendb_stby   | 192.168.22.34 | Leader | running |  3 |           |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n<\/code><\/p>\n<p>Do another check, if opendb_stby is the leader now, looks perfect.<br \/>\n<code><br \/>\nopendb@opendb_stby:\/home\/opendb\/ [PG1] patronictl list<br \/>\n\/home\/opendb\/.local\/lib\/python2.7\/site-packages\/psycopg2\/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use \"pip install psycopg2-binary\" instead. For details see: &lt;http:\/\/initd.org\/psycopg\/docs\/install.html#binary-install-from-pypi&gt;.<br \/>\n\"\"\")<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n| Cluster | Member | Host | Role | State | TL | Lag in MB |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n| PG1 | opendb_primary | 192.168.22.33 | | running | 4 | 0.0 |<br \/>\n| PG1 | opendb_stby | 192.168.22.34 | Leader | running | 4 | 0.0 |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n<\/code><\/p>\n<p>As a next step, let&#8217;s try to stop the Patroni service on the leading node to force the automatic failover.<br \/>\n<code><br \/>\nopendb@opendb_stby:\/home\/opendb\/ [PG1] patronictl list<br \/>\n\/home\/opendb\/.local\/lib\/python2.7\/site-packages\/psycopg2\/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use \"pip install psycopg2-binary\" instead. For details see: &lt;http:\/\/initd.org\/psycopg\/docs\/install.html#binary-install-from-pypi&gt;.<br \/>\n\"\"\")<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n| Cluster |     Member     |      Host     |  Role  |  State  | TL | Lag in MB |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n|   PG1   | opendb_primary | 192.168.22.33 |        | running |  4 |       0.0 |<br \/>\n|   PG1   |  opendb_stby   | 192.168.22.34 | Leader | running |  4 |       0.0 |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\nopendb@opendb_stby:\/home\/opendb\/ [PG1] sudo systemctl stop patroni<br \/>\nopendb@opendb_stby:\/home\/opendb\/ [PG1] patronictl list<br \/>\n\/home\/opendb\/.local\/lib\/python2.7\/site-packages\/psycopg2\/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use \"pip install psycopg2-binary\" instead. For details see: &lt;http:\/\/initd.org\/psycopg\/docs\/install.html#binary-install-from-pypi&gt;.<br \/>\n\"\"\")<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n| Cluster |     Member     |      Host     |  Role  |  State  | TL | Lag in MB |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n|   PG1   | opendb_primary | 192.168.22.33 | Leader | running |  5 |       0.0 |<br \/>\n|   PG1   |  opendb_stby   | 192.168.22.34 |        | stopped |    |   unknown |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\nopendb@opendb_stby:\/home\/opendb\/ [PG1]<br \/>\n<\/code><\/p>\n<p>Yeah, this also works as expected!<\/p>\n<p>Now, let us try the &#8220;worst case&#8221; and shutdown the leading node.<br \/>\nWith this the problem starts! There are tons of error messages in the log file. The secondary recognizes that the leader has gone, but does not become the new leader.<\/p>\n<p><code><br \/>\nApr 24 13:10:34 opendb_stby patroni: 2019-04-24 13:10:34,124 INFO: no action.  i am a secondary and i am following a leader<br \/>\nApr 24 13:10:36 opendb_stby patroni: 2019-04-24 13:10:36,766 WARNING: Request failed to opendb_primary: GET http:\/\/192.168.22.33:8008\/patroni (('Connection aborted.', error(104, 'Connection reset by peer')))<br \/>\nApr 24 13:10:36 opendb_stby etcd: lost the TCP streaming connection with peer 1154b56a13168e2a (stream MsgApp v2 reader)<br \/>\nApr 24 13:10:36 opendb_stby etcd: lost the TCP streaming connection with peer 1154b56a13168e2a (stream Message reader)<br \/>\nApr 24 13:10:36 opendb_stby etcd: failed to dial 1154b56a13168e2a on stream MsgApp v2 (peer 1154b56a13168e2a failed to find local node f13d668ae0cba84)<br \/>\nApr 24 13:10:36 opendb_stby etcd: peer 1154b56a13168e2a became inactive (message send to peer failed)<br \/>\nApr 24 13:10:36 opendb_stby patroni: 2019-04-24 13:10:36,838 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=0, status=None)) after connection broken by 'NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused',)': \/v2\/keys\/service\/PG1\/leader<br \/>\nApr 24 13:10:36 opendb_stby patroni: 2019-04-24 13:10:36,839 <strong>ERROR: Request to server http:\/\/192.168.22.33:2379 failed: MaxRetryError(\"HTTPConnectionPool(host='192.168.22.33', port=2379): Max retries exceeded with url: \/v2\/keys\/service\/PG1\/leader (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused',))\",)<\/strong><br \/>\nApr 24 13:10:36 opendb_stby patroni: 2019-04-24 13:10:36,839 INFO: Reconnection allowed, looking for another server.<br \/>\nApr 24 13:10:36 opendb_stby patroni: 2019-04-24 13:10:36,839 INFO: Selected new etcd server http:\/\/192.168.22.34:2379<br \/>\nApr 24 13:10:37 opendb_stby etcd: lost the TCP streaming connection with peer 1154b56a13168e2a (stream Message writer)<br \/>\nApr 24 13:10:38 opendb_stby etcd: lost the TCP streaming connection with peer 1154b56a13168e2a (stream MsgApp v2 writer)<br \/>\nApr 24 13:10:38 opendb_stby etcd: f13d668ae0cba84 stepped down to follower since quorum is not active<br \/>\nApr 24 13:10:38 opendb_stby etcd: f13d668ae0cba84 became follower at term 19197<br \/>\nApr 24 13:10:38 opendb_stby etcd: raft.node: f13d668ae0cba84 lost leader f13d668ae0cba84 at term 19197<br \/>\nApr 24 13:10:39 opendb_stby patroni: 2019-04-24 13:10:39,344 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=0, status=None)) after connection broken by 'ReadTimeoutError(\"HTTPConnectionPool(host=u'192.168.22.34', port=2379): Read timed out. (read timeout=2.5)\",)': \/v2\/keys\/service\/PG1\/leader<br \/>\nApr 24 13:10:40 opendb_stby etcd: f13d668ae0cba84 is starting a new election at term 19197<br \/>\nApr 24 13:10:40 opendb_stby etcd: f13d668ae0cba84 became candidate at term 19198<br \/>\nApr 24 13:10:40 opendb_stby etcd: f13d668ae0cba84 received MsgVoteResp from f13d668ae0cba84 at term 19198<br \/>\nApr 24 13:10:40 opendb_stby etcd: f13d668ae0cba84 [logterm: 19197, index: 447303] sent MsgVote request to 1154b56a13168e2a at term 19198<br \/>\nApr 24 13:10:41 opendb_stby etcd: f13d668ae0cba84 is starting a new election at term 19198<br \/>\nApr 24 13:10:41 opendb_stby etcd: f13d668ae0cba84 became candidate at term 19199<br \/>\nApr 24 13:10:41 opendb_stby etcd: f13d668ae0cba84 received MsgVoteResp from f13d668ae0cba84 at term 19199<br \/>\nApr 24 13:10:41 opendb_stby etcd: f13d668ae0cba84 [logterm: 19197, index: 447303] sent MsgVote request to 1154b56a13168e2a at term 19199<br \/>\nApr 24 13:10:41 opendb_stby patroni: 2019-04-24 13:10:41,938 ERROR: Request to server http:\/\/192.168.22.34:2379 failed: MaxRetryError(u'HTTPConnectionPool(host=u'192.168.22.34', port=2379): Max retries exceeded with url: \/v2\/keys\/service\/PG1\/leader (Caused by ReadTimeoutError(\"HTTPConnectionPool(host=u'192.168.22.34', port=2379): Read timed out. (read timeout=2.5)\",))',)<br \/>\nApr 24 13:10:41 opendb_stby patroni: 2019-04-24 13:10:41,938 INFO: Reconnection allowed, looking for another server.<br \/>\nApr 24 13:10:41 opendb_stby patroni: 2019-04-24 13:10:41,939 ERROR: Machines cache is empty, no machines to try.<br \/>\nApr 24 13:10:41 opendb_stby patroni: 2019-04-24 13:10:41,939 INFO: Selected new etcd server http:\/\/192.168.22.33:2379<br \/>\nApr 24 13:10:43 opendb_stby etcd: f13d668ae0cba84 is starting a new election at term 19199<br \/>\n<\/code><\/p>\n<h3>Why? What went wrong?<\/h3>\n<p>Using some search engines bring lot of nice ideas, e.g errors in configuration. But nothing helps&#8230;<\/p>\n<p>There is one main failure:<br \/>\nA two node patroni with only two etcd won&#8217;t work. How should the secondary node know, that he should become the leader?<\/p>\n<p>We need a third etcd host. An etcd cluster needs an extra node to agree on updates to the cluster state. Means: when one node failes, the third node recognizes that one is missing and the secondary becomes the new leader.<\/p>\n<p>We have to do some changes on the two existing hosts and on the third host we need to install etcd.<\/p>\n<p><code><br \/>\nwget https:\/\/github.com\/etcd-io\/etcd\/releases\/download\/v3.3.10\/etcd-v3.3.10-linux-amd64.tar.gz<br \/>\ntar -axf etcd-v3.3.10-linux-amd64.tar.gz<br \/>\ncp etcd-v3.3.10-linux-amd64\/etcd* \/u01\/app\/opendb\/local\/dmk\/bin<br \/>\nmkdir -p \/u02\/pgdata\/etcd<br \/>\nrm -rf etcd-v3.3.10-linux-amd64 etcd-v3.3.10-linux-amd64.tar.gz<br \/>\n<\/code><\/p>\n<p>Afterwards we have to create the etcd configuration file on the third host and change the files on the both other hosts to include the new host.<br \/>\n<code><br \/>\ncat &gt; \/u01\/app\/opendb\/local\/dmk\/dmk_postgres\/etc\/etcd.conf &lt; \/u01\/app\/opendb\/local\/dmk\/dmk_postgres\/etc\/etcd.conf &lt; \/u01\/app\/opendb\/local\/dmk\/dmk_postgres\/etc\/etcd.conf &lt;&lt; EOF<br \/>\nname: opendb-stby2<br \/>\ndata-dir: \/u02\/opendb\/pgdata\/etcd<br \/>\ninitial-advertise-peer-urls: http:\/\/192.168.22.35:2380<br \/>\nlisten-peer-urls: http:\/\/192.168.22.35:2380<br \/>\nlisten-client-urls: http:\/\/192.168.22.35:2379,http:\/\/localhost:2379<br \/>\nadvertise-client-urls: http:\/\/192.168.22.35:2379<br \/>\ninitial-cluster: opendb-primary=http:\/\/192.168.22.33:2380,opendb-stby=http:\/\/192.168.22.34:2380,opendb-stby2=http:\/\/192.168.22.35:2380<br \/>\nEOF<br \/>\n<\/code><\/p>\n<p>Create the etcd.service on the third node<br \/>\n<code><br \/>\n[opendb@opendb_standby2 etc]$ cat \/etc\/systemd\/system\/etcd.service<br \/>\n#<br \/>\n# systemd integration for etcd<br \/>\n# Put this file under \/etc\/systemd\/system\/etcd.service<br \/>\n#     then: systemctl daemon-reload<br \/>\n#     then: systemctl list-unit-files | grep etcd<br \/>\n#     then: systemctl enable etcd.service<br \/>\n#<br \/>\n[Unit]<br \/>\nDescription=dbi services etcd service<br \/>\nAfter=network.target<br \/>\n[Service]<br \/>\nUser=opendb<br \/>\nType=notify<br \/>\nExecStart=\/u01\/app\/opendb\/local\/dmk\/dmk_postgres\/bin\/etcd --config-file \/u01\/app\/opendb\/local\/dmk\/dmk_postgres\/etc\/etcd.conf<br \/>\nRestart=always<br \/>\nRestartSec=10s<br \/>\nLimitNOFILE=40000<br \/>\n[Install]<br \/>\nWantedBy=multi-user.target<br \/>\n<\/code><\/p>\n<p>Enable etcd service and reboot the third node<br \/>\n<code><br \/>\nsudo systemctl daemon-reload<br \/>\nsudo systemctl list-unit-files | grep etcd<br \/>\nsudo systemctl enable etcd.service<br \/>\nsudo systemctl reboot<br \/>\n<\/code><\/p>\n<p>Now patroni can be installed on the third node as well<br \/>\n<code><br \/>\nsudo pip install --upgrade pip<br \/>\npip install --upgrade --user setuptools<br \/>\npip install --user psycopg2-binary<br \/>\npip install --user patroni[etcd]<br \/>\n<\/code><\/p>\n<p>Afterwards we have to change the patroni.yml file on both database hosts and recreate the database, that everything works as expected. Otherwise the third etcd won&#8217;t be recognized in the cluster.<\/p>\n<p>1. Shutdown patroni on both nodes<br \/>\n<code><br \/>\nsystemctl stop patroni<br \/>\n<\/code><\/p>\n<p>2. Clean up data directories<br \/>\n<code><br \/>\nrm -rf \/u02\/opendb\/pgdata\/96\/PG1<br \/>\n<\/code><\/p>\n<p>3. Clean up cluster configuration in etcd<br \/>\n<code><br \/>\npatronictl -c \/u01\/app\/opendb\/local\/dmk\/dmk_postgres\/etc\/patroni.yml remove PG1<br \/>\n<\/code><\/p>\n<p>4. Change the patroni.yml file<br \/>\n<code><br \/>\netcd:<br \/>\n   hosts: 192.168.22.33:2379,192.168.22.34:2379,192.168.22.35:2379<br \/>\n<\/code><\/p>\n<p>5. Start patroni on both nodes where the database should run. This will build a new cluster.<br \/>\n<code><br \/>\nsystemctl start patroni<br \/>\n<\/code><\/p>\n<p>6. Start etcd on all THREE nodes<br \/>\n<code><br \/>\nsystemctl start etcd<br \/>\n<\/code><\/p>\n<p>Everything works as before. In the list you only see your two node patroni<br \/>\n<code><br \/>\nopendb@opendb_primary:\/home\/opendb\/ [PG1] patronictl list<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n| Cluster |     Member     |      Host     |  Role  |  State  | TL | Lag in MB |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n|   PG1   | opendb_primary | 192.168.22.33 | Leader | running |  7 |       0.0 |<br \/>\n|   PG1   |  opendb_stby   | 192.168.22.34 |        | running |  7 |       0.0 |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n<\/code><\/p>\n<p>The etcd Cluster looks also good<br \/>\n<code><br \/>\n opendb@opendb_primary:\/home\/opendb\/ [PG1] etcdctl cluster-health<br \/>\nmember f13d668ae0cba84 is healthy: got healthy result from http:\/\/192.168.22.34:2379<br \/>\nmember 1154b56a13168e2a is healthy: got healthy result from http:\/\/192.168.22.33:2379<br \/>\nmember c8520dba9907702d is healthy: got healthy result from http:\/\/192.168.22.35:2379<br \/>\ncluster is healthy<br \/>\n<\/code><\/p>\n<p>Now let&#8217;s try another shutdown of the leader node and see what happens:<br \/>\n<code><br \/>\nopendb@opendb_stby:\/home\/opendb\/ [PG1] patronictl list<br \/>\n\/home\/opendb\/.local\/lib\/python2.7\/site-packages\/psycopg2\/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use \"pip install psycopg2-binary\" instead. For details see: .<br \/>\n  \"\"\")<br \/>\n2019-04-25 03:40:26,447 - WARNING - Retrying (Retry(total=0, connect=None, read=None, redirect=0, status=None)) after connection broken by 'ConnectTimeoutError(, 'Connection to 192.168.22.33 timed out. (connect timeout=2.5)')': \/v2\/machines<br \/>\n2019-04-25 03:40:28,953 - ERROR - Failed to get list of machines from http:\/\/192.168.22.33:2379\/v2: MaxRetryError(\"HTTPConnectionPool(host='192.168.22.33', port=2379): Max retries exceeded with url: \/v2\/machines (Caused by ConnectTimeoutError(, 'Connection to 192.168.22.33 timed out. (connect timeout=2.5)'))\",)<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n| Cluster |     Member     |      Host     |  Role  |  State  | TL | Lag in MB |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n|   PG1   | opendb_primary | 192.168.22.33 |        | stopped |    |   unknown |<br \/>\n|   PG1   |  opendb_stby   | 192.168.22.34 | Leader | running |  8 |       0.0 |<br \/>\n+---------+----------------+---------------+--------+---------+----+-----------+<br \/>\n03:40:29 opendb@opendb_stby:\/home\/opendb\/ [PG1]<br \/>\n<\/code><\/p>\n<p>Also in \/var\/log\/messages you can see the successful failover to opendb_stby<br \/>\n<code><br \/>\nApr 25 03:40:09 opendb_stby patroni: 2019-04-25 03:40:09,096 WARNING: Request failed to opendb_primary: GET http:\/\/192.168.22.33:8008\/patroni (('Connection aborted.', error(104, 'Connection reset by peer')))<br \/>\nApr 25 03:40:09 opendb_stby etcd: f13d668ae0cba84 [term 11] received MsgTimeoutNow from 1154b56a13168e2a and starts an election to get leadership.<br \/>\nApr 25 03:40:09 opendb_stby etcd: f13d668ae0cba84 became candidate at term 12<br \/>\nApr 25 03:40:09 opendb_stby etcd: f13d668ae0cba84 received MsgVoteResp from f13d668ae0cba84 at term 12<br \/>\nApr 25 03:40:09 opendb_stby etcd: f13d668ae0cba84 [logterm: 11, index: 92328] sent MsgVote request to 1154b56a13168e2a at term 12<br \/>\nApr 25 03:40:09 opendb_stby etcd: f13d668ae0cba84 [logterm: 11, index: 92328] sent MsgVote request to c8520dba9907702d at term 12<br \/>\nApr 25 03:40:09 opendb_stby etcd: raft.node: f13d668ae0cba84 lost leader 1154b56a13168e2a at term 12<br \/>\nApr 25 03:40:09 opendb_stby etcd: f13d668ae0cba84 received MsgVoteResp from c8520dba9907702d at term 12<br \/>\nApr 25 03:40:09 opendb_stby etcd: f13d668ae0cba84 [quorum:2] has received 2 MsgVoteResp votes and 0 vote rejections<br \/>\nApr 25 03:40:09 opendb_stby etcd: f13d668ae0cba84 became leader at term 12<br \/>\nApr 25 03:40:09 opendb_stby etcd: raft.node: f13d668ae0cba84 elected leader f13d668ae0cba84 at term 12<br \/>\nApr 25 03:40:09 opendb_stby patroni: 2019-04-25 03:40:09,190 INFO: Software Watchdog activated with 25 second timeout, timing slack 15 seconds<br \/>\nApr 25 03:40:09 opendb_stby patroni: 2019-04-25 03:40:09,194 INFO: promoted self to leader by acquiring session lock<br \/>\nApr 25 03:40:09 opendb_stby patroni: server promoting<br \/>\nApr 25 03:40:09 opendb_stby patroni: 2019-04-25 03:40:09,200 INFO: cleared rewind state after becoming the leader<br \/>\nApr 25 03:40:09 opendb_stby etcd: lost the TCP streaming connection with peer 1154b56a13168e2a (stream Message reader)<br \/>\nApr 25 03:40:09 opendb_stby etcd: lost the TCP streaming connection with peer 1154b56a13168e2a (stream MsgApp v2 reader)<br \/>\nApr 25 03:40:09 opendb_stby etcd: failed to dial 1154b56a13168e2a on stream Message (peer 1154b56a13168e2a failed to find local node f13d668ae0cba84)<br \/>\nApr 25 03:40:09 opendb_stby etcd: peer 1154b56a13168e2a became inactive (message send to peer failed)<br \/>\nApr 25 03:40:09 opendb_stby etcd: lost the TCP streaming connection with peer 1154b56a13168e2a (stream Message writer)<br \/>\nApr 25 03:40:10 opendb_stby patroni: 2019-04-25 03:40:10,271 INFO: Lock owner: opendb_stby; I am opendb_stby<br \/>\nApr 25 03:40:10 opendb_stby patroni: 2019-04-25 03:40:10,304 INFO: no action.  i am the leader with the lock<br \/>\n<\/code><\/p>\n<h3>Conclusion<\/h3>\n<p>Running and especially failover a Patroni Cluster in two node modus is only possible when having a third etcd hosts. Otherwise the automatic failover won&#8217;t work when one node crashes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A few days ago I had the interesting mission to build a two Node Patroni Postgres 9.6 Cluster using etcd in our openDB Appliance environment. Sounds easy, one leader, one secondary, both running etcd, that&#8217;s it. But that&#8217;s only the first impression.<\/p>\n","protected":false},"author":28,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[229],"tags":[1618,1543,1002],"type_dbi":[],"class_list":["post-12450","post","type-post","status-publish","format-standard","hentry","category-database-administration-monitoring","tag-etcd","tag-patroni","tag-postgresql-9-6"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.8 (Yoast SEO v27.8) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Two node patroni does not failover when one node crashes - dbi Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dbi-services.com\/blog\/two-node-patroni-does-not-failover-when-one-node-crashes\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Two node patroni does not failover when one node crashes\" \/>\n<meta property=\"og:description\" content=\"A few days ago I had the interesting mission to build a two Node Patroni Postgres 9.6 Cluster using etcd in our openDB Appliance environment. Sounds easy, one leader, one secondary, both running etcd, that&#8217;s it. But that&#8217;s only the first impression.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dbi-services.com\/blog\/two-node-patroni-does-not-failover-when-one-node-crashes\/\" \/>\n<meta property=\"og:site_name\" content=\"dbi Blog\" \/>\n<meta property=\"article:published_time\" content=\"2019-04-30T10:34:39+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-09-10T15:37:47+00:00\" \/>\n<meta name=\"author\" content=\"Open source Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Open source Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/two-node-patroni-does-not-failover-when-one-node-crashes\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/two-node-patroni-does-not-failover-when-one-node-crashes\\\/\"},\"author\":{\"name\":\"Open source Team\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/59554f0d99383431eb6ed427e338952b\"},\"headline\":\"Two node patroni does not failover when one node crashes\",\"datePublished\":\"2019-04-30T10:34:39+00:00\",\"dateModified\":\"2024-09-10T15:37:47+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/two-node-patroni-does-not-failover-when-one-node-crashes\\\/\"},\"wordCount\":536,\"commentCount\":0,\"keywords\":[\"etcd\",\"Patroni\",\"PostgreSQL 9.6\"],\"articleSection\":[\"Database Administration &amp; Monitoring\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/two-node-patroni-does-not-failover-when-one-node-crashes\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/two-node-patroni-does-not-failover-when-one-node-crashes\\\/\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/two-node-patroni-does-not-failover-when-one-node-crashes\\\/\",\"name\":\"Two node patroni does not failover when one node crashes - dbi Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#website\"},\"datePublished\":\"2019-04-30T10:34:39+00:00\",\"dateModified\":\"2024-09-10T15:37:47+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/59554f0d99383431eb6ed427e338952b\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/two-node-patroni-does-not-failover-when-one-node-crashes\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/two-node-patroni-does-not-failover-when-one-node-crashes\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/two-node-patroni-does-not-failover-when-one-node-crashes\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Two node patroni does not failover when one node crashes\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/\",\"name\":\"dbi Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/59554f0d99383431eb6ed427e338952b\",\"name\":\"Open source Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/eb4fb12e386e8c41fdef0733e8114594cf2653e4f55e9fa2161442b8eaf3f657?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/eb4fb12e386e8c41fdef0733e8114594cf2653e4f55e9fa2161442b8eaf3f657?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/eb4fb12e386e8c41fdef0733e8114594cf2653e4f55e9fa2161442b8eaf3f657?s=96&d=mm&r=g\",\"caption\":\"Open source Team\"},\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/author\\\/open-source-team\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Two node patroni does not failover when one node crashes - dbi Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dbi-services.com\/blog\/two-node-patroni-does-not-failover-when-one-node-crashes\/","og_locale":"en_US","og_type":"article","og_title":"Two node patroni does not failover when one node crashes","og_description":"A few days ago I had the interesting mission to build a two Node Patroni Postgres 9.6 Cluster using etcd in our openDB Appliance environment. Sounds easy, one leader, one secondary, both running etcd, that&#8217;s it. But that&#8217;s only the first impression.","og_url":"https:\/\/www.dbi-services.com\/blog\/two-node-patroni-does-not-failover-when-one-node-crashes\/","og_site_name":"dbi Blog","article_published_time":"2019-04-30T10:34:39+00:00","article_modified_time":"2024-09-10T15:37:47+00:00","author":"Open source Team","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Open source Team","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.dbi-services.com\/blog\/two-node-patroni-does-not-failover-when-one-node-crashes\/#article","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/two-node-patroni-does-not-failover-when-one-node-crashes\/"},"author":{"name":"Open source Team","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/59554f0d99383431eb6ed427e338952b"},"headline":"Two node patroni does not failover when one node crashes","datePublished":"2019-04-30T10:34:39+00:00","dateModified":"2024-09-10T15:37:47+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/two-node-patroni-does-not-failover-when-one-node-crashes\/"},"wordCount":536,"commentCount":0,"keywords":["etcd","Patroni","PostgreSQL 9.6"],"articleSection":["Database Administration &amp; Monitoring"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.dbi-services.com\/blog\/two-node-patroni-does-not-failover-when-one-node-crashes\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.dbi-services.com\/blog\/two-node-patroni-does-not-failover-when-one-node-crashes\/","url":"https:\/\/www.dbi-services.com\/blog\/two-node-patroni-does-not-failover-when-one-node-crashes\/","name":"Two node patroni does not failover when one node crashes - dbi Blog","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/#website"},"datePublished":"2019-04-30T10:34:39+00:00","dateModified":"2024-09-10T15:37:47+00:00","author":{"@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/59554f0d99383431eb6ed427e338952b"},"breadcrumb":{"@id":"https:\/\/www.dbi-services.com\/blog\/two-node-patroni-does-not-failover-when-one-node-crashes\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dbi-services.com\/blog\/two-node-patroni-does-not-failover-when-one-node-crashes\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.dbi-services.com\/blog\/two-node-patroni-does-not-failover-when-one-node-crashes\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.dbi-services.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Two node patroni does not failover when one node crashes"}]},{"@type":"WebSite","@id":"https:\/\/www.dbi-services.com\/blog\/#website","url":"https:\/\/www.dbi-services.com\/blog\/","name":"dbi Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/59554f0d99383431eb6ed427e338952b","name":"Open source Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/eb4fb12e386e8c41fdef0733e8114594cf2653e4f55e9fa2161442b8eaf3f657?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/eb4fb12e386e8c41fdef0733e8114594cf2653e4f55e9fa2161442b8eaf3f657?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/eb4fb12e386e8c41fdef0733e8114594cf2653e4f55e9fa2161442b8eaf3f657?s=96&d=mm&r=g","caption":"Open source Team"},"url":"https:\/\/www.dbi-services.com\/blog\/author\/open-source-team\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/12450","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/users\/28"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/comments?post=12450"}],"version-history":[{"count":1,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/12450\/revisions"}],"predecessor-version":[{"id":34705,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/12450\/revisions\/34705"}],"wp:attachment":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media?parent=12450"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/categories?post=12450"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/tags?post=12450"},{"taxonomy":"type","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/type_dbi?post=12450"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}