After the minimal setup of the DC/OS in my further articles, I wanted to extend my DC/OS and add a loadbalancer.
There are two options for loadbalancing in DC/OS
1. Marathon-LB (a layer 7 load balancer, used for external requests, based on HAProxy)
2. Named VIPs (a layer 4 load balancer used for internal TCP traffic)

In this article we will use Marathon-LB. In case you want to read more about the VIP solution, just visit the DC/OS Documentation.

I also want to configure keepalived, which will automatically generate a unicast based failover for high-availabilty.

Preparation

To use a loadbalancer, I had to extend the DC/OS I build before Deploy DC/OS using Ansible (Part 1). The new DC/OS has the following structure:
POWERPNT_2018-11-07_10-26-14

Implementation of marathon-lb

I used the DC/OS CLI, but you can also install the marathon-lb package using the catalog on the web interface.

[root@dcos-master ~]# dcos package install marathon-lb
By Deploying, you agree to the Terms and Conditions https://mesosphere.com/catalog-terms-conditions/#community-services
We recommend at least 2 CPUs and 1GiB of RAM for each Marathon-LB instance.

*NOTE*: For additional ```Enterprise Edition``` DC/OS instructions, see https://docs.mesosphere.com/administration/id-and-access-mgt/service-auth/mlb-auth/
Continue installing? [yes/no] yes
Installing Marathon app for package [marathon-lb] version [1.12.3]
Marathon-lb DC/OS Service has been successfully installed!
See https://github.com/mesosphere/marathon-lb for documentation.

Create a keepalived configuration

To implement keepalived on the both Public agents, create two JSON Files (you can find the GitHub Guidance here). One for the master and one for the backup.
Make the IPs fitting to your environment. Be sure you did not mix up the IPs for the master and the backup. You also have to adapt the KEEPALIVED_VIRTUAL_IPADDRESS_1

[root@dcos-master ~]# cd /etc/
[root@dcos-master etc]# mkdir keepalived
[root@dcos-master etc]# cat keepalived-master.json
{
  "id": "/keepalived-master",
  "acceptedResourceRoles": [
    "slave_public"
  ],
  "constraints": [
    [
      "hostname",
      "LIKE",
      "192.168.22.104"
    ]
  ],
  "container": {
    "type": "DOCKER",
    "volumes": [],
    "docker": {
      "image": "arcts/keepalived",
      "forcePullImage": false,
      "privileged": false,
      "parameters": [
        {
          "key": "cap-add",
          "value": "NET_ADMIN"
        }
      ]
    }
  },
  "cpus": 0.5,
  "disk": 0,
  "env": {
    "KEEPALIVED_AUTOCONF": "true",
    "KEEPALIVED_VIRTUAL_IPADDRESS_1": "192.168.22.150/24",
    "KEEPALIVED_STATE": "MASTER",
    "KEEPALIVED_UNICAST_PEER_0": "192.168.22.106",
    "KEEPALIVED_INTERFACE": "en0ps8",
    "KEEPALIVED_UNICAST_SRC_IP": "192.168.22.104"
  },
  "instances": 1,
  "maxLaunchDelaySeconds": 3600,
  "mem": 100,
  "gpus": 0,
  "networks": [
    {
      "mode": "host"
    }
  ],
  "portDefinitions": [],
  "requirePorts": true,
  "upgradeStrategy": {
    "maximumOverCapacity": 1,
    "minimumHealthCapacity": 1
  },
  "killSelection": "YOUNGEST_FIRST",
  "unreachableStrategy": {
    "inactiveAfterSeconds": 0,
    "expungeAfterSeconds": 0
  },
  "healthChecks": [],
  "fetch": []
}
[root@dcos-master keepalived]# cat keepalived-backup.json
{
  "id": "/keepalived-backup",
  "acceptedResourceRoles": [
    "slave_public"
  ],
  "constraints": [
    [
      "hostname",
      "LIKE",
      "192.168.22.106"
    ]
  ],
  "container": {
    "type": "DOCKER",
    "volumes": [],
    "docker": {
      "image": "arcts/keepalived",
      "forcePullImage": false,
      "privileged": false,
      "parameters": [
        {
          "key": "cap-add",
          "value": "NET_ADMIN"
        }
      ]
    }
  },
  "cpus": 0.5,
  "disk": 0,
  "env": {
    "KEEPALIVED_AUTOCONF": "true",
    "KEEPALIVED_VIRTUAL_IPADDRESS_1": "192.168.22.150/24",
    "KEEPALIVED_STATE": "BACKUP",
    "KEEPALIVED_UNICAST_PEER_0": "192.168.22.104",
    "KEEPALIVED_INTERFACE": "en0ps8",
    "KEEPALIVED_UNICAST_SRC_IP": "192.168.22.106"
  },
  "instances": 1,
  "maxLaunchDelaySeconds": 3600,
  "mem": 124,
  "gpus": 0,
  "networks": [
    {
      "mode": "host"
    }
  ],
  "portDefinitions": [],
  "requirePorts": true,
  "upgradeStrategy": {
    "maximumOverCapacity": 1,
    "minimumHealthCapacity": 1
  },
  "killSelection": "YOUNGEST_FIRST",
  "unreachableStrategy": {
    "inactiveAfterSeconds": 0,
    "expungeAfterSeconds": 0
  },
  "healthChecks": [],
  "fetch": []
}

Add the keepalived apps to DC/OS

[root@dcos-master keepalived]# dcos marathon app add keepalived-master.json
Created deployment b41b4526-86ae-4b70-a254-429c3c212ce3
[root@dcos-master keepalived]# dcos marathon app add keepalived-backup.json
Created deployment f854a621-f402-4e72-9f46-150b11c6a7c8

Everything works as expected?

You can easily check if everything works as expected by either using the CLI

[root@dcos-master keepalived]# dcos marathon app list
ID                  MEM  CPUS  TASKS  HEALTH  DEPLOYMENT  WAITING  CONTAINER  CMD
/keepalived-backup  124  0.5    1/1    N/A       ---      False      DOCKER   N/A
/keepalived-master  100  0.5    1/1    N/A       ---      False      DOCKER   N/A
/marathon-lb        800   1     2/2    2/2       ---      False      DOCKER   N/A

Or the web interface. In this case I prefer the web interface. I think it offers a great overview. You can check the health of the Services, the IP addresses….
firefox_2018-11-01_14-21-00

Select the keepalived-master to proof the state
firefox_2018-11-07_11-10-25

Select the log tab and make sure the master is in “MASTER STATE”
firefox_2018-11-07_11-11-18

Do the same for the keepalived-backup. The log should show the backup in “BACKUP STATE”
firefox_2018-11-07_11-14-28