Finally, after all the configuration stuff is done, we can run the playbooks
Create SSH Access
First the SSH Access on all nodes need to be created. Therefore the access-onprem.yml is used:
Be careful, I used CentOS on my system, so I commented the apt-get and the debian-based part out.
If you want to run the playbook on another operating system, adjust it carefully.
---
# This playbook enable access to all ansible targets via ssh
- name: setup the ansible requirements on all nodes
hosts: all:!localhost
#hosts: all
serial: 20
remote_user: "{{ initial_remote_user | default('root') }}"
become: true
tasks:
# - name: attempt to update apt's cache
# raw: test -e /usr/bin/apt-get && apt-get update
# ignore_errors: yes
# - name: attempt to install Python on Debian-based systems
# raw: test -e /usr/bin/apt-get && apt-get -y install python-simplejson python
# ignore_errors: yes
- name: attempt to install Python on CentOS-based systems
raw: test -e /usr/bin/yum && yum -y install python-simplejson python
ignore_errors: yes
- name: Create admin user group
group:
name: admin
system: yes
state: present
- name: Ensure sudo is installed
package:
name: sudo
state: present
- name: Remove user centos
user:
name: centos
state: absent
remove: yes
- name: Create Ansible user
user:
name: "{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}"
shell: /bin/bash
comment: "Ansible management user"
home: "/home/{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}"
createhome: yes
password: "admin123"
- name: Add Ansible user to admin group
user:
name: "{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}"
groups: admin
append: yes
- name: Add authorized key
authorized_key:
user: "{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}"
state: present
key: "{{ lookup('file', lookup('env','HOME') + '/.ssh/ansible-dcos.pub') }}"
- name: Copy sudoers file
command: cp -f /etc/sudoers /etc/sudoers.tmp
- name: Backup sudoers file
command: cp -f /etc/sudoers /etc/sudoers.bak
- name: Ensure admin group can sudo
lineinfile:
dest: /etc/sudoers.tmp
state: present
regexp: '^%admin'
line: '%admin ALL=(ALL) NOPASSWD: ALL'
when: ansible_os_family == 'Debian'
- name: Ensure admin group can sudo
lineinfile:
dest: /etc/sudoers.tmp
state: present
regexp: '^%admin'
insertafter: '^root'
line: '%admin ALL=(ALL) NOPASSWD: ALL'
when: ansible_os_family == 'RedHat'
- name: Replace sudoers file
shell: visudo -q -c -f /etc/sudoers.tmp && cp -f /etc/sudoers.tmp /etc/sudoers
- name: Test Ansible user's access
local_action: "shell ssh {{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}@{{ ansible_host }} 'sudo echo success'"
become: False
register: ansible_success
- name: Remove Ansible SSH key from bootstrap user's authorized keys
lineinfile:
path: "{{ ansible_env.HOME }}/.ssh/authorized_keys"
state: absent
regexp: '^ssh-rsa AAAAB3N'
when: ansible_success.stdout == "success"
Start the Playbook for the SSH access
[root@dcos-ansible ansible-dcos]# pwd /root/ansible-dcos [root@dcos-ansible ansible-dcos]# ansible-playbook plays/access-onprem.yml PLAY [setup the ansible requirements on all nodes] **************************************************************************************** TASK [Gathering Facts] **************************************************************************************** ok: [192.168.22.103] ok: [192.168.22.102] ok: [192.168.22.104] ok: [192.168.22.101] ok: [192.168.22.100] [....] PLAY RECAP ************************************************************************************** 192.168.22.100 : ok=14 changed=6 unreachable=0 failed=0 192.168.22.101 : ok=14 changed=6 unreachable=0 failed=0 192.168.22.102 : ok=14 changed=6 unreachable=0 failed=0 192.168.22.103 : ok=14 changed=6 unreachable=0 failed=0 192.168.22.104 : ok=14 changed=6 unreachable=0 failed=0
This is not the whole output of the playbook. Important to know, during the “TASK [Test Ansible user’s access]” I had to insert the Ansible password 5 times. After that the playbooks finished successfully.
Ping the servers using Ansible
After the playbook finished successfully do a test ping
[root@dcos-ansible ansible-dcos]# ansible all -m ping
192.168.22.102 | SUCCESS => {
"changed": false,
"ping": "pong"
}
192.168.22.100 | SUCCESS => {
"changed": false,
"ping": "pong"
}
192.168.22.104 | SUCCESS => {
"changed": false,
"ping": "pong"
}
192.168.22.101 | SUCCESS => {
"changed": false,
"ping": "pong"
}
192.168.22.103 | SUCCESS => {
"changed": false,
"ping": "pong"
}
In case of trouble it is really helpful to use the “-vvv” option.
It is also possible to ping only one server using
ansible 192.168.22.100 -m ping
Rollout the DC/OS installation
[root@dcos-ansible ansible-dcos]# pwd
/root/ansible-dcos
[root@dcos-ansible ansible-dcos]# cat plays/install.yml
---
- name: setup the system requirements on all nodes
hosts: all
serial: 20
become: true
roles:
- common
- docker
- name: generate the DC/OS configuration
hosts: bootstraps
serial: 1
become: true
roles:
- bootstrap
- name: deploy nodes
hosts: [ masters, agents, agent_publics]
serial: 20
become: true
roles:
- node-install
[root@dcos-ansible ansible-dcos]# pwd /root/ansible-dcos [root@dcos-ansible ansible-dcos]# ansible-playbook plays/install.yml PLAY [setup the system requirements on all nodes] ********************************************************************* TASK [Gathering Facts] ********************************************************************* ok: [192.168.22.102] ok: [192.168.22.104] ok: [192.168.22.101] ok: [192.168.22.100] [....]
In case some installation steps fail, Ansible will skip for that server and gives you the opportunity to rerun the playbook on the failed server.
ansible-playbook plays/install.yml --limit @/root/ansible-dcos/plays/install.retry
If you cannot connect to your master via browser: Check your /var/log/messages for error messages. In my case the master searched for the eth0 interface. Which isn’t available on my VM.
Just change the detect-ip script as follows, according to your network interface. Same step is needed on all agent-nodes as well.
[root@dcos-master bin]# cat /opt/mesosphere/bin/detect_ip
#!/usr/bin/env bash
set -o nounset -o errexit
export PATH=/usr/sbin:/usr/bin:$PATH
echo $(ip addr show enp0s8 | grep -Eo '[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}' | head -1)
Install the CLI
For those of you, which prefer a CLI, just install it on your master.
[root@dcos-master ~]# [ -d /usr/local/bin ] || sudo mkdir -p /usr/local/bin
[root@dcos-master ~]# curl https://downloads.dcos.io/binaries/cli/linux/x86-64/dcos-1.11/dcos -o dcos
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 13.9M 100 13.9M 0 0 1313k 0 0:00:10 0:00:10 --:--:-- 3920k
[root@dcos-master ~]# sudo mv dcos /usr/local/bin
[root@dcos-master ~]# chmod +x /usr/local/bin/dcos
[root@dcos-master ~]# dcos cluster setup http://192.168.22.101
If your browser didn't open, please go to the following link:
http://192.168.22.101/login?redirect_uri=urn:ietf:wg:oauth:2.0:oob
Enter OpenID Connect ID Token: eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6Ik9UQkVOakZFTWtWQ09VRTRPRVpGTlRNMFJrWXlRa015Tnprd1JrSkVRemRCTWpBM1FqYzVOZyJ9.eyJlbWFpbCI6Imp1bGlhLmd1Z2VsQGdtYWlsLmNvbSIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJpc3MiOiJodHRwczovL2Rjb3MuYXV0aDAuY29tLyIsInN1YiI6Imdvb2dsZS1vYXV0aDJ8MTA2NTU2OTI5OTM1NTc2MzQ1OTEyIiwiYXVkIjoiM3lGNVRPU3pkbEk0NVExeHNweHplb0dCZTlmTnhtOW0iLCJpYXQiOjE1NDA0NTA4MTcsImV4cCI6MTU0MDg4MjgxN30.M8d6dT4QNsBmUXbAH8B58K6Q2XvnCKnEd_yziiijBXHdW18P2OnJEYrKa9ewvOfFhyisvLa7XMU3xeBUhoqX5T6mGkQo_XUlxXM82Ohv3zNCdqyNCwPwoniX4vU7R736blcLRx1aB8TJnydNb0H0IzEAVzaYBQ1CRV-4a9KsiMXKBBPlskOSvek4b_FRghA6hsjMA2eO-G5r3B6UgHo6CCwdwVrhsOygvJ5NwDC0xiFrnkW-SjZRZztCN8cRj7b40VH43uY6R2ibxJfE7SaGpbWzLyp7juUJ766WXar3O7ww42bYIqLnAx6YmWG5kFeJnmJGT-Rdmhl2JuvdABoozA
That’s it, now you can configure and use your DC/OS. Always keep in mind: the ntpd service is really essential for a working DC/OS Node. Also use the /var/log/messages, it really helps!
One little thing I have to mention at the end. Don’t confide in the official documentation and the troubleshooting guide, it does not help as much as expected…