Finally, after all the configuration stuff is done, we can run the playbooks
Create SSH Access
First the SSH Access on all nodes need to be created. Therefore the access-onprem.yml is used:
Be careful, I used CentOS on my system, so I commented the apt-get and the debian-based part out.
If you want to run the playbook on another operating system, adjust it carefully.
--- # This playbook enable access to all ansible targets via ssh - name: setup the ansible requirements on all nodes hosts: all:!localhost #hosts: all serial: 20 remote_user: "{{ initial_remote_user | default('root') }}" become: true tasks: # - name: attempt to update apt's cache # raw: test -e /usr/bin/apt-get && apt-get update # ignore_errors: yes # - name: attempt to install Python on Debian-based systems # raw: test -e /usr/bin/apt-get && apt-get -y install python-simplejson python # ignore_errors: yes - name: attempt to install Python on CentOS-based systems raw: test -e /usr/bin/yum && yum -y install python-simplejson python ignore_errors: yes - name: Create admin user group group: name: admin system: yes state: present - name: Ensure sudo is installed package: name: sudo state: present - name: Remove user centos user: name: centos state: absent remove: yes - name: Create Ansible user user: name: "{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}" shell: /bin/bash comment: "Ansible management user" home: "/home/{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}" createhome: yes password: "admin123" - name: Add Ansible user to admin group user: name: "{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}" groups: admin append: yes - name: Add authorized key authorized_key: user: "{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}" state: present key: "{{ lookup('file', lookup('env','HOME') + '/.ssh/ansible-dcos.pub') }}" - name: Copy sudoers file command: cp -f /etc/sudoers /etc/sudoers.tmp - name: Backup sudoers file command: cp -f /etc/sudoers /etc/sudoers.bak - name: Ensure admin group can sudo lineinfile: dest: /etc/sudoers.tmp state: present regexp: '^%admin' line: '%admin ALL=(ALL) NOPASSWD: ALL' when: ansible_os_family == 'Debian' - name: Ensure admin group can sudo lineinfile: dest: /etc/sudoers.tmp state: present regexp: '^%admin' insertafter: '^root' line: '%admin ALL=(ALL) NOPASSWD: ALL' when: ansible_os_family == 'RedHat' - name: Replace sudoers file shell: visudo -q -c -f /etc/sudoers.tmp && cp -f /etc/sudoers.tmp /etc/sudoers - name: Test Ansible user's access local_action: "shell ssh {{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}@{{ ansible_host }} 'sudo echo success'" become: False register: ansible_success - name: Remove Ansible SSH key from bootstrap user's authorized keys lineinfile: path: "{{ ansible_env.HOME }}/.ssh/authorized_keys" state: absent regexp: '^ssh-rsa AAAAB3N' when: ansible_success.stdout == "success"
Start the Playbook for the SSH access
[root@dcos-ansible ansible-dcos]# pwd /root/ansible-dcos [root@dcos-ansible ansible-dcos]# ansible-playbook plays/access-onprem.yml PLAY [setup the ansible requirements on all nodes] **************************************************************************************** TASK [Gathering Facts] **************************************************************************************** ok: [192.168.22.103] ok: [192.168.22.102] ok: [192.168.22.104] ok: [192.168.22.101] ok: [192.168.22.100] [....] PLAY RECAP ************************************************************************************** 192.168.22.100 : ok=14 changed=6 unreachable=0 failed=0 192.168.22.101 : ok=14 changed=6 unreachable=0 failed=0 192.168.22.102 : ok=14 changed=6 unreachable=0 failed=0 192.168.22.103 : ok=14 changed=6 unreachable=0 failed=0 192.168.22.104 : ok=14 changed=6 unreachable=0 failed=0
This is not the whole output of the playbook. Important to know, during the “TASK [Test Ansible user’s access]” I had to insert the Ansible password 5 times. After that the playbooks finished successfully.
Ping the servers using Ansible
After the playbook finished successfully do a test ping
[root@dcos-ansible ansible-dcos]# ansible all -m ping 192.168.22.102 | SUCCESS => { "changed": false, "ping": "pong" } 192.168.22.100 | SUCCESS => { "changed": false, "ping": "pong" } 192.168.22.104 | SUCCESS => { "changed": false, "ping": "pong" } 192.168.22.101 | SUCCESS => { "changed": false, "ping": "pong" } 192.168.22.103 | SUCCESS => { "changed": false, "ping": "pong" }
In case of trouble it is really helpful to use the “-vvv” option.
It is also possible to ping only one server using
ansible 192.168.22.100 -m ping
Rollout the DC/OS installation
[root@dcos-ansible ansible-dcos]# pwd /root/ansible-dcos [root@dcos-ansible ansible-dcos]# cat plays/install.yml --- - name: setup the system requirements on all nodes hosts: all serial: 20 become: true roles: - common - docker - name: generate the DC/OS configuration hosts: bootstraps serial: 1 become: true roles: - bootstrap - name: deploy nodes hosts: [ masters, agents, agent_publics] serial: 20 become: true roles: - node-install
[root@dcos-ansible ansible-dcos]# pwd /root/ansible-dcos [root@dcos-ansible ansible-dcos]# ansible-playbook plays/install.yml PLAY [setup the system requirements on all nodes] ********************************************************************* TASK [Gathering Facts] ********************************************************************* ok: [192.168.22.102] ok: [192.168.22.104] ok: [192.168.22.101] ok: [192.168.22.100] [....]
In case some installation steps fail, Ansible will skip for that server and gives you the opportunity to rerun the playbook on the failed server.
ansible-playbook plays/install.yml --limit @/root/ansible-dcos/plays/install.retry
If you cannot connect to your master via browser: Check your /var/log/messages for error messages. In my case the master searched for the eth0 interface. Which isn’t available on my VM.
Just change the detect-ip script as follows, according to your network interface. Same step is needed on all agent-nodes as well.
[root@dcos-master bin]# cat /opt/mesosphere/bin/detect_ip #!/usr/bin/env bash set -o nounset -o errexit export PATH=/usr/sbin:/usr/bin:$PATH echo $(ip addr show enp0s8 | grep -Eo '[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}' | head -1)
Install the CLI
For those of you, which prefer a CLI, just install it on your master.
[root@dcos-master ~]# [ -d /usr/local/bin ] || sudo mkdir -p /usr/local/bin [root@dcos-master ~]# curl https://downloads.dcos.io/binaries/cli/linux/x86-64/dcos-1.11/dcos -o dcos % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 13.9M 100 13.9M 0 0 1313k 0 0:00:10 0:00:10 --:--:-- 3920k [root@dcos-master ~]# sudo mv dcos /usr/local/bin [root@dcos-master ~]# chmod +x /usr/local/bin/dcos [root@dcos-master ~]# dcos cluster setup http://192.168.22.101 If your browser didn't open, please go to the following link: http://192.168.22.101/login?redirect_uri=urn:ietf:wg:oauth:2.0:oob Enter OpenID Connect ID Token: eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6Ik9UQkVOakZFTWtWQ09VRTRPRVpGTlRNMFJrWXlRa015Tnprd1JrSkVRemRCTWpBM1FqYzVOZyJ9.eyJlbWFpbCI6Imp1bGlhLmd1Z2VsQGdtYWlsLmNvbSIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJpc3MiOiJodHRwczovL2Rjb3MuYXV0aDAuY29tLyIsInN1YiI6Imdvb2dsZS1vYXV0aDJ8MTA2NTU2OTI5OTM1NTc2MzQ1OTEyIiwiYXVkIjoiM3lGNVRPU3pkbEk0NVExeHNweHplb0dCZTlmTnhtOW0iLCJpYXQiOjE1NDA0NTA4MTcsImV4cCI6MTU0MDg4MjgxN30.M8d6dT4QNsBmUXbAH8B58K6Q2XvnCKnEd_yziiijBXHdW18P2OnJEYrKa9ewvOfFhyisvLa7XMU3xeBUhoqX5T6mGkQo_XUlxXM82Ohv3zNCdqyNCwPwoniX4vU7R736blcLRx1aB8TJnydNb0H0IzEAVzaYBQ1CRV-4a9KsiMXKBBPlskOSvek4b_FRghA6hsjMA2eO-G5r3B6UgHo6CCwdwVrhsOygvJ5NwDC0xiFrnkW-SjZRZztCN8cRj7b40VH43uY6R2ibxJfE7SaGpbWzLyp7juUJ766WXar3O7ww42bYIqLnAx6YmWG5kFeJnmJGT-Rdmhl2JuvdABoozA
That’s it, now you can configure and use your DC/OS. Always keep in mind: the ntpd service is really essential for a working DC/OS Node. Also use the /var/log/messages, it really helps!
One little thing I have to mention at the end. Don’t confide in the official documentation and the troubleshooting guide, it does not help as much as expected…