Introduction

You can run a playbook for specific host(s), a group of hosts, or “all” (all hosts of the inventory).

Ansible will then run the tasks in parallel on the specified hosts. To avoid an overload, the parallelism – called “forks” – is limited to 5 per default.

A task with a loop (e.g. with_items:) will be executed serially per default. To run it in parallel, you can use the “async” mode.

But unfortunately, this async mode will not work to include roles or other playbooks in the loop. In this blog post we will see a workaround to run roles in parallel (on the same host).

Parallelization over the ansible hosts

In this example, we have 3 hosts (dbhost1, dbhost2, dbhost3) in the dbservers group
(use ansible-inventory --graph to see all your groups) and we run the following sleep1.yml playbook

- name: PLAY1
  hosts: [dbservers]
  gather_facts: no
  tasks:
    - ansible.builtin.wait_for: timeout=10

The tasks of the playbook will run in parallel on all hosts of the dbservers group, but not more at the same time as specified with the “forks” parameter. (specified in ansible.cfg, shell-variable ANSIBLE_FORKS, commandline parameter –forks)
https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_strategies.html

$ time ansible-playbook sleep1.yml --forks 2
...
ok: [dbhost1]  #appears after 10sec
ok: [dbhost2]  #appears after 10sec
ok: [dbhost3]  #appears after 20sec
...
real    0m22.384s

With forks=2 the results of dbhost1 and dbhost2 will both be returned after 10 seconds (sleep 10 in parallel). dbhost3 has to wait until one of the running tasks is completed. So the playbook will complete after approx. 20 seconds. If forks is 1, then it takes 30s, if forks is 3, it takes 10s (plus overhead).

Parallelization of loops

Per default, a loop is not run in parallel

- name: PLAY2A
  hosts: localhost
  tasks:
    - set_fact:
        sleepsec: [ 1, 2, 3, 4, 5, 6, 7 ]

    - name: nonparallel loop
      ansible.builtin.wait_for: "timeout={{item}} "
      with_items: "{{sleepsec}}"
      register: loop_result

This sequential run will take at least 28 seconds.

To run the same loop in parallel, use “async”
https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_async.html

- name: PLAY2B
  hosts: localhost
  gather_facts: no
  tasks:
    - name: parallel loop
      ansible.builtin.wait_for: "timeout={{item}}"
      with_items: "{{sleepsec}}"
      register: loop_result
      async: 600  # Maximum runtime in seconds. Adjust as needed.
      poll: 0     # Fire and continue (never poll here)

    # in the meantime, you can run other tasks

    - name: Wait for parallel loop to finish (poll)
      async_status:
        jid: "{{ item.ansible_job_id }}"
      register: loop_check
      until: loop_check.finished
      delay: 1      # Check every 1 seconds
      retries: 600  # Retry up to 600 times. 
                    # delay*retries should be "async:" above
      with_items: "{{ loop_result.results }}"

In the first task we start all sleeps in parallel. It will timeout after 600 seconds. We will not wait for the result (poll: 0). A later task polls the background processes until all parallel loops are finished. This execution only takes a little bit more than 7 seconds (the longest sleep plus some overhead). Between the loop and the poll you can add other tasks to use the waiting time for something more productive. Or if you know your loop takes at least 1 minute, then you can add to reduce the overhead of the polling loop, an ansible.builtin.wait_for: "timeout=60".

For example, we have an existing role to create and configure a new useraccount with many, sometimes longer running steps, e.g. add to LDAP, create NFS share, create a certificate, send a welcome-mail, ….; most of these tasks are not bound to a specific host, and will run on “localhost” calling a REST-API.

The following code example is a dummy role for copy/paste to see how it works with parallel execution.

# roles/create_user/tasks/main.yml    
    - debug: var=user
    - ansible.builtin.wait_for: timeout=10

Now we have to create many useraccounts and would like to do that in parallel. We use the code above and adapt it:

- name: PLAY3A
  hosts: localhost
  gather_facts: no
  tasks:
    - set_fact:
        users: [ 'Dave', 'Eva', 'Hans' ]

    - name: parallel user creation
      ansible.builtin.include_role: name=create_user
      with_items: "{{users}}"
      loop_control:
        loop_var: user
      register: loop_result
      async: 600
      poll: 0

But unfortunately, Ansible will not accept include_role:
ERROR! 'poll' is not a valid attribute for a IncludeRole

The only solution is to rewrite the role and to run every task with the async mode.

But is there no better solution to re-use existing roles? Let’s see…

Parallel execution of roles in a loop

As we already know

  • Ansible can run playbooks/tasks in parallel over different hosts (hosts parameter of the play).
  • Ansible can run tasks with a loop in parallel with the async option, but
  • Ansible can NOT run tasks with a loop in parallel for include_role or include_tasks

So, the trick will be to run the roles on “different” hosts. There is a special behavior of localhost. Well-known is the localhost IP 127.0.0.1; But also 127.0.0.2 to 127.255.255.254 refer to localhost (check it with ‘ping’). For our create-user script: we will run it on “different” localhosts in parallel. For that, we create a host-group at runtime with localhost addresses. The number of these localhost IP’s is equal to the number of users to create.

users[0] is Dave. It will be created on 127.0.0.1
users[1] is Eva. It will be created on 127.0.0.2
users[2] is Hans. It will be created on 127.0.0.3

- name: create dynamic localhosts group
  hosts: localhost
  gather_facts: no
  vars:
    users: [ 'Dave', 'Eva', 'Hans' ]
  tasks:
    # Create a group of localhost IP's; 
    # Ansible will treat it as "different" hosts.
    # To know, which locahost-IP should create which user:
    # The last 2 numbers of the IP matches the element of the {{users}} list:
    # 127.0.1.12 -> (1*256 + 12)-1 = 267 -> users[267]
    # -1: first Array-Element is 0, but localhost-IP starts at 127.0.0.1
    - name: create parallel execution localhosts group
      add_host:
        name: "127.0.{{item|int // 256}}.{{ item|int % 256 }}"
        group: localhosts
      with_sequence:  start=1  end="{{users|length}}" 

- name: create useraccounts
  hosts: [localhosts]  # [ 127.0.0.1, 127.0.0.2, ... ]
  connection: local
  gather_facts: no
  vars:
    users: [ 'Dave', 'Eva', 'Hans' ]
  # this play runs in parallel over the [localhosts] 
  tasks:
    - set_fact:
        ip_nr: "{{ inventory_hostname.split('.') }}"

    - name: parallel user creation
      ansible.builtin.include_role:
        name: create_user
      vars:
        user: "{{ users[ (ip_nr[2]|int*256 + ip_nr[3]|int-1) ] }}"

In this example: With forks=3 it runs in 11 seconds. With forks=1 (no parallelism) it takes 32 seconds.

The degree of parallelism (forks) depends on your use-case and your infrastructure. If you have to restore files, probably the network-bandwith, disk-I/O or the number of tape-slots is limited. Choose a value of forks that does not overload your infrastructure.

If some tasks or the whole role has to be run on another host than localhost (e.g. create a local useraccount on a server), then you can use delegate_to: "{{remote_host}}".

This principle can ideally be used for plays that are not bound to a specific host, usually for tasks that will run from localhost and calling a REST-API without logging in with ssh to a server.

Summary

Ansible is optimized to run playbooks on different hosts in parallel. The degree of parallelism can be limited by the “forks” parameter (default 5).

Ansible can run loops in parallel with the async mode. Unfortunately that does not work if we include a role or tasks.

The workaround to run roles in parallel on the same host is to assign every loop item to a different host, and then to run the role on different hosts. For the different hosts we can use the localhost IP’s between 127.0.0.1 and 127.255.255.254 to build a dynamic host-group; the number corresponds to number of loop items