Even if Ansible is powerful and flexible, it can be considered “slow”. It will be anyway faster, and more consistent, than doing the same steps manually. Nevertheless, we will experiment to make it even faster. I found few of them on the Internet, but rarely with figures of what to expect.

In this blog post, I will cover one of them and run different scenarios. We will also dig inside some internal mechanism used by Ansible.

SSH Connections

Ansible is connection intensive as it opens, and closes, many ssh connections to the targeted hosts.

I found two possible ways to count the amount of connections from control to agents nodes:

  • Add -vvv option to the ansible-playbook command.
  • grep audit.log file:
tail -f /var/log/audit/audit | grep USER_LOGIN

First option is really too much verbose, but I used it with the first playbook below to confirm the second option give the same count.

Simple Playbook

To demonstrate that, let’s start with a very minimal playbook without fact gathering:

---
- name: Test playbook
  hosts: all
  gather_facts: false
  pre_tasks:
    - name: "ping"
      ansible.builtin.ping:
...

This playbook triggered 8 ssh connections to the target host. If I enable facts gathering, count goes to 14 connections. Again, this is quiet a lot knowing that playbook does not do much beside check target is alive.

To summarize:

gather_factsconnectionstiming (s)
false81,277
true141,991
ping playbook results

What are All These Connection For?

To determine what are these connections doing, we can analyze verbose (really verbose!!) output of Ansible playbook without fact gathering.

First Connection

First command of first connection is echo ~opc && sleep 0 which will return the home directory of ansible user.

Second

Second command is already scary:

( umask 77 && mkdir -p "` echo /home/opc/.ansible/tmp `"&& mkdir "` echo /home/opc/.ansible/tmp/ansible-tmp-1712570792.3350322-23674-205520350912482 `" && echo ansible-tmp-1712570792.3350322-23674-205520350912482="` echo /home/opc/.ansible/tmp/ansible-tmp-1712570792.3350322-23674-205520350912482 `" ) && sleep 0
  1. It set the umask for the commands to follow.
  2. Create tmp directory to store any python script on target
  3. In that directory, create a directory to store the script for this specific task
  4. Makes this ssh command return the temporary variable with full path to the task script directory
  5. sleep 0

This one is mainly to ensure directory structure exists on the target.

Third

I will not paste this one here as it is very long and we can easily guess what it does with log just before:

Attempting python interpreter discovery

Roughly, what it does, it tries many versions of python.

Fourth

Next, it will run a python script with discovered python version to determine Operating System type and release.

Fifth

Fifth connection is actually a sftp command to copy module content (AnsiballZ_ping.py). AnsiballZ is a framework to embed module into script itself. This allows to be run modules with a single Python copy.

Seventh

This one is simply ensuring execution permission is set on temporary directory (ie. ansible-tmp-1712570792.3350322-23674-205520350912482) as well the python script (ie. AnsiballZ_ping.py).

Eighth and Last Connection

Lastly, the execution of the ping module itself:

/usr/bin/python3.9 /home/opc/.ansible/tmp/ansible-tmp-1712570792.3350322-23674-205520350912482/AnsiballZ_ping.py && sleep 0

Optimization

To reduce the amount of connection, there is one possible option: Pipelining

To enable that, I simply need to add following line in ansible.cfg:

pipelining = true

Or set ANSIBLE_PIPELINING environment variable to true.

How does it improve our playbook execution time:

gather_factsconnectionstiming (s)
false3 (-62%)0,473 (-63%)
true4 (-71%)1,275 (-36%)
ping playbook results with pipelining

As we can see there is a significant reduction on the amount of ssh connections as well as a reduction of the playbook duration.

In this configuration, only 3 connections are made:

  • python interpreter discovery (connection #3)
  • OS type discovery (connection #4)
  • python module execution (connection #8). AnsiballZ data is piped to that process.

With the pipelining option, I also noticed that the Ansible temporary directory is not created.

Of course, we can’t expect such big speed-up on a real life playbook. So, we should do it now.

Deploy WebLogic Server Playbook

Let’s use the WebLogic YaK component to deploy a single WebLogic instance. It includes dbi service best practices, latest CPU patches and SSL configuration. The “normal” run takes 13 minutes 30 seconds when the pipelined run takes 12 minutes 13 seconds. This is 10% faster.

This is nice, but not as good as previous playbook. Why is that? Because most of the time is not spent in ssh connections, but with actual work (running WebLogic installer, starting services, patching with OPatch, etc).

What Next?

With such results, you might wonder why isn’t it enabled by default? As per documentation, there is a limitation:

This can conflict with privilege escalation (become). For example, when using sudo operations you must first disable ‘requiretty’ in the sudoers file for the target hosts, which is why this feature is disabled by default.

Ansible documentation

Until now, with all tests I have made, I never encountered that limitation. Did you?