Ansible, a popular open-source automation tool, can be vital to managing complex IT environments. However, as your infrastructure grows, playbook executions can become slower. To address these issues and improve your playbook’s performance, here are some strategies and additional tips inspired by practices from experts and real-world scenarios.
1. Limit Resource Hogging with Forks
Ansible uses parallelism to execute tasks across different hosts, and this is controlled by the forks configuration. While increasing the number of forks allows more hosts to be managed simultaneously, it can also consume more resources. It’s essential to find a balance that fits your environment. To adjust it, modify the forks setting in your ansible.cfg:
[defaults]forks = 10A practical use case for adjusting forks is when you’re deploying applications across many small environments, such as during a microservices rollout. Increasing forks might speed up the overall deployment time.
2. Reduce Task Load with Polling Intervals
By default, Ansible waits for each task to complete on all nodes before moving on to the next one. In scenarios where tasks involve waiting (like waiting for a service to start), consider using asynchronous execution with specified polling intervals:
- name: Start the long-running process command: /usr/bin/long_running_process --option async: 3600 poll: 0
- name: Check back later async_status: jid: "{{ ansible_job_id }}" register: job_result until: job_result.finished retries: 30 delay: 60This setup starts a task and then periodically checks if it’s completed, freeing up resources to execute other tasks in the meantime.
3. Optimize Fact Gathering
Fact gathering can generate a significant delay, especially when dealing with many hosts. If your plays do not require facts, you can disable the gathering using gather_facts: no, or selectively gather only necessary facts to reduce overhead:
- hosts: all gather_facts: yes tasks: - name: Gather only specific facts setup: gather_subset: - network4. Use Static Imports Instead of Dynamic Includes
Dynamic includes (include_tasks or import_role) can add overhead since they are evaluated during runtime. If your play’s structure is well-defined and doesn’t require conditional execution of tasks, switching to static imports (import_tasks, import_role) might provide a performance boost:
- name: Use static imports import_tasks: setup.yml5. Implement Caching
Ansible supports caching facts between playbook runs to avoid unnecessary gathering of data. Set up a caching method in ansible.cfg and choose from memory, JSON files, or a Redis database:
[defaults]gathering = smartfact_caching = jsonfilefact_caching_connection = /tmp/ansible-cachefact_caching_timeout = 36006. Profile Your Playbooks
Identify bottlenecks by enabling callback plugins to profile tasks:
Note:
callbacks_enabledis the correct setting as of Ansible 2.18+. The oldcallback_whitelistwas renamed in 2.11 and removed in 2.15.
[defaults]callbacks_enabled = profile_tasksThis configuration shows the execution time for each task, helping you to pinpoint where optimizations are most needed.
7. Simplify Expressions and Loops
Complex expressions and extensive loops within tasks can degrade performance. Evaluate expressions ahead of play execution or simplify them, and break down large loops into smaller, more manageable tasks.
8. Leverage Native Modules
Whenever possible, use Ansible’s native modules rather than shell commands. Native modules are optimized in Python and run more efficiently than wrapping shell scripts or commands.
Conclusion
Optimizing Ansible playbooks is crucial as your IT infrastructure scales. By implementing these strategies — adjusting parallel execution, optimizing fact gathering, choosing static over dynamic imports, and more — you can significantly enhance playbook execution times, streamline management tasks, and maintain a more efficient automation environment.
Remember, the key is to test changes in a controlled environment before deploying them widely, ensuring that each adjustment contributes positively to the overall performance without compromising the reliability or functionality of your orchestrated workflows.
The Gotcha Nobody Warns You About: SSH Pipelining and sudo
SSH pipelining is one of the biggest single wins you can get — it cuts the number of SSH connections Ansible makes per task from three down to one. You enable it with two lines in ansible.cfg:
[connection]pipelining = TrueEnable it, run your playbook, and watch tasks fly. Then you hit a host that needs sudo and everything explodes with a cryptic sudo: sorry, you must have a tty to run sudo error. What happened?
RHEL, CentOS, and a bunch of enterprise distros ship with requiretty enabled in /etc/sudoers. That directive forces sudo to only run in a real terminal — which pipelining specifically avoids. The fix is to disable requiretty for your automation user:
# On the target host, or push this via Ansible itselfsudo visudo -f /etc/sudoers.d/ansibleansible_user ALL=(ALL) NOPASSWD: ALLDefaults:ansible_user !requirettyReplace ansible_user with whatever account Ansible connects as. The !requiretty line is the magic bit — it carves out an exception for that user without touching the global setting.
If you can’t touch sudoers on the remote hosts (locked-down environment, someone else’s servers), you can disable pipelining selectively per play instead:
- hosts: locked_down_servers vars: ansible_pipelining: false tasks: - name: This play runs without pipelining command: uptimeNot ideal, but it beats the whole playbook failing. The broader lesson: always test your ansible.cfg tuning against your actual sudo setup before rolling it out wide. Your staging environment that uses passwordless sudo will happily pass every test while prod silently disagrees.