Introduction to Ansible
In the world of Ansible, facts are the unsung heroes. These system variables, gathered during playbook execution, provide essential data used for decision-making, debugging, and generating reports. As versatile as any Ansible module, facts are integral to both ad-hoc commands and Playbooks. Let's explore how to harness their power.
Gathering Specific Facts
To fetch system facts you use the setup
module:
[user@ansible ~]$ ansible localhost -m setup
The list of facts can be lengthy, but fear not! You can filter them using filter. Suppose you want to extract only the IP address of a host, with filters you can extract just that:
[user@ansible ~]$ ansible localhost -m setup -a "filter=*ipv4"
Note: We can use *
(asterisk) in the filter. This command will output:
localhost | SUCCESS => {
"ansible_facts": {
"ansible_default_ipv4": {
"address": "192.168.150.2",
"alias": "wlp3s0",
"broadcast": "192.168.150.255",
"gateway": "192.168.150.1",
"interface": "wlp3s0",
"macaddress": "d0:ab:d5:58:8f:36",
"mtu": 1500,
"netmask": "255.255.255.0",
"network": "192.168.150.0",
"prefix": "24",
"type": "ether"
}
},
"changed": false
}
Another solution for extracting facts is using gather_subset, which by defaults it is "all" and it is also mostly used in playbooks.
[user@ansible ~]$ ansible localhost -m setup -a "gather_subset=network"
Why is it mostly used in Playbooks you ask? Because when you run a Playbook you usually look for, and use, those facts that are able to make decisions during the execution of the Playbook.
What decisions? Let's use this playbook:
- name: Facts example
hosts: localhost
gather_facts: True
gather_subset:
- network
- hardware
- min
- "!all"
tasks:
- name: Print a welcome message
debug:
msg: Hello {{ ansible_user_id }}. At {{ ansible_date_time.time }} of the {{ ansible_date_time.date }} you connected to this {{ ansible_system_vendor }} system which runs on {{ ansible_distribution }} {{ ansible_distribution_major_version }}. Have a good day!
This code prints:
[user@ansible ~]$ ansible-playbook facts-example.yml
PLAY [Facts example] ***********************************************************************************************************************
TASK [Gathering Facts] *********************************************************************************************************************
ok: [localhost]
TASK [Print a welcome message] *************************************************************************************************************
ok: [localhost] => {
"msg": "Hello juri. At 09:52:43 of the 2023-11-09 you connected to this LENOVO system which runs on Fedora 38. Have a good day!"
}
PLAY RECAP *********************************************************************************************************************************
localhost : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
With the ability to extract information about the systems, we can adapt our playbook to run specific tasks only when certain conditions are met. With the information above we can create a playbook that only runs on Fedora host group (but we really run it on localhost) and checks if the system is at its latest OS version.
For simplicity, we hardcoded the latest_fedora_version
variable which, as of November 2023, time of writing this blog post, Fedora just released the newer version and it gave me inspiration to write this simple playbook.
- name: Report if Fedora systems are at their latest version
hosts: localhost
gather_facts: True
gather_subset:
- network
- "!min"
- "!all"
vars:
latest_fedora_version: 39
tasks:
- name: Print system info
debug:
msg: "The system with hostname {{ ansible_hostname }} at IP {{ ansible_default_ipv4.address }} is NOT at its latest version"
when: ansible_distribution_major_version != latest_fedora_version
The output is of course:
[user@ansible ~]$ ansible-playbook fedora-version-check.yml
PLAY [Report if Fedora systems are at their latest version] ********************************************************************************
TASK [Gathering Facts] *********************************************************************************************************************
ok: [localhost]
TASK [Print system info] *******************************************************************************************************************
ok: [localhost] => {
"msg": "The system with hostname nuclear00 at IP 192.168.150.2 is NOT at its latest version"
}
PLAY RECAP *********************************************************************************************************************************
localhost : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Well, it looks like somebody will have to update his system!
Gather subset or filter?
filter
can be more precise, it tends to be slower because it has to filter the output (here the output is redacted) and it is used mostly in ad-hoc commands because of how minimal details you extract with it.
[user@ansible ~]$ time ansible localhost -m setup -a "filter=*time*"
localhost | SUCCESS => {
"ansible_facts": {
"ansible_date_time": {
"date": "2023-11-09",
...
"time": "09:58:47",
...
},
"changed": false
}
real 0m4.769s
user 0m1.806s
sys 0m0.621s
gather_subset
is fast in execution and multiple subsets can be extracted at the same moment which makes gathering specific facts faster than gathering all of them. You would want to use gather_subset
when there is more than one variable you need to obtain from the system.
[user@ansible ~]$ time ansible localhost -m setup -a "gather_subset=network"
localhost | SUCCESS => {
"ansible_facts": {
...
"gather_subset": [
"network"
],
"module_setup": true
},
"changed": false
}
real 0m1.123s
user 0m1.016s
sys 0m0.155s
And what about gather_subset: all
? At least from this example, similar execution time as filter
but way more data is extracted (which again, here it is redacted).
[user@ansible ~]$ time ansible localhost -m setup -a "gather_subset=all"
localhost | SUCCESS => {
"ansible_facts": {
...
"gather_subset": [
"all"
],
"module_setup": true
},
"changed": false
}
real 0m4.698s
user 0m1.812s
sys 0m0.521s
We can see that, in this occasion, gather_subset: network
takes roughly 1/4 of the time of the filter
to execute. This is a huge time saver when running a playbook that requires system facts on a pool of hundreds or thousands of nodes!
Gather_subset min or !min
With gather_subset
we can extract only facts that are interesting to us, but did you know what writing gather_subset: network
does not only extract network related facts?
There is one special fact that is always called when launching the fact gathering module: min
.
[user@ansible ~]$ ansible localhost -m setup -a 'gather_subset=!min,network' | wc -l
430
[user@ansible ~]$ ansible localhost -m setup -a 'gather_subset=min,network' | wc -l
589
[user@ansible ~]$ ansible localhost -m setup -a 'gather_subset=network' | wc -l
588
You can see that when not defining !min
, the command will insert the min
facts by default.
You can check out the gather_subset documentation and understand more about subset! Try them out in your command or playbook and fine tune your playbooks with just the right facts, for a rocketing fast execution!
Pro Tip: Use gather_subset or filters also when dealing with older systems. By selecting only necessary facts, you save time and resources, ensuring your playbook runs smoothly, even in a pool of hosts.
Custom facts
Custom facts is a useful feature of Ansible that allows adding facts to a system, both by writing them into a file on the system (local) or by creating them during a playbook execution.
Local facts
We understood that facts are information about the system which are extracted using the setup module. Creating local facts for the system may be useful when logically differentiate one host from another in those cases where the IP address, the hostname or the OS distribution is not the most important key.
A logical division could be the host belonging to a geographical area, or to a specific Datacenter, although the Ansible inventory helps us already in that regards, so, what's special about local facts?
Static facts
The first method is straight forward: You create the proper file on the host and the next time the gather_facts runs you will retrieve your custom facts into the ansible_local
variable.
Create /etc/ansible/facts.d/foobar.fact
, because unless you edit ansible.cfg
, this is the path where Ansible looks for custom facts, with this content:
[myfacts]
foo=bar
Let the fact gathering run and what you get is your facts into the ansible_local
variable.
[user@ansible ~]$ ansible localhost -m setup -a "filter=ansible_local"
localhost | SUCCESS => {
"ansible_facts": {
"ansible_local": {
"foobar": {
"myfacts": {
"foo": "bar"
}
}
}
},
"changed": false
}
Do not apply executable permission!
Scripted
The second method is less known, probably because it's not enough emphasized in the above documentation, although it is referenced multiple times.
Ansible facts allow us to create a custom program that is executed when the gather_facts
run. This can be Bash, Python or a compiled language, as long as its output is json
.
Create the script /etc/ansible/facts.d/myscript.fact
with this content:
#!/bin/bash
echo '[balloons]'
num=$(( 98 + 1 ))
echo "${num}:red"
And apply executable permissions: chmod +x /etc/ansible/facts.d/myscript.fact
Let the fact gathering run and what you get is your facts into the ansible_local variable.
[user@ansible ~]$ ansible localhost -m setup -a "filter=ansible_local"
localhost | SUCCESS => {
"ansible_facts": {
"ansible_local": {
"myscript": {
"balloons": {
"99": "red"
}
}
}
},
"changed": false
}
Exploring Ansible-cmdb
About facts gathering, it is also worth to mention ansible-cmdb.
It is possible to collect facts from a host and store them into a file.
Install ansible-cmdb
using python:
[user@ansible ~]$ pip install ansible-cmdb
Then, run the facts gatering modules with the option --tree
and the path to the output folder.
[user@ansible ~]$ ansible localhost -m setup --tree out/
Now run ansible-cmdb
against this folder to create a formatted html page. Open the page on a system with Graphical Interface and a browser, and it will pretty print the facts just gathered.
[user@ansible ~]$ ansible-cmdb out/ > overview.html
Why disabling facts might be necessary
Facts gathering is enabled by default, to disable it you can set the gather_facts
parameter in playbook to false, like below:
- name: Disable facts gather example
hosts: all
gather_facts: false
...
If your playbook, or role, does not require facts gathering, then disabling them is the best option for saving time.
Real-world use-cases
Simple use-case scenarios
A simple use case for ansible facts in a playbook shows up when we want to install Apache webserver on a system that can be RHEL or Debian based, since the package name differs in these two Linux distributions.
The installation of the package itself can use the module package
of Ansible (instead of yum
or apt
modules), but we require a way to differentiate between the two OS.
playbook.yaml
---
- name: Starting Apache installation
connection: local
gather_facts: true
hosts: localhost
tasks:
- name: Include OS specific variables
include_vars: vars/{{ ansible_os_family }}.yaml
- name: Install Apache
package:
name: "{{ apache_package }}"
state: present
- name: Start Apache and enable on boot
service:
name: "{{ apache_service }}"
state: started
enabled: true
vars/RedHat.yaml
---
apache_package: httpd
apache_service: httpd
vars/Debian.yaml
---
apache_package: apache2
apache_service: apache2
An ad-hoc command shows the value of ansible_os_family
when run on a Fedora system:
[user@ansible ~]$ ansible -i inventory localhost -m setup -a "filter=ansible_os_family"
localhost | SUCCESS => {
"ansible_facts": {
"ansible_os_family": "RedHat"
},
"changed": false
}
Advanced use-case examples
We can spice things up and use facts to apply dynamic changes to our systems. We can create a playbook that hadles updates and reports based on the facts that Ansible can gather.
- The playbook does not know anything about the hosts. The inventory is as simple as the hostname only
- Facts must be able to extract if the host is a webserver or a database by the packages they have installed:
- If a webserver, they must have either httpd or apache2
- If a database, they must have postgresql or mariadb
- According to the packages they have installed the playbook must:
- Update to the latest apache version RHEL systems only
- Print a list of hosts that run the DB mariadb
---
- name: Determine Server Roles
hosts: all
tasks:
- name: Check installed packages and set facts
package_facts:
manager: auto
register: package_info
- name: Determine Web Servers and Databases
set_fact:
is_webserver: "{{ ('httpd' in package_info.ansible_facts.packages) or ('apache2' in package_info.ansible_facts.packages) }}"
is_database: "{{ ('postgresql' in package_info.ansible_facts.packages) or ('mariadb' in package_info.ansible_facts.packages) }}"
when: package_info.ansible_facts.packages is defined
- name: Update Apache to the latest version on RHEL systems
package:
name: httpd
state: latest
when: is_webserver and ansible_os_family == "RedHat"
- name: Print DB Servers with Mariadb installed
debug:
var: inventory_hostname
when: is_database and ('mariadb' in package_info.ansible_facts.packages)
In this playbook:
Server Roles Determination: The playbook starts by gathering package facts for all hosts. Based on the installed packages, it sets two facts: is_webserver
and is_database
indicating whether the host is a web server or a database server, respectively.
- Updating Apache on RHEL Web Servers: For hosts identified as web servers with either httpd or apache2 installed, it updates Apache to the latest version, but only on RHEL systems (
ansible_os_family == "RedHat"
). - Printing Mariadb Database Servers: For hosts identified as database servers with mariadb installed, it prints the hostname. This is done only for hosts in the db_servers group.
This playbook dynamically determines server roles based on installed packages and performs specific tasks accordingly.
It is also worth mentioning that although hosts: all
may seem a good approach, it's generally better to let the inventory handle the hosts logical groups.
In this case we could have used an inventory similar to this.
[webserver]
webhost1
webhost2
[database]
dbhost1
dbhost2
And have one playbook that runs on webserver
and another playbook for database
. There are multiple ways for achieving the same result. What's important is being able to be as effective as possible and try to not create monsters! Remember that readability of a code is as important as the code itself.
Conclusions
So, why do facts matter so much in Ansible? Well, these facts are like the detective clues that Ansible uses to understand the story of your servers. When Ansible knows the facts, like what kind of system it's dealing with or what apps are installed, it can tailor its actions accordingly.
Imagine you're a chef. Knowing the ingredients and the type of kitchen you're working in helps you cook up the perfect dish. Similarly, Ansible relies on facts to serve up the right commands to each server.
Facts make Ansible smart and adaptable. They turn it from a one-size-fits-all tool into a customized wizard for each server. So, pay attention to the facts, and Ansible will work its magic, making your server management a breeze!