Introduction to Ansible

In the world of Ansible, facts are the unsung heroes. These system variables, gathered during playbook execution, provide essential data used for decision-making, debugging, and generating reports. As versatile as any Ansible module, facts are integral to both ad-hoc commands and Playbooks. Let's explore how to harness their power.

Gathering Specific Facts

To fetch system facts you use the setup module:

[user@ansible ~]$ ansible localhost -m setup

The list of facts can be lengthy, but fear not! You can filter them using filter. Suppose you want to extract only the IP address of a host, with filters you can extract just that:

[user@ansible ~]$ ansible localhost -m setup -a "filter=*ipv4"

Note: We can use * (asterisk) in the filter. This command will output:

localhost | SUCCESS => {
    "ansible_facts": {
        "ansible_default_ipv4": {
            "address": "192.168.150.2",
            "alias": "wlp3s0",
            "broadcast": "192.168.150.255",
            "gateway": "192.168.150.1",
            "interface": "wlp3s0",
            "macaddress": "d0:ab:d5:58:8f:36",
            "mtu": 1500,
            "netmask": "255.255.255.0",
            "network": "192.168.150.0",
            "prefix": "24",
            "type": "ether"
        }
    },
    "changed": false
}

Another solution for extracting facts is using gather_subset, which by defaults it is "all" and it is also mostly used in playbooks.

[user@ansible ~]$ ansible localhost -m setup -a "gather_subset=network"

Why is it mostly used in Playbooks you ask? Because when you run a Playbook you usually look for, and use, those facts that are able to make decisions during the execution of the Playbook.

What decisions? Let's use this playbook:

- name: Facts example
  hosts: localhost
  gather_facts: True
  gather_subset:
  - network
  - hardware
  - min
  - "!all"
  tasks:
  - name: Print a welcome message
    debug:
      msg: Hello {{ ansible_user_id }}. At {{ ansible_date_time.time }} of the {{ ansible_date_time.date }} you connected to this {{ ansible_system_vendor }} system which runs on {{ ansible_distribution }} {{ ansible_distribution_major_version }}. Have a good day!

This code prints:

[user@ansible ~]$ ansible-playbook facts-example.yml 
PLAY [Facts example] ***********************************************************************************************************************

TASK [Gathering Facts] *********************************************************************************************************************
ok: [localhost]

TASK [Print a welcome message] *************************************************************************************************************
ok: [localhost] => {
    "msg": "Hello juri. At 09:52:43 of the 2023-11-09 you connected to this LENOVO system which runs on Fedora 38. Have a good day!"
}

PLAY RECAP *********************************************************************************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

With the ability to extract information about the systems, we can adapt our playbook to run specific tasks only when certain conditions are met. With the information above we can create a playbook that only runs on Fedora host group (but we really run it on localhost) and checks if the system is at its latest OS version.

For simplicity, we hardcoded the latest_fedora_version variable which, as of November 2023, time of writing this blog post, Fedora just released the newer version and it gave me inspiration to write this simple playbook.

- name: Report if Fedora systems are at their latest version
  hosts: localhost
  gather_facts: True
  gather_subset:
    - network
    - "!min"
    - "!all"
  vars:
    latest_fedora_version: 39
  tasks:
  - name: Print system info
    debug:
      msg: "The system with hostname {{ ansible_hostname }} at IP {{ ansible_default_ipv4.address }} is NOT at its latest version"
    when: ansible_distribution_major_version != latest_fedora_version

The output is of course:

[user@ansible ~]$ ansible-playbook fedora-version-check.yml 
PLAY [Report if Fedora systems are at their latest version] ********************************************************************************

TASK [Gathering Facts] *********************************************************************************************************************
ok: [localhost]

TASK [Print system info] *******************************************************************************************************************
ok: [localhost] => {
    "msg": "The system with hostname nuclear00 at IP 192.168.150.2 is NOT at its latest version"
}

PLAY RECAP *********************************************************************************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Well, it looks like somebody will have to update his system!

Gather subset or filter?

filter can be more precise, it tends to be slower because it has to filter the output (here the output is redacted) and it is used mostly in ad-hoc commands because of how minimal details you extract with it.

[user@ansible ~]$ time ansible localhost -m setup -a "filter=*time*"
localhost | SUCCESS => {
    "ansible_facts": {
        "ansible_date_time": {
            "date": "2023-11-09",
...
            "time": "09:58:47",
...
    },
    "changed": false
}

real    0m4.769s
user    0m1.806s
sys 0m0.621s

gather_subset is fast in execution and multiple subsets can be extracted at the same moment which makes gathering specific facts faster than gathering all of them. You would want to use gather_subset when there is more than one variable you need to obtain from the system.

[user@ansible ~]$ time ansible localhost -m setup -a "gather_subset=network"
localhost | SUCCESS => {
    "ansible_facts": {
...
        "gather_subset": [
            "network"
        ],
        "module_setup": true
    },
    "changed": false
}

real    0m1.123s
user    0m1.016s
sys 0m0.155s

And what about gather_subset: all? At least from this example, similar execution time as filter but way more data is extracted (which again, here it is redacted).

[user@ansible ~]$ time ansible localhost -m setup -a "gather_subset=all"
localhost | SUCCESS => {
    "ansible_facts": {
...
        "gather_subset": [
            "all"
        ],
        "module_setup": true
    },
    "changed": false
}

real    0m4.698s
user    0m1.812s
sys 0m0.521s

We can see that, in this occasion, gather_subset: network takes roughly 1/4 of the time of the filter to execute. This is a huge time saver when running a playbook that requires system facts on a pool of hundreds or thousands of nodes!

Gather_subset min or !min

With gather_subset we can extract only facts that are interesting to us, but did you know what writing gather_subset: network does not only extract network related facts?
There is one special fact that is always called when launching the fact gathering module: min.

[user@ansible ~]$ ansible localhost -m setup -a 'gather_subset=!min,network' | wc -l
430
[user@ansible ~]$ ansible localhost -m setup -a 'gather_subset=min,network' | wc -l
589
[user@ansible ~]$ ansible localhost -m setup -a 'gather_subset=network' | wc -l
588

You can see that when not defining !min, the command will insert the min facts by default.

You can check out the gather_subset documentation and understand more about subset! Try them out in your command or playbook and fine tune your playbooks with just the right facts, for a rocketing fast execution!

Pro Tip: Use gather_subset or filters also when dealing with older systems. By selecting only necessary facts, you save time and resources, ensuring your playbook runs smoothly, even in a pool of hosts.

Custom facts

Custom facts is a useful feature of Ansible that allows adding facts to a system, both by writing them into a file on the system (local) or by creating them during a playbook execution.

Local facts

We understood that facts are information about the system which are extracted using the setup module. Creating local facts for the system may be useful when logically differentiate one host from another in those cases where the IP address, the hostname or the OS distribution is not the most important key.

A logical division could be the host belonging to a geographical area, or to a specific Datacenter, although the Ansible inventory helps us already in that regards, so, what's special about local facts?

Static facts

The first method is straight forward: You create the proper file on the host and the next time the gather_facts runs you will retrieve your custom facts into the ansible_local variable.

Create /etc/ansible/facts.d/foobar.fact, because unless you edit ansible.cfg, this is the path where Ansible looks for custom facts, with this content:

[myfacts]
foo=bar

Let the fact gathering run and what you get is your facts into the ansible_local variable.

[user@ansible ~]$ ansible localhost -m setup -a "filter=ansible_local"
localhost | SUCCESS => {
    "ansible_facts": {
        "ansible_local": {
            "foobar": {
                "myfacts": {
                    "foo": "bar"
                }
            }
        }
    },
    "changed": false
}

Do not apply executable permission!

Scripted

The second method is less known, probably because it's not enough emphasized in the above documentation, although it is referenced multiple times.

Ansible facts allow us to create a custom program that is executed when the gather_facts run. This can be Bash, Python or a compiled language, as long as its output is json.

Create the script /etc/ansible/facts.d/myscript.fact with this content:

#!/bin/bash
echo '[balloons]'
num=$(( 98 + 1 ))
echo "${num}:red"

And apply executable permissions: chmod +x /etc/ansible/facts.d/myscript.fact

Let the fact gathering run and what you get is your facts into the ansible_local variable.

[user@ansible ~]$ ansible localhost -m setup -a "filter=ansible_local"
localhost | SUCCESS => {
    "ansible_facts": {
        "ansible_local": {
            "myscript": {
                "balloons": {
                    "99": "red"
                }
            }
        }
    },
    "changed": false
}

Exploring Ansible-cmdb

About facts gathering, it is also worth to mention ansible-cmdb.
It is possible to collect facts from a host and store them into a file.

Install ansible-cmdb using python:

[user@ansible ~]$ pip install ansible-cmdb

Then, run the facts gatering modules with the option --tree and the path to the output folder.

[user@ansible ~]$ ansible localhost -m setup --tree out/

Now run ansible-cmdb against this folder to create a formatted html page. Open the page on a system with Graphical Interface and a browser, and it will pretty print the facts just gathered.

[user@ansible ~]$ ansible-cmdb out/ > overview.html

Why disabling facts might be necessary

Facts gathering is enabled by default, to disable it you can set the gather_facts parameter in playbook to false, like below:

- name: Disable facts gather example
  hosts: all
  gather_facts: false
...

If your playbook, or role, does not require facts gathering, then disabling them is the best option for saving time.

Real-world use-cases

Simple use-case scenarios

A simple use case for ansible facts in a playbook shows up when we want to install Apache webserver on a system that can be RHEL or Debian based, since the package name differs in these two Linux distributions.
The installation of the package itself can use the module package of Ansible (instead of yum or apt modules), but we require a way to differentiate between the two OS.

playbook.yaml

---
- name: Starting Apache installation
  connection: local
  gather_facts: true
  hosts: localhost
  tasks:
  - name: Include OS specific variables
    include_vars: vars/{{ ansible_os_family }}.yaml

  - name: Install Apache
    package:
      name: "{{ apache_package }}"
      state: present

  - name: Start Apache and enable on boot
    service:
      name: "{{ apache_service }}"
      state: started
      enabled: true

vars/RedHat.yaml

---
apache_package: httpd
apache_service: httpd

vars/Debian.yaml

---
apache_package: apache2
apache_service: apache2

An ad-hoc command shows the value of ansible_os_family when run on a Fedora system:

[user@ansible ~]$ ansible -i inventory localhost -m setup -a "filter=ansible_os_family"
localhost | SUCCESS => {
    "ansible_facts": {
        "ansible_os_family": "RedHat"
    },
    "changed": false
}

Advanced use-case examples

We can spice things up and use facts to apply dynamic changes to our systems. We can create a playbook that hadles updates and reports based on the facts that Ansible can gather.

  1. The playbook does not know anything about the hosts. The inventory is as simple as the hostname only
  2. Facts must be able to extract if the host is a webserver or a database by the packages they have installed:
    1. If a webserver, they must have either httpd or apache2
    2. If a database, they must have postgresql or mariadb
  3. According to the packages they have installed the playbook must:
    1. Update to the latest apache version RHEL systems only
    2. Print a list of hosts that run the DB mariadb
---
- name: Determine Server Roles
  hosts: all
  tasks:
    - name: Check installed packages and set facts
      package_facts:
        manager: auto
      register: package_info

    - name: Determine Web Servers and Databases
      set_fact:
        is_webserver: "{{ ('httpd' in package_info.ansible_facts.packages) or ('apache2' in package_info.ansible_facts.packages) }}"
        is_database: "{{ ('postgresql' in package_info.ansible_facts.packages) or ('mariadb' in package_info.ansible_facts.packages) }}"
      when: package_info.ansible_facts.packages is defined

    - name: Update Apache to the latest version on RHEL systems
      package:
        name: httpd
        state: latest
      when: is_webserver and ansible_os_family == "RedHat"

    - name: Print DB Servers with Mariadb installed
      debug:
        var: inventory_hostname
      when: is_database and ('mariadb' in package_info.ansible_facts.packages)

In this playbook:

Server Roles Determination: The playbook starts by gathering package facts for all hosts. Based on the installed packages, it sets two facts: is_webserver and is_database indicating whether the host is a web server or a database server, respectively.

  • Updating Apache on RHEL Web Servers: For hosts identified as web servers with either httpd or apache2 installed, it updates Apache to the latest version, but only on RHEL systems (ansible_os_family == "RedHat").
  • Printing Mariadb Database Servers: For hosts identified as database servers with mariadb installed, it prints the hostname. This is done only for hosts in the db_servers group.

This playbook dynamically determines server roles based on installed packages and performs specific tasks accordingly.

It is also worth mentioning that although hosts: all may seem a good approach, it's generally better to let the inventory handle the hosts logical groups.
In this case we could have used an inventory similar to this.

[webserver]
webhost1
webhost2

[database]
dbhost1
dbhost2

And have one playbook that runs on webserver and another playbook for database. There are multiple ways for achieving the same result. What's important is being able to be as effective as possible and try to not create monsters! Remember that readability of a code is as important as the code itself.

Conclusions

So, why do facts matter so much in Ansible? Well, these facts are like the detective clues that Ansible uses to understand the story of your servers. When Ansible knows the facts, like what kind of system it's dealing with or what apps are installed, it can tailor its actions accordingly.

Imagine you're a chef. Knowing the ingredients and the type of kitchen you're working in helps you cook up the perfect dish. Similarly, Ansible relies on facts to serve up the right commands to each server.

Facts make Ansible smart and adaptable. They turn it from a one-size-fits-all tool into a customized wizard for each server. So, pay attention to the facts, and Ansible will work its magic, making your server management a breeze!