Prometheus Federation

In the realm of monitoring and observability, Prometheus has emerged as a prominent open-source solution, empowering organizations to gain valuable insights into their systems and applications.
One key feature that sets Prometheus apart is its federation capabilities, which allow users to aggregate and consolidate data from multiple Prometheus instances.

In this blog post, we will explore Prometheus federation, comparing two approaches: one with all job metrics and the other with specific metrics. By understanding the unique advantages and considerations of each approach, you'll be equipped with the knowledge to make an informed decision about which federation strategy suits your monitoring needs.

What is Prometheus Federation?

Imagine you have a large infrastructure with multiple systems and services, each generating valuable metrics that you want to monitor and analyze. Prometheus federation allows you to bring all those metrics together from different Prometheus instances into a centralized view.

Think of it like a team of detectives investigating a complex case. Each detective is responsible for gathering information about a specific aspect of the case. Similarly, each Prometheus instance focuses on monitoring specific services or clusters in your infrastructure and collects metrics related to them.

Now, to solve the case effectively, the detectives need to collaborate and share their findings with each other. They do this by periodically meeting, exchanging information, and consolidating their knowledge. In the same way, Prometheus federation enables different Prometheus instances to meet, exchange metrics data, and create a unified view of your entire infrastructure.

In summary, Prometheus federation is like a collaboration between detectives, where each detective (Prometheus instance) collects metrics about a specific area, and federation brings their findings together to create a comprehensive view of your infrastructure.

Use Case for Federation

In this example, we will explore a scenario involving one Rancher cluster and one central Prometheus instance, chosen for the sake of simplicity. It is important to note that this blog post will not dig into the intricacies of Prometheus architecture or the process of building Prometheus at scale.

To provide clarity throughout this blog post, let's establish some technical details and naming conventions:

Kubernetes Cluster Name Kubernetes Flavor Number of Nodes CPU Memory
rancher1.cloudwerkstatt.com Rancher 3 4vCPU 8GB
Central Prometheus Instance CPU Memory Retention in Days Retention in GB
centralprom.cloudwerkstatt.com 4vCPU 4GB 30d Not set for the purpose of this blog

Additional technical details:

  • Rancher cluster operates a Prometheus instance that exposes its metrics through the /federate endpoint (implemented using the Kubernetes Ingress object).
  • The central Prometheus instance consumes the /federate endpoint of Rancher instances, serving as a centralized infrastructure view for all metrics.

Configuring Rancher Federation

The configuration process for Rancher federation is similar across different Kubernetes flavors, as the underlying logic remains consistent. Let's proceed with directly configuring federation for the Rancher clusters.

To ensure seamless communication, we need to create an ingress that points to the Prometheus service's /federate URL. This ensures that the central Prometheus instance can access the following URLs:

Kubernetes Cluster Name Prometheus Federation URL Scrape Interval
rancher1.cloudwerkstatt.com prom.rancher1.cloudwerkstatt.com/federate 30s

Here's an example YAML for the ingress object that exposes Prometheus with federation:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: cluster-acme
  name: expose-prom
  namespace: cattle-monitoring-system
spec:
  rules:
  - host: prom.rancher1.cloudwerkstatt.com
    http:
      paths:
      - backend:
          service:
            name: rancher-monitoring-prometheus
            port:
              number: 9090
        path: /federate
        pathType: ImplementationSpecific
  tls:
  - hosts:
    - prom.rancher1.cloudwerkstatt.com
    secretName: prom-secret

Feel free to adapt this example to match your specific cluster configurations and requirements. This YAML snippet demonstrates the creation of an ingress object, including the necessary annotations for certificate management. The host field specifies the URL for accessing the federation endpoint, while the backend section defines the service and port to route the traffic to. Additionally, the TLS section enables secure communication with the appropriate secret name for the SSL certificate.

With the federation configured, we can proceed to explore the comparison between federating all job metrics and federating specific metrics in the subsequent sections of this blog post.

Federate All metrics

First, we will begin by federating all metrics to showcase the sheer amount of data that can be collected. However, it is essential to understand and consider the purpose for which you want to collect all metrics. It is typically not considered a best practice to federate everything.

To federate all metrics, we need to configure the central Prometheus instance accordingly. By specifying the appropriate configuration, the central Prometheus instance will collect and aggregate metrics from all federated sources. It is important to note that federating all metrics can have implications on resource consumption, such as memory, CPU, and disk usage.

Within the prometheus.yml file of the central Prometheus configuration, you'll find the section that handles the collection of all metrics from the federated endpoint.

scrape_configs:
  - job_name: rancher-federate
    scrape_timeout: 30s
    honor_labels: true
    metrics_path: '/federate'
    scheme: https
    params:
      'match[]':
        - '{__name__=~".+"}' # Collect all
    tls_config:
      # For simplicity, let's skip SSL verification to focus on the main point of this blog
      insecure_skip_verify: true
    static_configs:
      - targets:
        - prom.rancher1.cloudwerkstatt.com
        labels:
          cluster_id: 'rancher1'
          env: "Dev"

Let's examine some real data. Please note that this Rancher cluster is currently empty, meaning it does not run any applications and thus contains less data compared to a cluster with multiple running apps (pods). The data provided will illustrate the amount of data scraped in 24 hours. Now, envision having 10 or 100 clusters, each being scraped in the same manner. Such an approach would result in significant resource wastage due to suboptimal setup.

Central Prometheus Instance TSDB Size after 24h
centralprom.cloudwerkstatt.com 630MB

Now, let's perform some calculations based on the expanded setup:

  • There are 40 clusters in total.
  • The approximate size of each federated cluster's TSDB (Time-Series Database) is 630MB.
  • The retention period for the central Prometheus would be set to 30 days.

If we intend to retain all this data for 30 days, we would require around (40 * 630 * 30) / 1024 ≈ 738GB of storage. However, this would significantly increase the memory footprint for Prometheus too. Consequently, you would need to allocate more memory and CPU resources to handle the increased workload.

In the next section, we will explore an alternative approach: federating specific metrics. This approach allows for a more targeted and efficient monitoring strategy, focusing only on the metrics that are most relevant to your specific use case and objectives.

Federate Specific Metrics

In this example, we will explore the approach of federating only minimal set of metrics that are necessary for alerting purposes. By focusing on specific metrics, we can optimize resource usage and streamline monitoring operations.

To begin, we need to configure the central Prometheus instance to federate only the desired metrics. By specifying the appropriate configuration, we ensure that only the selected metrics are collected and aggregated. This targeted approach allows for more efficient resource utilization, including memory, CPU, and disk usage.

Please keep in mind that the snippet below of the prometheus.yml is just an example to demonstrate federation with a specific set of metrics.

scrape_configs:
  - job_name: rancher-federate
    scrape_timeout: 30s
    honor_labels: true
    metrics_path: '/federate'
    scheme: https
    params:
      'match[]':
        - '{job="node-exporter"}'
        - '{job="rancher-monitoring-prometheus"}'
        - '{job="apiserver"}'
        - '{job="kube-state-metrics"}'
        - '{job="kube-scheduler"}'
        - '{job="kube-etcd"}'
        - '{job="kube-controller-manager"}'
        - '{job="kubelet"}'
        - '{job="coredns"}'
        - '{job="cert-manager"}'
    tls_config:
      insecure_skip_verify: true
    metric_relabel_configs:
      - source_labels: [__name__]
        action: keep
        regex: "(kube_pod_status_phase|\
      etcd_network_peer_round_trip_time_seconds_bucket|\
      kube_node_status_condition|\
      kube_pod_container_status_restarts_total|\
      kube_persistentvolume_status_phase|\
      node_memory_MemAvailable_bytes|\
      node_memory_MemTotal_bytes|\
      kubelet_volume_stats_available_bytes|\
      kubelet_volume_stats_capacity_bytes|\
      node_filesystem_avail_bytes|\
      node_filesystem_size_bytes|\
      node_filesystem_readonly|\
      apiserver_client_certificate_expiration_seconds_count|\
      apiserver_client_certificate_expiration_seconds_bucket|\
      aggregator_unavailable_apiservice|\
      kube_pod_status_phase|\
      kube_pod_info|\
      kube_node_status_capacity|\
      node_network_up|\
      node_nf_conntrack_entries|\
      node_nf_conntrack_entries_limit|\
      prometheus_tsdb_head_samples_appended_total|\
      coredns_panics_total|\
      coredns_dns_responses_total|\
      etcd_network_peer_round_trip_time_seconds_bucket|\
      etcd_server_has_leader|\
      etcd_network_peer_sent_failures_total|\
      etcd_mvcc_db_total_size_in_bytes|\
      etcd_server_quota_backend_bytes|\
      etcd_server_is_leader|\
      container_memory_working_set_bytes|\
      kube_pod_container_resource_limits_memory_bytes|\
      kube_node_role|\
      kube_pod_container_resource_requests|\
      kube_pod_container_resource_limits|\
      certmanager_certificate_ready_status|\
      certmanager_http_acme_client_request_count|\
      certmanager_certificate_expiration_timestamp_seconds|\
      kube_node_status_allocatable|\
      kubelet_node_name|\
      up)"
    static_configs:
      - targets:
        - prom.rancher1.cloudwerkstatt.com
        labels:
          cluster_id: 'rancher1'
          env: "Dev"

Let's take a look at some real data where only a specific set of jobs is scraped, and then only a specific set of metrics is stored in the central Prometheus. This ensures that the central Prometheus only stores metrics from the listed jobs and those defined in the metric_relabel_configs. The inclusion of action: keep is vital, as it ensures that any other metrics not meeting the criteria will be discarded, resulting in a substantial reduction in TSDB size.

Central Prometheus Instance TSDB Size after 24h
centralprom.cloudwerkstatt.com 11MB

Once again let's perform some calculations based on the expanded setup:

  • There are 40 clusters in total
  • The approximate size of each federated cluster's TSDB (Time-Series Database) is 11MB.
  • The retention period for the central Prometheus would be set to 30 days.

So the data would persist for 30 days in the central Prometheus, and we will require around ( 40 * 11 * 30) / 1024 ≈ 12.9GB of storage.
Now it is evident that the central Prometheus instance will contain much less data compared to the previous case where all metrics were collected. This also decreases the memory and CPU footprint on the central Prometheus.

By federating specific metrics, we can create a monitoring setup that aligns closely with our alerting requirements. This approach enables us to focus on the metrics that are most crucial for detecting and responding to critical events. By eliminating unnecessary noise and reducing the volume of collected data, we can enhance the effectiveness and responsiveness of our alerting system.

In the next section, we will further explore the benefits and considerations of federating specific metrics, as well as provide guidance on best practices for implementing this approach effectively.

Benefits and Considerations of Federating Specific Metrics

Federating specific metrics offers several advantages, allowing for a more targeted and efficient monitoring strategy. Let's explore some of the key benefits and considerations associated with this approach:

  • Reduced Resource Consumption: Federating specific metrics optimizes resource usage on the central Prometheus instance.

  • Reduce federation failure: Federation is not designed to handle the federation of all jobs, and attempting to federate a large number of metrics from all jobs can lead to issues. You may encounter errors in the Prometheus logs with a federation failed message.

Considerations:

  • Missing Metrics for Central Dashboarding: Federating specific metrics means that not all metrics are available in the central Prometheus instance. This can make central dashboarding more challenging, as you may not have a complete overview of all metrics in a single location. It requires careful planning and coordination to ensure that the essential metrics for centralized monitoring and reporting are federated.

  • Potential Overlooking of Critical Metrics: When selecting specific metrics for federation, there is a risk of overlooking important metrics that may be crucial for identifying emerging issues or trends. It is crucial to thoroughly analyze your system, involve stakeholders, and gather feedback to ensure that the chosen metrics accurately reflect the health and performance of your system.

  • Maintenance Overhead: Federating specific metrics may introduce additional maintenance overhead. As your system evolves, the set of critical metrics may change, requiring periodic review and refinement of the federated metrics. It is essential to allocate resources for monitoring configuration updates and metric validation to ensure the effectiveness of your monitoring setup.

Central Prometheus TSDB Stats Comparison

Time Series Database Stats:

All Metrics after 24H

Top 10 series count by metric names

Name Count
apiserver_request_slo_duration_seconds_bucket 11308
apiserver_request_duration_seconds_bucket 7760
etcd_request_duration_seconds_bucket 6696
apiserver_response_sizes_bucket 4264
apiserver_watch_events_sizes_bucket 2232
grpc_server_handled_total 2193
workqueue_queue_duration_seconds_bucket 1518
workqueue_work_duration_seconds_bucket 1518
container_blkio_device_usage_total 1296
apiserver_flowcontrol_priority_level_request_count_watermarks_bucket 1008

Specific Metrics after 24H

Top 10 series count by metric names

Name Count
kube_pod_status_phase 255
apiserver_client_certificate_expiration_seconds_bucket 180
container_memory_working_set_bytes 162
aggregator_unavailable_apiservice 117
etcd_network_peer_round_trip_time_seconds_bucket 102
kube_pod_container_status_restarts_total 60
kube_pod_info 51
kube_node_status_condition 45
node_network_up 42
up 30

Making Informed Decisions for Prometheus Federation

After having described all the above, we need to point out that there is no one-size-fits-all solution. The choice of Prometheus federation approach depends on the specific needs of your environment and the trade-offs you are willing to make.

Furthermore, explore additional tools and technologies like Thanos that can complement Prometheus federation and address specific monitoring challenges. By continually adapting and enhancing your monitoring setup, you can stay proactive in identifying and addressing issues, ensuring the reliability and performance of your systems.

Feel free to reach to us via our contact info on our website if you've questions or requests.