In the realm of monitoring and observability, Prometheus has emerged as a prominent open-source solution, empowering organizations to gain valuable insights into their systems and applications.
One key feature that sets Prometheus apart is its federation capabilities, which allow users to aggregate and consolidate data from multiple Prometheus instances.
In this blog post, we will explore Prometheus federation, comparing two approaches: one with all job metrics and the other with specific metrics. By understanding the unique advantages and considerations of each approach, you'll be equipped with the knowledge to make an informed decision about which federation strategy suits your monitoring needs.
What is Prometheus Federation?
Imagine you have a large infrastructure with multiple systems and services, each generating valuable metrics that you want to monitor and analyze. Prometheus federation allows you to bring all those metrics together from different Prometheus instances into a centralized view.
Think of it like a team of detectives investigating a complex case. Each detective is responsible for gathering information about a specific aspect of the case. Similarly, each Prometheus instance focuses on monitoring specific services or clusters in your infrastructure and collects metrics related to them.
Now, to solve the case effectively, the detectives need to collaborate and share their findings with each other. They do this by periodically meeting, exchanging information, and consolidating their knowledge. In the same way, Prometheus federation enables different Prometheus instances to meet, exchange metrics data, and create a unified view of your entire infrastructure.
In summary, Prometheus federation is like a collaboration between detectives, where each detective (Prometheus instance) collects metrics about a specific area, and federation brings their findings together to create a comprehensive view of your infrastructure.
Use Case for Federation
In this example, we will explore a scenario involving one Rancher cluster and one central Prometheus instance, chosen for the sake of simplicity. It is important to note that this blog post will not dig into the intricacies of Prometheus architecture or the process of building Prometheus at scale.
To provide clarity throughout this blog post, let's establish some technical details and naming conventions:
Kubernetes Cluster Name | Kubernetes Flavor | Number of Nodes | CPU | Memory |
---|---|---|---|---|
rancher1.cloudwerkstatt.com | Rancher | 3 | 4vCPU | 8GB |
Central Prometheus Instance | CPU | Memory | Retention in Days | Retention in GB |
---|---|---|---|---|
centralprom.cloudwerkstatt.com | 4vCPU | 4GB | 30d | Not set for the purpose of this blog |
Additional technical details:
- Rancher cluster operates a Prometheus instance that exposes its metrics through the
/federate
endpoint (implemented using the Kubernetes Ingress object). - The central Prometheus instance consumes the
/federate
endpoint of Rancher instances, serving as a centralized infrastructure view for all metrics.
Configuring Rancher Federation
The configuration process for Rancher federation is similar across different Kubernetes flavors, as the underlying logic remains consistent. Let's proceed with directly configuring federation for the Rancher clusters.
To ensure seamless communication, we need to create an ingress that points to the Prometheus service's /federate
URL. This ensures that the central Prometheus instance can access the following URLs:
Kubernetes Cluster Name | Prometheus Federation URL | Scrape Interval |
---|---|---|
rancher1.cloudwerkstatt.com | prom.rancher1.cloudwerkstatt.com/federate | 30s |
Here's an example YAML for the ingress object that exposes Prometheus with federation:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: cluster-acme
name: expose-prom
namespace: cattle-monitoring-system
spec:
rules:
- host: prom.rancher1.cloudwerkstatt.com
http:
paths:
- backend:
service:
name: rancher-monitoring-prometheus
port:
number: 9090
path: /federate
pathType: ImplementationSpecific
tls:
- hosts:
- prom.rancher1.cloudwerkstatt.com
secretName: prom-secret
Feel free to adapt this example to match your specific cluster configurations and requirements. This YAML snippet demonstrates the creation of an ingress object, including the necessary annotations for certificate management. The host
field specifies the URL for accessing the federation endpoint, while the backend
section defines the service and port to route the traffic to. Additionally, the TLS section enables secure communication with the appropriate secret name for the SSL certificate.
With the federation configured, we can proceed to explore the comparison between federating all job metrics and federating specific metrics in the subsequent sections of this blog post.
Federate All metrics
First, we will begin by federating all metrics to showcase the sheer amount of data that can be collected. However, it is essential to understand and consider the purpose for which you want to collect all metrics. It is typically not considered a best practice to federate everything.
To federate all metrics, we need to configure the central Prometheus instance accordingly. By specifying the appropriate configuration, the central Prometheus instance will collect and aggregate metrics from all federated sources. It is important to note that federating all metrics can have implications on resource consumption, such as memory, CPU, and disk usage.
Within the prometheus.yml
file of the central Prometheus configuration, you'll find the section that handles the collection of all metrics from the federated endpoint.
scrape_configs:
- job_name: rancher-federate
scrape_timeout: 30s
honor_labels: true
metrics_path: '/federate'
scheme: https
params:
'match[]':
- '{__name__=~".+"}' # Collect all
tls_config:
# For simplicity, let's skip SSL verification to focus on the main point of this blog
insecure_skip_verify: true
static_configs:
- targets:
- prom.rancher1.cloudwerkstatt.com
labels:
cluster_id: 'rancher1'
env: "Dev"
Let's examine some real data. Please note that this Rancher cluster is currently empty, meaning it does not run any applications and thus contains less data compared to a cluster with multiple running apps (pods). The data provided will illustrate the amount of data scraped in 24 hours. Now, envision having 10 or 100 clusters, each being scraped in the same manner. Such an approach would result in significant resource wastage due to suboptimal setup.
Central Prometheus Instance | TSDB Size after 24h |
---|---|
centralprom.cloudwerkstatt.com | 630MB |
Now, let's perform some calculations based on the expanded setup:
- There are 40 clusters in total.
- The approximate size of each federated cluster's TSDB (Time-Series Database) is 630MB.
- The retention period for the central Prometheus would be set to 30 days.
If we intend to retain all this data for 30 days
, we would require around (40 * 630 * 30) / 1024 ≈ 738GB
of storage. However, this would significantly increase the memory footprint for Prometheus too. Consequently, you would need to allocate more memory and CPU resources to handle the increased workload.
In the next section, we will explore an alternative approach: federating specific metrics. This approach allows for a more targeted and efficient monitoring strategy, focusing only on the metrics that are most relevant to your specific use case and objectives.
Federate Specific Metrics
In this example, we will explore the approach of federating only minimal set of metrics that are necessary for alerting purposes. By focusing on specific metrics, we can optimize resource usage and streamline monitoring operations.
To begin, we need to configure the central Prometheus instance to federate only the desired metrics. By specifying the appropriate configuration, we ensure that only the selected metrics are collected and aggregated. This targeted approach allows for more efficient resource utilization, including memory, CPU, and disk usage.
Please keep in mind that the snippet below of the prometheus.yml
is just an example to demonstrate federation with a specific set of metrics.
scrape_configs:
- job_name: rancher-federate
scrape_timeout: 30s
honor_labels: true
metrics_path: '/federate'
scheme: https
params:
'match[]':
- '{job="node-exporter"}'
- '{job="rancher-monitoring-prometheus"}'
- '{job="apiserver"}'
- '{job="kube-state-metrics"}'
- '{job="kube-scheduler"}'
- '{job="kube-etcd"}'
- '{job="kube-controller-manager"}'
- '{job="kubelet"}'
- '{job="coredns"}'
- '{job="cert-manager"}'
tls_config:
insecure_skip_verify: true
metric_relabel_configs:
- source_labels: [__name__]
action: keep
regex: "(kube_pod_status_phase|\
etcd_network_peer_round_trip_time_seconds_bucket|\
kube_node_status_condition|\
kube_pod_container_status_restarts_total|\
kube_persistentvolume_status_phase|\
node_memory_MemAvailable_bytes|\
node_memory_MemTotal_bytes|\
kubelet_volume_stats_available_bytes|\
kubelet_volume_stats_capacity_bytes|\
node_filesystem_avail_bytes|\
node_filesystem_size_bytes|\
node_filesystem_readonly|\
apiserver_client_certificate_expiration_seconds_count|\
apiserver_client_certificate_expiration_seconds_bucket|\
aggregator_unavailable_apiservice|\
kube_pod_status_phase|\
kube_pod_info|\
kube_node_status_capacity|\
node_network_up|\
node_nf_conntrack_entries|\
node_nf_conntrack_entries_limit|\
prometheus_tsdb_head_samples_appended_total|\
coredns_panics_total|\
coredns_dns_responses_total|\
etcd_network_peer_round_trip_time_seconds_bucket|\
etcd_server_has_leader|\
etcd_network_peer_sent_failures_total|\
etcd_mvcc_db_total_size_in_bytes|\
etcd_server_quota_backend_bytes|\
etcd_server_is_leader|\
container_memory_working_set_bytes|\
kube_pod_container_resource_limits_memory_bytes|\
kube_node_role|\
kube_pod_container_resource_requests|\
kube_pod_container_resource_limits|\
certmanager_certificate_ready_status|\
certmanager_http_acme_client_request_count|\
certmanager_certificate_expiration_timestamp_seconds|\
kube_node_status_allocatable|\
kubelet_node_name|\
up)"
static_configs:
- targets:
- prom.rancher1.cloudwerkstatt.com
labels:
cluster_id: 'rancher1'
env: "Dev"
Let's take a look at some real data where only a specific set of jobs is scraped, and then only a specific set of metrics is stored in the central Prometheus. This ensures that the central Prometheus only stores metrics from the listed jobs and those defined in the metric_relabel_configs
. The inclusion of action: keep
is vital, as it ensures that any other metrics not meeting the criteria will be discarded, resulting in a substantial reduction in TSDB size.
Central Prometheus Instance | TSDB Size after 24h |
---|---|
centralprom.cloudwerkstatt.com | 11MB |
Once again let's perform some calculations based on the expanded setup:
- There are 40 clusters in total
- The approximate size of each federated cluster's TSDB (Time-Series Database) is 11MB.
- The retention period for the central Prometheus would be set to 30 days.
So the data would persist for 30 days in the central Prometheus, and we will require around ( 40 * 11 * 30) / 1024 ≈ 12.9GB
of storage.
Now it is evident that the central Prometheus instance will contain much less data compared to the previous case where all metrics were collected. This also decreases the memory and CPU footprint on the central Prometheus.
By federating specific metrics, we can create a monitoring setup that aligns closely with our alerting requirements. This approach enables us to focus on the metrics that are most crucial for detecting and responding to critical events. By eliminating unnecessary noise and reducing the volume of collected data, we can enhance the effectiveness and responsiveness of our alerting system.
In the next section, we will further explore the benefits and considerations of federating specific metrics, as well as provide guidance on best practices for implementing this approach effectively.
Benefits and Considerations of Federating Specific Metrics
Federating specific metrics offers several advantages, allowing for a more targeted and efficient monitoring strategy. Let's explore some of the key benefits and considerations associated with this approach:
-
Reduced Resource Consumption: Federating specific metrics optimizes resource usage on the central Prometheus instance.
-
Reduce federation failure: Federation is not designed to handle the federation of all jobs, and attempting to federate a large number of metrics from all jobs can lead to issues. You may encounter errors in the Prometheus logs with a
federation failed
message.
Considerations:
-
Missing Metrics for Central Dashboarding: Federating specific metrics means that not all metrics are available in the central Prometheus instance. This can make central dashboarding more challenging, as you may not have a complete overview of all metrics in a single location. It requires careful planning and coordination to ensure that the essential metrics for centralized monitoring and reporting are federated.
-
Potential Overlooking of Critical Metrics: When selecting specific metrics for federation, there is a risk of overlooking important metrics that may be crucial for identifying emerging issues or trends. It is crucial to thoroughly analyze your system, involve stakeholders, and gather feedback to ensure that the chosen metrics accurately reflect the health and performance of your system.
-
Maintenance Overhead: Federating specific metrics may introduce additional maintenance overhead. As your system evolves, the set of critical metrics may change, requiring periodic review and refinement of the federated metrics. It is essential to allocate resources for monitoring configuration updates and metric validation to ensure the effectiveness of your monitoring setup.
Central Prometheus TSDB Stats Comparison
Time Series Database Stats:
All Metrics after 24H
Top 10 series count by metric names
Name | Count |
---|---|
apiserver_request_slo_duration_seconds_bucket | 11308 |
apiserver_request_duration_seconds_bucket | 7760 |
etcd_request_duration_seconds_bucket | 6696 |
apiserver_response_sizes_bucket | 4264 |
apiserver_watch_events_sizes_bucket | 2232 |
grpc_server_handled_total | 2193 |
workqueue_queue_duration_seconds_bucket | 1518 |
workqueue_work_duration_seconds_bucket | 1518 |
container_blkio_device_usage_total | 1296 |
apiserver_flowcontrol_priority_level_request_count_watermarks_bucket | 1008 |
Specific Metrics after 24H
Top 10 series count by metric names
Name | Count |
---|---|
kube_pod_status_phase | 255 |
apiserver_client_certificate_expiration_seconds_bucket | 180 |
container_memory_working_set_bytes | 162 |
aggregator_unavailable_apiservice | 117 |
etcd_network_peer_round_trip_time_seconds_bucket | 102 |
kube_pod_container_status_restarts_total | 60 |
kube_pod_info | 51 |
kube_node_status_condition | 45 |
node_network_up | 42 |
up | 30 |
Making Informed Decisions for Prometheus Federation
After having described all the above, we need to point out that there is no one-size-fits-all solution. The choice of Prometheus federation approach depends on the specific needs of your environment and the trade-offs you are willing to make.
Furthermore, explore additional tools and technologies like Thanos that can complement Prometheus federation and address specific monitoring challenges. By continually adapting and enhancing your monitoring setup, you can stay proactive in identifying and addressing issues, ensuring the reliability and performance of your systems.
Feel free to reach to us via our contact info on our website if you've questions or requests.