Kubernetes: The present and future of platform engineering
Kubecon 2024
Although I've watched quite some talks of past Kubecons online, this was my first time attending one in person. I've to admit that the impact it had on was much more amplified that what I initially expected.
But this blog is not about my experience but about the K8s ecosystem. This blog intends to serve as a short summary of awesome things going on in this industry.
Kubecon started with tens of lightning talks; short presentations of just 7-15 minutes about technologies part of the CNCF sandbox, incubating and graduate projects (https://www.cncf.io/projects/).
Since Google's Borg evolved into what we know today as Kubernetes back in 2014, (https://kubernetes.io/blog/2015/04/borg-predecessor-to-kubernetes/) and after being donated to the CNCF in 2016, (https://www.cncf.io/reports/kubernetes-project-journey-report/) the project got more than 314k+ commits from tens of thousands of individual contributors. Therefore, comes to no surprise that this technology created a whole spectrum of projects that gravitate around it.
The CNCF is adopting and promoting all the best projects. Providing a structure to support them, incentivizing contributions, and offering a common code of conduct to ease collaboration. CNCF encourages developers to take ownership, helping contributors feel as part of a community. Most projects also have a clear contributor ladder (https://gateway-api.sigs.k8s.io/contributing/contributor-ladder/), where the path towards leadership or formal roles within the project are described. A clear contributor ladder ensures a good long term vitality for the projects.
Each project is different, some make decide for a lazy consensus algorithm, where proposals are implemented if no one opposes in a determined amount of time. Others might have stricter decision mechanisms, because of the implications changes could have in the businesses where implemented. There is no one-size-fits all, but what it is clear is that opensource and community contributions is what has built this platform that is now a cornerstone of thousands of world leading organizations.
If you want to get a feeling about how this is playing out, please give https://devstats.cncf.io/ a look. Its a set of dashboards (1 per project) to visualize the sheer amount of community collaboration that goes into this industry.
Is not uncommon to find hundreds of events per hour in the GitHub Kubernetes repo, from industry leaders like Red Hat, Google or Microsoft.
credit: https://devstats.cncf.io
Extensible by design
Since K8s is used in such different scenarios, it becomes obvious that new requirements and challenges around it emerge.
This section is about that, solutions to problems that the community found along the way.
Kubernetes is very extensible and configurable. Frequently, this two characteristics is what allow for building solutions that integrate nicely with K8s.
Some examples follow.
Annotations
Just a piece of metadata, simple in principle but heavily leveraged by tons of projects. Just to name a few:
- nginx ingress controller
- CNI plugins
- Service Meshes
- Monitoring components
- a very long etc.
Custom Resource Definitions (CRDs)
Is there no vanilla K8s object (service, deployment, replicaset, pod, etc.) that fits your needs? Worry not, we got you covered; create your own. This extension of the API is also a cornerstone for all projects that require a resource that does not fit any of the default types. K8s was designed with extensibility in mind, you cannot create a one-size-fits-all container orchestration platform. In hindsight this design decision is paying dividends.
Why am I bringing all this up?
Because it is key to understand how most of the projects integrate with K8s. I would encourage the reader to assert the previous sentence on its own, by having a quick look at how the projects' documentation, just to name a few:
- cert-manager
- calico
- carvel
- konveyor
- argo
- falco
- prometheus
- operators in general
Common development patterns
Projects that wish to make their user's life easier create operators. Operators use CRDs to automate the lifecycle of an app to an extent where the human can be considered just an "operator". Is common to find operators for some of the biggest projects, have a look at: https://operatorhub.io/. Also, consider following this pattern too if you plan to build a project on top of K8s, your users will thank you.
Another very commonly use patterns are: init containers. At Kubecon 2024, Cloudflare gave a talk about how they leverage init containers to ensure that the requirements (S3 buckets in this particular case) where available, or otherwise automatically create them, saving time and preventing the pods from failing if some of the requirements were not in place. For this, the init container was granted higher permissions than the actual pod, but since the lifetime is short and the capabilities of the init container can be limited easily (use an API token that is only able to create S3 buckets when needed, nothing else), this is a much better approach that granting more permissions to the pod itself.
Init containers have been around for a long time, and they are a wonderful tool for such use cases. Can you think of a use case for an init container in your organization? 👀
Improvements in containerization
Small feature, big impact
In this section I'd like to touch on some improvements on the very base layer: containers and also how Red Hat is contributing towards easing the pain that developers face when migrating their legacy apps to container platforms.
Containers images build in images. Based on the size of the organization the time it spends building and re-building images can be substantial. Think that every new version of the app will have its own image. Any improvement in this process has a multiplicative effect, saving time.
In this context I'd like to talk about one of the features mentioned in a talk during this Kubecon.
You might be familiar to git rebase
as a command able of "changing the base" of some set of commits. It would be great if we could also do that with container image layers.
Imagine the base OS layer for your image has a newly discovered vulnerability, instead of having to rebuild tens or thousands of images you can just rebase them, which is fast and very cheap in CPU terms, since it reutilizes the non-affected existing layers.
credit: https://tag-env-sustainability.cncf.io/blog/
Quality of life
Migrating legacy applications designed to run on VMs into a container platform can be very challenging.
This is why Red Hat is working on a solution to ease this process, enter the stage: Konveyor.
While still in Beta (v1.0 expected in Q1 2025) it already supports several languages. Konveyor assists in the process of getting a legacy app as close to container native principles as possible, by translating source files into K8s resources.
One of the surprising (at least to me) features is Kai (Konveyor AI), which uses LLMs (Large Language Models, think of ChatGPT) to analyze the code and get the most of it to suggest improvements. Amazing.
AI
Another big topic during the conference was AI. Nvidia gave several presentations about how they develop and optimize
for running AI-related workloads on top of K8s. Nvidia takes advantage of his expertise in GPUs to be at the cutting edge of
AI development. They presented several improvements and lessons learnt while working on GPU acceleration for AI workloads, such as training LLMs.
GPU resources
By utilizing device plugins (https://github.com/NVIDIA/k8s-device-plugin) they are able to extend the scheduler's capabilities into also taking GPU resources into account.
For example:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
restartPolicy: Never
containers:
- name: cuda-container
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
resources:
limits:
nvidia.com/gpu: 1 # requests 1 GPU, fail to schedule otherwise
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
This is yet another example of how K8s' extensibility is key for more and more diverse workloads on it.
Time versus space partitioning
Another concept mentioned was the optimization of GPU resources via partitioning. In essence, dividing how the workloads share the hardware resources. Time partitioning consists in allocating resources to different tasks at different times, first task A, then task B, etc. Space partitioning consists in allocating resources to several tasks simultaneously but limiting each of them to some % of the total capacity. For instance: task A and B run in parallel, but task A gets 60% of compute cycles while task B gets only 40%.
The key resides in finding the right balance between the two, to maximize the overall efficiency. This is, as you could think, not an easy or straightforward process, since it depends of the nature of the workloads themselves.
One crucial aspect discussed during the conference was the optimization of GPU resources through time versus space partitioning. Time partitioning involves allocating GPU resources to different tasks at different times, while space partitioning involves allocating GPU resources to different tasks simultaneously. Nvidia emphasized the importance of finding the right balance between these two approaches to maximize GPU utilization and overall system efficiency.
Virtualization on top of virtualization: vGPUs
Another highlighted concept was the implementation of vGPUs. Similarly to the somewhat more familiar vCPU concept, this approach allows for another abstraction layer: virtualization of GPU resources resulting in multiple workloads to share physical GPUs efficiently. By virtualizing GPUs (again, much alike vCPUs) an improved resource utilization can be achieved, especially relevant in multi tenant environments where isolating workloads is required.
Solutions at scale
At this point, and specially after, social media, the amount of users, big-data, and AI one could already develop the intuition that the amount of data processed is ever-growing, and growing rather fast. But it was at a talk by Deutsche Bahn
where a mention to Moore's new law put this into figures rather than just a vague idea. Deutsche Bahn claims that
computing requirements multiply by 10 every 18 months. In comparison with the more widely known Moore's law that (somewhat accurately) describes that computing power only doubles every 18 months.
While this was presented in a talk focused on efficiency and sustainability, to me it connects directly with the idea that
big corporations need solutions that are able to scale, and scale BIG.
Sustainability and Efficiency
Due to the benefits of virtualization, which allows better usage of hardware resources, K8s has a net positive environmental impact.
But this topic is way more complex, lets point out some ideas to illustrate it.
An important step towards sustainability is having some sort of energy monitoring. Even if it's simple and primitive or the data is not the most accurate at first, it is still better than nothing. And once we've some metrics one can start taking informed decisions towards small incremental improvements.
For instance, downscaling of test workloads when they are not required. How many engineers utilize the non-production related resources outside of office hours?
This doesn't mean having to scale down all test environments at 18:00 o'clock, we all had some longer days due to various reasons... we can be flexible about it. But it's about small incremental improvements.
Efficiency is also related to scheduling, think that we can fine-tune our schedule to optimize for compaction or distribution. Compaction aims to increase the utilization of the nodes, saving costs. Distribution on the other hand would try to spread the workload, as much as possible, at the expense of having less utilized nodes/cpus/gpus. This usually yields better availability but incurs in more cost.
This is a multi-variable optimization problem, the is no one-size-fits-all solution. Each organization will give a different weight to each of the variables. Some workloads might tolerate better potential downtimes caused by lack of distribution (compaction) while others have to optimize for availability because they are business critical.
Whatever your needs are, because K8s is highly configurable and extensible, you've the tools to reach your goal.
Much more
There are many more interesting topics that were discussed in the Kubecon, that we're not touching in this post:
- Disaster Recovery strategies: simple backups vs. geographically replicated services.
- Custom load balancing algorithms to reduce inter availability zones traffic, and thus cost.
- Observability improvements.
- Security thread detection
- etc.
The sessions were recorded 👀
Stay tuned for more K8s content 🚀