Flux2 migration: how we dropped our CPU usage by nearly 40x
Kubernetes is the go-to system for tasks like container orchestration and deployment. Still, it has its drawbacks. Teams running their applications on Kubernetes end up with many files of Kubernetes manifests.
As release cycles increase, getting these manifests into a Kubernetes cluster can be a tiresome and resource intensive job to do over and over again. Enter Flux, a tool that automates this process.
At TrueLayer, we’ve already seen promising results from Flux. With the release of Flux2, we wanted to share how we migrated from Flux1 to the new version and what we gained from the process.
What is Flux?
Flux is a tool that continuously reconciles Kubernetes manifests from a source into a Kubernetes cluster. If Kubernetes manifests are committed into a source like a GitHub repository, Flux can continuously synchronise the manifests and apply it into the cluster.
At TrueLayer, we’ve been using Flux to bring Kubernetes manifests into clusters for a long while. With Flux1’s lifetime coming to a close, we started exploring the new possibilities of Flux2.
Flux2 has many new features, but the following were of particular interest to us:
Server side validations
Use of CustomResourceDefinition objects to configure Flux
Architected to support multitenancy
We worked out the key benefits of migrating to Flux2 and how we could make it work for us.
Our GitOps setup
At TrueLayer, we have more than 40 engineering teams. Each team at TrueLayer owns many microservices/applications. Each application has its own Kubernetes manifest files.
As part of our Continuous Integration (CI) pipeline, we render the application manifests (using either Helm or Kustomize) and automatically create a GitHub pull request with the Kubernetes manifest files against their corresponding GitOps repository.
The structure of our GitOps repositories centres around teams. Within each team repo, we follow the same pattern: a folder for each environment, containing folders for the deployed applications with their Kubernetes YAML manifests.
While the GitOps setup seems fairly simple, the Flux1 setup is rather complex. We had one Flux GitOps agent for each GitOps repo per cluster. Each of these agents ran on one CPU core, so 40 CPU cores were permanently allocated.
We managed each Flux GitOps agent via a Helm release using Flux Helm chart in a Terraform script.
Flux2 setup and the challenges
The conventional Flux2 setup would have been similar to Flux1 — one Flux GitOps agent for each GitOps repository. However, we came up with our new setup leveraging Flux2's more configurable interface.
Flux2 introduced many CRDs, among which we wanted to use GitRepository and Kustomization. The GitRepository resource configures how a GitOps repository (eg from GitHub) is pulled into the cluster. It is then read by the Kustomization resource to apply the Kubernetes manifests present in the GitOps repository into the cluster itself. If we could have one Flux GitOps agent to sync all the Custom Resources, we could drop the CPU cores from 40 to just one. We were excited to make this happen.
Unlike the Flux1 setup, where we had one Flux GitOps agent per repository, we had one agent per cluster in the V2 setup. We configured each repository using a GitRepository Custom Resource. Every application in the GitOps repository was configured using a Kustomization Custom Resource.
We created a centralised repository to store the manifests of GitRepository and Kustomization Custom Resources for all applications. This repository is fondly named flux-capacitor, after the crucial component that makes time travel possible in the 1985 movie Back to the Future.
This flux-capacitor repository is synchronised and applied into the cluster via its own Flux GitOps agent via a Terraform script. The agent is managed via a Helm release using the Flux2 helm chart in a Terraform script.
The GitRepository Custom Resources and Kustomization Custom Resources are created in the cluster via the Flux GitOps agent. These resources then synchronise the corresponding GitOps repository and apply the Kubernetes manifests present in that repository. Since the GitOps repositories contains Kubernetes manifests of the applications, the applications get successfully deployed into the cluster.
We automated the process of having the flux-capacitor repository updated when a new application is added or deleted by running a script in the CI pipeline of the GitOps repositories. This means the flux-capacitor repo is automatically kept up to date with the Flux configurations of the applications that are deployed to the cluster. The flux-capacitor also serves as the single source of truth to do sweeping changes to the Flux configurations if needed.
With our original Flux setup, we were running one pod per GitOps, and with 40 teams, that required a lot of cash and CPU. But with this setup, we run just one flux GitOps agent for an entire cluster. In total, one flux GitOps agent manages over 40 GitRepoCRDResources and 240 FluxKustomizeCRDResources.
Our migration to Flux2 has paved the way for a config-managed setup. Not only did this drastically reduce costs, but it also made Flux reconciliations faster and reduced CPU usage by almost 40x.
As the sun sets on Flux1, migrating to Flux2 may sound like a daunting task. But with the right migration plan, engineering teams can reap the benefits.