gRPC load balancing in Rust
In the first post in our open source series, we share our solution for gRPC client-side load balancing, using ginepro.
We want to give back to the wider developer community. Each post in our open source series walks you through a challenge faced by TrueLayer's engineering teams – the code we wrote to solve the issue is released under an OSS license.In this post, we'll explain how we tackled the challenge of gRPC load balancing in Rust.Our solution was to release ginepro, a new gRPC channel implementation for tonic. ginepro (github) provides client-side gRPC load balancing by enriching tonic 's channel with periodic service discovery.
The backgroundTrueLayer has recently started adopting Rust as a backend language. As we deploy more and more services to production we have to constantly improve our applications so that they can handle more load and achieve the required reliability SLAs.Load balancing gRPC requests has been a challenge: we do not use a service mesh and there was no gRPC client in the Rust ecosystem that satisfied all our requirements. To bridge the gap we built ginepro – an add-on to tonic's Channel which provides service discovery to perform client-side look-aside gRPC load balancing.
LoadBalancedChannel is a drop-in replacement for tonic's Channel:
The problemgRPC uses the HTTP/2 protocol to multiplex requests and responses over a single TCP connection. This allows gRPC to be more efficient: you only pay the cost of establishing a connection once and better utilise the capacity of the underlying transport.Multiplexing, though, has a few implications when it comes to load balancing.HTTP/2 load balancingHTTP/2 connections are persistent: a direct connection between a client (or a load-balancer) and a specific server should remain open as long as possible.We do not open a new connection to a server every time we want to make a request.Here we can say that load balancing is done on a per-request basis: for every request the client will choose a new server and issue that request through an existing connection.But what happens if load balancing is moved out of the client?Clients will maintain a connection to a load balancer and all requests go through that single connection. Traditional network load balancers, however, are unable to tell application requests apart.Since network load balancers function at the fourth layer of the OSI stack (known as the transport layer), they can only reason about TCP and UDP connections. Therefore they will only able to forward the traffic from one client to one fixed server (remember that connections are persistent!).
To solve this problem HTTP/2 load balancers must be able to inspect the application traffic. This is what application load balancers are for: they can tell requests apart and choose a new server for every incoming one.Skewed load Let’s look at a scenario where failing to perform HTTP/2-aware load-balancing can cause servers to be utilised unevenly.Let’s consider a toy system: one client and two servers, talking gRPC over HTTP/2.
When the client starts up, it performs service discovery (e.g. a DNS query): what IPs can I send my requests to?The answer is a set with two elements: server A and server B. An HTTP/2 connection is established with each of them.Every time the client needs to perform a request it now chooses randomly between the two connections (round-robin).What happens if server A crashes?In most gRPC implementations, the client will not perform service discovery again.The client will start routing all its requests to server B, even if server C is spawned to replace server A!
Server C is completely unused, wasting resources and worsening the performance profile of the system (server B is overloaded!).
An overview of gRPC load-balancing approachesThere are a couple of approaches we could take to avoid the scenario we just described:
- The servers periodically force the client to reconnect
- The client periodically performs service discovery
- We introduce an application load balancer
Out of processIn this category the complexity of load balancing is (entirely or partially) moved away from the client into one or more dedicated processes.These processes can either sit alongside the application (e.g. a sidecar container) or run as standalone services.Generally speaking, out-of-process gRPC load balancers come in two forms:
- Look-aside services that tells the client which server to call
- A completely separate service where the load balancing, health checks, load reporting and service discovery is completely transparent to the application (eg Envoy).
- maintain a connection to the look-aside process (what server should I call?)
- establish and maintain open connections to all healthy server backends.
ProxyWith a proxy all the logic is moved away from the client.This leaves the client very simple as it only needs to maintain a single connection to the the proxy. There are several ways to incorporate a proxy into your infrastructure:
- Service mesh: dedicated infrastructure layer that controls service-to-service communication (eg Istio and Linkerd), deployed as a sidecar.
- Service Proxy: single standalone service that all clients connect to and is configured for each gRPC service.
- Container sidecar proxy: sidecar proxy is deployed alongside every client that are all configured to proxy across the same gRPC service.
However, nothing comes for free – there is always a trade-off:
- More moving parts in the hot path, impacting the latency of your system
- Both a service mesh and standalone proxies add a lot of complexity to your setup, with novel failure modes. They need to be set up, monitored and maintained.
TrueLayer’s approachTrueLayer leverages gRPC to have strongly-typed contracts between applications written in various programming languages (C#, Rust, TypeScript, Python).We currently do not run a service mesh in our Kubernetes clusters, therefore we do not get gRPC load balancing out of the box.Historical precedent, C#: use an Envoy sidecarMost of our early gRPC servers and clients were written in C#.There we used the sidecar approach – a manually-configured Envoy proxy. With an Envoy sidecar you get a production hardened-solution with a considerable community around it. It was the fastest way to get gRPC load-balancing working at that point in time.Standalone sidecar proxies, as we discussed, increase the overall complexity of the system: it is another component to configure, operate and understand. In particular, configuration management scales poorly as the number of services increases while testing and reproducing failure modes locally or on CI is fairly hard.Rust opportunitiesWhat about Rust? What does the gRPC landscape look like? Is client-side load-balancing viable?Let’s look at Rust’s most popular gRPC crates:
- grpc-rs by TiKV – implements load balancing but no way of updating service IPs;
- grpc-rust – does not implement load balancing;
- tonic – implements load balancing and has building blocks for updating endpoints dynamically.
tonic has another upside: it builds on top of tower, a flexible and widely-used middleware framework for networking applications. In particular, it relies on tower::balance for the load balancing logic. The very same component is used by Linkerd’s data plane proxy: we are building on top of a stack of production-hardened components.Building a tonic-based solutionWe decided to give tonic a shot and invest some time to explore what a feature-complete client-side load balancing channel would look like. The result is ginepro, the crate we open sourced.This section delves a bit deeper into the implementation details.The gap we need to fill in tonic is service discovery: we want to periodically refresh the list of healthy server backends – the list of IPs we can send requests to. We assume that each service name maps to one or more IP addresses (ie replicas of the server). The logic to resolve the service name is decoupled from the channel implementation via a trait, LookupService:
This contract gives us three properties: we can easily inject failures via a mock implementation to test unhappy scenarios, we do not constrain how IPs are resolved, and we can hide protocol specific nuances like DNS record timeouts.At TrueLayer, for example, we have two choices when it comes to resolving a service name to a list of IPs: we can either query the Kubernetes’ DNS or query Kubernetes’ API directly (Endpoint resource).How does LookupService fit into the client logic?There is an event loop running in the background, managed by our LoadBalancedChannel.LoadBalancedChannel keeps a set of known endpoints and, on a schedule, triggers service discovery.We then remove the endpoints that have disappeared and add the newly discovered ones.
It's worth pointing out that the gRPC client does not remove an endpoint if it starts failing: it will keep trying to connect until we explicitly tell the client to remove it (eg the server fails health check probes and gets removed by Kubernetes).