How to keep track of what your customers really care about

Author profile image
Marco Ieni, Senior Rust Backend Engineer
9 Feb 2022
Hand holding a customer

After the launch of PayDirect, we learned a lot about how customers see and use our instant payment product. By listening to feedback, we found that we weren't tracking something they really cared about: how “instant” our instant payments actually are.

This article will walk you through the process of introducing a new service level indicator: in this case for payment settlement time. We will also provide an overview of our observability stack and philosophy.

Quantifying what good looks like

  • Is the service up?

  • Are we responding HTTP requests in a reasonable amount of time?

We want an answer to these questions in real time, and if the answer is alarming (for example, the service is down), we want to send an alert to the relevant engineering team. 

Service level indicators (SLIs)

We start by defining and measuring a set of service level indicators (SLIs) – a quantifiable representation of the product properties that users care about. 

Usually, we start by tracking a few standard metrics. These include availability, error rate, throughput, latency percentiles and more.

Service level objectives (SLOs)

Measuring is not enough. Is it good enough if your availability is 90.5%? What about 99.9%?

For each SLI, you want to define a threshold and a time period. For example, 99.9% of incoming requests in the last seven days were handled successfully. That is a service level objective (SLO).

Defining (and being accountable!) for your SLOs might be scary, but baking those SLOs into the design of your services can also be the most enjoyable part of the job! Plus, operating a service that is always available under heavy load is extremely satisfying.

Service level agreements (SLAs)

Saying that your service is “always” up and running might be too much, but it should at least satisfy the service level agreements (SLAs), which are basically SLOs written into the contract you signed with your customers. Failing to meet your SLAs can lead to unpleasant financial consequences.

Let’s get back to our problem. We started to receive several questions from our clients, which were a variation of the following:

  • Can you check why the transaction with id 904c09e2-5a55-4ed6-a169-f8dc886c8d67 is not settled yet?

  • Why are those transactions taking so long?

It’s a bad sign when clients spot a problem before you do. That’s when we realised that we were missing a key SLI: transaction settlement time.

The rise of a new SLI

  • Why are payments not instant?

Moving money is a complicated business. In fact, each payment goes through a multi-status lifecycle: 

  • Booked: TrueLayer checks that there are enough funds in the account

  • Submitted: TrueLayer instructs the payment scheme to process the payment

  • Settled: the payment scheme notifies us that money has landed in the beneficiary account

In-body diagram

We were already monitoring the success of each state change in isolation, ie the percentage of messages handled successfully and how long it took to process each of them. But we weren’t monitoring the payment lifecycle as a whole — how long it took to move from booked (the payment is authorised) to settled (the money has landed in the bank account of the beneficiary).

It was time to fix it.

Tracking the new SLI

We use Prometheus to scrape and query realtime metrics exposed by our services. 

In a nutshell, Prometheus does a GET request against the /metrics HTTP endpoint, which all our microservices expose. This allows us to query the collected metrics. These metrics are used to both notify the teams if an alert is triggered and to power our dashboards in Grafana.

All the data required to compute the transaction settlement time is stored in our payment ledger.

The simplest way to start tracking the new SLI would have been to add a new metric to its /metrics endpoint. However, that isn’t the approach we chose.

The payment ledger is one of the most critical components in our entire stack. It's a fairly complex project on its own and, to keep it maintainable, we try to limit its scope as much as possible.

There was also a performance concern. We didn’t want to increase the load on the database that is used as the source of truth for payment statuses. Computing the transaction settlement time every 15 seconds would have impacted our maximum throughput (#payments/s), which is one of the key SLIs for the payment ledger.

There is a way forward, though. Our payment ledger emits an event every time a payment is created or its status changes. We decided to use those events as the integration point for tracking the new SLI.

The rise of a new microservice

We created a new microservice called transaction-settlement-tracker. It consumes the payment ledger events and maintains a view over payment state transitions in a separate Postgres database.

The view does not store all information about a payment. It just keeps track of statuses and how long each transition took. This helps to keep the microservice focused and easy to maintain.

In-body diagram

We later realised, looking at our view, that we could expose another key metric we were missing: the total number of in-flight transactions (transactions that are not settled yet).

In-flight transactions have proven to be extremely helpful during our periodic load tests to understand if the payment ledger was struggling and how fast it recovered after a massive request spike.

In-body diagran

Conclusion

We’ve now been monitoring transaction settlement time for over two months and we have a precise understanding of what normal and abnormal looks like.

In the next few weeks, we will roll out alerts to page our on-call team when the metric fails to meet the threshold we have identified.

In 2022, we’ll be able to proactively notify our customers of potential issues around settlement delays before they reach out to us! 

Grow with us
We are systems builders, design thinkers and product crafters. Together, we build intelligent financial infrastructure that puts fintech at people's fingertips.
View open roles
Latest
Pay by bank phone
12 Jun 2025

Pay by Bank protections: a modern approach

15 million users milestone
10 Jun 2025

TrueLayer hits new industry milestone, surpassing 15 million consumers

Hey, I'm Andy from TrueLayer, and I'm going to try and tell you everything you need to know about Pay by Bank—in just ninety seconds.  Let’s start the clock.  Let’s keep it simple. What is Pay by Bank? It’s a payment method that lets you pay directly from your bank account via your banking app—with zero need for card networks.  That could mean buying pizza, paying for flights, or just about anything in between. And it’s actually pretty easy—and very quick.  It looks a bit like this: start by tapping the Pay by Bank button, then choose your bank from the list.  If you’ve used it before, we can even preselect your preferred bank. You then review the payment, and you’re seamlessly redirected to your bank app to approve it using secure biometrics.  That’s Face ID or a fingerprint, to you and me. And that’s it—success. But no time to relax—we're on the clock!  Now, this might be the first time you’re hearing about it, but every month in the UK, 27 million payments are made using Pay by Bank. And most people who haven’t tried it yet say they’d be happy to—if given the option. On the merchant side, nine out of ten businesses are already planning to adopt it in one way or another.  So what’s in it for businesses?  Number one: more potential sales. No cards means no long card numbers, no clunky 3DS2—just a smoother experience from start to finish. And it converts.  Number two: because payment details are pre-populated and verified with biometrics, things like card-not-present fraud, chargebacks, and authorized push payment fraud are virtually eliminated.  Number three: lower costs. Without all the intermediaries and manual admin, the total cost of Pay by Bank is typically lower than card payments.  I'm running out of time—one last benefit: instant refunds. And trust me, shoppers love instant refunds.  And breathe. That was a lot to cram into ninety seconds.  If you’d like to take your time and learn more about Pay by Bank—and why brands like Just Eat Takeaway, lastminute.com, Ryanair, and Papa John’s already offer it at checkout—you can read our in-depth guide. There should be a link on screen now.  And that’s it. Thanks for watching.
9 Jun 2025

Pay by Bank explained in 90 seconds

Categories to explore