Improving the classification of your transaction data with machine learning
Authors:
Alex Spanos: Lead Data Scientist, Data Platform
Daniele Paliotta: Machine Learning Engineer Intern, Data Platform
Transaction classification at TrueLayer
TrueLayer’s Data API is a uniform, reliable and secure conduit through which applications can retrieve banking data of their end-users, including their financial transaction history.
On top of enabling the underlying connectivity to the raw transactions, TrueLayer provides additional information and insights into the transactions themselves. In this context, our classification service enriches financial transactions with information related to their purpose and the relevant counterparty.
The output of our classification service is embedded in the response of our Data API. For each purchase-related transaction (i.e. Credit/Debit Card payments and Direct Debits) the service tries to assign:
a category and sub-category based on our taxonomy; and
the merchant name.
The service is currently in beta and in the Platform team, we are actively working on enabling classification for other transaction types, such as Transfers and Standing Orders.
In September 2019, we shipped a new version of our classification service. This release represents a major milestone for TrueLayer; our first foray into the world of Machine Learning-enabled data products — the first of many to come!
Nomenclature note: this process is frequently referred to as “categorisation” in our broader ecosystem, however, we prefer “classification”.
The joys of rules-based systems
The original service for classifying purchase-related transactions was rules-based; an expert system.
It relied on building classification rules with the transaction description through an offline human annotation process. But transaction descriptions returned by providers can vary quite a bit according to payment-related conditions and the provider itself. So, to make annotation more efficient, we grouped similar transactions together by creating a complex set of parsing rules that standardised the form of the transaction description across providers and payment conditions.
Although quite basic as a service, it succeeded in classifying around 75% of purchase-related transactions flowing daily through our Data API (coverage) — not too bad 😏.
However, as day follows night, we ran into many of the problems rule-based systems suffer from.
Diminishing returns: ↘️
Human annotation stopped being practical in improving classification coverage, as the distribution of transaction frequencies by the merchant is long-tailed. Coverage plateaued at around 75%.
Maintenance: ⚙️
As providers change description formats, the complex parsing rules no longer work — and it becomes increasingly difficult to identify which specific rule is at fault. Creating more rules adds significant maintenance overhead.
Generalisation: 📖
The parsing and classification rules were UK-specific. Very little knowledge can be transferred to enable classification in different market geographies; rules would have to be built from scratch.
It became very clear that to make our purchase classification service more valuable for our clients we had to solve the fundamental limitations of coverage and maintenance imposed by the deterministic rules-based system and its non-scalable human annotation requirements. If only there was a way to infer the rules more easily…
Infusion of machine learning
This textbook problem provided the basis for developing our first Machine Learning-based service.
At a high level, we used supervised learning to infer models for transaction classification that map information relating to the transaction to a category/sub-category combination.
Compared to the (already complex) rules-based system that relied only on the description, with supervised learning we were able to easily take advantage of additional transaction properties, such as amounts, timestamps and others.
By “feeding” these (features) and their associated categories/sub-categories assigned through the human annotation process (labels) to the modelling algorithms, they eventually learned patterns mapping the former to the latter — eventually enabling models to generalise; namely, to accurately predict categories on previously unseen transactions. We will cover our Machine Learning model development workflow in more depth in a future blog post. Promise 🤞!
After some exploratory work, we also determined an optimal model-dependent “minimum confidence” threshold that needs to be achieved before assignment — model predictions are not always trustworthy!
Eventually, we rolled out the new Machine Learning service as a “fallback”, to be invoked only when the rules-based classifier fails to assign a classification.
As a result, we measured significant classification coverage uplift for almost all providers (banks), and globally by around 10%.
Additionally, we measured the accuracy of our classification service to ~90% at the category level and ~75% at the sub-category level.
Wait: was it that easy?
Well, no. Introducing Machine Learning capabilities to an organisation from scratch is not a straightforward task!
Our goal in this project, in addition to creating a new service, was to build a sustainable, state-of-the-art machine learning pipeline, which could make the development of additional functionality fast and seamless.
On the one hand, model development is an iterative process that, as most Data Scientists can attest to, consists of time-consuming and error-prone data understanding, cleaning and preprocessing steps. Meanwhile, achieving high predictive accuracy whilst maintaining model interpretability is another challenge that we were conscious of during this stage. Thorough experimentation management, tracking and collaboration were vital for us during this phase.
But this was only one part of the story. Engineering complexities arise in integrating such services in an enterprise-level infrastructure, in our case, Kubernetes microservices.
As an API provider, we impose strong requirements on latency and reliability of our services and have invested a lot in monitoring, tracking, and alerting.
But Machine Learning-powered services are a different beast! Apart from “functional” monitoring, we also need to closely monitor the quality of the predictions. How is it actually performing in the wild? When do we choose to replace the model? And when we do, what are the consequences?
We also need to be able to reproduce a previous model prediction at any given time, for any model version, for debugging and, potentially, compliance purposes. That is tricky! It requires full provenance through version control of the training dataset and the exact model pipeline (including parameters).
Machine Learning has been described as the “High-Interest Credit Card of Technical Debt” for good reasons. With great power comes great responsibility!
How we are addressing these issues at TrueLayer will be the topic of (yet) another blog post — stay tuned!
How is that important for end-users?
With improved transaction classification data, personal finance management apps could provide end-users with a categorisation of their incomes and expenditures. This makes it easier for them to keep track of their spending, and creates an opportunity for more personalisation and improved user experience.
What’s next?
In the Data Platform team, we are working on making our classification service feature-complete.
The next Data Science challenge we are excited about is the classification of “transfer”-related transactions: Transfers, Standing Orders, etc.
This will give our clients the ability to identify income and rent-related transactions of their users, which in turn will unlock a host of new services we can offer.
If you have feedback or ideas to throw on our table, we are eager to hear from you! Get in touch by opening a ticket in our Help Centre.
Cheers to better data! 🥂 📊