Violette Cubier, InFiNe.lu grantee, is currently attending the Boulder Digital Transformation of MFIs programme. This year this programme is offered digitally: once a week, Violette takes a course on a specific aspect of her chosen course and summarises it for us/you :
This week’s module was about “Data and data analytics”. Dean Caire, who is a consultant with extensive experience in credit scoring, presented the module.
The amount of data captured by MFIs has substantially grown since 2010. It now includes data captured by the MFIs themselves on clients, data from third parties (credit bureau for instance), data from market research, call centres and all transactions with clients. We are also seeing an increased use of alternative data, such as call data records (from Mobile Network Operators), agents’ transactional data, geospatial data, data from social media, or psychometric surveys.
This data is being used by some MFIs to build predictive models and anticipate trends on purchases of products, on clients who might exit and, most often, to predict the repayment of a loan (credit scoring).
Few MFIs have however already reached that stage. Most are still only collecting data, without structuring it much. Dean Caire therefore delivered through this module some advices for MFIs and organisations just starting their journey on data analytics.
First advice: before engaging in a data project, ask yourself what it is you want to achieve, and what the ultimate business goal is. This will be the starting point for determining data needs.
As illustrated by IFC’s in its handbook on “Data analytics and digital financial services1” , dashboards, for example, “should reflect demand from the business units and help them make more informed decisions”. The most useful data “are those that can be turned into the information needed to make decisions”.
As explained by Dean Caire, “the challenge in the first instance is not in amassing any and all types of customer data and information that can be captured in digital format. Instead, the challenge is in digitally capturing and organising the most relevant data captured in existing business processes”.
While it can be useful to collect as much data as possible, this can also sometimes create some frustrations. It often happens that MFIs have access to huge repositories, and yet do not find any useful purpose in them. This is what Dean Caire calls a “data swamp”. He further explains that “simply having a lot of data does not solve problems. Data that has not been collected for the purpose of a specific set of analyses may never yield any answer, regardless of how much data you have. Adding more data unconnected to your questions will make your task harder, not easier”.
His advice is therefore to start with the data at hand, or data that can be acquired at a reasonable cost. Behavioral data (about what clients did in the past) is often a good start. In this process, one should always remember the 80/20 Pareto rule: 80% of the time spent by a data scientist is on gathering, cleansing, and storing the data, while 20% of the time is spent on analysing the data. At the same time, in data analytics projects, usually 20% of the available data set provides 80% (or more) of the answers (or explanatory power) for the analytical question.
This leads to a clear conclusion: focus on the most relevant data first. As explained by Maria Fernandez Vidal, Dean Caire, and Fernando Barbon in a technical guide on data segmentation2, one should “focus first on what is known or thought to be relevant. Data mining or looking at large data sets without a hypothesis on expected relationships may lead to spurious or chance results and models that may fit too closely to the particular data set and not work well with new data”. Data that has never been used or never considered relevant in the past is unlikely to revolutionise the analysis.
While data driven segmentation (with the use of machine learning to identify clusters) can provide new insights from data, it requires specialised skills (data scientists), can be more costly and longer. Rules-based segmentation (usually in one or two dimensions of the data only) can sometimes be enough to get some first insights.
More generally, as explained by Dean Caire, “one simple truth about predictive models […] is that once a few good predictors are used in a model, each additional predictor contributes incrementally less to the model’s overall predictive power”.
This means that a model does not have to be extremely complex nor to include as much data points as possible to work well. When an MFI has enough data at hand, it should give preference to data points that:
• Are objective and can be observed directly, rather than reported by clients;
• Are intuitively the most relevant to credit risk / ability and willingness to repay a loan;
• Are not too costly to collect;
• Can be consistently collected for all clients.
1 Data analytics and digital financial services, IFC, first edition of 2017
2 CGAP Technical Guide on Data driven segmentation in financial inclusion, Maria Fernandez Vidal, Dean Caire, and Fernando Barbon, July 2019
InFiNe.lu is the Luxembourg platform that brings together public, private and civil society actors involved in inclusive finance. The value of InFiNe.lu lies in the wide range of expertise characterised by the diversity of its members.
With the support of
Picture 1 © Pallab Seth