Data & model quality: what we learned from Yields.io.

Posted on 9-apr-2021 14:16:17

At our first DataOps Ghent Meetup of 2021, we had the opportunity to learn more about DataOps in FinTech. The second speaker of the night was Jos Gheerardyn, co-founder and CEO of Yields.io, creators of Chiron, an AI Platform for model risk management that uses AI for real-time model testing and validation on an enterprise-wide scale financial institutions like banking.

Yields.io is determined to instil trust in the planet's most impactful algorithms and make tools that help users and developers ensure their models live up to their standards.

“Data is today's new oil, and it’s the oil of the industry. It’s what many industries are consuming to generate value.” - Jos Gheerardyn.

We can compare data to oil in the 18th century because its value and potential are apparent, but it requires processing to create value. Those who put their time in the process of learning to extract, refine, and utilise Data will receive a great ROTI (Return On Time Invested), as it did for the oil industry.

 

How Data drives value

Data is used in many ways. A primary use case is displaying data to discover obvious patterns. The visualisation of the Data leads to insights. Typically, this is where BI tools and dashboards come in.

On the other hand, new patterns can be discovered in Data. For this, different types of mathematical algorithms are used.

Issues in data

 “Many algorithms are used on data that isn’t correct and might contain issues.”

Let’s look at two situations where this is the case and what effect it can have:

Incomplete data

A good algorithm needs the whole picture, or in this case, all the relevant (Roomba) data. 

Anomalies

An example of the impact of data quality issues on mathematical models' performance is the Melbourne building simulation in a windows flight simulator. The creator tried to make 3D maps from 2D plans using machine learning. During this process, the height of this building was put in incorrectly. As you can see in the image below, the building ended up towering over the landscape and defying physics. It might have broken immersion for some, but it proved the importance of data quality checks for everyone

These are just two of a long list of issues with algorithms caused by data. It’s important to realise that there’s quite a bit of risk related to using models on just about any dataset.

 

Meaning of a model

Model risk has already been actively managed in the financial sector for quite some time. Reasons: They have been producing massive amounts of data, they have been very intensive users of mathematical models historically, and plenty of accidents have happened in the past, f.e., banks have lost billions of dollars because of mistakes in models.

Definition Model published by the FED (The Federal Reserve):

“The term model refers to a quantitative method, system, or approach that applies statistical economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates.” (2011, FED)

Putting it simply: the model is what’s between the input data and the results, whether that’s a desired output, behaviour, estimate, or prediction.

If you’re comparing data to oi, the model is the machine powered by it producing your desired results. And just like any mechanical engineer would tell you, these need to be managed correctly.

A mismanaged model, just like a mismanaged machine, has its risks that endanger the quality of your end product, which leads us to model risk.

Definition Model Risk published by the FED (The Federal Reserve):

“The potential for adverse consequences of decisions based on incorrect or misused model outputs and reports. Model risk can lead to financial loss, poor business and strategic decision-making, or damage to a banking organisation’s reputation.” (2011, FED)

In summary, model risk calculates what can go wrong while processing the data into, through, and out of the model.

 

Issues with models

There are roughly two types of issues with models: First, there are the simple ones, such as bugs and software issues. And secondly, there are the more subtle and often encountered ones, such as models applied in contexts for which they haven’t been developed.

An example of the last one is building a model to predict the temperature for tomorrow. To make this happen, you’ll need massive amounts of (climate) data, and you’ll need to train the model on historical climate data and then apply it to today's climate. It probably won’t work as expected because the climate has changed since you took it as a reference. 

When using a model, you need to understand what data it has been trained for and for what context it has been developed.

 

Evolution of models

Models have evolved a lot over time. About ten years ago, they were primarily used as bottom-up approaches: you start with statistical assumptions and build a model based on those assumptions. 

Nowadays, we can learn from data and extract non-linear patterns from it with neural networks and other types of algorithms. Due to this, interesting effects can appear, f.e. bias, adversarial attacks...

Model risk management 

Knowing how to handle the uncertainty that comes with mathematical modelling is called model risk management.

Models have become so common and impactful throughout the years that it’s important to understand and know how to avoid the risks because the outcome can be drastic, and this is where yields.io comes in. 


Eager to learn more about Jos Gheerardyn, Yields.io and their platform for model risk management? 

Or do you want to learn more about the DataOps solutions for banking? Go here.

You can watch the talk on the DataOps Ghent YouTube channel.

Topics: DataOps

Join our 100+ subscribers!

Stay informed about topic related to DataOps, data management, Tengu, interesting data events.

Subscribe