Transparency of machine learning algorithms


By this time, there is a lot of talk around GDPR (General Data Protection Regulation) which is going to be applied in the EU (European Union) on May 25, 2018, in this post let's see what does GDPR mean and what are the main implications for the different companies collecting data from citizens. I am no expert on data privacy or algorithm transparency, I am just trying to share some of my thoughts at the time being.

What is GDPR ?

On the website EU GDPR, the message is clear and they state that :

The EU GDPR is the most important change in data privacy regulation in 20 years.

Following also the Cambridge Analytica problem with the data collected from Facebook, data privacy has come to light and by the way has emphasized the need for a more democratized regulation. This issue is not new though since there were previous laws to enforce the data privacy of citizens, in fact it replaces the Data Protection Directive (DPD) of 1995 by improving it and taking into consideration the advancement in the technology.

The time has changed and the challenges are no more the same 20 years ago. Today, there is GAFAM (Google, Apple, Facebook, Amazon, Microsoft) which are leading technology companies with a huge data collected from all sources, sometimes even without the knowledge of the users. We have to admit that, it is not every user who knows under the hood how the technologies they use work. Usually, their primary interest is to make use of those technologies and very few take time to read the terms when signing in in a new app.

So the goal of GDPR is to protect all the citizens, from the well-versed to the lambda there, when it comes to their data usage by companies, not only the big ones but every company that collect by any means data about citizens. Basically, it aims at putting the companies in a situation where they will put effort in democratizing the use they make out of the data they collect from their european citizens, regardless of the nature of the data.

One key factor of the GDPR is the need for transparent information and communication, which leads to the second part of this post.

Why is it important to have transparent algorithms ?

A simple example to understand the transparency of algorithms, is the recommender systems such as ads or even the decision of a bank to give a client credit or not. All these processes are today powered by algorithms, that behave mainly as black-boxes in the sense that what happens inside the algorithm itself is not mastered and the factors that influence the output given an input are not controlled.

Having transparent algorithms is mainly about having algorithms which are more explainable and for which the output is predictable in the sense we know what influences the decision at the end. You know today even the recruiting process is being automatized to filter out bad candidates by analyzing their CVs. A candidate would never know why he/she was rejected and he/she just receives an automatic mail with the same format telling them despite all the interest of their profile, there won't any further action for their candidacy. If there was a fair share of information, he/she could have informed that it is due to the fact that he is overqualified for the position or whatever reason.

Having transparent algorithms and being able to explain to all citizens how those algorithms behave will lead to a more fairness in the big data world. People will know how their data are being used, they will also feel a control over the process. There will be no surprise that if you see an ad everywhere about perfume when you just searched Dior Sauvage in Google.

There is a lot to say about this topic, but this blog post is about giving an overview on the matter.