Over the last two years UnBias has engaged with a wide range of stake holder to explore the issue of bias in algorithmic decision making systems. In this post I wanted to share some personal thoughts on this issue, especially in relation the introduction of “AI” systems.
To begin, we should clarify that the way in which AI is currently commonly used, what is meant is something like “Artificial Intelligence has various definitions, but in general it means a program that uses data to build a model of some aspect of the world.” This clearly places the focus on the Machine Learning (ML) branch of algorithmic decision making systems.
When we look at the ethical concerns that are raised around AI, one of the key areas of concern is unjustified, and often unintentional, bias/discrimination in algorithmic decisions. When we think of online services, one of the main areas where machine learning types of AI are being used is service personalization.
Discrimination and personalization are intrinsically connected. Discrimination means “making a difference between”, personalization promises that every individual is treated differently (in reality every individual is categorized into, increasingly fine, categories and treated as equal to the other people in that category). Provided there is a reasonable and acceptable justification for being treated differently from other people such personalization/discrimination can be a good thing. When I use an online search engine I hope that the results I get back will be specific to the search term I put in and not some uniformly average response to the global average of search queries.
There are two, interconnected, ways in which algorithmic discrimination becomes problematic:
- When the system starts to differentiate between people on the basis of factors that we as a society have decided should not (or should no-longer) be valid reasons for treating people differently (e.g. race, gender etc.). ML based algorithmic decision-making systems (i.e. AI) are meant to avoid the pitfall of past systems that were prone to perpetuate implicit bias from the developers who had manually set the decision criteria. As it turns out however, if you have a system that needs to be trained based on examples, it can be very difficult to make sure that it behaves in a way that reflects the world as we would want it to be, rather than as the world currently is. Especially when the system does not ‘understand’ anything about the decision it is making, but simply follows statistical patterns in the training data. ML based AI is intrinsically conservative – it assumes that the patterns of the past examples are a good representation of the way things will be.
- Because ML systems generate their internal models of the world through dynamic process of parameter adjustment in a large network of simple general-purpose computing nodes, the resulting models do not lend themselves to easy human interpretation. [side note: my MSc thesis was on using a combination of Fussy Logic and Artificial Neural Networks to produce a ML system that would generate human readable models]. This intrinsic lack of transparency of ML systems is often compounded with additional layers of obscurity related to Intellectual Property rights and business strategies. The results if a lack of transparency and a lack of means of redress/correction to the way in which the personalization algorithm is discriminating between us and others. You don’t know which category you will be put into because it is done by algorithmic inference based on data (that you didn’t know they were collecting about you) – you were never asked (consent boxes that require reading and interpreting long legal text don’t count). If the algorithmic inference is wrong there is no obvious way to correct it.
Tying these two issues together is danger that comes from blindly using ML systems without asking what the model of the world is that it has learnt.
ML systems are being introduced on the basis of their high level of accuracy, as defined by performance on test data. For many tasks that have traditionally been very difficult for computers, like image classification and natural language processing, this accuracy can be much higher than was achievable with traditional hand-crafter methods that were based on theoretical models of how to solve the problem. The problem is that this measure of performance asks us to have faith in the black-box based on past success, not on the basis of the soundness of the underlying model.
Are we heading towards a new area of Faith based reasoning where we abandon the critical thinking of the scientific method and replace it with Faith in the correctness of machine inference (big data and statistics)?