Accuracy vs Explainability of Machine Learning Models [NIPS workshop poster review]
My ex-labmate Ryan Turner presented an awesome poster at the NIPS workshop on Black Box Learning and Inference that was an eye-opener to me. Here I'm going to cover what I think was the main take-home message for me, but I encourage everyone to take a look at the paper for the details:
- Ryan Turner (2015): A Model Explanation System
Many practical applications of machine learning systems call for the ability to explain why certain predictions are made. Consider a fraud detection system: it is not very useful for a user to see a list of possible fraud attempts without any explanation why the system thought the attempt was fraud. You want to say something like 'the system thinks it's fraud because the credit card was used to make several transactions that are smaller than usual'. But such explanations are not always compatible with our machine learning model. Or are they?
When choosing a machine learning model we usually think in terms of two choices:
accurate but black-box: The best classification accuracy is typically achieved by black-box models such as Gaussian processes, neural networks or random forests, or complicated ensembles of all of these. Just look at the kaggle leaderboards. These are called black-box and are often criticised because their inner workings are really hard to understand. They don't, in general, provide a clear explanation of the reasons they made a certain prediction, they just spit out a probability.
white-box but weak: On the other end of the spectrum, models whose predictions are easy to understand and communicate are usually very impoverished in their predictive capacity (linear regression, a single decision tree) or are inflexible and computationally cumbersome (explicit graphical models).
So which ones should we use: accurate black-box models, or less accurate but easy-to-explain white-box models?
The paper basically tells us that this is a false tradeoff. To summarise my take-home from this poster in one sentence:
Explainability is not a property of the model
Ryan presents a nice way to separate concerns of predictive power and explanation generation. He does this by introducing a formal framework in which simple, human-readable explanations can be generated for any black-box classifier, without assuming anything about the internal workings of the classifier.
If you think about it, it makes sense. If you watch someone playing chess, you can probably post-rationalise and give a reason why the person might think it's a good move. But you probably don't have an idea about the algorithm the person was executing in his brain.
Now we have a way to explain why decisions were made by complex systems, even if that explanation is not an exact explanation of what the classifier algorithm actually did. This is super-important in applications such as face recognitions where the only models that seem to work today are large black-box models. As (buzzword alert) AI-assisted decision making is becoming commonplace, the ability to generate simple explanations for black-box systems is going to be super important, and I think Ryan has made some very good observations in this paper.
I recommend everyone to take a look, I'm definitely going to see the world a bit differently after reading this.