‘Decoupled Classifiers for Fair and Efficient Machine Learning’ by Cynthia Dwork, Nicole Immorlica, Adam Tauman Kalai, & Max Leiserson (2018)

Tasneem Mewa

Dwork, C., Immorlica, N., Kalai, A.T. & Leiserson, M.. (2018). Decoupled Classifiers for Group-Fair and Efficient Machine Learning. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, in PMLR 81:119-133

URL: http://proceedings.mlr.press/v81/dwork18a.html

Abstract

When it is ethical and legal to use a sensitive attribute (such as gender or race) in machine learning systems, the question remains how to do so. We show that the naive application of machine learning algorithms using sensitive attributes leads to an inherent tradeoff in accuracy between groups. We provide a simple and efficient decoupling technique, that can be added on top of any black-box machine learning algorithm, to learn different classifiers for different groups. Transfer learning is used to mitigate the problem of having too little data on any one group.

Critical Annotation

Identifying the Dataset

With the help of transfer learning, this experiment attempts to decouple classifiers and train models separately. One of the reasons for adopting this approach is to understand the tradeoffs between accuracy and fairness by calculating and minimising a joint loss function. Two experiments were conducted in this paper. The datasets include 47 sets from openml.org with an arbitrary binary attribute in place of a sensitive attribute. This experiment was untaken using a least-squares linear regression model. The second experiment involved images of suits - 506 images of suits of which 462 were male, 44 female and 1295 no-suit images of which 633 were male, 662 female.

Methods

The first experiment is numerical in nature and spread across the 47 datasets mentioned above. The experiment sets up classification problems, followed by regression, regression normalization, and finally converts categorical attributes into binary features via 5-fold cross validation. Transfer learning algorithms are integral to this experiment - this involves out-group examples (i.e. related data) being fed into a classifier on a small dataset to increase accuracy. Essentially, this methodology aims to minimise classifier difference between groups so as to avoid majority bias.

The second experiment serves as a visual manifestation of the first. While using the images of suits, the researchers hypothesise that a decoupled classifier would be able to avoid majority bias, and therefore achieve greater accuracy and fairness by minimising loss. Both coupled and decoupled classifiers were trained using BVLC CaffeNet. The decoupled classifier used two support vector constructions, one with all male pictures, the other with all female pictures.

Key Assumptions Stated by Authors

Cautioning the readers, the researchers do not claim to communicate the pros and cons of their experimentation methods, nor the generalisability of their work given the small datasets and manual work involved.

More substantially, the authors highlight the social consequences of making decisions based on algorithms that are coded by biases and unfairness. They claim that straightforward applications often make the tradeoff as sources of unfairness include scarce data on minority groups.

Inferred Assumptions

The effort made to emphasise the presence of biases in statistical techniques is certainly commendable. Given the scope and field of study, the proofs and formulae used to denote these features is also impressive. However, there is an assumption of totality that undergirds these functions. While it is important to quantify these realities in the context of statistical experiments, quantification alone is insufficient. There are broader social consequences that exist beyond functions, and this reality is something that researchers should flag as a limitation.

Evaluating Results

Transfer learning stood out as a significant factor in enhancing the decoupled classifier method (without which, the coupled classifier remains more successful). However, in the suits experiment, the decoupled classifier still managed to make an error by wrongly categorising a woman in a suit - showing evidence of majority bias. Although, with a broader lens in mind, this work shows that loss functions can quantify and identify fairness and accuracy tradeoffs in a way that can be beneficial for participants and researchers alike.