‘Piloting a Theory-Based Approach to Inferring Gender in Big Data’ by Jason Radford (2017)

Radford, J.: Piloting a theory-based approach to inferring gender in big data. In: Proceedings of the IEEE International Conference on Big Data (2017)

URL: https://ieeexplore.ieee.org/document/8258555/

Abstract

Machine learning methods can be used to accurately predict core characteristics about people such as their gender, age, race, or political orientation. However, prediction models tend not to generalize, offer little explanation for particular corpora, produce weak theory, and suffer from latent biases. In this study, we present an alternative approach to demographic inference combining sociological theories of gender with machine learning to create high-dimensional measures of gender rather than predict sex. We create measurement models for gender across five corpora: blog posts, tweets, crowdfunding essays, movie scripts, and professional writing. We show these models validly measure gender in the corpora and then compare their ability to predict author gender to standard prediction models. We find that measurement models of gender are as accurate and sometimes more accurate than prediction models. Thus we show theory-based measurement models are not only interpretable but performant.

Critical Annotation

Radford, advocating for an alternative approach to gender inference, develops a model to measure and predict gender based on gender systems theory. This theory posits that gender is constructed on three levels: at the individual level, the interactional level and the institutional level. He suggests that the scalar component of gender construction applied to a gender inference model would render this model more interoperable across different types of source materials, as gender inference being done differently according to the context. Based on this theory, Radford creates gender measurement models for five different source materials: (1) blog posts, (2) tweets, (3) crowdfunding essays, (4) movie scripts, and (5) professional writing. The author tests the models’ accuracy in measuring gender for each of these source materials with machine learning topic and behaviour models. Comparing the result with more standard prediction models, Radford discovers that such measurement models of gender are if not as accurate, sometime more accurate than the standard prediction models.