Karimi, F., Wagner, C., Lemmerich, F., Jadidi M., Strohmaier, M.: Inferring gender from names on the web: a comparative evaluation of gender detection methods. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 53-54 (2016)
Computational social scientists often harness the Web as a "societal observatory" where data about human social behavior is collected. This data enables novel investigations of psychological, anthropological and sociological research questions. However, in the absence of demographic information, such as gender, many relevant research questions cannot be addressed. To tackle this problem, researchers often rely on automated methods to infer gender from name information provided on the web. However, little is known about the accuracy of existing gender-detection methods and how biased they are against certain sub-populations. In this paper, we address this question by systematically comparing several gender detection methods on a random sample of scientists for whom we know their full name, their gender and the country of their workplace. We further suggest a novel method that employs web-based image retrieval and gender recognition in facial images in order to augment name-based approaches. Our findings show that the performance of name-based gender detection approaches can be biased towards countries of origin and such biases can be reduced by combining name-based an image-based gender detection methods.
Karimi et al. present a name features-based gender inference model that they combine with the web-based image retrieval of facial images. The authors explore frequently used name-based gender classification tools (Sexmachine and Genderize) and utilize a labeled dataset of names of varying origins to evaluate and compare them based on accuracy and bias. They also utilized Face++ in order to infer gender by collecting the first five Google thumbnails from a full name search query -- applying image recognition on the resultant thumbnails. They further combined the more accurate gender inference tool -- Genderize -- with Face++ in two ways: with both methods given equal weight and where Genderize is used first and facial recognition is used for unidentified names. Both methods achieve high accuracy, with the method that uses Genderize first achieving an accuracy of 92% for inference. They propose this method as it increases the accuracy of gender detection across varying geographies. Considering the fact that popular names of non-Western industrialized countries (such as China, South Korea or Brazil) are not covered sufficiently in name databases, the combination of name and image-based gender inference methods can help reduce this bias.