Skip to main content
SearchLoginLogin or Signup

'Discriminating Gender on Twitter' by John D. Burger, John Henderson, George Kim and Guido Zarella (2011)

Published onApr 03, 2020
'Discriminating Gender on Twitter' by John D. Burger, John Henderson, George Kim and Guido Zarella (2011)

Burger, J. D., Henderson, J., Kim, G., Zarella G.: Discriminating gender on twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1301-1309 (2011)



Accurate prediction of demographic attributes from social media and other informal online content is valuable for marketing, personalization, and legal investigation. This paper describes the construction of a large, multilingual dataset labeled with gender, and investigates statistical models for determining the gender of uncharacterized Twitter users. We explore several different classifier types on this dataset. We show the degree to which classifier accuracy varies based on tweet volumes as well as when various kinds of profile metadata are included in the models. We also perform a large-scale human assessment using Amazon Mechanical Turk. Our methods significantly out-perform both baseline models and almost all humans on the same task.

Critical Annotation

In this paper, Burger et al. attempt to identify the gender of Twitter users using four different fields: the language content of tweets as well as the full name, screen name, and description of the user from their profile.  They utilize more than 4 million tweets from 184 000 authors in various languages -- 66.7% of which were in English. Both word- and character-level n-grams from each of the previously mentioned fields are utilized in different combinations as inputs for the balanced winnow2 classifier. They performed a number of experiments with the winnow algorithm, evaluating the four fields in isolation and various combinations. Significantly, they compared the efficacy of their classifier results to human performance of the same activity conducted over a crowdsourcing platform. When considering a single tweet, the algorithm’s predictive accuracy was 67.8% compared to the 75.5% predictive accuracy achieved if all  of a user’s tweets were used. The combination of all four fields achieved an accuracy of 92.0% whereas the combination of tweets and screen name (the data most often available) achieved an accuracy of 81.4%. Surprisingly, all these were higher than the accuracy of the human raters, who predicted gender at an accuracy of 65.7% form individual messages.

No comments here
Why not start the discussion?