Alowibdi, J. S., Buy, U. A., Yu, P.: Language Independent Gender Classification on Twitter. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 739-743 (2013)
Online Social Networks (OSNs) generate a huge volume of user-originated texts. Gender classification can serve multiple purposes. For example, commercial organizations can use gender classification for advertising. Law enforcement may use gender classification as part of legal investigations. Others may use gender information for social reasons. Here we explore language independent gender classification. Our approach predicts gender using five color-based features extracted from Twitter profiles (e.g., the background color in a user's profile page). Most other methods for gender prediction are typically language dependent. Those methods use high-dimensional spaces consisting of unique words extracted from such text fields as postings, user names, and profile descriptions. Our approach is independent of the user's language, efficient, and scalable, while attaining a good level of accuracy. We prove the validity of our approach by examining different classifiers over a large dataset of Twitter profiles.
Alowidbi et al., recognizing that the vast majority of research for gender inference and classification on Twitter is language dependent, present a language independent inference model that predicts gender using five color-based features extracted from users’ Twitter profiles: (1) background color, (2) text color, (3) link color, (4) sidebar fill color, and (5) sidebar border color. They preprocess colors harvested from 53,326 Twitter profiles, normalizing these colors into five-color based features. Based on this dataset, they performed a number of experiments with data subsets as well as with different classifiers to determine accuracy. They conclude that utilizing solely five color features (what they refer to as quantization) can provide reasonably accurate gender prediction, with the naive bayes/decision tree hybrid classifier achieving the highest overall accuracy of 71.4%.