Yang, H., & Yaun, Y.: A general gender inference method based on web. In: Proceedings of the 2016 2nd International Conference on Artificial Intelligence and Industrial Engineering (2016)
Gender information, as a crucial part of human demographics, is valuable for its abundant connotations and potential applications. Though much effort has been made on the problem of gender inference, most existing methods are highly dependent on data from specific sources, like Twitter, and are difficult to be generalized to other tasks. In this work, we propose a general Web-based method for gender inference. We show that our model significantly outperforms state-of-the-art without much human workload or any limits on specific scenarios. Based on that, we also present a voting framework to efficiently incorporate several methods to further improve performance. Experiments show that our voting framework can achieve 96.9% accuracy.
Yang and Yuan propose a general method for efficiently and accurately inferring gender from unlabelled data, arguing that few existing studies have considered measuring gender using an approach that facilitates interoperability. They also argue that by using the big data potential of the Web they can source enough information to automatically infer a person’s gender. They constructed what they call a “smart query” -- a query composed of a person’s name and representative gender keywords (in this study those keywords were “her” and “his”) -- to search engines. The query (constructed as “name his OR her”) finds the representative keywords for documents describing a user with a specific gender and is used with a supervised SVM classification model for inference. Yang and Yuan also present a voting framework to efficiently incorporate various alternative methods of general inference (facial recognition and name-list method), where each predictor model gives its inference result as a “vote”, with the most voted gender label as the final prediction. While their web-based method performs better than the other methods evaluated in isolation (achieving 93.38% accuracy), they demonstrate that the accuracy performance improves slightly when this method is used as part of their voting framework, improving the performance to 96.9%.