Zhu Z., Wang J., Li S., Zhou G. (2015) Interactive Gender Inference in Social Media. In: Liu A., Ishikawa Y., Qian T., Nutanong S., Cheema M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science, vol 9052. Springer, Cham
In this paper, we define a novel task named interactive gender inference, which aims to utilize interactive text to identify the genders of two interactive users. To address this task, we propose a two stage approach by well incorporating the dependency among the interactive samples sharing identical users. Specifically, we first apply a standard four-category classification algorithm to get a preliminary result, and then propose a global optimization algorithm to achieve better performance. Evaluation demonstrates the effectiveness of our proposed approach to interactive gender inference.
Identifying the Dataset
Like the paper on leveraging interactive text, this study subscribes to a similar rationale. In particular, the authors cite the significance of communication via interaction in sociology as well as AI research examining human-machine interaction and learning. The dataset is derived from Sina Microblog in China. Users with less than 10 comments were excluded, and only users who identified their names gender were included. Over 20,000 users were selected for the training dataset and 9,339 were chosen for the test group. Note that the users in the training and test datasets have not interacted with each other whatsoever.
The researchers propose a two-pronged approach. First they seek to conduct a 4-category classification (mm, mf, fm, ff) followed by an optimisation algorithm to ensure the correctness of the inferred data. Like the article on leveraging interactive text, this paper also uses the same features and classification algorithms to graph interactions (edges) juxtaposed with the gender of each edge.
Key Assumptions Stated by Authors
Despite the helpfulness of interactive text in identifying a user, merely relying on the four-category classification could potentially lead to contradictory findings; one user may be identified as a female in one interaction, and the same user as a male in another interaction. As previously mentioned, the authors have framed their model and approach to favour consistency, regularity, and the precision of gender inferences for each individual user across a multitude of interactions.
Although most papers are problematic in their disregard for the gender spectrum, the issue in this paper is specific to the reliance and belief in a static performativity of gender. Going against the inherent dialectic and dynamic nature of interactive text, the study aims to push for standard and consistent classifications that may not be entirely representative of certain populations and are unlikely to translate across various cultural contexts.
Performance is evaluated according to the F1 score in each category as well as the macro average of all F1 scores. The authors conclude that the two-stage approach outperforms other models in all cases and is highly effective in validating genders using the global optimisation algorithm. The concept of a two-step approach to evaluate the data can certainly enhance the validity of data. However, in this case, the two-step approach does little beyond confirming the correctness of inferred data and assumed attributes.