Al Zamal, F., Liu, W., Ruths, D.: Homophily and latent attribute inference attribute: inferring latent attributes of twitter users from neighbors. In: Proceedings of International Conference on Weblogs and Social Media (2012)
In this paper, we extend existing work on latent attribute inference by leveraging the principle of homophily: we evaluate the inference accuracy gained by augmenting the user features with features derived from the Twitter profiles and postings of her friends. We consider three attributes which have varying degrees of assortativity: gender, age, and political affiliation. Our approach yields a significant and robust increase in accuracy for both age and political affiliation, indicating that our approach boosts performance for attributes with moderate to high assortativity. Furthermore, different neighborhood subsets yielded optimal performance for different attributes, suggesting that different subsamples of the user's neighborhood characterize different aspects of the user herself. Finally, inferences using only the features of a user's neighbors outperformed those based on the user's features alone. This suggests that the neighborhood context alone carries substantial information about the user.
Zamal et al. infer a number of latent attributes -- specifically author, age and political affiliation -- of Twitter users based on their tweets. They further then evaluate the extent of which features related to Twitter user’s neighbourhoods (such as their closest and least popular friends on the platform) can improve the overall accuracy of inference. For gender ifnerrence, the authors used a labelled dataset of 400 Twitter users along with what Zamal et al. refer to as the user’s friends -- as opposed to their followers. For each of these users and their friends, the most recent 1000 tweets were collected and used for the classification model. While the augmenting features of user’s Twitter neighbourhoods was found to improve inference accuracy for both age and political orientation (by at least 3 percent), gender demonstrated no statistically significant improvement of inference. Zamal et al. concluded that improvements in inference using neighbourhood related features was dependent on the assoritivity of an attribute (with gender reported to have low assoritivity in online and physical networks).