Liu, W., Zamal, F. A., & Ruths, D.: Using social media to infer gender composition of commuter populations. In: Proceedings of the International Conference on Weblogs and Social Media (2012)
In order for a municipality to effectively service and engage its constituency, it must understand the composition of the communities within it. Up to the present, such demographic estimates for target populations have been obtained largely from census data or expensive, time-intensive surveys. In this paper, we use Twitter microblog content to estimate the gender makeup of commuting populations using different modes of transportation (cars, public transportation, and bikes) in Toronto, Canada. We apply a demographic inference algorithm to 33,215 public Twitter accounts that follow one of three popular transportation-related Twitter-based news feeds (one for traffic, one for public transportation updates, and one for bicycling). Recent census data provides ground truth against which to compare the estimates we derive from Twitter. We find that, for all three communities (car drivers, public transport users, and bicyclists), the estimates obtained from Twitter reflect the majority-minority relationships between genders reported in census data. This provides preliminary, but compelling evidence that Twitter may be a platform that can go beyond simply signaling the presence of physical communities to actually measure their compositions.
Liu et al. use the Twitter posts of commuters in order to infer the gender makeup of those taking different modes of transportation in Toronto, Canada. They identified popular Twitter accounts dedicated to broadcasting news for various commuter groups in the Toronto region (specifically, automobiles, public transit and biking). For each of these account, the profile and most recent 1000 tweets of each follower with a public account was attained. For each commuter group, this gender classifier was trained on labeled users and then applied to the remaining users, incorporating features such as the frequency of words, hashtags of tweets. The results were compared with Canadian census data to evaluate accuracy, and it was determined that the estimate obtained for all 3 commuter groups reflected general demographic patterns reported in the census.