Lansley, G., & Longley, P. (2016). Deriving age and gender from forenames for consumer analytics. Journal of Retailing and Consumer Services, 30, 271-278.
This paper explores the age and gender distributions of the bearers of British forenames and identifies key trends in British naming conventions. Age and gender characteristics are known to greatly influence consumption behaviour, and so extracting and using names to indicate these characteristics from consumer datasets is of clear value to the retail and marketing industries. Data representing over 17 million individuals sourced from birth certificates and market data have been modeled to estimate the total age and gender distributions of 32,000 unique forenames in Britain. When aggregated into five year age bands for each gender, the data reveal distinctive age profiles for different names, which are largely a product of the rise and decline in popularity of different baby names over the past 90 years. The names database produced can be used to infer the expected age and gender structures of many consumer datasets, as well as to anticipate key characteristics of consumers at the level of the individual.
Identifying the Dataset
This study uses the 2011 CACI Consumer register, which holds the credit card records of 44.8 million individuals, and birth certificate records to examine the popularity and gender assignment of names across time in the UK.
The datasets were spread across four distinctive places - the O2, a football stadium, a business district, and a shopping centre - for spatial analysis purposes. Researchers sought to remain true to the contemporary UK population as well as achieve an overview of the names and age ranges collected over the past few decades. Their strategy involved clustering, reclustering, allocating and reallocating data with an interactive algorithm that continuously changes data pieces based on other inputted data and their nearest centroids/ categories.
Key Assumptions Stated by Authors
Experiments in this study are founded upon the idea that consumption patterns are significant to the construction of individual gendered identities - patterns which are inherently different for men and women. Authors noted exclusion criteria that largely revolved around age and place of birth - namely that not all people (especially those below 18 or of low socioeconomic status) would have access to credit cards, and that a person may not be shopping for themselves only. Moreover, they recognise the incompleteness of the dataset in terms of multitude of adults born outside of the UK who now belong to the UK's adult population.
Despite the iterative machine learning processes, the datasets used limit the scope of the adults and genders included. This gender binary is then translated into consumption patterns. If a study is based on pre-conceived or axiomatic notions of gendered consumption, it is likely that the study and learned models will reproduce those same biases - creating a vicious cycle of stereotypical market identification and targeting.
While most names are uniquely distributed across ages and gender, some are more uniform and therefore not all names can serve as effective discriminators. In addition to showing the demographic attributes of credit card holders and the adult UK population, the study demonstrates that night-time residence or permanent addresses are not necessary for geospatial analyses. Lastly, the authors propose that this study can be useful for advertisement targeting.
This approach may not be as relevant in contexts where population data infrastructure is not archived or available. Additionally, it would be of little use in settings where credit cards are not the primary mode of payment.