Antal, M., & Nemes, G. (2016, May). Gender recognition from mobile biometric data. In 2016 IEEE 11th International Symposium on Applied Computational Intelligence and Informatics (SACI) (pp. 243-248). IEEE.
This paper investigates gender recognition from keystroke dynamics data and from touchscreen swipes. Classification measurements were performed using 10-fold cross-validation and leave-one-user-out cross-validation (LOUOCV). We show that when the target is unseen user data classification, only the second approach is viable. Based on our limited datasets, we show that gender cannot be reliably predicted. The best results were 64.76% for the keystroke dataset and 57.16% for the swipes dataset. However, the classification accuracy is over 80% for more than half of the users in the case of keystroke dynamics dataset.
Identifying the Dataset
In several studies, gender is commonly referred to as a latent attribute. Drawing on Jain et. al, this study qualifies gender as a soft biometric - an indicator which can supplement identity verification systems, but cannot conclusively differentiate between two individuals. Given this framing of gender, the two datasets employed for this experiment are biometric; collected from touchscreen mobile phones. One dataset compiles horizontal swiping data and the other dataset consists of password typing data. Horizontal swipes were collected from 98 individual smartphones as they completed an online “Eyesnck Personality Questionnaire.” The questionnaire totaled 58 questions, thus one swipe per question (the successful swipe was chosen if there were many attempts per question) was collected for each subject. Acquiring raw data for the second dataset involved a controlled process (specific devices, no typing errors allowed, specific age range, known gender) and totalled 42 users and 2142 samples (51 per user).
It is important to note that the acting of swiping or typing is itself examined. According to the methods, characteristics such as velocity, pressure, finger area, key hold time, etc. are used to process raw data across both datasets. More specifically, of the 58 horizontal swipes collected for each of the 92 users underwent a feature vector calculation with nine characteristics. The measurements were then presented on a class-balanced dataset. The 14 features identified to evaluate the second dataset with ranked through the Weka data mining software via correlation based ranking. Rankings were performed based on gender and user. In addition to conducting experiments with the full list of features, the authors tested the datasets with the five best performing features as well.
Key Assumptions Stated by Authors
The authors astutely observe that most studies in this field fail to acknowledge the fact that they often use training and testing data with the same individuals, particularly when evaluating the results of their experiment. Veering away from this pattern, the authors state that they cannot guarantee that the accuracy they saw for some features would be able to accurately predict the gender of unseen users. Moreover, they state that it is unlikely to ascertain gender data from biometric information - a profound finding given the increasing reliance on gender to supplement and verify biometric forms of identity verification.
While the authors call out faults in their own papers and related literature, like many other studies, there is no cost-benefit analysis of choosing gender as an attribute to rely on. Gender is difficult to correlate to absolute characteristics, particularly when it comes to characteristics such as finger pressure. As is the case with many features, language based features included, they operate on the assumption that there is an inherent and implicit connection between these features and gender, statistically speaking. However this framework limits our understanding of gender both qualitatively and quantitatively.
Both the Random Forests classification algorithm and a Java application on Weka data mining software were used to evaluate the datasets. Cross-validation of the datasets took two forms: 10-fold cross validation (classifier trained on 9/10 parts and tested on 1/10 parts over 10 rounds of validation) and the leave-one-user-out-cross-validation (LOUOCV), a variation of a method proposed by Corenilius and Kotz (trained on all data, except for one user who acts as the test data - this is repeated until every user is treated as test data). The evaluation demonstrated that there are significant discrepancies in accuracy between the validation methods and between testing varying numbers of features. On average, the ability for these methods to accurately predict gender is weak, with the exception of specific rounds of tests. The authors find that discriminating based on the personality of the user is less mixed than differentiating based on gender. Ultimately, based on the fact that these classifiers cannot necessarily predict the accuracy of predicting the gender of new users, the authors conclude that gender information cannot be acquired reliably from biometric data. There is little scope in this paper to account for the social and societal characteristics of the construction of gender; based on the context of the study, the authors do well to understand the limitations of understanding gender within the parameters they have outlined.