A Comprehensive Evaluation of Gender Inference from Names: Findings from a Large-Scale Performance Study (arXiv:2308.12381v1 [cs.CL])

Research across various scientific disciplines, including medicine, sociology, political science, and economics, heavily relies on a person’s gender as a crucial piece of information. However, with the increasing availability of big data, gender information is often not readily accessible. In such cases, researchers are compelled to infer gender from other available information, primarily from individuals’ names. While inferring gender from names may raise ethical concerns, researchers are left with limited alternatives and resort to such approaches when the end justifies the means. The primary objective of such studies is to examine patterns and factors contributing to gender disparities. Consequently, the need for name-to-gender inference has led to the development of numerous algorithmic approaches and software products, widely used in academia, industry, and various governmental and non-governmental organizations worldwide. Despite their prevalence, these existing approaches lack systematic evaluation and comparison, making it challenging to determine the most effective approach for future research. In this study, we conducted a comprehensive performance evaluation of existing name-to-gender inference approaches using large annotated datasets of names. Additionally, we propose two novel hybrid approaches that outperform any individual existing approach.