Arming text-based gender inference with partition membership filtering and feature selection for online social network users
Published in The Computer Journal, 2025
Recommended citation: Çoban, Ö. and Altay Yücel, Ş. (2025). "Arming text-based gender inference with partition membership filtering and feature selection for online social network users". The Computer Journal, xx(x), xx-xx.
Abstract
This study is devoted to simulating a text categorization-based gender inference attack over online social networks primarily to inspect the effect of partition membership filter (PMF) and feature selection (FS) on the performance of an attribute inference mechanism especially for the case of the distributed representation of texts. The task turning into a binary machine learning (ML) problem in the field of artificial intelligence (AI) is studied in multilingual scenarios (i.e. Turkish and English) under four main cases. The results obtained by extensive experiments show that distributed embeddings often outperform traditional embeddings. In contrast, the case involving FS on distributed embeddings is superior to other cases two of which incorporate PMF. On the other hand, the best f1-scores obtained on Turkish and English datasets are 0.727 and 0.611 obtained with the help of Random Forest and Support Vector Machine classifiers, respectively. It is worth noting that this investigation is not handled in the existing literature on text data. Therefore, it is believed that the findings of this study will provide useful insight for researchers studying text-based attribute inference attacks as well as some other text-based binary ML tasks in the field of AI.
Use Google Scholar for full citation