[Submitted on 21 Aug 2023]

Click here to download a PDF of the paper titled “Feature Extraction Using Deep Generative Models for Bangla Text Classification on a New Comprehensive Dataset” by Md. Rafi-Ur-Rashid and 2 other authors

Download PDF

Abstract: The task of feature selection for text classification is crucial in text mining and information retrieval. Despite being the sixth most widely spoken language globally, Bangla has received limited attention due to the scarcity of text datasets. In this study, we collected, annotated, and prepared a comprehensive dataset of 212,184 Bangla documents across seven different categories, making it publicly available. We employed three deep learning generative models – LSTM variational autoencoder (LSTM VAE), auxiliary classifier generative adversarial network (AC-GAN), and adversarial autoencoder (AAE) – to extract text features, despite their primary use being in computer vision. We trained these three models on our dataset and utilized the obtained feature space for the document classification task. After evaluating the performance of the classifiers, we found that the adversarial autoencoder model produced the most effective feature space.

Submission history

From: Md Rafi Ur Rashid [view email]

[v1]

Mon, 21 Aug 2023 22:18:09 UTC (3,444 KB)