Beyond Transformers: Lightweight Multilingual Hate Speech Detection Using MLP and TF-IDF
Abstract
Hate speech and trolling online are becoming a serious threat to digital well-being and digital reputation. In this work, we benchmark transformer-free machine learning techniques to identify hate speech and toxic content in a mono- as well as in a multilingual way. Specifically, we experiment with an MLP-based classifier with TF-IDF and count-based feature extraction on two benchmark datasets—Jigsaw Toxic Comment Classification (multi-label) and HateXplain (multi-class). Our experiments (Section 4) demonstrate that our MLP-based model delivers state-of-the-art precision and F1-scores on both the HateXplain and Jigsaw dataset with 97% and 93% accuracy respectively, against transformer-based baselines such as BERT and XLM-R. Measures like precision, recall, confusion matrices and ROC curves per class are analyzed. This raises the question of a lightweight and interpretable neural model for multilingual hate speech detection as a capacity-efficient alternative to transformer-based models. The results also show the inadequacy of handling class imbalance, and semantic subtleties, and further work can be explored by ensemble techniques and multilingual adaptation.