BERTRetrax: A Transformer-Augmented Pipeline for Retraction Detection in Scholarly Articles Atanu Sarkar 1 , Ritwika Das 2 , Mauparna Nandan 3 * , and Anil Bikash Chowdhury 4 1 Dept. of Computer Applications, Techno India University, Kolkata, India sarkarbubai1810@gmail.com 2 Dept. of Computer Science & Engineering (AIML), Techno Main Salt Lake, Kolkata, India rimidas199467@gmail.com 3 Dept. of Computer Applications, Techno Main Salt Lake, Kolkata, India mauparna2011@gmail.com (*Corresponding Author) 4 Dept. of Computer Applications, Techno India University, Kolkata, India abchaudhuri007@gmail.com Abstract. The rising prevalence of retracted scholarly articles has elicited concerns about research integrity and the distribution of dubious sci- entiﬁc ﬁndings. This study introduces BERTRetrax, a transformer- enhanced pipeline aimed at automated identiﬁcation of retracted re- search papers. The framework integrates classical machine learning algo- rithms—such as Logistic Regression, Random Forest, K-Nearest Neigh- bors, and AdaBoost with advanced transformer-based models including BERT, RoBERTa, ALBERT, DistilBERT, Electra, SciBERT, Mobile- BERT, and BioBERT. A substantial, empirical dataset comprising over 50,000 entries from Retraction Watch was utilized to assess the eﬃcacy of each model. Comprehensive preprocessing and hyperparameter tuning were implemented, and performance was evaluated employing accuracy, precision, recall, F1-score, and confusion matrices. BioBERT attained the highest classiﬁcation accuracy of 98.23% amongst all models, surpassing both general-purpose and domain-adapted transformer mod- els. The ﬁndings highlight the eﬃcacy of transformer models, especially those pre-trained on biomedical literature, in detecting retracted content. BERTRetrax exhibits exceptional performance and versatility, providing a resilient and scalable solution to assist publishers, reviewers, and re- search institutions in maintaining academic integrity and responsibility. Keywords: Retraction Detection · Transformer Models · BioBERT · Text Classiﬁcation · Scientiﬁc Integrity