TY - GEN
T1 - Detecting Human-to-AI Author Change in Arabic Text
AU - Boutadjine, Amal
AU - Harrag, Fouzi
AU - Bensouilah, Mouad
AU - Karboua, Sabrina
AU - Deriche, Mohamed
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Recent large language models (LLMs), such as Gemini and ChatGPT, have demonstrated the ability to produce texts that are fluent and human-like when given precise instructions, presenting significant challenges in distinguishing between human-authored and AI-generated content, particularly in morphologically rich languages like Arabic. Despite the fact that these issues have prompted several research on AI content identification, the majority of these earlier investigations framed the problem as a binary classification problem, supposing that a text is either wholly AI-generated or totally human-written. This study introduces a novel methodology for detection where the text to be identified is cowritten by generative LLMs and humans, by identifying transitions in text authorship using advanced machine learning and deep learning techniques. We propose Trans-Detect a sophisticated neural architecture that combines AraBERT with a bidirectional LSTM network and a specialized attention mechanism, specifically designed to capture subtle linguistic variations in hybrid texts. Alongside traditional Random Forest, XGBoost, and LSTM-CNN models, our detector processes Arabic text datasets to identify and predict authorship transitions. The research achieved a macro F1-score of 0.6-0.8 with traditional models, while our proposed neural architecture demonstrated superior performance with an F1-score of 0.98, showing significant improvement in detecting text origin segments and revealing the complex nature of authorship changes. The findings provide a foundation for future research exploring the integration of advanced natural language processing techniques to enhance the accuracy and robustness of style change detection systems, particularly in handling the characteristics of Arabic text.
AB - Recent large language models (LLMs), such as Gemini and ChatGPT, have demonstrated the ability to produce texts that are fluent and human-like when given precise instructions, presenting significant challenges in distinguishing between human-authored and AI-generated content, particularly in morphologically rich languages like Arabic. Despite the fact that these issues have prompted several research on AI content identification, the majority of these earlier investigations framed the problem as a binary classification problem, supposing that a text is either wholly AI-generated or totally human-written. This study introduces a novel methodology for detection where the text to be identified is cowritten by generative LLMs and humans, by identifying transitions in text authorship using advanced machine learning and deep learning techniques. We propose Trans-Detect a sophisticated neural architecture that combines AraBERT with a bidirectional LSTM network and a specialized attention mechanism, specifically designed to capture subtle linguistic variations in hybrid texts. Alongside traditional Random Forest, XGBoost, and LSTM-CNN models, our detector processes Arabic text datasets to identify and predict authorship transitions. The research achieved a macro F1-score of 0.6-0.8 with traditional models, while our proposed neural architecture demonstrated superior performance with an F1-score of 0.98, showing significant improvement in detecting text origin segments and revealing the complex nature of authorship changes. The findings provide a foundation for future research exploring the integration of advanced natural language processing techniques to enhance the accuracy and robustness of style change detection systems, particularly in handling the characteristics of Arabic text.
KW - AIgenerated Text Detection
KW - Arabic Text Generation
KW - Generative AI
KW - Machine Learning
KW - Natural Language Processing
KW - Style Change
UR - https://www.scopus.com/pages/publications/105007285271
U2 - 10.1109/SSD64182.2025.10989918
DO - 10.1109/SSD64182.2025.10989918
M3 - Conference contribution
AN - SCOPUS:105007285271
T3 - 22nd IEEE International Multi-Conference on Systems, Signals and Devices, SSD 2025
SP - 348
EP - 353
BT - 22nd IEEE International Multi-Conference on Systems, Signals and Devices, SSD 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd IEEE International Multi-Conference on Systems, Signals and Devices, SSD 2025
Y2 - 17 February 2025 through 20 February 2025
ER -