Skip to main navigation Skip to search Skip to main content

Abstract

Natural language inference (NLI), also known as textual entailment recognition (TER), is a crucial task in natural language processing that combines many fundamental aspects of language understanding. Despite the recent significant advancement in NLI, primarily driven by the development of diverse large-scale datasets, most of the progress has been confined to English. This is attributed to the scarcity of human-annotated corpora for most other languages, notably Arabic. In this paper, we present an Arabic NLI dataset called ArEntail, consisting of 6000 sentence pairs collected from news headlines and manually labeled to indicate whether an entailment relationship links the sentences or not without resorting to machine translation from English datasets. To our knowledge, this is the largest yet human-crafted NLI dataset for the Arabic language. We offer various data analyses and establish baseline results using state-of-the-art pre-trained models for Arabic, in addition to a human-based evaluation. Our findings revealed that AraBERT-base v2, the best-performing model, achieves an accuracy of 93%, revealing a gap of 2.6% compared to human performance and presenting a valuable opportunity for further advancements in modeling techniques in future research. Besides, the “hypothesis-only” baseline performance baseline closely resembles a random guesser’s, indicating the rarity of annotation artifacts compared to prior NLI English benchmarks. We also evaluated GPT-3.5-turbo in zero-shot and few-shot Arabic NLI learning scenarios and observed promising outcomes with a cautious approach, awaiting strong clues for predicting the presence of the entailment relationship.

Original languageEnglish
Article number100686
Pages (from-to)509-535
Number of pages27
JournalLanguage Resources and Evaluation
Volume59
Issue number1
DOIs
StatePublished - Mar 2025

Keywords

  • AraBERT
  • Arabic NLI
  • BERT
  • Dataset
  • Few-shot learning
  • GPT-3.5
  • LLMs
  • Natural language inference
  • Textual entailment recognition
  • Zero-shot learning

Fingerprint

Dive into the research topics of 'ArEntail: manually-curated Arabic natural language inference dataset from news headlines'. Together they form a unique fingerprint.

Cite this