Skip to main navigation Skip to search Skip to main content

Overview of the PAN@FIRE 2020 Task on the Authorship Identification of SOurce COde

  • Ali Fadel
  • , Husam Musleh
  • , Ibraheem Tuffaha
  • , Mahmoud Al-Ayyoub
  • , Yaser Jararweh
  • , Elhadj Benkhelifa
  • , Paolo Rosso
  • Jordan University of Science and Technology
  • Duquesne University
  • University of Staffordshire
  • Polytechnic University of Valencia

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Authorship identification is essential to the detection of undesirable deception of others' content misuse or exposing the owners of some anonymous malicious content. While it is widely studied for natural languages, it is rarely considered for programming languages. Accordingly, a PAN@FIRE task, named Authorship Identification of SOurce COde (AI-SOCO), is proposed with the focus on the identification of source code authors. The dataset consists of crawled source codes submitted by the top 1,000 human users with 100 correct C++ submissions or more from the CodeForces online judge platform. The participating systems are asked to predict the author of a given source code from the predefined list of code authors. In total, 60 teams registered on the task's CodaLab page. Out of them, 14 teams submitted 94 runs. The results are surprisingly high with many teams and baselines breaking the 90% accuracy barrier. These systems used a wide range of models and techniques from pretrained word embeddings (especially, those that are tweaked to handle source code) to stylometric features.

Original languageEnglish
Title of host publicationFIRE 2020 - Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation
EditorsPrasenjit Majumder, Mandar Mitra, Surupendu Gangopadhyay, Parth Mehta
PublisherAssociation for Computing Machinery
Pages4-8
Number of pages5
ISBN (Electronic)9781450389785
DOIs
StatePublished - 16 Dec 2020
Externally publishedYes
Event12th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2020 - Hyderabad, India
Duration: 16 Dec 202020 Dec 2020

Publication series

NameACM International Conference Proceeding Series

Conference

Conference12th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2020
Country/TerritoryIndia
CityHyderabad
Period16/12/2020/12/20

Keywords

  • authorship-identification
  • datasets
  • source-code

Fingerprint

Dive into the research topics of 'Overview of the PAN@FIRE 2020 Task on the Authorship Identification of SOurce COde'. Together they form a unique fingerprint.

Cite this