Skip to main navigation Skip to search Skip to main content

Using transliteration with entity resolution for Arabic datasets

  • Hashemite University
  • Princess Sumaya University for Technology
  • Applied Science Private University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Entity resolution (ER) is the operation of distinguishing records that return to the same real world entity. It is used to link records among datasets and to match query records in real-time with existing datasets. Indexing is a major step in the ER process that reduces the search space. Most existing indexing techniques that are utilized in the ER process are designed to work with English datasets. Such techniques may not be suitable for use with other languages, such as Arabic. In this paper, enhancement for indexing techniques that are designed to work with English datasets has been proposed to be used with Arabic language by applying transliteration on Arabic strings before performing the indexing step of the ER process. The proposed approach is experimented and compared with using word stems as blocking keys in the indexing step. The results show better matching accuracy for the use of transliteration over the use of words stems.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications, AICCSA 2017
PublisherIEEE Computer Society
Pages593-597
Number of pages5
ISBN (Electronic)9781538635810
DOIs
StatePublished - 2 Jul 2017
Externally publishedYes
Event14th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2017 - Hammamet, Tunisia
Duration: 30 Oct 20173 Nov 2017

Publication series

NameProceedings of IEEE/ACS International Conference on Computer Systems and Applications, AICCSA
Volume2017-October
ISSN (Print)2161-5322
ISSN (Electronic)2161-5330

Conference

Conference14th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2017
Country/TerritoryTunisia
CityHammamet
Period30/10/173/11/17

Keywords

  • Arabic Dataset
  • Entity Resolution
  • Indexing
  • Stemming
  • Transliteration

Fingerprint

Dive into the research topics of 'Using transliteration with entity resolution for Arabic datasets'. Together they form a unique fingerprint.

Cite this