Skip to main navigation Skip to search Skip to main content

Evaluation of classification algorithms for banking customer's behavior under Apache Spark Data Processing System

  • Princess Sumaya University for Technology

Research output: Contribution to journalConference articlepeer-review

25 Scopus citations

Abstract

Many different classification algorithms could be used in order to analyze, classify or predict data. These algorithms differ in their performance and results. Therefore, in order to select the best approach, a comparison studies required to present the most appropriate approach to be used in a certain domain. This paper presents a comparative study between two classification techniques namely, Naïve Bayes (NB) and the Support Vector Machine (SVM), of the Machine Learning Library (MLlib) under the Apache Spark Data processing System. The comparison is conducted after applying the two classifiers on a dataset consisting of customer's personal and behavioral information in Santander Bank in Spain. The dataset contains: a training set of more than 13 million records and a testing set of about 1 million records. To properly apply these two classifiers on the dataset, a preprocessing step was performed to clean and prepare data to be used. Experimental results show that Naïve Bayes overcomes Support Vector Machine in term of precision, recall and F-measure. Peer-review under responsibility of the Conference Program Chairs.

Keywords

  • Big Data
  • Machine Learning
  • Naïve Bayes
  • Spark
  • Support Vector Machine

Fingerprint

Dive into the research topics of 'Evaluation of classification algorithms for banking customer's behavior under Apache Spark Data Processing System'. Together they form a unique fingerprint.

Cite this