Abstract
Many different classification algorithms could be used in order to analyze, classify or predict data. These algorithms differ in their performance and results. Therefore, in order to select the best approach, a comparison studies required to present the most appropriate approach to be used in a certain domain. This paper presents a comparative study between two classification techniques namely, Naïve Bayes (NB) and the Support Vector Machine (SVM), of the Machine Learning Library (MLlib) under the Apache Spark Data processing System. The comparison is conducted after applying the two classifiers on a dataset consisting of customer's personal and behavioral information in Santander Bank in Spain. The dataset contains: a training set of more than 13 million records and a testing set of about 1 million records. To properly apply these two classifiers on the dataset, a preprocessing step was performed to clean and prepare data to be used. Experimental results show that Naïve Bayes overcomes Support Vector Machine in term of precision, recall and F-measure. Peer-review under responsibility of the Conference Program Chairs.
| Original language | English |
|---|---|
| Pages (from-to) | 559-564 |
| Number of pages | 6 |
| Journal | Procedia Computer Science |
| Volume | 113 |
| DOIs | |
| State | Published - 2017 |
| Externally published | Yes |
| Event | 8th International Conference on Emerging Ubiquitous Systems and Pervasive Networks, EUSPN 2017 and the 7th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare, ICTH 2017 - Lund, Sweden Duration: 18 Sep 2017 → 20 Sep 2017 |
Keywords
- Big Data
- Machine Learning
- Naïve Bayes
- Spark
- Support Vector Machine
Fingerprint
Dive into the research topics of 'Evaluation of classification algorithms for banking customer's behavior under Apache Spark Data Processing System'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver