Abstract
Unsolicited emails such as phishing and spam emails cost businesses and individuals millions of dollars annually. Several models and techniques to automatically detect spam emails have been introduced and developed yet non showed 100% predicative accuracy. Among all proposed models both machine and deep learning algorithms achieved more success. Natural language processing (NLP) enhanced the models' accuracy. In this work, the effectiveness of word embedding in classifying spam emails is introduced. Pre-trained transformer model BERT (Bidirectional Encoder Representations from Transformers) is fine-tuned to execute the task of detecting spam emails from non-spam (HAM). BERT uses attention layers to take the context of the text into its perspective. Results are compared to a baseline DNN (deep neural network) model that contains a BiLSTM (bidirectional Long Short Term Memory) layer and two stacked Dense layers. In addition results are compared to a set of classic classifiers k-NN (k-nearest neighbors) and NB (Naive Bayes). Two open-source data sets are used, one to train the model and the other to test the persistence and robustness of the model against unseen data. The proposed approach attained the highest accuracy of 98.67% and 98.66% F1 score.
| Original language | English |
|---|---|
| Pages (from-to) | 853-858 |
| Number of pages | 6 |
| Journal | Procedia Computer Science |
| Volume | 184 |
| DOIs | |
| State | Published - 2021 |
| Externally published | Yes |
| Event | 12th International Conference on Ambient Systems, Networks and Technologies, ANT 2021 / 4th International Conference on Emerging Data and Industry 4.0, EDI40 2021 / Affiliated Workshops - Warsaw, Poland Duration: 23 Mar 2021 → 26 Mar 2021 |
Keywords
- BERT transformer
- Cybersecurity
- Deep learning
- Spam
- Word embedding
Fingerprint
Dive into the research topics of 'Spam email detection using deep learning techniques'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver