Skip to main navigation Skip to search Skip to main content

Enhanced Best Fit Algorithm for Merging Small Files

  • Universiti Sains Malaysia
  • United Arab Emirates University

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

In the Big Data era, numerous sources and environments generate massive amounts of data. This enormous amount of data necessitates specialized advanced tools and procedures that effectively evaluate the information and anticipate decisions for future changes. Hadoop is used to process this kind of data. It is known to handle vast volumes of data more efficiently than tiny amounts, which results in inefficiency in the framework. This study proposes a novel solution to the problem by applying the Enhanced Best Fit Merging algorithm (EBFM) that merges files depending on predefined parameters (type and size). Implementing this algorithm will ensure that the maximum amount of the block size and the generated file size will be in the same range. Its primary goal is to dynamically merge files with the stated criteria based on the file type to guarantee the efficacy and efficiency of the established system. This procedure takes place before the files are available for the Hadoop framework. Additionally, the files generated by the system are named with specific keywords to ensure there is no data loss (file overwrite). The proposed approach guarantees the generation of the fewest possible large files, which reduces the input/output memory burden and corresponds to the Hadoop framework's effectiveness. The findings show that the proposed technique enhances the framework's performance by approximately 64% while comparing all other potential performance-impairing variables. The proposed approach is implementable in any environment that uses the Hadoop framework, not limited to smart cities, real-time data analysis, etc.

Original languageEnglish
Pages (from-to)913-928
Number of pages16
JournalComputer Systems Science and Engineering
Volume46
Issue number1
DOIs
StatePublished - 2023
Externally publishedYes

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 11 - Sustainable Cities and Communities
    SDG 11 Sustainable Cities and Communities

Keywords

  • Big data
  • HDFS
  • Hadoop
  • MapReduce
  • small file

Fingerprint

Dive into the research topics of 'Enhanced Best Fit Algorithm for Merging Small Files'. Together they form a unique fingerprint.

Cite this