Skip to main navigation Skip to search Skip to main content

A New Merging Numerous Small Files Approach for Hadoop Distributed File System

  • School of Electrical and Electronic Engineering

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

In the current era of big data, enormous data is being recorded every second from multiple streams and multiple environments of different types. This hugely generated data is processed with the support of specialized tools such as Hadoop which ensures the processing of data by considering the memory, process allocation, size, and storage. Hadoop framework is known to be efficient with few files of large size rather than many files of small size which caused lots of issues for the framework to work efficiently and the time required for the processing is hugely increased. To eliminate this issue, this work proposes a new algorithm for merging many files of small size into a single large file based on certain match criteria (type and size). This process will be executed before the files are passed to the Hadoop framework. The proposed algorithm ensures that it will generate the least number of large files that reduces the I/O memory load and correlates with the efficiency of the Hadoop framework. The results prove that the proposed algorithm increases the efficiency and the time required by the Hadoop framework for processing by approximately 40% over all the possible factors that hinder the performance.

Original languageEnglish
Title of host publication19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, ECTI-CON 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665485845
DOIs
StatePublished - 2022
Externally publishedYes
Event19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, ECTI-CON 2022 - Prachuap Khiri Khan, Thailand
Duration: 24 May 202227 May 2022

Publication series

Name19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, ECTI-CON 2022

Conference

Conference19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, ECTI-CON 2022
Country/TerritoryThailand
CityPrachuap Khiri Khan
Period24/05/2227/05/22

Keywords

  • HDFS
  • Hadoop
  • Map Reduce
  • big data
  • data mining

Fingerprint

Dive into the research topics of 'A New Merging Numerous Small Files Approach for Hadoop Distributed File System'. Together they form a unique fingerprint.

Cite this