HDFS file operation fingerprints for forensic investigations
Understanding the Hadoop Distributed File System (HDFS) is currently an important issue for forensic investigators because it is the core of most Big Data environments. The HDFS requires more study to understand how forensic investigations should be performed and what artifacts can be extracted from this framework. The HDFS framework encompasses a large amount of data; thus, in most forensic analyses, it is not possible to gather all of the data, resulting in metadata and logs playing a vital role. In a good forensic analysis, metadata artifacts could be used to establish a timeline of events, highlight patterns of file-system operation, and point to gaps in the data.
This paper provides metadata observations for HDFS operations based on fsimage and hdfs-audit logs. These observations draw a roadmap of metadata changes that aids in forensic investigations in an HDFS environment. Understanding metadata changes assists a forensic investigator in identifying what actions were performed on the HDFS.
This study focuses on executing day-to-day (regular) file-system operations and recording which file metadata changes occur after each operation. Each operation was executed, and its fingerprints were detailed. The use of those fingerprints as artifacts for file-system forensic analysis was elaborated via two case studies. The results of the research include a detailed study of each operation, including which system entity (user or service) performed this operation and when, which is vital for most analysis cases. Moreover, the forensic value of examined observations is indicated by employing these artifacts in forensic analysis.