Monday, June 29, 2020

View FSImage and Edit Logs Files in Hadoop

Read this blog post, to learn how to View FSImage and Edit Logs Files in Hadoop and also we will be discussing the working of FsImage, edit logs and procedure to convert these binary format files which are not readable to human into XML file format.

So, let’s begin with knowing the working of FsImage and edit logs.
FsImage :
The contents of the FsImage is an “Image file” which contains a serialized form of all the directory and file inodes in the filesystem. These cannot be read with the normal file system tools like cat.
Here, each inode is an internal representation of a file or directory’s metadata. It contains information such as the file’s replication level, modification and access times, access permissions, block size, and the blocks the file is made up of.
For directories, the modification time, permissions, and quota metadata are stored. During many situations, it becomes absolutely important to read a clear text version of the FsImage. For example: To perform Namespace Analysis or to determine if the FsImage is corrupt etc. To resolve this kind of issue we can use a tool called Offline Image Viewer.
Offline Image Viewer :
To convert the contents of FsImage file into text, xml or other file formats we can use tool called Offline Image Viewer. This dumps the contents of hdfs FsImage files into human-readable formats in order to allow offline analysis and examination of an Hadoop cluster’s namespace.
The Offline Image Viewer tool is capable of processing very large image files relatively quickly, converting them to one of several output formats. The tool handles the layout formats that were included with Hadoop versions 16 and up. If the tool is not able to process an image file, it will exit cleanly. The Offline Image Viewer does not require any Hadoop cluster to be running, it is entirely offline in its operation.
Syntax :
hdfs oiv -i fsimage -o fsimage.xml
The simplest usage of the Offline Image Viewer is to provide just an input and output file, via the -i and -o command-line switches
Example :
In the below example we will be converting a FsImage file into .XML file format. which we have copied in our desktop path.
In the below diagram you can observe an outlook of FsImage.

We can use below command to convert the above file contents into readable form (xml file format).
hdfs oiv -i /home/acadgild/Desktop/fsimage_0000000000000000006 -o /home/acadgild/Desktop/fsimage.xml -p XML

The above code will run the Offline Image Viewer (oiv) tool and converts the above FsImage file into .XML format using XML output processor and store the output fsimage.xml file in the above-given path.
Output fsImage .xml file :

We can observe from the above figure we have successfully converted FsImage file into .XML file format and each Inode section tag consist values of modification time, access times (in seconds), access permissions, block size and quota of metadata stored for files and directories.
Hadoop
Now let us understand the working of edit Logs and how to convert these edit Log files into .XMl file format.
Edit Logs :
When a filesystem client performs any write operation (such as creating or moving a file), the transaction is first recorded in the edit log. The namenode also has an in-memory representation of the filesystem metadata, which it updates after the edit log has been modified. The in-memory metadata is used to serve read requests.
Conceptually the edit log is a single entity, but it is represented as a number of files on disk. Each file is called a segment and has the prefix edits and a suffix that indicates the transaction IDs contained in it.

Only one file is open for writes at any one time (edits_inprogress_00000000000000000020 in the preceding example), and it is flushed and synced after every transaction before a success code is returned to the client. For namenodes that write to multiple directories, the write must be flushed and synced to every copy before returning successfully. This ensures that no transaction is lost due to machine failure.
In case there is some problem with Hadoop cluster and the edits file is corrupted it is possible to save at least part of the edits file that is correct. This can be done by converting the binary edits to XML, edit it manually and then convert it back to binary.
Thus, to convert these edit log files into human readable form we can use Offline Edits viewer tool.
Offline Edits Viewer :
Offline Edits Viewer is also a tool which converts Edits log file contents into different file formats. The Offline Edits Viewer does not require a Hadoop cluster to be running, it is entirely offline in its operation.
Syntax :
hdfs oev -i edits -o editsoutput.xml
The simplest usage of the offline edit viewer is to provide just an input and output file, via the -i and -o command-line switches
Example :
In the below example we will be converting an edit log file into .XML file format. which we have copied in our desktop.
In the below diagram you can observe an outlook of Edit Log.

We can use below command to convert the above file contents into readable form (xml file format).
hdfs oev -i /home/acadgild/Desktop/edits_0000000000001_0000000000000014 -o /home/acadgild/Desktop/edit.xml -p XML

Output Edit_log .xml file :

We observe from the above figure that we have successfully converted FsImage file into .XML file format. Each Record section tag consists of subtags like opcode and Data consisting of fields like inode id, timestamp at what time we have accessed the above path, username, groupname and other tags.

No comments:

Post a Comment