Ext2fs and forensics

From SecuriWiki

Finished by team Haiying Luan(55137911) and Simon Mackey(55231647).

Contents

Second Extended File System

First, I will briefly introduce what the Second Extended File System is, its features and how it looks like.

File System

Before we start introducing the Second Extended File System, we have to make clear what the file system is.

File System is a software mechanism that defines the way that files are named, stored, organized, and accessed on logical volumes of partitioned memory. It is also the structure of files on a disk medium that is visible via the operation system, and in Linux file system can refer to two very distinct things, the directory tree or the arrangement of files on disk partitions.

The Second Extended File System

The Ext2fs (Second Extended File System) is designed and implemented to fix some problems present in the first Extended File System. Its goal is to provide a powerful file system, which implements Linux file semantics and offers advanced features. Ext2fs is a very robust file system, it reduce the risk of data loss in intensive use, it have to include provision for extensions to allow users to benefit from new features without reformatting their file system.

Features

● Ext2fs supports standard Linux file types: regular files, directories, device special files and symbolic links.

● Ext2fs is able to manage file systems created on really big partitions. At present, the size of hard disk is far bigger than before, several hundreds GB are normal. So the general size of partition is changed from 10GB to 40GB, maybe more.

● Ext2fs provides long file names. It uses variable length directory entries. The maximal file name size is 255 characters. This limit could be extended to 1012 if needed.

● Ext2fs reserves some blocks for the super user (root). Normally, 5% of the blocks are reserved. This allows the administrator to recover easily from situations where user processes fill up file systems.

● File attributes allow the users to modify the kernel behavior when acting on a set of files. One can set attributes on a file or on a directory. In the later case, new files created in the directory inherit these attributes.

● Ext2fs allows the administrator to choose the logical block size when creating the file system. Block sizes can typically be 1024, 2048 and 4096 bytes. Using big block sizes can speed up I/O since fewer I/O requests, and thus fewer disk head seeks, need to be done to access a file. On the other hand, big blocks waste more disk space: on the average, the last block allocated to a file is only half full, so as blocks get bigger, more space is wasted in the last block of each file. In addition, most of the advantages of larger block sizes are obtained by Ext2 file system's preallocation techniques.

● Ext2fs implements fast symbolic links. A fast symbolic link does not use any data block on the file system. The target name is not stored in a data block but in the inode itself. This policy can save some disk space (no data block needs to be allocated) and speeds up link operations (there is no need to read a data block when accessing such a link). Of course, the space available in the inode is limited so not every link can be implemented as a fast symbolic link. The maximal size of the target name in a fast symbolic link is 60 characters. We plan to extend this scheme to small files in the near future.

● Ext2fs keeps track of the file system state. A special field in the superblock is used by the kernel code to indicate the status of the file system. When a file system is mounted in read/write mode, its state is set to “Not Clean”. When it is unmounted or remounted in read-only mode, its state is reset to “Clean”. At boot time, the file system checker uses this information to decide if a file system must be checked.

Forensic Characteristic

The Ext2fs kernel code contains many performance optimizations, which tend to improve I/O speed when reading and writing files, they are also very useful for forensic analysis.

The follow two allocation optimizations produce a very good locality of forensic:

● Related files through block groups

● Related blocks through the 8 bits clustering of block allocations.

First, Ext2fs contains many allocation optimizations. Block groups are used to cluster together related inodes and data: the kernel code always tries to allocate data blocks for a file in the same group as its inode. Due to this characteristic, we can undeletion most of file we have deleted easily than before.

Second, when writing data to a file, Ext2fs preallocates up to 8 adjacent blocks when allocating a new block. Preallocation hit rates are around 75% even on very full file systems. This preallocation achieves good write performances under heavy load. It also allows contiguous blocks to be allocated to files, thus it speeds up the future sequential reads. It also make finding the deleted file block easily.

Ext2fs physical data structure on the hard disk

The first aspect of using the Second Extended File System one has to grasp is that all the meta-data structures size are based on a "block" size rather than a "sector" size. This block size is variable depending on the size of the file system. On a floppy disk for example, it is 1KB (2 sectors), while on a 10GB partition, the block size is normally 4KB or 8KB (8 and 16 sectors respectively). Each block is further sub-divided into "fragments".

Except for the superblock, all meta-data structures are resized to fit into blocks. This is something to remember when trying to mount any other file system than one on a floppy. The "Inode Table Block" for example will contain more entries in a 4KB block than in a 1KB block, so one will have to take that into account when accessing this particular structure. The next major aspect is that the file system is split into "block groups". While a floppy would contain only one block group holding all the blocks of the file system, a hard disk of 10GB could easily be split into 30 of such block groups; each holding a certain quantity of blocks.

At the start of each block group are various meta-data structures detailing the location of the other, more informative, meta-data structures defining the current file system state. Because the organization of an ext2 file system on a floppy is similar with the organization of an ext2 file system on a hard disk, and the different of organization of an ext2 file system on between small and big hard disk just the size of the block, so I just use a organization of a normal ext2 file system as a example, represent in follow table:

Image:block1.JPG

Each block group contains a redundant copy of crucial filesystem control informations (superblock and the filesystem descriptors) and also contains a part of the filesystem (a block bitmap, an inode bitmap, a piece of the inode table, and data blocks). The structure of a block group is represented in this table:

Image:block2.JPG

Using block groups is a big win in terms of reliability: since the control structures are replicated in each block group, it is easy to recover from a filesystem where the superblock has been corrupted. This structure also helps to get good performances: by reducing the distance between the inode table and the data blocks, it is possible to reduce the disk head seeks during I/O on files.

Image:block3.JPG

The layout on disk is very predictable as long as you know a few basic information; block size, blocks per group, inodes per group. This information is all located in, or can be computed from, the superblock structure.

Without the superblock information, the disk is useless; therefore as soon as enough space is available, one or more superblock backups will be created on the disk.

The block bitmap and inode bitmap are used to identify which blocks and which inode entries are free to use. The data blocks is where the various files will be stored. Note that a directory is also seen as a file under Ext2.

In Ext2fs, directories are managed as linked lists of variable length entries. Each entry contains the inode number, the entry length, the file name and its length. By using variable length entries, it is possible to implement long file names without wasting disk space in directories. The structure of a directory entry is shown in this table:

Image:directory1.JPG

Here is an example:

Image:Ext2fs-Graph1.JPG

The Superblock, Group Descriptors and Inode Table are very important part of the ext2fs, so I will briefly explain them.

The superblock is the structure on an ext2 disk containing the very basic information about the file system properties. It is layed out in the following form:


offset    size  description
    0       4 	s_inodes_count
    4       4 	s_blocks_count
    8       4 	s_r_blocks_count
   12       4 	s_free_blocks_count
   16       4 	s_free_inodes_count
   20       4 	s_first_data_block
   24       4 	s_log_block_size
   28       4 	s_log_frag_size
   32       4 	s_blocks_per_group
   36       4 	s_frags_per_group
   40       4 	s_inodes_per_group
   44       4 	s_mtime
   48       4 	s_wtime
   52       2 	s_mnt_count
   54       2 	s_max_mnt_count
   56       2 	s_magic
   58       2 	s_state
   60       2 	s_errors
   62       2 	s_minor_rev_level
   64       4 	s_lastcheck
   68       4 	s_checkinterval
   72       4 	s_creator_os
   76       4 	s_rev_level
   80       2 	s_def_resuid
   82       2 	s_def_resgid
 -- EXT2_DYNAMIC_REV Specific --
   84       4 	s_first_ino
   88       2 	s_inode_size
   90       2 	s_block_group_nr
   92       4 	s_feature_compat
   96       4 	s_feature_incompat
  100       4 	s_feature_ro_compat
  104      16 	s_uuid
  120      16 	s_volume_name
  136      64 	s_last_mounted
  200       4 	s_algo_bitmap
 -- Performance Hints         --
  204       1 	s_prealloc_blocks
  205       1 	s_prealloc_dir_blocks
  206       2 	- (alignment)
 -- Journaling Support        --
  208      16 	s_journal_uuid
  224       4 	s_journal_inum
  228       4 	s_journal_dev
  232       4 	s_last_orphan
 -- Unused                    --
  236     788 - (padding)

The Group Descriptors is an array of the group_desc structure, each describing a "block group", giving the location of its inode table, blocks and inodes bitmaps, and some other useful informations. The group descriptors are located on the first block following the block containing the superblock structure. Here's what one of the group descriptor looks like:


 offset   size description
     0       4 bg_block_bitmap
     4       4 bg_inode_bitmap
     8       4 bg_inode_table
    12       2 bg_free_blocks_count
    14       2 bg_free_inodes_count
    16       2 bg_used_dirs_count
    18       2 bg_pad
    20      12 bg_reserved

The "Inode Table" is used to keep track of every file; their location, size, type and access rights are all stored in inodes. The filename is not stored in there though, within the inode tables all files are refenced by their inode number. There is one inode table per group and it can be located by reading the "bg_inode_table" in its associated group descriptor. There are s_inodes_per_group inodes per table. Each inode contain the information about a single physical file on the system. A file can be a directory, a socket, a buffer, character or block device, symbolic link or a regular file. So an inode can be seen as a block of information related to an entity, describing its location on disk, its size and its owner. An inode looks like this:


   offset size description
     0       2 i_mode
     2       2 i_uid
     4       4 i_size
     8       4 i_atime
    12       4 i_ctime
    16       4 i_mtime
    20       4 i_dtime
    24       2 i_gid
    26       2 i_links_count
    28       4 i_blocks
    32       4 i_flags
    36       4 i_osd1
    40  15 x 4 i_block
   100       4 i_generation
   104       4 i_file_acl
   108       4 i_dir_acl
   112       4 i_faddr
   116      12 i_osd2

For more detail, please click The Second Extended File System

A lot bit ext3fs

The ext3 filesystem is a journaling extension to the standard ext2 filesystem on Linux. Journaling results in massively reduced time spent recovering a filesystem after a crash, and is therefore in high demand in environments where high availability is important, not only to improve recovery times on single machines but also to allow a crashed machine's filesystem to be recovered on another machine when we have a cluster of nodes with a shared disk.

But the ext3fs do not support undeletion operation.

Forensics

As a multiuser and multitask operation system Linux, the deleted files are difficult to recovery. Even though the command of deleting just adds a deleting mark on a file inode, it does not delete the contents of files exactly, but other user and some processes that have write permission will cover those files quite fast. However, for single user or family user, it might retrieve deleted files back if those files are recovered soon.

About the recovery rate, for a computer, it is effectively a single-user workstation, and you weren't doing anything disk-intensive at the fatal moment of deleting those files, the recovery rate will be more than 80%.

The procedure principally involves finding the data on the raw partition device and making it visible again to the operating system. There are basically two ways of doing this: one is to modify the existing file system such that the deleted inodes have their ‘deleted’ flag removed, and hope that the data just magically falls back into place. The other method, which is safer but slower, is to work out where the data lies in the partition and write it out into a new file on another file system.

In this article,I just talk about the first way, because I really don't think it's wise to play with a file system at a low enough level for this to work. This method also has problems in that you can only reliably recover the first 12 blocks of each file. So if you have any long files to recover, you'll normally have to use the other method anyway. For write data elsewhere, you need to make sure you have a rescue partition somewhere -- a place to write out new copies of the files you recover. Hopefully, your system has several partitions on it: perhaps a root, a /haiying, and a /home. With all these to choose from, you should have no problem: just create a new directory on one of these.

There is a step you need to take before beginning to attempt your data recovery: Unmounting the file system. To find out how to actually retrieve your files, you need four steps: Finding the deleted inodes, Obtaining the details of the inodes, Recovering data blocks and Modifying inodes directly.

Unmounting the file system

Regardless of which method you choose, the first step is to unmount the file system containing the deleted files. I strongly discourage any urges you may have to mess around on a mounted file system. This step should be performed as soon as possible after you realise that the files have been deleted; the sooner you can unmount, the smaller the chance that your data will be overwritten.

The simplest method is as follows: assuming the deleted files were in the /haiying file system, say:

# umount /haiying

You may, however, want to keep some things in /haiying available. So remount it read-only:

# mount -o ro,remount /haiying

If the deleted files were on the root partition, you'll need to add a -n option to prevent mount from trying to write to /etc/mtab:

# mount -n -o ro,remount /

Regardless of all this, it is possible that there will be another process using that file system (which will cause the unmount to fail with an error such as ‘Resource busy’). There is a program which will send a signal to any process using a given file or mount point: fuser. Try this for the /haiying partition:

# fuser -v -m /haiying

This lists the processes involved. Assuming none of them are vital, you can say

# fuser -k -v -m /haiying

to send each process a SIGKILL (which is guaranteed to kill it), or for example,

# fuser -k -TERM -v -m /haiying

to give each one a SIGTERM (which will normally make the process exit cleanly).

Finding the deleted inodes

The next step is to ask the file system which inodes have recently been freed. This is a task you can accomplish with debugfs. Start debugfs with the name of the device on which the file system is stored:

# debugfs /dev/hda5

If you want to modify the inodes directly, add a -w option to enable writing to the file system:

# debugfs -w /dev/hda5

The debugfs command to find the deleted inodes is lsdel. So, type the command at the prompt: debugfs:lsdel

debugfs: 2692 deleted inodes found.
Inode Owner Mode Size Blocks Time deleted
164821 0 100600 8192 1/ 1 Sun May 13 19:22:46 2006
…………………………………………………………………………………
36137 0 100644 4 1/ 1 Tue Apr 24 10:11:15 2006
196829 0 100644 149500 38/ 38 Mon May 27 13:52:04 2006

Now, based only on the deletion time, the size, the type, and the numerical permissions and owner, you must work out which of these deleted inodes are the ones you want. With luck, you'll be able to spot them because they're the big bunch you deleted about five minutes ago. Otherwise, trawl through that list carefully.

I suggest that if possible, you print out the list of the inodes you want to recover. It will make life a lot easier.

In this case, I find 2692 inodes, we can see the list, the first field is inode number, the second field is the owner of inode, the third field is permission, and the follows are size of inode, using blocks, deleting time.

Obtaining the details of the inodes

Debugfs has a stat command which prints details about an inode. Issue the command for each inode in your recovery list. For example, if you're interested in inode number 196829, use this:

# debugfs:stat <196829>

it will display:

Inode: 196829 Type: regular Mode: 0644 Flags: 0x0 Version: 1
User: 0 Group: 0 Size: 149500
File ACL: 0 Directory ACL: 0
Links: 0 Blockcount: 38
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x31a9a574 -- Mon May 27 13:52:04 2006
atime: 0x31a21dd1 -- Tue May 21 20:47:29 2006
mtime: 0x313bf4d7 -- Tue Mar 5 08:01:27 2006
dtime: 0x31a9a574 -- Mon May 27 13:52:04 2006
BLOCKS:
594858 594859 594866 594867 594868 594869 
…………………………………
TOTAL: 4

Recovering data blocks

For example I had before, this file has 4 blocks. although this is less than the limit of 12. Now we get debugfs to write the file into a new location, such as /mnt/hda/haiying.sav:

# debugfs:dump <196829> /mnt/hda6/haiying1.sav

This can also be done with fsgrab:

# fsgrab -c 2 -s 594858 /dev/hda5 >> /mnt/hda6/haiying1.sav
# fsgrab -c 4 -s 594866 /dev/hda5 >> /mnt/hda6/haiying1.sav
…………………………………

With either debugfs or fsgrab, there will be some garbage at the end of /mnt/hda/haiying.sav, but that's fairly unimportant. If the file we want retrieve is more than 12 blocks, the first 12 blocks called direct block, the 13th block is called the first indirect block, the 14th block is called the second indirect block. Actually the 13th and 14th block are not data, they are the index of next block position, just like that: ● The block numbers of the first 12 data blocks are stored directly in the inode; these are sometimes referred to as the direct blocks. ● The inode contains the block number of an indirect block. An indirect block contains the block numbers of 256 additional data blocks. ● he inode contains the block number of a doubly indirect block. A doubly indirect block contains the block numbers of 256 additional indirect blocks. ● The inode contains the block number of a triply indirect block. A triply indirect block contains the block numbers of 256 additional doubly indirect blocks. Do you remember, we have mention that the kernel code always tries to allocate data blocks for a file in the same group as its inode. So we can assume that the file was not fragmented, there are several layouts of data blocks, according to how many data blocks the file used:

0 to 12

The block numbers are stored in the inode, as described above.

13 to 268

After the direct blocks, count one for the indirect block, and then there are 256 data blocks.

269 to 65804

As before, there are 12 direct blocks, a (useless) indirect block, and 256 blocks. These are followed by one (useless) doubly indirect block, and 256 repetitions of one (useless) indirect block and 256 data blocks.

65805 or more

The layout of the first 65804 blocks is as above. Then follow one (useless) triply indirect block and 256 repetitions of a ‘doubly indirect sequence’. Each doubly indirect sequence consists of a (useless) doubly indirect block, followed by 256 repetitions of one (useless) indirect block and 256 data blocks.

Of course, even if these assumed data block numbers are correct, there is no guarantee that the data in them is intact. In addition, the longer the file was, the less chance there is that it was written to the file system without appreciable fragmentation (except in special circumstances).

You should note that I assume throughout that your blocksize is 1024 bytes, as this is the standard value. If your blocks are bigger, some of the numbers above will change. Specifically: since each block number is 4 bytes long, blocksize/4 is the number of block numbers that can be stored in each indirect block. So every time the number 256 appears in the discussion above, replace it with blocksize/4. The `number of blocks required' boundaries will also have to be changed.

Let's look at an example of recovering a longer file.

# debugfs:  stat <1387>
Inode: 148004   Type: regular    Mode:  0644   Flags: 0x0   Version: 1
User:   503   Group:   100   Size: 1851347
File ACL: 0    Directory ACL: 0
Links: 0   Blockcount: 3616
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x31a9a574 -- Mon May 27 13:52:04 2006
atime: 0x31a21dd1 -- Tue May 21 20:47:29 2006
mtime: 0x313bf4d7 -- Tue Mar  5 08:01:27 2006
dtime: 0x31a9a574 -- Mon May 27 13:52:04 2006
BLOCKS:
8314 8315 8316 8317 8318 8319 8320 8321 8322 8323 8324 8325 8326 8583
TOTAL: 14

There seems to be a reasonable chance that this file is not fragmented: certainly, the first 12 blocks listed in the inode (which are all data blocks) are contiguous. So, we can start by retrieving those blocks:

# fsgrab -c 12 -s 8314 /dev/hda5 > /mnt/recovered.001

Now, the next block listed in the inode, 8326, is an indirect block, which we can ignore. But we trust that it will be followed by 256 data blocks (numbers 8327 through 8582).

# fsgrab -c 256 -s 8327 /dev/hda5 >> /mnt/recovered.001

The final block listed in the inode is 8583. Note that we're still looking good in terms of the file being contiguous: the last data block we wrote out was number 8582, which is 8327 + 255. This block 8583 is a doubly indirect block, which we can ignore. It is followed by up to 256 repetitions of an indirect block (which is ignored) followed by 256 data blocks. So doing the arithmetic quickly, we issue the following commands. Notice that we skip the doubly indirect block 8583, and the indirect block 8584 immediately (we hope) following it, and start at block 8585 for data.

# fsgrab -c 256 -s 8585 /dev/hda5 >> /mnt/hda6/haiying2.sav
# fsgrab -c 256 -s 8842 /dev/hda5 >> /mnt/hda6/haiying2.sav
# fsgrab -c 256 -s 9099 /dev/hda5 >> /mnt/hda6/haiying2.sav
# fsgrab -c 256 -s 9356 /dev/hda5 >> /mnt/hda6/haiying2.sav
# fsgrab -c 256 -s 9613 /dev/hda5 >> /mnt/hda6/haiying2.sav
# fsgrab -c 256 -s 9870 /dev/hda5 >> /mnt/hda6/haiying2.sav

Adding up, we see that so far we've written 12 + (7 * 256) blocks, which is 1804. The ‘stat’ results for the inode gave us a ‘blockcount’ of 3616; unfortunately these blocks are 512 bytes long (as a hangover from UNIX), so we really want 3616/2 = 1808 blocks of 1024 bytes. That means we need only four more blocks. The last data block written was number 10125. As we've been doing so far, we skip an indirect block (number 10126); we can then write those last four blocks.

# fsgrab -c 4 -s 10127 /dev/hda5 >> /mnt/hda6/haiying2.sav

Now, with some luck the entire file has been recovered successfully.

References

Design and Implementation of the Second Extended Filesystem

The Second Extended File System, Internal Layout, Dave Poirier

EXT3, Journaling Filesystem, 20 July, 2000, Dr. Stephen Tweedie

Linux Ext2fs Undeletion mini-HOWTO

Linux文件系统的反删除方法简介

ext2文件系统下恢复误删除的文件


--Haiying Luan 15:16, 20 April 2006 (IST)