Linux ext3 Quick Guide

Related Topics:
i-node structure is explained at here.
A very good introduction to ext2 is at here.


Ext3 is compatible to ext2 filesystems; actually you can look at it as an ext2 filesystem with a journal file.

The journaling capability means no more waiting for fsck's or worrying about metadata corruption.
What is most noticeable is that you can switch back and forth between ext2 and ext3 on a partition without any problem:
it is just a matter of giving the mount command the right filesystem type.


ext3 journaling options

-o data=writeback
-o data=ordered (default)
-o data=journal


These journaling modes are the following: writing filesystem data and/or filesystem metadata

journal - logs all filesystem data and metadata changes. The slowest of the three ext3 journaling modes,
this journaling mode minimizes the chance of losing the changes you have made to any file in an ext3 filesystem.

ordered - only logs changes to filesystem metadata, but flushes file data updates to disk before making changes
to associated filesystem metadata. This is the default ext3 journaling mode.

writeback - only logs changes to filesystem metadata but relies on the standard filesystem write process to write file
data changes to disk. This is the fastest ext3 journaling mode.



The differences between these journaling modes are both subtle and profound. Using the "journal" mode requires that an ext3
filesystem write every change to a filesystem twice - once to the journal, and then again to the filesystem itself. This can
reduce the overall performance of your filesystem, but is the mode most beloved by users, because it minimizes the chances of
losing changes to your files since both metatdata and data updates are recorded in the ext3 journal and can be replayed when
a system reboots.

Using the "ordered" mode, only filesystem metadata changes are logged, which reduces redundancy between writing to the filesystem
and to the journal and is therefore faster. Though the changes to file data are not logged, they must be done before associated
filesystem metadata changes are made by the ext3 journaling daemon, which can slightly reduce the performance of your system.
However, using this journaling mode guarantees that files in the filesystem will never be out of sync with any related changes to
filesystem metadata.

Using the "writeback" mode is faster than the other two ext3 journaling modes because it only logs changes to filesystem metadata and
does not wait for associated changes to file data to be written before updating things like file size and directory information.
Because updates to file data are done asynchronously to journaled changes to filesystem metadata, files in the filesystem may exhibit
metadata inconsistencies such as owning data blocks to which updated data was not yet written when the system went down.
This isn't fatal, but can be disappointing to users.



Commands:

e2fsck
E2fsck will run the journal automatically, and if the filesystem is otherwise clean, it skip doing a full filesystem check.

By default, exhaustive fs check happens every twentieth mount or every 180 days, whichever comes first.

You can view the filesystem check interval (as well as lots of other interesting information) by typing tune2fs -l /dev/sdXX.


[root@localhost geyes]# tune2fs -l /dev/sdb1
tune2fs 1.35 (28-Feb-2004)
Filesystem volume name:
Last mounted on:
Filesystem UUID: 22e4830c-7f65-43c5-9060-b017b402ecb6
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode filetype sparse_super
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 32768
Block count: 131056
Reserved block count: 6552
Free blocks: 121221
Free inodes: 32733
First block: 1
Block size: 1024
Fragment size: 1024
Reserved GDT blocks: 256
Blocks per group: 8192
Fragments per group: 8192
Inodes per group: 2048
Inode blocks per group: 256
Filesystem created: Tue Mar 30 22:08:47 2010
Last mount time: Tue Mar 30 22:10:17 2010
Last write time: Tue Mar 30 23:21:51 2010
Mount count: 0
Maximum mount count: 36
Last checked: Tue Mar 30 23:21:51 2010
Check interval: 15552000 (6 months)
Next check after: Mon Sep 27 00:21:51 2010
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal inode: 8
Default directory hash: tea
Directory Hash Seed: 33370d2c-2965-45ca-8751-9ba4cc28ecf0
Journal backup: inode blocks




mke2fs -j /dev/hdax

fsck.ext2 -f /dev/hdaX

[root@localhost geyes]# e2fsck /dev/sdb1
e2fsck 1.35 (28-Feb-2004)
/dev/sdb1 is mounted.

WARNING!!! Running e2fsck on a mounted filesystem may cause
SEVERE filesystem damage.

Do you really want to continue (y/n)? yes

/dev/sdb1: recovering journal
/dev/sdb1: clean, 34/32768 files, 9835/131056 blocks




You can convert my ext2 partition to ext3 by two simple steps:
1. creating a journal on it. e.g. "tune2fs -j /dev/hdaX "
2. mount -t ext3 /dev/hdax /mnt/mount-point



With mke2fs -j /dev/hdaX you can format a partition as ext3 (as always it will be also usable as ext2 partion).



This can be done on an unmounted or on a mounted filesystem.

(on mounted partition)
If you create the journal on a mounted filesystem you will see a .journal file.

Don't try to delete this and don't back this up or restore it from backup!

(on unmounted partition)
If you run tune2fs -j on an unmounted partition an unvisible journal file will be created.



'df' command says partition is full, while 'du' reports free space
Some user process keeping a deleted file open.
When this happens, the space is not visible via 'du', since the file is no longer visible in the directory tree.
However, the space is still used by the file until it is deallocated, and that can only happen once the last process
which has the file open either closes its file descriptor to the file, or the process exits.
You can use the lsof program to try to find which process is keeping an open file.
Usually it's some log file, or some large data base file which gets rotated out, but some older process are still keeping the log file open.


If you have ext2 compiled into the kernel and ext3 as a module and your root filesystem is ext2/3, then the kernel will always mount the root fs as ext2 and
not as ext3 since at the time when the root filesystem is mounted, the kernel does not have access to the modules, since they are located on the root filesystem.
(This is a chicken and egg problem!)

If you have this setup, you might first want to consider whether it makes sense to compile ext3 as a module. If you want to use ext3 on your root filesystem,
the ext3 filesystem module will always be loaded and can not be unloaded, so it might as well be compiled-in. Furthermore, modules waste a tiny amount of memory
(on average 2k per module), and take up an extra entry in the TLB cache --- a slight, and perhaps not measurable disadvantage, but given that there is no real advantage
to compiling ext3 as a module, why bother?

If you do want to compile ext3 as a module and use it as your root filesystem, it can be done, but what you must do is do is boot into an initial ramdisk (initrd)
image as your root image. This initrd image will contain the necessary modules (scsi, ext3, etc.) so you can mount your "real" root filesystem and then use pivot_root
to replace the initrd root filesystem with the "real" root filesystem.

Most distributions do the pivot_root trick automatically, but they differ in how the tool which builds the initrd image needs to be called.

* On a SuSE system you have to put "jbd ext3" (in this order!) into the YaST setting INITRD_MODULES. Then do a mk_initrd and it should work.
* On a Red Hat system it seems that it is sufficiant to just do a mkinitrd.
* On a Debian system (woody and above) you must run the command:
mkinitrd -o boot/initrd.img-2.4.18-386 /lib/modules/2.4.18-386
where 2.4.18-386 must be substituted with the version number of the debian kernel which you are currently using. Note: this requires that you have the initrd-tools
package installed, and if you currently are not using an initrd setup, your lilo or grub configuration files must be modified to tell the booting kernel to use
the initrd image.

How do I convert my ext3 partition back to ext2?
Actually there is only little need to do so, because in most cases it is sufficient to mount the partition explicitely as ext2.
But if you really need to convert your partion back to ext2 just do the following on an unmounted partition:

tune2fs -O ^has_journal /dev/hdaX

To be on the safe side you should force a fsck run on this partition afterwards:

fsck.ext2 -f /dev/hdaX

After this procedure you can safely delete the .journal file if there was any.


If a system shutdown hard, even with journaling is it at all necessary to run e2fsck?

It's best to just always run e2fsck. [...]
E2fsck will run the journal automatically, and if the filesystem is otherwise clean, it skip doing a full filesystem check.
If the filesystem is not clean (because during the previous run the kernel noticed some filesystem inconsistencies), e2fsck will automatically do a full check
if it is necessary.
If you have multiple disks, fsck will run multiple e2fsck processes in parallel, thus speeding up your boot sequence than if you let the kernel replay the journal
for each filesystem when it tries to mount it, since then the journal replays will be done sequentially, instead of in parallel.

What is the largest possible size of an ext3 filesystem and of files on ext3?

Ext3 can support files up to 1TB. With a 2.4 kernel the filesystem size is limited by the maximal block device size, which is 2TB.

In 2.6 the maximum (32-bit CPU) limit is of block devices is 16TB, but ext3 supports only up to 4TB.

How do I convert the journal file from version 1 to version 2?
Just type:

mount /dev/hdaX /mnt -o journal=update

to convert your old (ext3 v0.0.3* and earlier) filesystem to the new journal format.

How do I convert my root filesystem from version 1 to version 2?
Just issue a "lilo -R linux rw rootflags=journal=update" and reboot.


I updated ext3 today. Got all of my mounts converted. Now on boot, I see: "EXT3-fs: mounted filesystem with ordered data mode". Is this normal?
Nigel Metheringham answered this on the ext3-users mailing list as follows:

That's fine. The EXT3-fs message is just telling you it mounted the fs OK. It's also telling you what form of journaling you are using.

ext3 has 2 formats of journal:

* version 1 - default and only possibility for ext3 releases 0.0.3* and earlier
* version 2 - default for filesystems created with 0.0.4 and later

Version 2 journals support additional semantics required to allow metadata journaling, and provide 2 new forms of journaling - ordered and writeback.
Both V1 & V2 journals support data journaling (where everything that would go to disk is journaled).


Examples:

[root@localhost sysconfig]# lvmdiskscan
/dev/cdrom [ 68.80 MB]
/dev/sda1 [ 101.94 MB]
/dev/sda2 [ 7.90 GB] LVM physical volume
/dev/sdb [ 512.00 MB]
2 disks
1 partition
0 LVM physical volume whole disks
1 LVM physical volume
[root@localhost sysconfig]# e2fsck /dev/sdb
e2fsck 1.35 (28-Feb-2004)
Couldn't find ext2 superblock, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/sdb

The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193


...
[root@localhost sysconfig]# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): m
Command action
a toggle a bootable flag
b edit bsd disklabel
c toggle the dos compatibility flag
d delete a partition
l list known partition types
m print this menu
n add a new partition
o create a new empty DOS partition table
p print the partition table
q quit without saving changes
s create a new empty Sun disklabel
t change a partition's system id
u change display/entry units
v verify the partition table
w write table to disk and exit
x extra functionality (experts only)

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-512, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-512, default 512): 128M

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (129-512, default 129):
Using default value 129
Last cylinder or +size or +sizeM or +sizeK (129-512, default 512): 128M
Value out of range.
Last cylinder or +size or +sizeM or +sizeK (129-512, default 512): +128M

Command (m for help): p

Disk /dev/sdb: 536 MB, 536870912 bytes
64 heads, 32 sectors/track, 512 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 128 131056 83 Linux
/dev/sdb2 129 251 125952 83 Linux


Command (m for help): v
782367 unallocated sectors

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@localhost sysconfig]# lvmdiskscan
/dev/cdrom [ 68.80 MB]
/dev/sda1 [ 101.94 MB]
/dev/sda2 [ 7.90 GB] LVM physical volume
/dev/sdb1 [ 127.98 MB]
/dev/sdb2 [ 123.00 MB]
1 disk
3 partitions
0 LVM physical volume whole disks
1 LVM physical volume
[root@localhost sysconfig]# pvscan
PV /dev/sda2 VG VolGroup00 lvm2 [7.88 GB / 32.00 MB free]
Total: 1 [7.88 GB] / in use: 1 [7.88 GB] / in no VG: 0 [0 ]


[root@localhost sysconfig]# mke2fs -j /dev/sdb1
mke2fs 1.35 (28-Feb-2004)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
32768 inodes, 131056 blocks
6552 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
16 block groups
8192 blocks per group, 8192 fragments per group
2048 inodes per group
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729

Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 36 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.