Unix Files and Directories

Key Concepts:

A filesystem normally has boot block, super block, i-node list, directory and data blocks.

The i-node contains all the information about the file: the file type, the file's access permission bits, the size of the file, pointers to the data blocks for the file, and so on. Most of the information in the stat structure is obtained from the i-node. Only two items are stored in the directory entry: the filename and the i-node number.

When renaming a file without changing filesystems, the actual content of the file need not be moved - all that needs to be done is to have a new directory entry point to the existing i-node and have the old directory entry removed.

Since the i-node number in the directory entry points to an i-node in the same filesystem, we cannot have a directory entry pint to an-inode in a different filesystem. This is why hardlink can't cross filesystems.

With a symbolic link, the actual contents of the file (the data blocks) contains the name of the file that the symbolic link points to.

Each i-node has a link count that contains the number of directory entries that point to the i-node. When the link count goes to 0 can the file be deleted. This is why "unlinking a file" does not always mean "deleting the blocks associated with the file." They is why the function that removes a directory is called unlink and not delete.

Any leaf directory always has a link count of 2. The value of 2 is from the directory entry that names the directory and from the entry for dot in that directory. What about the / filesystem's link count? Are the . and .. pointing to the same place or different ones?

Every directory in the working directory causes the working directory's link count to be increased by 1. (the .. entry of the new subdirectory)

Only a superuser process can create a new link that points to a directory. The reason is that doing this can cause loops in the filesystem.

When a file is closed the kernel first checks the count of the number of processes that have the file open. If this count has reached 0 then the kernel checks the link count, and if it is 0, the file's contents are deleted.

A directory is just a file containing directory entries(filenames and associate i-node numbers). Adding, deleting, or modifying these directory entries can affect all three times associated with that directory. With permissions, we can create new files in the direcotry and remove files from it, but we can't write to the directory itself. Only the kernel can write to a directory.



R
eferences:

Unix UFS i-node Structure is explained at this site.

A very good lecture note about file system is at here.

A very good brief introduction of Unix filesystems and Linux extfs and ext2fs is at : http://web.mit.edu/tytso/www/linux/ext2intro.html

For a complete description of Unix Files and Directories, please read Richard Stevens' Advanced Programming in the UNIX Environment.





stat man page.
http://www.manpages.info/linux/stat.2.html


They all return a stat structure, which contains the following fields:

struct stat {
dev_t st_dev; /* device */
ino_t st_ino; /* inode */
mode_t st_mode; /* protection */
nlink_t st_nlink; /* number of hard links */
uid_t st_uid; /* user ID of owner */
gid_t st_gid; /* group ID of owner */
dev_t st_rdev; /* device type (if inode device) */
off_t st_size; /* total size, in bytes */
blksize_t st_blksize; /* blocksize for filesystem I/O */
blkcnt_t st_blocks; /* number of blocks allocated */
time_t st_atime; /* time of last access */
time_t st_mtime; /* time of last modification */
time_t st_ctime; /* time of last change */
};

The value st_size gives the size of the file (if it is a regular file
or a symlink) in bytes. The size of a symlink is the length of the
pathname it contains, without trailing NUL.

The value st_blocks gives the size of the file in 512-byte blocks.
(This may be smaller than st_size/512 e.g. when the file has holes.)
The value st_blksize gives the "preferred" blocksize for efficient file
system I/O. (Writing to a file in smaller chunks may cause an ineffi-
cient read-modify-rewrite.)

Not all of the Linux filesystems implement all of the time fields.
Some file system types allow mounting in such a way that file accesses
do not cause an update of the st_atime field. (See `noatime' in
mount(8).)

The field st_atime is changed by file accesses, e.g. by execve(2),
mknod(2), pipe(2), utime(2) and read(2) (of more than zero bytes).
Other routines, like mmap(2), may or may not update st_atime.

The field st_mtime is changed by file modifications, e.g. by mknod(2),
truncate(2), utime(2) and write(2) (of more than zero bytes). More-
over, st_mtime of a directory is changed by the creation or deletion of
files in that directory. The st_mtime field is not changed for changes
in owner, group, hard link count, or mode.

The field st_ctime is changed by writing or by setting inode informa-
tion (i.e., owner, group, link count, mode, etc.).



Sample code:

/* statinfo.c - demonstrates using stat() to obtain
* file information.
* - some members are just numbers...
*/
#include
#include
#include

void show_stat_info(char *, struct stat *);

int main(int ac, char *av[])
{
struct stat info; /* buffer for file info */

if (ac>1)
if( stat(av[1], &info) != -1 ){
show_stat_info( av[1], &info );
return 0;
}
else
perror(av[1]); /* report stat() errors */
return 1;
}
void show_stat_info(char *fname, struct stat *buf)
/*
* displays some info from stat in a name=value format
*/
{
printf(" inode: %o\n", buf->st_ino); /* inode */
printf(" mode: %o\n", buf->st_mode); /* type + mode */
printf(" links: %d\n", buf->st_nlink); /* # links */
printf(" user: %d\n", buf->st_uid); /* user id */
printf(" group: %d\n", buf->st_gid); /* group id */
printf(" size: %d\n", buf->st_size); /* file size */
printf("modtime: %d\n", buf->st_mtime); /* modified */
printf(" name: %s\n", fname ); /* filename */
}

Execution:

[root@ipc4 filesystems]# ./fileinfo /etc/passwd
mode: 100644
links: 1
user: 0
group: 0
size: 1888
modtime: 1256242470
name: /etc/passwd

-rw-r--r-- 1 root root 1888 Oct 22 13:14 /etc/passwd


stat -> st_mode tells us inode's type, suid,sgid,sticky, ugo permissions.
The mode (100644) or -rw-r--r-- is the value of st_mode (type+mode), the type is 1000 and
the (unix file access) mode is 644.

It's binary and its bit break down are defined as below:

1000 644 => 1000 0 0 0 110 110 100
type suid sgid sticky u(rw-) group(r--) other(r--)

For type's value, it's defined as follow in

S_IFLNK 0120000 symbolic link
S_IFREG 0100000 regular file
S_IFBLK 0060000 block device
S_IFDIR 0040000 directory
S_IFCHR 0020000 character device
S_IFIFO 0010000 fifo

About file's a_time, c_time, m_time

The field st_atime is changed by file accesses, e.g. by execve(2),
mknod(2), pipe(2), utime(2) and read(2) (of more than zero bytes).
Other routines, like mmap(2), may or may not update st_atime.

The field st_mtime is changed by file modifications, e.g. by mknod(2),
truncate(2), utime(2) and write(2) (of more than zero bytes). More-
over, st_mtime of a directory is changed by the creation or deletion
of files in that directory. The st_mtime field is not changed for
changes in owner, group, hard link count, or mode.

The field st_ctime is changed by writing or by setting inode informa-
tion (i.e., owner, group, link count, mode, etc.).

------------------------------------




A simple 'ls' command in C:

main( argc, *argv[] )
while ( -- argc )
do do_ls ( *argv )
done

do_ls (dir[])
opendir ( dir )
dostat (direntp->d_name);
closedir(dir);

dostat ( *filename)
stat (filename, &stat_info)
show_file_info (filename, & stat_info);

show_file_info ( *filename, stat_info)
mode_to_letters (stat_info->st_mode, modestr)
uid_to_name (stat_info->st_uid)
gid_to_name (stat_info->st_gid)
utilitiy_functions
uid_to_name,
gid_to_name,
mode_to_letters;



/* lss.c
* purpose list contents of directory or directories
* action if no args, use . else list files in args
* note uses stat and pwd.h and grp.h
* BUG: try lss /tmp
*/
#include
#include
#include
#include

void do_ls(char[]);
void dostat(char *);
void show_file_info( char *, struct stat *);
void mode_to_letters( int , char [] );
char *uid_to_name( uid_t );
char *gid_to_name( gid_t );

main(int ac, char *av[])
{
if ( ac == 1 )
do_ls( "." );
else
while ( --ac ){
printf("%s:\n", *++av );
do_ls( *av );
}
}

void do_ls( char dirname[] )
/*
* list files in directory called dirname
*/
{
DIR *dir_ptr; /* the directory */
struct dirent *direntp; /* each entry */

if ( ( dir_ptr = opendir( dirname ) ) == NULL )
fprintf(stderr,"ls1: cannot open %s\n", dirname);
else
{
while ( ( direntp = readdir( dir_ptr ) ) != NULL )
dostat( direntp->d_name );
closedir(dir_ptr);
}
}

void dostat( char *filename )
{
struct stat info;

if ( stat(filename, &info) == -1 ) /* cannot stat */
perror( filename ); /* say why */
else /* else show info */
show_file_info( filename, &info );
}

void show_file_info( char *filename, struct stat *info_p )
/*
* display the info about 'filename'. The info is stored in struct at *info_p
*/
{
char *uid_to_name(), *ctime(), *gid_to_name(), *filemode();
void mode_to_letters();
char modestr[11];

mode_to_letters( info_p->st_mode, modestr );

printf( "%s" , modestr );
printf( "%4d " , (int) info_p->st_nlink);
printf( "%-8s " , uid_to_name(info_p->st_uid) );
printf( "%-8s " , gid_to_name(info_p->st_gid) );
printf( "%8ld " , (long)info_p->st_size);
printf( "%.12s ", 4+ctime(&info_p->st_mtime));
printf( "%s\n" , filename );

}

/*
* utility functions
*/

/*
* This function takes a mode value and a char array
* and puts into the char array the file type and the
* nine letters that correspond to the bits in mode.
* NOTE: It does not code setuid, setgid, and sticky
* codes
*/
void mode_to_letters( int mode, char str[] )
{
strcpy( str, "----------" ); /* default=no perms */

if ( S_ISDIR(mode) ) str[0] = 'd'; /* directory? */
if ( S_ISCHR(mode) ) str[0] = 'c'; /* char devices */
if ( S_ISBLK(mode) ) str[0] = 'b'; /* block device */

if ( mode & S_IRUSR ) str[1] = 'r'; /* 3 bits for user */
if ( mode & S_IWUSR ) str[2] = 'w';
if ( mode & S_IXUSR ) str[3] = 'x';

if ( mode & S_IRGRP ) str[4] = 'r'; /* 3 bits for group */
if ( mode & S_IWGRP ) str[5] = 'w';
if ( mode & S_IXGRP ) str[6] = 'x';

if ( mode & S_IROTH ) str[7] = 'r'; /* 3 bits for other */
if ( mode & S_IWOTH ) str[8] = 'w';
if ( mode & S_IXOTH ) str[9] = 'x';
}

#include

char *uid_to_name( uid_t uid )
/*
* returns pointer to username associated with uid, uses getpw()
*/
{
struct passwd *getpwuid(), *pw_ptr;
static char numstr[10];

if ( ( pw_ptr = getpwuid( uid ) ) == NULL ){
sprintf(numstr,"%d", uid);
return numstr;
}
else
return pw_ptr->pw_name ;
}

#include

char *gid_to_name( gid_t gid )
/*
* returns pointer to group number gid. used getgrgid(3)
*/
{
struct group *getgrgid(), *grp_ptr;
static char numstr[10];

if ( ( grp_ptr = getgrgid(gid) ) == NULL ){
sprintf(numstr,"%d", gid);
return numstr;
}
else
return grp_ptr->gr_name;
}

[root@ipc4 filesystems]# ./lss /home/shan/prog/unix .
/home/shan/prog/unix:
drwxr-xr-x 2 root root 4096 Oct 28 22:00 .
drwxr-xr-x 3 root root 4096 Oct 28 17:23 ..
filesystems: No such file or directory
.:
-rw-r--r-- 1 root root 3446 Oct 28 17:00 ls2.c
-rwxr-xr-x 1 root root 5382 Oct 28 19:09 fileinfo
-rw-r--r-- 1 root root 332 Oct 28 16:58 Makefile
-rwxr-xr-x 1 root root 7101 Oct 28 19:43 ls2
drwxr-xr-x 2 root root 4096 Oct 28 22:00 .
-rwxr-xr-x 1 root root 5075 Oct 28 17:00 filesize
drwxr-xr-x 3 root root 4096 Oct 28 17:23 ..
-rw-r--r-- 1 root root 322 Oct 28 16:58 filesize.c
-rw-r--r-- 1 root root 1216 Oct 28 19:09 fileinfo.c
-rw-r--r-- 1 root root 4467 Oct 28 19:12 fs.txt
-rw-r--r-- 1 root root 9524 Oct 28 19:04 stat.man
-rwxr-xr-x 1 root root 5382 Oct 28 19:09 f
-rw-r--r-- 1 root root 718 Oct 28 16:59 ls1.c
-rwxr-xr-x 1 root root 7101 Oct 28 22:00 lss
-rw-r--r-- 1 root root 3446 Oct 28 21:58 lss.c