UNIX Unleashed, System Administrator's Edition

- 4 -

The UNIX File System

by Sanjiv Guha

In the UNIX operating system, a file is a repository of raw or processed data stored as a stream of bytes (also known as characters). In UNIX, the data is encoded using ASCII, although systems such as IBM 3090 mainframe store a file's data in EBCDIC. The ASCII and EBCDIC codes are different from each other; that is, the same code means different things and the same character is denoted by different code in these two coding schemes. On different operating systems, the data is stored differently, which might cause problems if you are trying to process files created on a different operating system. You will need special programs to convert the data in the files created in one operating system to suit the needs of another.

Files contain different types of information. For example, a file can contain source code for a program in C or COBOL or C++, be a text document containing mail from a friend, or contain executable code for a program. These are some of the native file types supported by UNIX; that is, you can view or copy these types of files using UNIX commands. However, some files can't be processed by native UNIX commands. For example, a file containing data for a third-party database such as Oracle will need special programs to be processed, viewed, and so on.

A file can reside on different media. A file can also be a permanent file on disk, a temporary file in the memory, or a file displaying or accepting data from the terminal. If the file is a permanent file, you might be able to view it--but if the file is temporary, you might not know it exists.

The functions usually performed on a file are as follows:

Opening a file for processing
Reading data from a file for processing
Writing data to a file after processing
Closing a file after all the necessary processing has been done

Now that you have an idea about what a file is and what it contains, it's time to learn more about different types of files you will encounter.

File Types

This section discusses the various file types available in UNIX. You might be familiar with some of these types of files, such as text documents and source files.

Regular Files

Regular files are the ones with which you are probably most familiar. They are permanent in nature and contain data such as source code for a program, mail received from the boss, and a letter that you are writing to a friend. These files almost always contain text information. In these files, the data is organized into records. If, for example, this book were a file containing data about UNIX operating systems, each line in this book would be called a record.

How does UNIX know about these records? The answer is that there is a special character, called a newline character, which is used by UNIX to find out where a record ends and the next one starts. As you will see later, most of the UNIX commands support text processing. However, keep in mind that text files are not the only type of regular files. Some files have a stream of bytes without any newline characters. Although UNIX is built to process text documents, the data in these files cannot be processed by UNIX.

The following are examples of some of the regular files:

prog.c is a file containing a C source program.
prog.cbl is a file containing a COBOL source program.
prog.exe is a file containing executable code for a program.
invite.doc is a file containing an invitation to a party from a co-worker.

NOTE:The examples provided here follow the usual UNIX file naming conventions. However, these are just conventions, not the rules. So, it is possible for someone to name a file prog.c, even though it contains a letter to his or her boss.

Here is an example of the list of attributes a file has. The file is called testfile, and the attributes are obtained by using the following command:

ls -al testfile

The result is

rwxr-xr-x   2 guhas    staff       1012 Oct 30 18:39 testfile

UNIX keeps track of the file attributes using a data-structure called i-node. Each i-node in the system is identified by a number called the i-node number. Each file in the system has an associated i-node that contains information such as the following:

Ownership details of a file
Permission details of a file
Timestamps of a file (date and time of creation, data and time of modification, and so on)
Type of the file

A number of timestamps are associated with a file. These times are

Last access time
Last modification time
Last i-node modification time

The last access time changes whenever you perform any operation on a file. The last modification date changes when the contents of the file are modified. The last i-node modification time is when any of the information stored in the i-node changes.

NOTE:Some UNIX versions, for instance, AIX, do not modify the last access time when you execute them.

Directory Files

A directory file is a special file that contains information about the various files stored in the directory, such as file locations, file sizes, times of file creation, and file modifications. This special file can be read only by the UNIX operating system or programs expressly written to do directory processing. You may not view the content of the directory file, but you may use UNIX commands to inquire about these attributes of the directory. A file directory is like a telephone directory that contains address information about the files in it. When you ask UNIX to process a filename, UNIX looks up the specified directory to obtain information about the file. In each directory, you will always find two files:

1. . (single period)
2. .. (two consecutive periods)

The single period (.) refers to the current directory, and the two consecutive periods (..) refer to the directory one level up (sometimes referred to as parent directory).

An example of the directory attributes of a testdir are presented here:

drwxr-xr-x   2 guhas    writer       512 Oct 30 18:39 testdir

rwxr-xr-x defines the permissions of testdir created by a user called guhas belonging to a group called writer. The size of the directory entry testdir is 512 bytes. The directory was last modified on October 30 at 6:39 p.m.

A directory is treated as a file by UNIX, but it has some special characteristics. A directory has at least two names. For example, if the current directory were /u/guhas and you created a sub-directory called testdir, two links would be created:

/u/guhas/testdir
/u/guhas/testdir/.

The entry /u/guhas/testdir is created in the directory /u/guhas, and the entry /u/guhas/testdir/. is created in the directory /u/guhas/testdir.

First, the entry /u/guhas/testdir is created as an empty directory and then is linked to /u/guhas/testdir/. (single period). Both these links exist during the life of the directory and are deleted when you delete the directory.

Character and Block Device Files

The character special files are used for unbuffered I/O to and from a device, and the block special files are used when data is transferred in fixed-size packets. The character special files do I/O on one character at a time mode while the block special file use buffer chaching mechanism to increase the efficiency of data transfer by keeping in-memory copy of the data. Some examples of these files are

Floppy disk device--character or block special file
Tape device--character special file
Terminal--character special file

UNIX treats the keyboard and the monitor (terminal) as files. The keyboard is considered an input file, also referred to as a standard input file (stdin in UNIX terminology). The terminal is considered an output file, also referred to as the standard output file (stdout in UNIX terminology).

An important corollary of the standard input and output is referred to as I/O redirection. In UNIX, using I/O redirection makes it possible to change the standard input file from keyboard to a regular file, and change the standard output file from terminal to a new or existing regular file.

All UNIX commands, by default, accept input from standard input, display output on standard output, and send error messages to standard error output. By using I/O redirection, it is possible to control the source and destination of the command input and output, respectively. It is possible to direct the output of a command to a different file than the standard output. Similarly, it is possible to accept the input from a file rather than standard input. It is also possible to direct error messages to a file rather than the standard output. This gives you the flexibility to run commands in background, where these special files, that is, standard input, standard output, and standard error output, are not available. You can use regular files to redirect these input, output, or error messages when you are running commands in the background.

Another interesting special file is the bit bucket. This is defined as the file /dev/null. If you redirect the output of a command to /dev/null, the output is not produced at all. Suppose you wanted to run a command and were interested only in finding out whether the command execution generated errors. You would redirect the standard output to /dev/null. When you do so, the output will not be produced for the command.

Sockets

A socket is an application programming interface (API), which is used to communicate between two host computers. In other words, the socket performs network I/O. The abstraction of socket has been designed similar to files, but a socket is not a real file. To use a socket in a program, create a socket, and configure the socket with the required addresses of the local and remote hosts. After the socket is connected, the program can use the socket to communicate with the remote hosts. However, there are ways to communicate between hosts using connectionless sockets. A connected socket transfers data between two points between which connection has been established. In the case of a connectionless socket for each transfer the destination address has to be specified; that is, transfer is not limited between two points. A connectionless socket can be used to communicate between any two computers in a network.

A network program communication has typically two parts: a client and server. Client programs usually actively seek to connect to the server; server programs passively listen for incoming requests from clients. UNIX I/O does not have passive capabilities. So, the sockets, although similar to files, are not exactly identical to files. Sockets have extra system functions to handle capabilities needed by servers, such as passively listening and waiting for client requests.

The files with which most people are familiar reside on hard disks and have fixed addresses. Although an address may be modified by moving the file to a new location, this does not happen during operations on the file. This concept is suited for fixed-connection network communications. However, computers might need to communicate without fixed addresses, using connectionless communication. For connectionless communication, the UNIX concept of a file does not work because the point-to-point connection is not achieved. For this purpose, sockets have a number of special APIs.

Let us see how a connectionless communication is achieved. The program specifies the destination address to which the data has to be delivered. However, the program does not actually deliver this data; instead, it passes the data to the network to do the actual delivery.

The following is a list of socket API functions to transmit data:

send: Transmits data through a connected socket
write: Transmits data through a connected socket using a simple data buffer
writev: Transmits data through a connected socket (using noncontiguous memory locations)
sendto: Transmits data through an unconnected socket
sendmsg: Transmits data through an unconnected socket using a special data structure

The following is a list of functions used for receiving data using a socket:

recv: Reads data through a connected socket
read: Reads data through a connected socket using simple buffer
readv: Reads data through a connected socket (using noncontiguous memory locations)
recvfrom: Reads data through an unconnected socket
recvmsg: Reads data through an unconnected socket using a special data structure

A socket has a number of arguments associated with it. These arguments must be specified during socket creation. The first argument is the communication protocol family to be used to communicate. A number of protocols are available, of which Internet's TCP/IP is the most popular. While working with the protocol families, you should also know about the address families. Each network has a different format for the address of computers attached to it. The second argument is the type of the communication to be used. This data can be sent as a stream of bytes, as in a connection-oriented communication or as a series of independent packets (called datagrams), as in a connectionless communication. The last argument is the actual protocol to be used, which is part of the protocol family specified as the first argument.

Named Pipes

A named pipe is a file created to do inter-process communication. That is, it serves as a go-between for data between two programs. The sending process writes data to the named pipe, and the receiving process reads data from the named pipe. It is a temporary file that lasts as long as the processes are communicating. The data is processed in a FIFO (first-in, first-out) basis from the named pipe.

Symbolic and Hard Links

Links create pointers to the actual files, without duplicating the contents of the files. That is, a link is a way of providing another name to the same file. There are two types of links to a file:

Hard link
Symbolic (or soft) link; also referred to as symlink

With hard links, the original filename and the linked filename point to the same physical address and are absolutely identical. There are two important limitations of a hard link. A directory cannot have a hard link, and it cannot cross a file system. (A file system is a physical space within which a file must reside; a single file cannot span more than one file system, but a file system can have more than one file in it.) It is possible to delete the original filename without deleting the linked filename. Under such circumstances, the file is not deleted, but the directory entry of the original file is deleted and the link count is decremented by 1. The data blocks of the file are deleted when the link count becomes zero.

With symbolic or soft links, there are two files: One is the original file, and the other is the linked filename containing the name of the original file. An important limitation of the symbolic link is that you may remove the original file and it will cause the linked filename to be there, but without any data. However, a symbolic linked filename can cross file systems.

You should be careful about symbolic links. If you are not, you will be left with files that do not point anywhere because the original file has been deleted or renamed.

An important feature of the symbolic link is that it can be used to link directories as well as files.

If we have a file called origfile in the directory /u/guhas, whose characteristics are

-rw-r--r--   2 guhas    writer         30 Nov  8 01:14 origfile

a file called hlinkfile, which has been hard linked to origfile, will have the following characteristics

-rw-r--r--   2 guhas    writer         30 Nov  8 01:20 hlinkfile

The 2 before guhas signifies that there are two files linked to the same physical address (origfile and hlinkfile).

A file called slinkfile, which has been soft linked to origfile, will have the following characteristics

lrwxrwxrwx   1 guhas    writer          8 Nov  8 01:18 slinkfile -> origfile

The link is evident in the filename. In this case, if you delete origfile, slinkfile will be rendered useless.

Naming Files and Directories

Each file is identified by a name, which is a sequence of characters. The older versions of UNIX had limitations on the numbers of characters that could be used in a filename. All the newer versions of UNIX have removed this limitation. You should be careful when naming the files, though. Although UNIX allows most characters to be used as part of the filename, some of the characters have special meaning in UNIX and can pose some problems.

For example, the character > is used as an output redirection operator in UNIX. If you wanted to create a file named x>y, you would use the touch command:

touch x>y

You would then get two files: one named x and one named y.

To circumvent this problem, use a special character (\) (in Korn and C shell) and use the touch command, as follows:

touch x\>y

CAUTION: Using special characters such as asterisks (*) and dollar signs ($) as part of the filename doesn't work because the shell interprets these characters differently. The presence of these characters can trigger the shell to interpret the filename as a command and execute it.

The following is a list of characters that may be used as part of the UNIX filenames:

A through Z or a through z
Numerals 0 through 9
Underscore (_)
Period (.)

The underscore can separate words in a filename, thus making the filename easier to read. For example, instead of naming a file mytestfile, you could name it my_test_file.

A period may be used to append an extension to a filename in a way similar to DOS filenames. For example, a C language source file containing a program called prog may be named prog.c. However, in UNIX you are not limited to one extension. You should keep in mind that a period (.), when used as the first character in a filename, has a special meaning. The period as the first character gives the file a hidden status. For example, if you had the files x and .x in your current directory, issuing an ls command will show you only the file x. To list both the files, use ls -a.

CAUTION: UNIX is case-sensitive. For example, a file named abc is different from a file named ABC.

Some of the system files which begin with a . (period), also called hidden files will not be displayed until special flags are used. For example, .profile file.

Table 4.1 provides a list of characters or combination characters that should be avoided because they have special meanings. This list is not exhaustive and depends on the UNIX shell you are running.

Table 4.1. Meaning of some special characters.

Character Meaning

$ Indicates the beginning of a shell variable name. For example, $var will look for a shell variable named var.

| Pipes standard output to next command.

# Start a comment.

& Executes a process in the background.

? Matches one character.

* Matches none or more characters.

$# Number of arguments passed to a shell script.

$* Arguments passed to a shell script.

$? Returns code from the previous executed command.

$$ Process identification number.

> Output redirection operator.

< Input redirection operator.

' (backquote) Command substitution.

>> Output redirection operator (to append to a file).

[ ] Lists a range of characters. [a-z] means all characters a through z. [a,z] means characters a or z.

. filename Executes the file filename

: Directory name separator in the path.

File System Organization

This chapter has discussed different types of files and filenames. You have also learned about special directory files. In this section, you will learn about the ways UNIX provides for organizing files so that you can easily locate and use them.

UNIX has provided the directory as a way of organizing files. The directory is a special file under which you can have files or more directories (also referred to as subdirectories). You can visualize the UNIX file structure as a bottom-up tree with the root at the top. Thus, the top-level directory is called the root directory and is denoted by a single / (forward slash). All the directories and files belong to the root directory. You can also visualize the UNIX file system as a file cabinet in which the file cabinet is the root directory, the individual cabinets are various directories under the root directory, the file folders are the subdirectories, and the files in the individual folders are the files under the directories or subdirectories. Figure 4.1 shows a typical directory tree structure.

Figure 4.1.
A directory tree.

Table 4.2 provides you with a list of standard directory names in the UNIX file system. This list is not exhaustive. A complete list would depend on the UNIX system you are working with.

Table 4.2. List of standard UNIX directories.

Directory name Details about the directory

/ Root directory. This is the parent of all the directories and files in the UNIX file system.

/bin Command-line executable directory. This directory contains all the UNIX native command executables.

/dev Device directory containing special files for character- and block-oriented devices such as printers and keyboards. A file called null existing in this directory is called the bit bucket and can be used to redirect output to nowhere.

/etc System configuration files and executable directory. Most of the administrative, command-related files are stored here.

/lib The library files for various programming languages such as C are stored in this directory.

/lost+found This directory contains the in-process files if the system shuts down abnormally. The system uses this directory to recover these files. There is one lost+found directory in all disk partitions.

/u Conventionally, all the user home directories are defined under this directory.

/usr This directory has a number of subdirectories (such as adm, bin, etc, and include. For example, /usr/include has various header files for the C programming language.

Pathnames

In UNIX, the filename used by the operating system to uniquely identify the file has all the directory names start from the root directory as part of the filename. This allows you to use the same filename for different files present under a different directory. For example, if you kept mail received by month and day, you could create directories named january, february, march, and so on. Under each of these directories, you could create files such as day01, day02, and day03. The same holds true for directories. That is, you can have the same directory name under different directories.

This brings in the concept of current directory and relative pathnames. For example, if you were in the directory named january and you executed the command

ls -l day01

you would get the attributes of the file day01 under the january directory, which means that UNIX looked in the directory you were currently in to find out whether the file you specified was present or not. All the commands in UNIX use the current directory to resolve the filename if the filename does not have directory information. The relative pathname is always specified relative to the current directory you are in.

If you were in the directory january and wanted to get the attributes of file day01 in directory february, you would specify the absolute pathname of the file. That is, you would execute the command

ls -l /u/guhas/february/day01

UNIX uses the special characters .. (two consecutive periods) as a relative pathname to indicate the directory one level up or the parent directory. For example, if you were in the directory /u/guhas/january, .. (two consecutive periods) in the relative pathname would indicate the /u/guhas directory (which is the parent directory of /u/guhas/january) and ../.. in the relative pathname would indicate the /u directory.

Working with Directories

When working with UNIX, you will always be placed in a directory. The directory you are in will depend on what you are working on. The directory you are currently in is called the current directory. UNIX uses the current directory information to resolve the relative pathname of a file.

A forward slash (/) in the filename means that you are working with a file in another directory. If the filename starts with .. (two consecutive periods), you are using them (two consecutive periods) to get to a file using the relative pathname of the file. If there are no .. (two consecutive periods) in the filename, you are trying to get to a file using the absolute pathname of the file. An absolute pathname always starts with a forward slash (/).

When you log into a UNIX system, the directory you are placed in is known as the home directory. Each user in the system has his or her own home directory and, by convention, it is /u/username. Korn shell and C shell use a special character tilde (~) as a shortcut to identify the home directory of a user. For example, if guhas is the user currently logged in, the following would hold true:

~ refers to the home directory of guhas.
~friend refers to the home directory of a user friend.

Listing Files and Directories with `ls`

You can use ls (with its various options) to list details about one or more files or directories on the system. Use ls to generate list of files and directories in different orders, such as order by name and order by time. It is possible to list only certain details about files and directories--for example, only the filename. You will learn more about the options of ls command in Chapter 5, "General Commands," but here are some examples that give insight into the details of a file or directory UNIX system stores. For example, executing the command in the current directory /u/guhas

ls -l

shows the following:

-rwxrwxrwx   1 guhas    staff       7161 May  8 15:35 example.c
drwxrwxrwx   3 guhas    staff       1536 Oct 19 00:54 exe
-rw-r--r--   2 guhas    staff         10 Nov  3 14:28 file1
-rw-r--r--   2 guhas    staff         10 Nov  3 14:28 file112

The details about a file include

Permission attributes of the file
Number of links
User
Group of the user who created the file
Size of the file date and time when the file was last modified
Name of the file

The previous example shows that the current directory has a directory called exe and three files--example.c, file1, and file2. For the directory exe, the number of links shown is three, which can be counted as: one to the parent directory /u/guhas, one that is the directory entry exe itself, and one more that is to a sub-directory under exe. The number of links for the file example.c is one, because that file does not have any hard links. The number of links in files file1 and file2 are two, because they are linked using a hard link.

As mentioned, you should be careful about hidden files. You will not know they exist if you do not use the option -a for ls. In the previous example, if you use

ls -al

you will see two more entries, . (single period) and .. (two consecutive periods), which are the directory and the parent directory entries.

In the previous examples, the first character before the permissions (for example, d in drwxrwx---) provides information about the type of the file. The file-type values are as follows:

d: The entry is a directory.
b: The entry is a block special file.
c: The entry is a character special file.
l: The entry is a symbolic link.
p: The entry is a first-in, first-out (FIFO) special file.
s The entry is a local socket.
-: The entry is an regular file.

Creating and Deleting Directories: `mkdir` and `rmdir`

When you are set up as a user in a UNIX operating system, you usually are set up with the directory /u/username as your home directory. You will need to organize your directory structure. As with the files, you can use relative or absolute pathnames to create a directory. If your current directory is /u/guhas,

mkdir temp

will create a sub-directory called temp under the directory temp whose absolute pathname is /u/guhas/temp.

mkdir /u/guhas/temp

can also be used to have the same effect as the previous one.

mkdir ../temp

can be used to create the directory /u/temp. This example uses .. (two consecutive periods) as part of the relative pathname to indicate that the directory temp will be created under the directory one level up, which is /u. Using mkdir, it is possible to create more than one directory at a time. For example, from you current directory, issue the following command:

mkdir testdir1 /u/guhas/temp/testdir2

which will create testdir1 in the current directory and testdir2 under /u/guhas/temp (assuming it exists). In this example, testdir1 uses a relative pathname, and /u/guhas/temp/testdir2 uses an absolute pathname.

If the directory is already present, UNIX will display an error stating that the directory already exists.

To create a directory, you must have write permission to the parent directory in which you are creating the subdirectory, and the parent directory must exist. However, many UNIX systems provide an option -p with mkdir so that the parent directory is also created if it does not already exist.

After you are finished using a directory or you run out of space and want to remove a directory, you can use the command rmdir to remove the directory.

If your current directory is /u/guhas and directory temp is under it, to remove the directory temp, use the command

rmdir temp

When you execute this command, you might get an error message stating Directory temp is not empty, which means that you still have files and directories under temp directory. You can remove a directory only if it is empty (all the files and directories under it have been removed). As with mkdir, it is possible to specify multiple directory names as part of the rmdir command. You cannot delete files using the rmdir command. For deleting files you will need to use the rm command instead.

Using the `find` Command

If you are working on multiple projects at the same time, it might not be possible for you to remember all the details about the various files you are working with. The find command comes to your rescue. The basic function of the find command is to find the filename or directory with specified characteristics in the directory tree specified.

The most basic form of the find command is

find . -print

There are a number of arguments you can specify with the find command for different attributes of files and directories. You will learn more about these arguments and their usage in Chapter 5, but here are some examples of these arguments:

name: Finds files with certain naming conventions in the directory structure
modify date: Finds files that have been modified during the specified duration
access date: Locates files that have been accessed during the specified duration
permission: Locates files with certain permission settings
user: Locates files that have specified ownership
group: Locates files that are owned by specified group
size: Locates files with specified size
type: Locates a certain type of file

Using the find command, it is possible to locate files and directories that match or do not match multiple conditions, for example:

a to have multiple conditions ANDed
o to have multiple conditions ORed
! to negate a condition
expression to satisfy any complex condition

The find command has another group of arguments used for specifying the action to be taken on the files or directories that are found, for example:

print prints the names of the files on standard output.
exec command executes the specified command.

The most common reason for using the find command is to utilize its capability to recursively process the subdirectories.

NOTE: Always use the -print option of find command. If you do not, the find command will execute but will not generate any output. For example, to find all files that start with "t" in the current directory or sub-directories under that, you should use find . -name "t*" -print rather than find . -name "t*"

If you want to obtain a list of all files accessed in the last 24 hours, execute the following command:

find . -atime 0 -print

If the system administrator want a list of .profile (this file has special use while logging into UNIX system) used by all users, the following command should be executed:

find / -name .profile -print

You can also execute the find command with multiple conditions. If you wanted to find a list of files that have been modified in the last 24 hours and which has a permission of 777, you would execute the following command:

find . -perm 777  -a -mtime 0 -print

Reviewing Disk Utilization with `du` and `df`

Until now, this chapter has discussed files and directories but not the details of their physical locations and limitations. This section discusses the physical locations and limitations of the files and directories.

In UNIX, the files and directories reside on what are called file systems. File systems define the attributes of the physical devices on which the files reside. UNIX imposes restrictions on the file system size. A file cannot span across file systems; a file cannot exceed the size of a file system. A UNIX system will have multiple file systems, each of which have files and directories. To access files in a file system, a file system must be mounted. Another important concept is that of a network file system (NFS), which is used to access files on a physically different computer from the local computer. Similar to the local file system, NFS also must be mounted in order for you to access the files in it.

The command df is used to obtain the attributes of all or specified file systems in the system. Typically, the attributes displayed by the df command are as follows:

file system: Name of the file system
kbytes: Size of the file system in kilobytes
used: Amount of storage used
avail: Amount of storage still available
iused: Number of i-nodes used
capacity: Percentage of the total capacity used
%iused: Percentage of the available i-nodes already used
mounted on: The name of the top-level directory

If you are in your home directory and execute the following command

df .

which returns the following

File system    Total KB    free %used   iused %iused Mounted on
/dev/hd1        151552   41828   72%    5534    14% /u

it means that the your home directory is on a file system called /dev/hd1 and the top-level directory in the file system is called /u. For this example, you will get the same result regardless of what your current directory is, as long as you are in a directory whose absolute pathname starts with /u.

You can execute the df command without any arguments to obtain a list of all the file systems in your system and their attributes. You can provide an absolute or a relative pathname for a directory to find out the file system attributes of the file system to which it belongs.

The du command displays the number of blocks for files and directories specified by the file and directory arguments and, recursively, for all directories within the specified directory argument.

You can execute the following command from your current directory:

du or du .

and obtain the following result

8        .

which means that the file system on which the current directory is present has only the current directory in it and it has taken up eight blocks. If there were more directories in that file system, all of them and their sizes would have been displayed.

Determining the Nature of a File's Contents with `file`

The command file can be used to determine the type of the file the specified file is. The file command actually reads through the file and performs a series of tests to determine the type of the file. The command then displays the output in standard output.

If a file appears to be ASCII, the file command examines the first 512 bytes and tries to determine the language. If a file does not appear to be ASCII, the file command further attempts to distinguish a binary data file from a text file that contains extended characters.

If the file argument specifies an executable or object module file and the version number is greater than 0, the file command displays the version stamp.

The file command uses the /etc/magic file to identify files that have some sort of a magic number--that is, any file containing a numeric or string constant that indicates type.

For example, if you have a file called letter in you current directory and it contains a letter to your friend, executing the command

file letter

will display the following result:

letter:  commands text

If you have a file called prog and it is a executable program (and you are working on IBM RISC 6000 AIX version 3.1), executing the command

file prog

displays the following result:

prog:        executable (RISC System/6000 V3.1)

If you are in the /dev directory, which contains all the special files, executing the command

file hd1

for a file called hd1 (a disk on which a file system has been defined) will display the following result:

hd1:            block special

You will learn more about the options to be used with the file command in Chapter 5.

File and Directory Permissions

Earlier in this chapter, you saw that the ls command with the option -al displayed the permissions associated with a file or a directory. The permissions associated with a file or a directory tell who can or cannot access the file or directory, and what the user can or cannot do.

In UNIX, each user is identified with a unique login id. Additionally, multiple users can be grouped and associated with a group. A user can belong to one or more of these groups. However, a user belongs to one primary group. All other groups to which a user belongs are called secondary groups. The user login id is defined in the /etc/passwd file, and the user group is defined in /usr/group file. The file and directory permissions in UNIX are based on the user and group.

All the permissions associated with a file or a directory have three types of permissions:

Permissions for the owner: This identifies the operations the owner of the file or the directory can perform on the file or the directory
Permissions for the group: This identifies the operations that can be performed by any user belonging to the same group as the owner of the file or the directory.
Permissions for world: This identifies the operations everybody else (other than the owner and members of the group to which the owner belongs) can do.

Using the permission attributes of a file or directory, a user can selectively provide access to users belonging to a particular group and users not belonging to a particular group. UNIX checks on the permissions in the order of owner, group, and other (world)--and the first permission that is applicable to the current user is used.

Here is an example of a file called testfile in the current directory, created by a user called guhas belonging to a group called staff. The file is set up so that only the user guhas can read, modify, or delete the file; users belonging to the group can read it, but nobody outside the group can access it. Executing the following command from current directory

ls -al testfile

displays the permissions of the file testfile:

-rw-r-----    1 guhas    staff         2031  Nov 04 06:14 testfile

You should be careful when setting up permissions for a directory. If a directory has read permissions only, you might be able to obtain a list of the files in the directory, but you will be prevented from doing any operations on the files in that directory.

For example, if you have a directory called testdir in the current directory, which contains a file called testfile, and the group permissions for testdir is read-only, executing the following command

ls testdir

will display the result

testfile

However, if you want to see the content of the file testfile using the following command:

cat testdir/testfile

you will get the following error message:

cat: testdir/testfile permission denied

To perform any operation on testfile in testdir, you must have the execute permission for testdir.

If you want all the members in your group to know the names of the files in a particular directory but do not want to provide any access to those files, you should set up the directory using only read permission.

The owner of a file is determined by the user who creates the file. The group to which the file belongs is dependent on which UNIX system you are working on. In some cases, the group is determined by the current directory. In other cases, you might be able to change to one of you secondary groups (by using the newgrp command) and then create a file or directory belonging to that group.

Similarly, if you set up a directory with just execute permission for the group, all members of the group can access the directory. However, without read permission, the members of the group cannot obtain a list of directories or files in it. However, if someone knows the name of a particular file within the directory, he or she can access the file with the file's absolute pathname.

For example, let us assume that we have a sub-directory testdir under /u/guhas that has a file called testfile. Let us assume the sub-directory testdir has been set up with 710 permission (that is execute permission for the group). In such a case, if a member of the group executes the ls command on testdir, the following will be the result

ls -l testdir 
testdir unreadable
total 0

while if someone is aware of the file testfile and executes the following command

ls -l testdir/testfile 
-rw-r--r--   1 guhas    staff         23 Jul  8 01:48 testdir/testfile

then he or she will get all the information about the file testfile.

In UNIX, there is a special user who has blanket permission to read, write and execute all files in the system regardless of the owner of the files and directories. This user is known as root.

The Permission Bits

You know that files and directories have owners and groups associated with them. The following are three set of permissions associated with a file or directory:

Owner permission
Group permission
World (other) permission

For each of these three types for permissions there are three permission bits associated. The following is a list of these permission bits and their meanings for files:

Read (r): The file can be read.
Write (w): The file can be modified, deleted, and renamed.
Execute (x): The file can be executed.

The following is a list of these permissions and their meanings for directories:

Read (r): The directory can be read.
Write (w): The directory can be updated, deleted, and renamed.
Execute (x): Operations may be performed on the files in the directory. This bit is also called the search bit, because execute permission in a directory is not used to indicate whether a directory can be executed or not but to indicate whether you are permitted to search files under the directory.

Let us examine the directory permissions more closely. Suppose there is a sub-directory called testdir under the directory /u/guhas with the following permissions:

drwxrws---   3 guhas    staff       1536 Nov  4 06:00 testdir

Also a file called testfile is in the directory testdir with the following permission:

-rwxr-----   1 guhas    staff       2000 Nov  4 06:10 testfile

This means that the user guhas can read, modify, and rename the directory and files within the directory. Any member of the group staff also has access to the directory. The file testfile is set up with read permissions only for all members of group staff. However, because all members of staff have read, write, and execute permissions on testdir, anyone belonging to group staff may modify, delete, and rename the file testfile.

CAUTION: If a user has write permissions to a directory containing a file, the permissions of the files in that directory are overridden by permissions of the directory.

Permissions (for owners, groups, and others) are stored in the UNIX system in octal numbers. An octal number is stored in UNIX system using three bits so that each number can vary from 0 through 7. Following is how a octal number is stored:

Bit 1, value 0 or 1 (defines read permission)
Bit 2, value 0 or 1 (defines write permission)
Bit 3, value 0 or 1 (defines execute permission)

The first bit (read) has a weight of 4, the second bit (write) has a weight of 2, and the third bit (execute) has a weight of 1. For example, a value of 101 will be 5. (The value of binary 101 is (4 * 1) + (0 * 1) + (1 * 1) = 5.)

Let us now examine how to use the octal number to define the permissions. For example, you might want to define the following permissions for the file testfile in the current directory:

Owner read, write, and execute
Group read and execute
Others--no access at all

This can be defined as (using binary arithmetic):

Owner 111 = 7
Group 101 = 5
Others 000 = 0

Thus, the permission of the file testfile is 750.

Some versions of UNIX provide an additional bit called the sticky bit as part of a directory permission. The purpose of the sticky bit is to allow only the owner of the directory, owner of the file, or the root user to delete and rename files.

The following is a convention for setting up permissions to directories and files. For private information, the permission should be set to 700. Only you will have read, write, and execute permissions on the directory or file.

If you want to make information public but you want to be the only one who can publish the information, set the permission to 755. Nobody else will have write access, and nobody else will be able to update the file or directory.

If you do not want the information to be accessed by anybody other than you or your group, set the permission for other 0. The permission may be 770 or 750.

The following is an example of where you can set up permissions to deny permissions to a particular group. Assume that there is a directory called testdir in the current directory owned by a group called outsider. If you execute the following command in the current directory, the group outsider will not be able to perform any function on the directory testdir:

chmod  705 testdir

Default Permissions: `umask`

When a user logs into a UNIX system, she is provided with a default permission. All the files and directories the user creates will have the permissions defined in umask.

You can find out what the default permissions you have by executing the following command:

umask

It might display the following result:

umask is stored and displayed as a number to be subtracted from 777. 022 means that the default permissions are

777 - 022 = 755

That is, the owner can read, write, and execute; the group can read and execute; and all others can also read and execute.

The default umask, usually set for all users by the system administrator, may be modified to suit your needs. You can do that by executing the umask command with an argument, which is the mask you want. For example, if you want the default permissions to be owner with read, write, and execute (7); group with read and write (5); and others with only execute (1), umask must be set to 777 - 751 = 026. You would execute the command as follows:

umask 026

Changing Permissions: `chmod`

You have just seen how the default permissions can be set for files and directories. There might be times when you will want to modify the existing permissions of a file or directory to suit your needs. The reason for changing permissions might be that you want to grant or deny access to one or more individuals. This can be done by using the chmod command.

With the chmod command, you specify the new permissions you want on the file or directory. The new permissions can be specified using one the following two ways:

In a three-digit, numeric octal code
In symbolic mode

You are already familiar with the octal mode. If you wanted the file testfile to allow the owner to read, write, and execute; the group to read; and others to execute, you would need to execute the following command:

chmod 741 testfile

When using symbolic mode, specify the following:

Whose (owner, group, or others) permissions you want to change
What (+ to add, - to subtract, = to equal) operation you want to perform on the permission
The permission (r, w, x)

Assuming that the current permission of testfile is 740 (the group has read-only permission), you can execute the following command to modify the permissions of testfile so that the group has write permissions also:

chmod g+w testfile

Another example of symbolic mode is when you want others to have the same permissions as the group for a file called testfile. You can execute the following command:

chmod o=g testfile

Another example of symbolic mode is when you want to modify the permissions of the group as well as the world. You can execute the following command to add write permission for the group and eliminate write permission for the world:

chmod  g+w, o-w testfile

Changing Owner and Group: `chown and chgrp`

If you wanted to change the owner of a file or directory, you could use the chown command.

CAUTION: On UNIX systems with disk quotas, only the root user may change the owner of a file or directory.

If the file testfile is owned by user guhas, to change the ownership of the file to a user friend, you would need to execute the following command:

chown friend testfile

If you wanted to change the group to which file belongs, you may use the chgrp command. The group must be one of the groups to which the owner belongs. That is, the group must be either the primary group or one of the secondary groups of the owner. Let us assume that user guhas owns the file testfile and the group of the file is staff. Also assume that guhas belongs to the groups staff and devt. To change the owner of testfile from staff to devt, execute the following command:

chgrp devt testfile

`Setuid` and `Setgid`

When you execute some programs, it becomes necessary to assume the identity of a different user or group. It is possible in UNIX to set the SET USER ID(setuid) bit of an executable so that when you execute it, you will assume the identity of the user who owns the executable. For example, if you are executing a file called testpgm, which is owned by specialuser, for the duration of the execution of the program you will assume the identity of specialuser. In a similar manner, if SET GROUP ID(setgid) of a executable file is set, executing that file will result in you assuming the identity of the group that owns the file during the duration of execution of the program.

Here is an example of how the SET USER ID bit is used. Suppose you wanted a backup of all the files in the system to be done by a nightshift operator. This usually is done by the root user. Create a copy of the backup program with the SET USER ID bit set. Then, the nightshift operator can execute this program and assume the identity of the root user during the duration of the backup.

Summary

This chapter discussed what files are, the various types of files, and how to organize the files in different directories. You learned how to define permissions on files and directories. You also saw some of the commands used on files, such as ls to list files and its details, chmod to change permissions on files and directories, chown to change the ownership of a file or directory, chgrp to change the group ownership of a file, umask to display and change default permission settings for the user, and du or df to find out about utilization of disk space. You will learn more about various UNIX commands in Chapter 5.

Character	Meaning
`$`	Indicates the beginning of a shell variable name. For example, `$var` will look for a shell variable named `var`.
`\|`	Pipes standard output to next command.
`#`	Start a comment.
`&`	Executes a process in the background.
`?`	Matches one character.
`*`	Matches none or more characters.
`$#`	Number of arguments passed to a shell script.
`$*`	Arguments passed to a shell script.
`$?`	Returns code from the previous executed command.
`$$`	Process identification number.
`>`	Output redirection operator.
`<`	Input redirection operator.
`'` (backquote)	Command substitution.
`>>`	Output redirection operator (to append to a file).
`[ ]`	Lists a range of characters. `[a-z]` means all characters `a` through `z`. `[a,z]` means characters `a` or `z`.
`. filename`	Executes the file `filename`
`:`	Directory name separator in the path.

Directory name	Details about the directory
`/`	Root directory. This is the parent of all the directories and files in the UNIX file system.
`/bin`	Command-line executable directory. This directory contains all the UNIX native command executables.
`/dev`	Device directory containing special files for character- and block-oriented devices such as printers and keyboards. A file called `null` existing in this directory is called the bit bucket and can be used to redirect output to nowhere.
`/etc`	System configuration files and executable directory. Most of the administrative, command-related files are stored here.
`/lib`	The library files for various programming languages such as C are stored in this directory.
`/lost+found`	This directory contains the in-process files if the system shuts down abnormally. The system uses this directory to recover these files. There is one `lost+found` directory in all disk partitions.
`/u`	Conventionally, all the user home directories are defined under this directory.
`/usr`	This directory has a number of subdirectories (such as `adm`, `bin`, `etc`, and `include`. For example, `/usr/include` has various header files for the C programming language.