Files
There are two basic concepts in Unix – processes and files. The processes do things and files keep all the important data. An efficient filesystem is important for an operating system. When Unix was conceived around 1969-70, several design decisions were taken to simplify the filesystem. It was thought that if something was simple it would be efficient and also it would provide a strong foundation for software development and operations.
Another important decision taken in design of Unix was the generalization that anything with which I/O was done was a file. So, regular files, directories, devices, interprocess communication mechanisms like pipes, fifos, sockets all are files. All files can be accessed with a file descriptor in a program. It also led to simplification of commands.
Table of Contents
1.0 Disk, Partitions and Filesystems
The hard disk provides the medium on which data in files can be stored. Data in files persists even when the system is switched off. There may be multiple disks and similar secondary storage media. Each disk my have one or more partitions. Each partition can have a filesystem. There is a root filesystem, which is the filesystem for the system. When the system is powered on, it is mounted and is available. The root filesystem has the root directory which is the starting point for traversing the tree structured filesystem. More filesystems can be mounted at directory nodes in the root filesystem. Once mounted, the files in the mounted filesystem appear to be part of the filesystem tree. The filesystem tree looks like this.
2.0 Inodes
Although a filesystem appears to be a tree with files at its nodes and some nodes may be directories, which in turn may be sub-trees, the data structure implemented in a filesystems is not a tree and is quite involved. The most important thing is that a file does not have a name. It is identified by an i-number, which is the index into a table of inodes at the beginning of a filesystem. An inode for a file contains all the control information for the file. It contains, file type, permissions, the owner and group ids for the file, file size, the number of links to the file, the creation and last update and access timestamps, inline file data, and direct and indirect links to the blocks of data contained in the file.
3.0 File Types
There are many kinds of files under Linux. The major file types are regular files, directories, symbolic links, special files, named pipes and Unix domain sockets.
3.1 Regular Files
A file is a sequence of bytes. The operating system does not put any special bytes inside a file. At the time of introduction of Unix, it was customary for computers to have records inside a file. And there would be control data for each record. In Unix, data inside a file is only put in by the concerned programs. The operating system does not write any control data inside a file. As another example, some systems put carriage return (CR) and line feed (LF) control characters between lines so that it can be displayed or printed on devices. Unix just puts an LF character when the user presses ENTER to indicate a new line. When the file is being displayed, the device driver puts in CR before every LF so that the file is displayed or printed correctly. Once again, this conforms to the basic philosophy that file contents should be exactly what the user (or program) put in. If, for display on a device something more is required, it should be done by the device driver at the time of output to the device.
3.2 Directories
It is not necessary to remember inode numbers to access file because of directories. A directory is a special file. It is conceptually a two column table, mapping a file name to an inode number. The combination “filename – inode number” is called a link. These are “hard” links and can only be made to files on the same filesystem. The number of such links is kept in the inode data structure. When the number of links becomes zero and the file is not being opened by any process, it is discarded.
A file appears in at least one directory. Each row in a directory can be for a file or another directory. This leads to a tree-like impression of the filesystem, with the root directory (/), at the top. Also, it means that a single inode can appear with different file names in different directories.
3.3 Symbolic Links
Each filesystem has its own inodes. So a “hard” link to an inode (and the file) can only be made in the filesystem in which that inode is present. To make it possible to link to a file present in another filesystem, symbolic links were introduced. Symbolic links contain the actual file path as the data.
3.4 Special Files
Special files are devices like the hard disk or cdrom. These are mostly present in the /dev directory. There are two types of special files – block devices and character devices. On block devices, data can only be written or read in blocks. There is no such restriction on character devices and even small amount of data can be read or written. Data is cached in buffers in the kernel for block devices. Also, block devices need to be random access. Only filesystems on block devices can be mounted.
3.5 Named Pipes
Fifos, or named pipes, are used for interprocess communication. Fifos behave just like pipes, except that they appear in the file system and can be opened, read and written by a process having the permissions to do so. The standard open, read and write calls for files work on Fifos.
3.6 Unix Domain Sockets
Unix domain sockets are also used for inter process communication. The calls used are the same as that for networking sockets. The domain sockets are fast and read and write calls can be used sending and receiving data.
4.0 File handling commands
4.1 pwd
The pwd command prints the current working directory.
$ pwd /home/user1/src
4.2 ls
The ls command lists files.
$ ls -ls total 60 4 drwxr-xr-x 2 user1 user1 4096 Mar 21 20:04 dbus 28 -rw-rw-r-- 1 user1 user1 25642 Apr 15 21:00 shell.c 4 drwxr-xr-x 5 user1 user1 4096 Feb 29 18:53 socket 4 drwxr-xr-x 2 user1 user1 4096 Jan 12 12:17 threads 4 drwxr-xr-x 2 user1 user1 4096 Apr 19 02:17 time 12 -rwxr-xr-x 1 user1 user1 8296 Apr 8 09:05 try 4 -rw-r--r-- 1 user1 user1 186 Apr 8 09:05 try.c
The -i option prints the inode number for the file.
$ ls -lsi total 60 4063680 4 drwxr-xr-x 2 user1 user1 4096 Mar 21 20:04 dbus 52167051 28 -rw-rw-r-- 1 user1 user1 25642 Apr 15 21:00 shell.c 4063368 4 drwxr-xr-x 5 user1 user1 4096 Feb 29 18:53 socket 52824390 4 drwxr-xr-x 2 user1 user1 4096 Jan 12 12:17 threads 4195245 4 drwxr-xr-x 2 user1 user1 4096 Apr 19 02:17 time 52824007 12 -rwxr-xr-x 1 user1 user1 8296 Apr 8 09:05 try 52824001 4 -rw-r--r-- 1 user1 user1 186 Apr 8 09:05 try.c
4.3 file
The file command prints the type of a file.
$ file * acpid.pid: ASCII text acpid.socket: socket alsa: directory avahi-daemon: directory boltd: directory crond.reboot: empty initctl: symbolic link to /run/systemd/initctl/fifo initramfs: directory lock: sticky, directory log: directory ntpd.pid: ASCII text, with no line terminators sendsigs.omit.d: directory snapd-snap.socket: socket snapd.socket: socket spice-vdagentd: directory
4.4 du
The du command estimates the disk usage of files and recursively for directories.
$ du -h 56K ./socket/tcp 52K ./socket/udp 68K ./socket/select 180K ./socket 72K ./time 24K ./threads 20K ./dbus 344K .
4.5 df
The df command tells the free space available on mounted filesystems.
$ df -h Filesystem Size Used Avail Use% Mounted on udev 3.9G 0 3.9G 0% /dev tmpfs 784M 2.0M 782M 1% /run /dev/sda3 92G 16G 71G 19% / tmpfs 3.9G 80M 3.8G 3% /dev/shm tmpfs 5.0M 4.0K 5.0M 1% /run/lock tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/loop3 15M 15M 0 100% /snap/gnome-characters/399 ... ... ... ... ... .... /dev/sda4 801G 81G 679G 11% /home tmpfs 784M 20K 784M 1% /run/user/121 tmpfs 784M 40K 784M 1% /run/user/1000 /dev/loop22 291M 291M 0 100% /snap/vlc/1620
4.6 cat
The cat command prints the file, passed as argument, on the terminal.
$ cat hello.c #include <stdio.h> #include <string.h> int main (int argc, char *argv[]) { printf ("Hello, World!\n"); }
4.7 hexdump
cat is fine for text files, but if you have a binary data file and wish to know its contents, you can try the hexdump command.
$ hexdump -cx hello 0000000 177 E L F 002 001 001 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000000 457f 464c 0102 0001 0000 0000 0000 0000 0000010 003 \0 > \0 001 \0 \0 \0 0 005 \0 \0 \0 \0 \0 \0 0000010 0003 003e 0001 0000 0530 0000 0000 0000 0000020 @ \0 \0 \0 \0 \0 \0 \0 0 031 \0 \0 \0 \0 \0 \0 0000020 0040 0000 0000 0000 1930 0000 0000 0000 0000030 \0 \0 \0 \0 @ \0 8 \0 \t \0 @ \0 035 \0 034 \0 0000030 0000 0000 0040 0038 0009 0040 001d 001c 0000040 006 \0 \0 \0 004 \0 \0 \0 @ \0 \0 \0 \0 \0 \0 \0 0000040 0006 0000 0004 0000 0040 0000 0000 0000 0000050 @ \0 \0 \0 \0 \0 \0 \0 @ \0 \0 \0 \0 \0 \0 \0 ...
The first line shows the contents as characters. The next line shows bytes in hexadecimal. The leftmost column is offset in the file.
4.8 Text Editor
Files are created using a text editor. There are many text editors under Linux, and the one people use is a matter of taste or preference. The popular text editors are Emacs, vi, ed, nano, gedit, etc.