Data is an important part of a program. In fact, programs are written so that data can be captured, processed, stored and presented to the user. The success of a program depends on how well data has been organized and used. In this post, we will be looking at data types and expressions in programming in C language.
Table of Contents
1.0 Data Type
All data has an underlying type. The number of persons in a room is an integer. It cannot be a fraction. On the other hand, the average rainfall received by a city in a year is a floating point number, because there are always some digits after the decimal point. Without further ado, we can list the basic data types in C language, which are,
Type | Description |
---|---|
char | character |
int | integer |
float | single precision floating point |
double | double precision floating point |
1.1 char
The basic type char is for storing characters. A char is basically an integer; it stores the integer code for a character. A character is stored in a byte.
The qualifier signed or unsigned can be added to char. An unsigned char variable can store a value in the range 0 through 255. A signed char variable can store values between -128 and 127 in 2s complement machines.
1.2 int
The basic type int is for storing integers. An int is normally 32 bits long. The qualifiers short and long can be applied to int. A short int is 16 bits long and a long int is 64 bits long. Normally “int” is omitted after short and long. That is “short x” means “short int x” and “long y” means “long int y”. A signed or unsigned qualifier may be used with int. The unsigned qualifier is more common and it is often used for bit masks.
1.3 float and double
The float, double and long double are single, double and extended precision floating point data types respectively. The float, double and long double occupy 4, 8 and 16 bytes of memory respectively. Of the three, double is mostly used, as it provides a balance between accuracy and economy of storage space.
2.0 Identifiers
Identifier names can be constructed with uppercase and lowercase alphabets, underscore and digits. The first character must be an alphabet. As a convention, symbolic constant names are made of uppercase characters, whereas variable names are made up of lower case alphabets. In both cases, digits and underscore may be used.
3.0 typedef
typedef defines a new name for an existing type. This helps in defining meaningful names for involved declarations. For example,
typedef unsigned char __uint8_t; typedef unsigned short int __uint16_t; typedef unsigned int __uint32_t; typedef unsigned long int __uint64_t; typedef __uint8_t uint8_t; typedef __uint16_t uint16_t; typedef __uint32_t uint32_t; typedef __uint64_t uint64_t;
which defines easy to remember 8, 16, 32 and 64-bit unsigned integers. However, it is not necessary to include these typedefs in your C program. You can include the file stdint.h, and the types uint8_t, uint16_t, etc. become available automatically. For example, consider the following program.
// try2.c #include <stdio.h> #include <string.h> #include <stdint.h> int main () { printf ("__WORDSIZE = %d bits\n", __WORDSIZE); printf ("sizeof (int) = %d bytes, sizeof (int *) = %d bytes\n", (int) sizeof (int), (int) sizeof (int *)); printf ("sizeof (uint8_t) = %d byte\n", (int) sizeof (uint8_t)); printf ("sizeof (uint16_t) = %d bytes\n", (int) sizeof (uint16_t)); printf ("sizeof (uint32_t) = %d bytes\n", (int) sizeof (uint32_t)); printf ("sizeof (uint64_t) = %d bytes\n", (int) sizeof (uint64_t)); }
After compiling and running the above program, we get following results.
$ gcc try2.c -o try2 $ ./try2 __WORDSIZE = 64 bits sizeof (int) = 4 bytes, sizeof (int *) = 8 bytes sizeof (uint8_t) = 1 byte sizeof (uint16_t) = 2 bytes sizeof (uint32_t) = 4 bytes sizeof (uint64_t) = 8 bytes
4.0 Enumeration
An enumeration is a series of some constants. The default value of the first constant is zero. The default value of a constant is the value of predecessor plus 1. The default value can be overridden my explicit assignment. For example, consider the enumeration,
enum Day {Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday};
here, the value of Sunday is zero and that of Saturday is 6. For example,
#include <stdio.h> #include <string.h> enum Day {Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday}; int main () { enum Day weekday = Saturday; printf ("weekday = %d\n", weekday); }
The above program prints weekday as 6. If we had defined the enumeration as
enum Day {Sunday = 1, Monday, Tuesday, Wednesday = 7, Thursday, Friday, Saturday};
The above program would have printed week day as 10 (for Saturday).
5.0 Boolean type
C does not have a boolean data type. However the integer type has provided for the boolean type. The value zero (0) is considered false and anything non-zero is true. Since 1 is non-zero, we can say 1 is true. We can provide the boolean type with an enumeration, as in the example below.
#include <stdio.h> #include <string.h> typedef enum {False, True} Boolean; int main () { Boolean over = False; int i = 0; printf ("over = %d\n", over); while (!over) { printf ("i = %d\n", i); i++; if (i > 2) over = True; } printf ("over = %d\n", over); }
And, after compilation and running the program, we get the following results.
$ gcc try4.c -o try $ ./try over = 0 i = 0 i = 1 i = 2 over = 1
For using the boolean type, it is not really necessary to define the typedef enum Boolean. With the C99 standard, C provides the type bool which can have the value false or true. To get this functionality, one has to include the file stdbool.h. Using the stdbool.h file,
#include <stdio.h> #include <string.h> #include <stdbool.h> int main () { bool over = false; int i = 0; printf ("over = %d\n", over); while (!over) { printf ("i = %d\n", i); i++; if (i > 2) over = true; } printf ("over = %d\n", over); }
And, the result is the same as before.
$ gcc try4.c -o try $ ./try over = 0 i = 0 i = 1 i = 2 over = 1
6.0 Declarations
All variables need to be declared before use. We have seen declaration such as,
int i; char buffer [20]; double radius;
There are two terms, declarations and definitions for variables. A definition for a variable introduces the name of the variable and sets aside storage for the variable. For example,
int num;
is a definition of variable num, as it introduces the name, num, and storage is set aside for it. But, there are cases, where the variable is defined in some other file and we just need its name for using it. For example,
extern int counter;
Here, the declaration, extern int counter, says that counter is an integer and is defined elsewhere. So no storage is set aside for it, only the name is introduced in this declaration. The name, counter, can now be used in expressions.
6.1 const qualifier
The const qualifier in a declaration specifies that the value of the variable would not be changed in that scope. For example,
const double pi = 3.14159265358979323844; const int scores [] = {34, 67, 98, 23};
If const is applied before an array, it means that the values of the array elements would not be changed.
7.0 Storage class
There are two properties associated with variable names, scope and lifetime. The scope of a variable name is the portion of the program in which the variable name is visible and can be used. The lifetime of a variable is the time the variable is “live”; the time during which a variable has valid memory and retains its value. Collectively, the scope and lifetime define the storage class of a variable. There are four storage classes: automatic, external, static and register.
7.1 Automatic
Automatic variables are defined inside functions. The storage for these variables is allocated on the stack. The variables can be defined using the keyword auto. However, auto is the default and is generally not mentioned. The scope and lifetime for automatic variables is the point of definition to the end of the block. Automatic variables are not initialized.
7.2 External
As opposed to automatic variables which are internal to functions, there are variables that are external to functions. These are global variables and are a part of the data segment. The scope of these variables is the point of definition to the end of program. However, if there is an automatic variable with the same name, the automatic variable gets precedence and the scope of global variable is obscured by the automatic variable bearing the same name. The lifetime of the external variables is the lifetime of the program. The external variables are initialized to zero at the time of definition. The difference between definition and declaration is most significant for external variables. An external variable is defined in one file. It is declared with the extern qualifier in other files of the program. Once declared, it can be used in expressions.
7.3 Static
We have seen that automatic variables come and go in functions. Once a function exits, the value of the automatic variables is lost. However, if a variable is declared with the static keyword, it retains its value between different function invocations. So their lifetime is that of the program. If an external variable or a function is declared static, it is only visible in the file of the definition. It can not be accessed from other files.
7.4 Register
Register variables are automatic variables. By the putting the keyword register in front of a variable, it is suggested to the compiler that the variable would be heavily used in calculations, and, so the compiler could place the variable in a register. The compiler is free to ignore the suggestion and place the variable where it deems fit. However, it is not possible to take the address of a variable declared with the register keyword and this true even when the variable is not stored in a register.
8.0 Operators
8.1 Arithmetic Operators
C has the four binary arithmetic operators, +, -, * and / for addition, subtraction, multiplication and division respectively. The precedence of * and / is higher than that of + and -. Binary arithmetic operators associate left to right. Then, there is the modulus operator, %, which gives the remainder of division of two integers. If x and y are two integers, x / y gives the integer quotient, in which the fraction has been truncated and x % y gives the remainder. If x is divisible by y, the remainder is zero. The modulus operator is not defined for float or double operands.
C also has the unary + and -. The precedence of unary + and – is higher than that of binary * and /. Unary + and – operators associate right to left.
8.2 Relational Operators
The relational operators are <, <=, >, >=, == and !=. The first four of these, that is, <, <=, > and >= have the same precedence. The other two, == and != have a lower precedence. The relational operators associate left to right and have a lower precedence as compared to binary arithmetic operators. A word of caution about the equality operator, ==. It is a common error to write the equality operator as =, which is obviously wrong as = is the assignment operator. This error is common and is difficult to debug.
8.3 Logical Operators
There are two logical operators, && and ||. These have a precedence less than that of relational operators. Of the two, the operator && has a higher precedence. The value of an expression involving relational and/or logical operators is zero if it is false and 1 if it is true. Expressions involving logical operators are evaluated left to right and the evaluation stops as soon as the truth or falsehood value of the expression is established. For example, consider the statement,
while (x < a || y < b || z < c) ...
In above example, if (x < a) evaluates as true, the other two conditions are not checked and the loop continues. If (x < a) evaluates as false, then (y < b) is checked. If (y < b) evaluates as true, the third condition is not checked and the loop continues. If both (x < a) and (y < b) evaluate as false then the third condition, (z < c) is checked and if it evaluates as true, the loop continues. If it evaluates as false, the loop terminates.
8.4 Unary negation Operator
The unary negation operator ! converts an operand with value non-zero to zero and zero to 1. This is quite useful in writing condition for while loops. For example,
int over = 0; while (over == 0) ...
can be written as,
int over = 0; while (!over) ...
which is more intuitive and sounds better.
9.0 Type conversion
In an expression, there may be implicit type conversions. The basic principle is that the “lower” type is promoted to the “higher” type and the expression is evaluated. For example, if there is a mix of integer and float, the integer is converted to float for evaluation. Or, if there is a mix of float and double, float is converted to double for evaluation. char types are treated as small integers; char is freely mixed with integers in expressions. As per the language specification, printable characters are guaranteed to be positive.
An important type conversion is type cast, that is, we force a variable to a particular type. For example, the function sqrt (double) expects a double argument and returns a double and we wish to find the square root of an integer. While passing the integer to the sqrt function, we type cast it to a double.
#include <stdio.h> #include <string.h> #include <math.h> int main () { int num = 99; double ret = sqrt ((double) num); printf ("square root = %f\n", ret); }
We can compile and run this program.
$ gcc try2.c -o try2 -lm $ ./try2 square root = 9.949874
In the above example, it is as if num is assigned to a variable of type double, which is passed to the sqrt function. The value of variable num is not affected.
10.0 Increment and decrement operators
C has increment (++) and decrement (–) operators. The increment operator adds 1 to its operand, while the decrement operator subtracts 1. The operator can be used as a prefix, that is, before the operand and, postfix, after the operand. For example,
// add 1 to i (prefix) ++i; // add 1 to i (postfix) i++; // subtract 1 from i (prefix) --i; // subtract 1 from i (postfix) i--;
So, what is the difference between prefix and postfix? In case of prefix, the value is incremented or decremented before use. In case of postfix, the value of the operand is incremented or decremented after use. If these operators are used in standalone mode, as in examples above, it does not matter, whether prefix or postfix mode is used. The effect is the same in both modes. But, consider the case,
k = a [j++];
First a [j] is assigned to k, and then, the index j is incremented. As another example, consider,
i = *++ptr;
which increments the pointer ptr and, then, the value pointed by ptr is assigned to i. But, if we use postfix increment,
i = *ptr++;
the value pointed by ptr is first assigned to i and, then, ptr is incremented.
11.0 Assignment operators
The assignment operator evaluates the expression on the right and the value is stored in the variable on the left. The left side of assignment operator must be a variable. Quite often, we find assignment statements such as,
i = i + 10;
In C, this can be written as,
i += 10;
which is more efficient in addition to being compact. In C,
var op= expr;
is a shorthand for
var = var op (expr);
The op can be any one of the binary arithmetic operator, +, -, *, /, %, and, also, any one of the binary bitwise operator, <<, >>, &, | and ^.
It is important to note the parentheses around expr. So
x *= y + 10;
means
x = x * (y + 10);
and, not,
x = x * y + 10;
12.0 Bitwise operators
C has following binary bitwise operators: &, for bitwise AND, |, for bitwise inclusive OR, ^, for bitwise exclusive OR, <<, for left shift and >>, for right shift. Also there is the unary ~ operator for 1s complement. These bitwise operators can be applied to all integer types, char, signed short, unsigned short, signed int, unsigned int, etc. The bitwise operators cannot be applied to float, double and long double types. We should be careful in using signed integers for bitwise operations as right shift operation fills in sign bits on the left, whereas the expectation might have been that of zero. So, by default, unsigned integers should be used for bitwise operations and signed integers should be used only when they are really required.
In C programming, it is best to use bitwise operators with unsigned integers. If you use signed operands for bitwise operations, the sign bit can bring in unexpected results. Also, bit masks are not a signed quantity. We often encounter flags, which are of type int and are bit masks. The individual bits these flags signify some setting value. For example, consider the open system call to create and open a new a file.
int open (const char *pathname, int flags, mode_t mode);
The third parameter, mode, is for file permissions of the newly created file. The bit mask for some the permissions are,
Mask | Value | Description |
---|---|---|
S_IRWXU | 00700 | The user (file owner) has read, write, and execute permissions. |
S_IRUSR | 00400 | The user has read permission. |
S_IWUSR | 00200 | The user has write permission |
S_IXUSR | 00100 | The user has execute permission |
S_IRWXG | 00070 | The group has read, write, and execute permissions. |
S_IRGRP | 00040 | The group has read permission. |
S_IWGRP | 00020 | The group has write permission. |
S_IXGRP | 00010 | The group has execute permission. |
S_IRWXO | 00007 | Others have read, write, and execute permissions |
S_IROTH | 00004 | Others have read permission. |
S_IWOTH | 00002 | Others have write permission. |
S_IXOTH | 00001 | Others have execute permission. |
We can set bits in the operand with the | operator. To reset bits, we use do an & operation of the operand with a bit mask having the relevant bits set as 0 and the rest as 1. For example,
mode_t mode; mode = 0; // Set read, write and execute permissions for user mode |= S_IRWXU; // Set read, write and execute permissions for group mode |= S_IRWXG; // Set read, write and execute permissions for others mode |= S_IRWXO; // remove read, write and execute permissions for others mode &= ~S_IRWXO; // Check whether read, write and execute bits are set for user if ((mode & S_IRWXU) == S_IRWXU) ....
Note the expression to check whether read, write and execute bits are set for user. A common error is to check whether (mode & S_IRWXU) is true. The correct expression is to first find (mode & S_IRWXU) and then check whether it is equal to S_IRWXU.
13.0 Conditional expressions
Consider the if statement,
if (a >= 0) x = a; else x = -a;
The if statement can be replaced with a conditional expression using the ternary operator, ?:
x = (a >= 0) ? a : -a;
The ternary operator combines three expressions
expr1 ? expr2 : expr3
First expr1 is evaluated. If it evaluates true (non-zero), expr2 is evaluated. Otherwise, expr3 is evaluated. If expr1 is true, expr2 is the value of the whole expression. Otherwise, expr3 is the value of the whole expression.
14.0 Precedence and associativity table
The following table gives the precedence and associativity rules for operators in C language.
Operator | Associativity |
---|---|
() [] -> . | left to right |
! ~ ++ — + – * & (type) sizeof | right to left |
* / % | left to right |
+ – | left to right |
<< >> | left to right |
< <= > >= | left to right |
== != | left to right |
& | left to right |
^ | left to right |
| | left to right |
&& | left to right |
|| | left to right |
?: | right to left |
= += -= *= /= %= &= ^= |= <<= >>= | right to left |
, | left to right |
15.0 Reference
Brian W. Kernighan and Dennis M. Ritchie, "The C Programming Language", Second Edition, Pearson, 1988.