Expressions

Expressions consist of sequences of operators and operands that specifies the computation of a value, designates an object or function, and/or produces side effects. There are essentially three important components of expressions; these are the operators themselves, the implicit conversions that expressions undergo, and the rules surrounding the order of evaluation and sequencing of side-effects.

The basic concepts of expressions should be familiar to students in this class. What follows is a basic summary of the syntax and use of expressions in C. Additional required readings at the end of this module cover this topic in greater depth.

Constants and Literals

A constant is a scalar value, while a literal is a representation of an object. For example, 42 is a constant integer, while "Hello world!" is a string literal which represents an array of thirteen characters (char[13]).

Integer Constants

An integer constant begins with a digit, and there are three recognized bases: base 10–decimal, base 8–octal, and base 16–hexadecimal.

If the first digit is non-zero, the constant is interpreted in base 10. For example, 42 is equal to 42.

If the first character is “0”, followed by a digit, then the constant is interpreted in base 8. For example, 052 is equal to ((5 \times 8^1) + (2 \times 8^0)) = 40 + 2 = 42.

If the first two characters are “0x”, then the remaining digits are interpreted in base 16, using the (case-insensitive) letters ‘a’ through ‘f’ to represent values 10 through 15. For example, 0x2a is equal to ((2 \times 16^1) + (10 \times 16^0)) = 32 + 10 = 42.

An additional “u” suffix may be added to specify an unsigned type, and an “l” or “ll” suffix to specify a long or long long integer type. These suffixes may also be capitalized–“U”, “L”, and “LL”. The actual type used, for a given suffix combination, is the smallest type capable of holding the specified value listed in this table in the C standards.

Floating Constants

Floating constants are recognized by the presence of a decimal “.” or an exponent part with the letter ‘e’. For example, 1.2 and 1e10 are both recognized as floating point values. See Floating Constants for more information. As mentioned earlier, floating point arithmetic will not be covered in this class.

Enumeration Constants

These were previously described–an enumeration constant is an identifier that has been declared in an enumeration declaration, and has type int.

Character Constants

A character constant is enclosed in single quotes, as in 'a', and has type int–not char. Character constants are mapped in an implementation-defined manner to values in the execution character set. Typically, the ASCII character encoding is used to map character constants to integer values, which can be referenced via the ASCII(1) man page. Many modern compilers also support unicode UTF-8 character constants, as well.

As with most programming languages, several well-known escape sequences for special characters are recognized, as described in the C standards.

String Literals

String literals are sequences of characters enclosed in double quotes, as in "Hello World!". During compilation, a string literal has a null-byte appended to it, and the result is stored in a static character array. The string literal is an lvalue reference to that array and has type char[n], where n is the number of characters, including the null-byte.

The arrays holding string literals are immutable–modifying them produces undefined behavior. It is also unspecified whether they overlap. For example, "Hello World" and "World" may point into different locations in the same underlying character array, so that the expression "World" == "Hello World" + 6 might or might not evaluate as true depending on the compiler and any optimizations. Additionally, multiple string literals with the same contents may or may not refer to the same array.

Recall that, as a special case, when initializing character arrays with a string literal–e.g char s[] = "Hello World"–the string literal is a shorthand for a brace-enclosed initializer list with an implied terminating null byte–e.g. char s[] = {'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '0'}. No string literal object is created in this context, which is explicitly different from the behavior of string literals in all other contexts, and often a source of confusion for beginners.

Compound Literals

C supports one other type of literal called a compound, or object, literal. Compound literals represent unnamed (anonymous) objects. A compound literal names a type, enclosed in parentheses, followed by a brace-enclosed initializer list; for example, (int[]){1,2,3,4}, which evaluates to type int[4] and contains the initialized values. Compound literals can also be structs and unions, such as (struct coordinates){.x = 3, .y = 5}.

The contents of a compound literal are modifiable, unless the type is const-qualified, and the lifetime of a compound object is the same as it would be had the object in question been explicitly declared in the same location.

Operators

Basic Operators

These basic operators are available in most programming languages with consistent behavior. Familiarity with each of the operators mentioned in this section is an expectation of this class; remedial self-study may be required for some students.

Scalar Arithmetic

+a

Unary addition

-a

Unary subtraction

a + b

Addition

a - b

Subtraction

a * b

Multiplication

a / b

Division

a % b

Remainder

Bitwise operators

~a

Bitwise not

a & b

Bitwise and

a | b

Bitwise or

a ^ b

Bitwise xor

a << b

Bitwise shift left

a >> b

Bitwise shift right

Code Smell

These operators give inconsistent results, or produce undefined or implementation defined behavior, when supplied with operands that are negative values. In general, they should only ever be used with unsigned types, or used very carefully with signed values which are provably non-negative.

Assignment Operators

a = b

Assignment

a += b

Addition assignment

a -= b

Subtraction assignment

a *= b

Multiplication assignment

a /= b

Division assignment

a %= b

Remainder assignment

a &= b

Bitwise and assignment

a |= b

Bitwise or assignment

a ^= b

Bitwise xor assignment

a <<= b

Bitwise shift left assignment

a >>= b

Bitwise shift right assignment

For all of the assignment operators of the form @=, where @ is a placeholder for an operator, a @= b is equivalent to a = a @ b except that a is only evaluated once–this may be relevant if the evaluation of a produces side-effects.

The result of an assignment operator is whatever the right-hand side evaluates to, so assignment expressions can be embedded in larger expressions. For example, a = b = c = 42 assigns c = 42, then b = 42, then a = 42.

Boolean Operators

!a

Logical not

a || b

Logical or

a && b

Logical and

a == b

Equality

a != b

Inequality

a < b

Less than

a > b

Greater than

a <= b

Less than or equal

a >= b

Greater than or equal

All of these operators evaluate to an int value of either 1 or 0.

The operators || and && perform short-circuit evaluation: The left-hand side is evaluated first; the right-hand side is then evaluated only if its outcome affects the overall result of the expression. Thus, the order of the operands is significant, affecting both program behavior and performance.

Warning

Signed integer overflow is undefined behavior. This can occur with the addition, subtraction, multiplication, division, and left shift operators. On the other hand, unsigned integer overflow simply wraps around so that UINT_MAX + 1 equals 0, and converting -1 to an unsigned type results in the largest value that type can hold.

C-specific Operators

Member access operators

a[b]

Array subscript

*a

Pointer dereference

&a

Address of

a.b

Member access

a->b

Member access through pointer

The array subscript operator a[b] is exactly equivalent to *(a + b).

The pointer dereference operator *a designates the object that a points to. The operand a must be a pointer type, T*, and the result is an lvalue of type T. It is undefined behavior to dereference a pointer that does not point at a valid object.

The address of operator &a evaluates to the address of a. If a is type T, the result is type T*.

The member access operator a.b designates the member b of the structure or union a, while a->b designates the member b of the structure or union pointed at by a. The a->b operator is exactly equivalent to (*a).b.

Other operators

a(...)

Function call operator

a, b

Comma operator

(type) a

Cast operator

a ? b : c

Ternary operator

sizeof

Size of operator

The function call operator a(...) requires a be a type pointer to function, and its enclosed arguments must be convertible to the types of the parameters of a.

The comma operator a, b evaluates a, discards its result, then evaluates b. The comma operator is used rarely.

The cast operator (type) a converts a to type T. The result of the conversion is described below.

The ternary operator a ? b : c evaluates a. If true, the expression evaluates b, otherwise c. Equivalent to python’s c if a or b ternary operator, with the operands in a different order.

The size of operator sizeof evaluates to the size of a particular type or object, and there are two forms:

  • For a given type, T, sizeof (T) evaluates to the size, in bytes, of T.

  • For an expression, e, which evaluates to a value of type T, sizeof e evaluates to the size, in bytes, of T.

Code Smell

Avoid using the sizeof (T) form, where T is a typename. There is usually an expression that could be used instead. This way, if the type of the expression were ever to change, each sizeof operand referring to it would be automatically updated. A common example is when dynamically allocating an array of objects with malloc, as in,

T *my_array;
/* ... */
my_array = malloc(sizeof *my_array * array_size);

which automatically updates the size expression if the type of my_array changes, vs. my_array = malloc(sizeof (T) * ARRAY_SIZE), which does not. This is an example of a programming principle called DRY–Don’t Repeat Yourself.

Implicit Conversions

Implicit type conversion is performed automatically without requiring an explicit cast. There are three basic types of implicit conversions that occur,

Type Coercion

Type coercion refers to the process of converting the types of each operand of an expression into appropriate types to carry out each operation. In C, type coercion occurs when assigning, passing, and initializing where the value type differs from the target type, and when performing arithmetic with operands of different types.

Type Promotion

When integer values with an integer type with rank less than or equal to int are evaluated, they are usually implicitly converted to a value of type int or unsigned int. If int can represent the entire range of values of the original type, the value is converted to int; otherwise, unsigned int is used.

Implementation Detail

This is a reflection of the physical notion of int being the native word size of the target architecture. Promoting values to the native word size ensures that calculations are performed as efficiently as possible. Additionally, some systems may be physically incapable of performing certain operations on data types smaller than int.

Type Decay

Type decay refers to the process by which certain types are automatically converted to another type with some loss of information. In C, array types automatically decay into pointers to their first elements, function types automatically decay into pointers to the functions themselves, and lvalues automatically decay into rvalues. There are certain contexts where decay does not occur, such as when taking the address of an object or when using the sizeof operator on an array.

Order of Evaluation

Within an expression, it is important to understand the order in which actual sub-expressions are evaluated. In C, a sequence point represents a settled state where all objects have well-defined values and no calculations are in progress. Between any two sequence points, there may be multiple sub-expressions that are evaluated in an indeterminate order.

In general, there is a sequence point before a function is called, after each full expression, and after each declarator. Additionally, there is a sequence point after the left-hand operand of the ||, &&, ?:, and , operators is evaluated.

If an object is modified and separately accessed between sequence points, the behavior is undefined. Common examples of this are,

f(i, i++);
i = ++i;