Expressions
Expressions consist of sequences of operators and operands that specifies the computation of a value, designates an object or function, and/or produces side effects. There are essentially three important components of expressions; these are the operators themselves, the implicit conversions that expressions undergo, and the rules surrounding the order of evaluation and sequencing of side-effects.
The basic concepts of expressions should be familiar to students in this class. What follows is a basic summary of the syntax and use of expressions in C. Additional required readings at the end of this module cover this topic in greater depth.
Constants and Literals
A constant is a scalar value, while a literal is a representation of an object. For example, 42
is a constant integer, while "Hello world!"
is a string literal which represents an array of thirteen characters (char[13]
).
Integer Constants
An integer constant begins with a digit, and there are three recognized bases: base 10–decimal, base 8–octal, and base 16–hexadecimal.
If the first digit is non-zero, the constant is interpreted in base 10. For example, 42
is equal to 42.
If the first character is “0”, followed by a digit, then the constant is interpreted in base 8. For example, 052
is equal to .
If the first two characters are “0x”, then the remaining digits are interpreted in base 16, using the (case-insensitive) letters ‘a’ through ‘f’ to represent values 10 through 15. For example, 0x2a
is equal to .
An additional “u” suffix may be added to specify an unsigned type, and an “l” or “ll” suffix to specify a long or long long integer type. These suffixes may also be capitalized–“U”, “L”, and “LL”. The actual type used, for a given suffix combination, is the smallest type capable of holding the specified value listed in this table in the C standards.
Floating Constants
Floating constants are recognized by the presence of a decimal “.” or an exponent part with the letter ‘e’. For example, 1.2
and 1e10
are both recognized as floating point values. See Floating Constants for more information. As mentioned earlier, floating point arithmetic will not be covered in this class.
Enumeration Constants
These were previously described–an enumeration constant is an identifier that has been declared in an enumeration declaration, and has type int
.
Character Constants
A character constant is enclosed in single quotes, as in 'a'
, and has type int
–not char
. Character constants are mapped in an implementation-defined manner to values in the execution character set. Typically, the ASCII character encoding is used to map character constants to integer values, which can be referenced via the ASCII(1) man page. Many modern compilers also support unicode UTF-8 character constants, as well.
As with most programming languages, several well-known escape sequences for special characters are recognized, as described in the C standards.
String Literals
String literals are sequences of characters enclosed in double quotes, as in "Hello World!"
. During compilation, a string literal has a null-byte appended to it, and the result is stored in a static character array. The string literal is an lvalue reference to that array and has type char[n]
, where n
is the number of characters, including the null-byte.
The arrays holding string literals are immutable–modifying them produces undefined behavior. It is also unspecified whether they overlap. For example, "Hello World"
and "World"
may point into different locations in the same underlying character array, so that the expression "World" == "Hello World" + 6
might or might not evaluate as true depending on the compiler and any optimizations. Additionally, multiple string literals with the same contents may or may not refer to the same array.
Recall that, as a special case, when initializing character arrays with a string literal–e.g char s[] = "Hello World"
–the string literal is a shorthand for a brace-enclosed initializer list with an implied terminating null byte–e.g. char s[] = {'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '0'}
. No string literal object is created in this context, which is explicitly different from the behavior of string literals in all other contexts, and often a source of confusion for beginners.
Compound Literals
C supports one other type of literal called a compound, or object, literal. Compound literals represent unnamed (anonymous) objects. A compound literal names a type, enclosed in parentheses, followed by a brace-enclosed initializer list; for example, (int[]){1,2,3,4}
, which evaluates to type int[4]
and contains the initialized values. Compound literals can also be structs and unions, such as (struct coordinates){.x = 3, .y = 5}
.
The contents of a compound literal are modifiable, unless the type is const
-qualified, and the lifetime of a compound object is the same as it would be had the object in question been explicitly declared in the same location.
Operators
Basic Operators
These basic operators are available in most programming languages with consistent behavior. Familiarity with each of the operators mentioned in this section is an expectation of this class; remedial self-study may be required for some students.
- Scalar Arithmetic
+a
Unary addition
-a
Unary subtraction
a + b
Addition
a - b
Subtraction
a * b
Multiplication
a / b
Division
a % b
Remainder
- Bitwise operators
~a
Bitwise not
a & b
Bitwise and
a | b
Bitwise or
a ^ b
Bitwise xor
a << b
Bitwise shift left
a >> b
Bitwise shift right
Code Smell
These operators give inconsistent results, or produce undefined or implementation defined behavior, when supplied with operands that are negative values. In general, they should only ever be used with unsigned types, or used very carefully with signed values which are provably non-negative.
- Assignment Operators
a = b
Assignment
a += b
Addition assignment
a -= b
Subtraction assignment
a *= b
Multiplication assignment
a /= b
Division assignment
a %= b
Remainder assignment
a &= b
Bitwise and assignment
a |= b
Bitwise or assignment
a ^= b
Bitwise xor assignment
a <<= b
Bitwise shift left assignment
a >>= b
Bitwise shift right assignment
For all of the assignment operators of the form
@=
, where@
is a placeholder for an operator,a @= b
is equivalent toa = a @ b
except thata
is only evaluated once–this may be relevant if the evaluation ofa
produces side-effects.The result of an assignment operator is whatever the right-hand side evaluates to, so assignment expressions can be embedded in larger expressions. For example,
a = b = c = 42
assignsc = 42
, thenb = 42
, thena = 42
.- Boolean Operators
!a
Logical not
a || b
Logical or
a && b
Logical and
a == b
Equality
a != b
Inequality
a < b
Less than
a > b
Greater than
a <= b
Less than or equal
a >= b
Greater than or equal
All of these operators evaluate to an
int
value of either 1 or 0.The operators
||
and&&
perform short-circuit evaluation: The left-hand side is evaluated first; the right-hand side is then evaluated only if its outcome affects the overall result of the expression. Thus, the order of the operands is significant, affecting both program behavior and performance.Warning
Signed integer overflow is undefined behavior. This can occur with the addition, subtraction, multiplication, division, and left shift operators. On the other hand, unsigned integer overflow simply wraps around so that
UINT_MAX + 1
equals 0, and converting -1 to an unsigned type results in the largest value that type can hold.
C-specific Operators
- Member access operators
a[b]
Array subscript
*a
Pointer dereference
&a
Address of
a.b
Member access
a->b
Member access through pointer
The array subscript operator
a[b]
is exactly equivalent to*(a + b)
.The pointer dereference operator
*a
designates the object thata
points to. The operanda
must be a pointer type,T*
, and the result is an lvalue of typeT
. It is undefined behavior to dereference a pointer that does not point at a valid object.The address of operator
&a
evaluates to the address ofa
. Ifa
is typeT
, the result is typeT*
.The member access operator
a.b
designates the memberb
of the structure or uniona
, whilea->b
designates the memberb
of the structure or union pointed at bya
. Thea->b
operator is exactly equivalent to(*a).b
.- Other operators
a(...)
Function call operator
a, b
Comma operator
(type) a
Cast operator
a ? b : c
Ternary operator
sizeof
Size of operator
The function call operator
a(...)
requiresa
be a type pointer to function, and its enclosed arguments must be convertible to the types of the parameters ofa
.The comma operator
a, b
evaluatesa
, discards its result, then evaluatesb
. The comma operator is used rarely.The cast operator
(type) a
convertsa
to typeT
. The result of the conversion is described below.The ternary operator
a ? b : c
evaluatesa
. If true, the expression evaluatesb
, otherwisec
. Equivalent to python’sc if a or b
ternary operator, with the operands in a different order.The size of operator
sizeof
evaluates to the size of a particular type or object, and there are two forms:For a given type,
T
,sizeof (T)
evaluates to the size, in bytes, ofT
.For an expression,
e
, which evaluates to a value of typeT
,sizeof e
evaluates to the size, in bytes, ofT
.
Code Smell
Avoid using the
sizeof (T)
form, whereT
is a typename. There is usually an expression that could be used instead. This way, if the type of the expression were ever to change, eachsizeof
operand referring to it would be automatically updated. A common example is when dynamically allocating an array of objects withmalloc
, as in,T *my_array; /* ... */ my_array = malloc(sizeof *my_array * array_size);
which automatically updates the size expression if the type of
my_array
changes, vs.my_array = malloc(sizeof (T) * ARRAY_SIZE)
, which does not. This is an example of a programming principle called DRY–Don’t Repeat Yourself.
Implicit Conversions
Implicit type conversion is performed automatically without requiring an explicit cast. There are three basic types of implicit conversions that occur,
- Type Coercion
Type coercion refers to the process of converting the types of each operand of an expression into appropriate types to carry out each operation. In C, type coercion occurs when assigning, passing, and initializing where the value type differs from the target type, and when performing arithmetic with operands of different types.
- Type Promotion
When integer values with an integer type with rank less than or equal to
int
are evaluated, they are usually implicitly converted to a value of typeint
orunsigned int
. Ifint
can represent the entire range of values of the original type, the value is converted toint
; otherwise,unsigned int
is used.Implementation Detail
This is a reflection of the physical notion of
int
being the native word size of the target architecture. Promoting values to the native word size ensures that calculations are performed as efficiently as possible. Additionally, some systems may be physically incapable of performing certain operations on data types smaller thanint
.- Type Decay
Type decay refers to the process by which certain types are automatically converted to another type with some loss of information. In C, array types automatically decay into pointers to their first elements, function types automatically decay into pointers to the functions themselves, and lvalues automatically decay into rvalues. There are certain contexts where decay does not occur, such as when taking the address of an object or when using the
sizeof
operator on an array.
Order of Evaluation
Within an expression, it is important to understand the order in which actual sub-expressions are evaluated. In C, a sequence point represents a settled state where all objects have well-defined values and no calculations are in progress. Between any two sequence points, there may be multiple sub-expressions that are evaluated in an indeterminate order.
In general, there is a sequence point before a function is called, after each full expression, and after each declarator. Additionally, there is a sequence point after the left-hand operand of the ||
, &&
, ?:
, and ,
operators is evaluated.
If an object is modified and separately accessed between sequence points, the behavior is undefined. Common examples of this are,
f(i, i++);
i = ++i;