1. Proposal on Undefined Behavior.
Rationale:
This Standard definition of “undefined behavior”, like the rest of the Standard, relies on an implicit expectation that implementations are cooperating with the programmer and respecting the C abstract semantics. For example, overflow is undefined behavior, yet as per 5.1.2.3 Example 6, the Standard limits reordering certain expressions when the implementation produces traps on arithmetic overflows. The purpose of “undefined behavior” is to relieve the translator from hiding implementation differences and from making safety checks that are the responsibility of the C programmer. Unfortunately, some implementations are interpreting undefined behavior to license arbitrary transformations, without reference to the abstract semantics.
As an example, some implementations will roll-over on arithmetic overflow, but may delete programmer checks for roll-over as an “optimization”. The wording of the Standard does not support this interpretation: “possible undefined behavior ranges from”
- ignoring the situation completely with unpredictable results” – which case the underlying processor architecture might increment in modular arithmetic or trap (which may or may not be handled) or do something else, but the translation “ignore[s] the situation” and produces code without making use of knowledge of any undefined behavior;
- “to behaving during translation in a documented manner characteristic of the environment“;
- ” to terminating a translation or execution (with the issuance of a diagnostic message)“.
This range is very wide and provides implementations with latitude for optimization but
it does not support conforming implementations that produce hidden hazards for programmers. The purpose of the Standard is harmonize implementations, as much as is consistent with the purpose and design of the language, not to license implementations that are hostile to programmer intent and C abstract semantics. The proposed change makes this explicit.
Implementations that violate this expectation for undefined behavior make C programming a hazardous endeavor where the implementation is full of unpleasant surprises. A second consideration is that the more radical interpretations of undefined behavior are causing a fork in the language. Currently, the Linux Operating System requires its translators to respect options to turn off a number of “UB is impossible” program transformations such as pointer alias analysis, null pointer analysis, and overflow analysis. The existence of these modifications and their wide use indicates both that a great deal of existing practice depends on disabling the “UB is impossible” assumption and that accommodating a more modest use of UB directed optimization is acceptable to developers of implementations.
It has been argued that permitting implementations to violate the constraints of the abstract machine in the presence of undefined behavior is necessary for high levels of optimization, but there are no published studies validating that claim, and, indeed, the Linux kernel code is highly optimized despite its extensive use of the translator options described in the preceding paragraph.
Proposed change.
undefined behavior
1 behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements beyond those in NOTE 2 below
2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). This range is exhaustive, not inclusive and does not permit conforming implementations to assign semantics that are self-inconsistent. In particular, conforming implementations cannot transform code on the basis that some behavior left undefined by the standard is impossible unless it is, in fact, impossible in the implementation.
3 EXAMPLE An example of undefined behavior is the behavior on integer overflow.
4 EXAMPLE Code that checks to see if an integer value has overflowed or an array index is past the limit of the array cannot be deleted or otherwise ignored during translation unless the overflow or out of bounds index is actually impossible in the implementation.
5 EXAMPLE If the value of the right operand of a shift is negative, the result is undefined but a conforming implementation cannot thereby assume that the value is not negative. The code “x<<y; if(y < 0)panic()” cannot be translated to omit the test and the panic unless a negative shift is known to cause a trap or y is known to be non-negative.
6. EXAMPLE Code that checks to see if a pointer value is null cannot be omitted during translation solely on the basis that the pointer is dereferenced before the check and that earlier dereference would produce undefined behavior if the pointer value was null.
2. Proposal on Optimization
Rationale:
It is implicit in the Standard that “optimization” in a conforming implementation must produce behavior that conforms to the semantics of the language. Advances in optimization technology in translators have lead to the development of multi-pass translators in which semantics of programs can be modified – possibly unintentionally – without some care to preserve semantics over optimization passes. This change calls for conforming implementations to take that care.
Proposed change
1 The semantic descriptions in this International Standard describe the behavior of an
abstract machine in which issues of optimization are irrelevant. A conforming implementation may not change semantics of a program as an “optimization”
3. Proposal on Alias
Rationale:
The limitation of lvalue access by types has never made much sense and required a undefensible exception for character pointers. It is both impossible to write an efficient “memcpy” or to write a memory allocator at all (which must repeatedly change the type of allocated storage) in conforming C without this change. Furthermore, standard operations, in longstanding common practice, such as using an int type to compute a checksum on a message structure are forbidden under the current language. Access to the underlying representation of a type is a fundamental capability of C programming.
Proposed change
7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object, — a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type via a pointer with a properly aligned type.