C undefined behavior. From one of the LLVM developers:
This behavior enables an analysis known as “Type-Based Alias Analysis” (TBAA) which is used by a broad range of memory access optimizations in the compiler, and can significantly improve performance of the generated code. For example, this rule allows clang to optimize this function:
float *P; void zero_array() { int i; for (i = 0; i < 10000; ++i) P[i] = 0.0f; }into “memset(P, 0, 40000)“. This optimization also allows many loads to be hoisted out of loops, common subexpressions to be eliminated, etc. This class of undefined behavior can be disabled by passing the -fno-strict-aliasing flag, which disallows this analysis. When this flag is passed, Clang is required to compile this loop into 10000 4-byte stores (which is several times slower), because it has to assume that it is possible for any of the stores to change the value of P, as in something like this:
int main() { P = (float*)&P; // cast causes TBAA violation in zero_array. zero_array(); }This sort of type abuse is pretty uncommon, which is why the standard committee decided that the significant performance wins were worth the unexpected result for “reasonable” type casts.
The permitted optimization is very effective: the code is 80 (EIGHTY!) times faster when clang O3 optimization replaces the loop with a memset (some of that is other optimization, but still). But the programmer has the option of using memset directly – which produces exactly the same optimization. In fact, the programmer has the option of using memset directly because fundamentally C wants to expose the underlying memory to the programmer.
The original motivation for leaving some C behavior undefined was that different processor architectures would produce different behaviors and the expert C programmer was supposed to know about those. Now compiler writers and standards developers claim it is good to introduce “unexpected results” (that surprise the experienced programmer) because these permit a certain kind of optimization.
Violating Type Rules: It is undefined behavior to cast an int* to a float* and dereference it (accessing the “int” as if it were a “float”). C requires that these sorts of type conversions happen through memcpy: using pointer casts is not correct and undefined behavior results. The rules for this are quite nuanced and I don’t want to go into the details here (there is an exception for char*, vectors have special properties, unions change things, etc). This behavior enables an analysis known as “Type-Based Alias Analysis” (TBAA) which is used by a broad range of memory access optimizations in the compiler, and can significantly improve performance of the generated code.
To me: “The rules for this are quite nuanced and I don’t want to go into the details here (there is an exception for char*, vectors have special properties, unions change things, etc)” means, “we mucked up the standard and we are going to cause many systems to fail as these nuanced rules confuse and surprise otherwise careful and highly expert programmers”. Compiler writers like undefined behavior because, in their interpretation of the standard, these permit any arbitrary code transformation. Anything. The well known controversial results include removing checks for null pointers due to an unreliable compiler inference about dereference behavior. These uses of “undefined” and limitations on reasonable type coercion are based on an incorrect idea of the purpose of the C language. Unexpected results are a catastrophe for a C programmer. Limitations on compiler optimizations based on second guessing the programmer are not catastrophes ( and nothing prevents compiler writers from adding suggestions about optimizations). There are two cases for this loop:
- The C programmer is an expert who used a loop instead of memset for a good reason or because this is not a performance critical part of the code.
- The C is programmer is not an expert and program almost certainly contains algorithmic weaknesses that are more significant than the loop –
Neither case benefits from the optimization. Programmers who want the compiler to optimize their algorithms using clever transformations should use programming languages that are better suited to large scale compiler transformations where type information is clear indication of purpose. The C compiler doesn’t need to be a 5th rate Mathematica or APL or FORTRAN or something like Haskell. Unexpected results are far more serious a problem for C than missed minor optimizations.