Sei sulla pagina 1di 172

DRAFT – WORK IN PROGRESS

C memory object and value semantics:


the space of de facto and ISO standards
[This revises and extends WG14 N2013]

Revision: 1571 2016-03-17

David Chisnall Justus Matthiesen Kayvan Memarian Peter Sewell Robert N. M. Watson
University of Cambridge
http://www.cl.cam.ac.uk/~pes20/cerberus/

Contents 2.3.1 Q8. Should intra-object pointer sub-


traction give provenance-free inte-
1 Introduction 5 ger results? . . . . . . . . . . . . . 23
1.1 Experimental Testing . . . . . . . . . . . . 6 2.3.2 Q9. Can one make a usable off-
1.2 Summary of answers . . . . . . . . . . . . 7 set between two separately allocated
objects by inter-object subtraction
2 Abstract Pointers 7 (using either pointer or integer arith-
2.1 Pointer Provenance . . . . . . . . . . . . . 7 metic), to make a usable pointer to
the second by adding the offset to
2.1.1 Q1. Must the pointer used for
the first? . . . . . . . . . . . . . . . 26
a memory access have the right
provenance, i.e. be derived from 2.3.3 Q10. Presuming that one can
the pointer to the original alloca- have valid pointers with multiple
tion (with undefined behaviour oth- provenances, does an inter-object
erwise)? (This lets compilers do pointer subtraction give a value with
provenance-based alias analysis) . . 7 explicitly-unknown provenance or
2.1.2 Q2. Can equality testing on pointers something more specific? . . . . . . 30
be affected by pointer provenance 2.3.4 Q11. Is the XOR linked list idiom
information? . . . . . . . . . . . . 10 supported? . . . . . . . . . . . . . 32
2.1.3 GCC and ISO C11 differ on the re- 2.3.5 Q12. For arithmetic over prove-
sult of a == comparison on a one- nanced integer values, is the prove-
past pointer . . . . . . . . . . . . . 14 nance of the result invariant under
2.2 Pointer provenance via integer types . . . . 14 plus/minus associativity? . . . . . . 33
2.2.1 Q3. Can one make a usable pointer 2.3.6 Multiple provenance semantics
via casts to intptr t and back? . . 14 summarised . . . . . . . . . . . . . 35
2.2.2 Q4. Can one make a usable pointer 2.4 Pointer provenance via pointer representa-
via casts to unsigned long and tion copying . . . . . . . . . . . . . . . . . 35
back? . . . . . . . . . . . . . . . . 14 2.4.1 Q13. Can one make a usable copy of
2.2.3 Q5. Must provenance information a pointer by copying its representa-
be tracked via casts to integer types tion bytes using the library memcpy? 35
and integer arithmetic? . . . . . . . 16 2.4.2 Q14. Can one make a usable copy of
2.2.4 Q6. Can one use bit manipulation a pointer by copying its representa-
and integer casts to store informa- tion bytes (unchanged) in user code? 35
tion in unused bits of pointers? . . . 21 2.4.3 Q15. Can one make a usable copy
2.2.5 Q7. Can equality testing on integers of a pointer by copying its repre-
that are derived from pointer values sentation bytes by user code that in-
be affected by their provenance? . . 22 directly computes the identity func-
2.3 Pointers involving multiple provenances . . 23 tion on those bytes? . . . . . . . . . 36

1 2016/3/17
2.4.4 Q16. Can one carry provenance 2.13.1 Q31. Can one construct out-of-
through dataflow alone or also bounds (by more than one) pointer
through control flow? . . . . . . . . 38 values by pointer arithmetic (with-
2.5 Pointer provenance and union type punning 42 out undefined behaviour)? . . . . . 62
2.5.1 Q17. Is type punning between inte- 2.13.2 Q32. Can one form pointer values
ger and pointer values allowed? . . 42 by pointer addition that overflows
2.5.2 Q18. Does type punning between (without undefined behaviour)? . . 66
integer and pointer values preserve 2.13.3 Q33. Can one assume pointer addi-
provenance? . . . . . . . . . . . . 43 tion wraps on overflow? . . . . . . 67
2.6 Pointer provenance via IO . . . . . . . . . 44 2.13.4 Q34. Can one move among
2.6.1 Q19. Can one make a usable pointer the members of a struct using
via IO? . . . . . . . . . . . . . . . 44 representation-pointer arithmetic
2.7 Q20. Can one make a usable pointer from a and casts? . . . . . . . . . . . . . . 68
concrete address (of device memory)? . . . 47 2.13.5 Q35. Can one move between subob-
2.8 Pointer provenance for other allocators . . . 49 jects of the members of a struct us-
2.9 Stability of pointer values . . . . . . . . . . 49 ing pointer arithmetic? . . . . . . . 70
2.9.1 Q21. Are pointer values stable? . . 49 2.13.6 Q36. Can one implement offsetof
2.10 Pointer Equality Comparison (with == and !=) 50 using the addresses of members of a
2.10.1 Q22. Can one do == comparison be- NULL struct pointer? . . . . . . . . 71
tween pointers to objects of non- 2.14 Casts between pointer types . . . . . . . . . 71
compatible types? . . . . . . . . . . 50
2.14.1 Q37. Are usable pointers to a struct
2.10.2 Q23. Can one do == comparison be- and to its first member intercon-
tween pointers (to objects of com- vertable? . . . . . . . . . . . . . . 72
patible types) with different prove-
nances that are not strictly within 2.14.2 Q38. Are usable pointers to a union
their original allocations? . . . . . . 54 and to its current member intercon-
vertable? . . . . . . . . . . . . . . 73
2.10.3 Q24. Can one do == comparison of
a pointer and (void*)-1? . . . . . 55 2.15 Accesses to related structure and union types 73
2.11 Pointer Relational Comparison (with <, >, 2.15.1 Q39. Given two different structure
<=, or >=) . . . . . . . . . . . . . . . . . . 55 types sharing a prefix of members
2.11.1 Q25. Can one do relational compar- that have compatible types, can one
ison (with <, >, <=, or >=) of two cast a usable pointer to an object of
pointers to separately allocated ob- the first to a pointer to the second,
jects (of compatible object types)? . 56 that can be used to read and write
2.11.2 Q26. Can one do relational compari- members of that prefix (with strict-
son (with <, >, <=, or >=) of a pointer aliasing disabled and without pack-
to a structure member and one to a ing variation)? . . . . . . . . . . . 77
sub-member of another member, of 2.15.2 Q40. Can one read from the initial
compatible object types? . . . . . . 58 part of a union of structures sharing
2.11.3 Q27. Can one do relational compari- a common initial sequence via any
son (with <, >, <=, or >=) of pointers union member (if the union type is
to two members of a structure that visible)? . . . . . . . . . . . . . . . 79
have incompatible types? . . . . . . 59 2.15.3 Q41. Is writing to the initial part of
2.12 Null pointers . . . . . . . . . . . . . . . . 60 a union of structures sharing a com-
2.12.1 Q28. Can one make a null pointer by mon initial sequence allowed via
casting from a non-constant integer any union member (if the union type
expression? . . . . . . . . . . . . . 60 is visible)? . . . . . . . . . . . . . 80
2.12.2 Q29. Can one assume that all null 2.15.4 Q42. Is type punning by writing and
pointers have the same representation? 61 reading different union members al-
2.12.3 Q30. Can null pointers be assumed lowed (if the lvalue is syntactically
to have all-zero representation bytes? 61 obvious)? . . . . . . . . . . . . . . 80
2.13 Pointer Arithmetic . . . . . . . . . . . . . 62 2.16 Pointer lifetime end . . . . . . . . . . . . . 82

2 2016/3/17
2.16.1 Q43. Can one inspect the value, 3.2.8 Q56. Given multiple bitfields that
(e.g. by testing equality with ==) of may be in the same word, can one be
a pointer to an object whose life- a well-defined value while another is
time has ended (either at a free() an unspecified value? . . . . . . . . 107
or block exit)? . . . . . . . . . . . 83
2.16.2 Q44. Is the dynamic reuse of alloca- 3.2.9 Q57. Are the representation bytes of
tion addresses permitted? . . . . . . 85 an unspecified value themselves also
unspecified values? (not an arbitrary
2.17 Invalid Accesses . . . . . . . . . . . . . . . 86
choice of concrete byte values) . . . 107
2.17.1 Q45. Can accesses via a null pointer
be assumed to give runtime errors, 3.2.10 Q58. If one writes some but not
rather than give rise to undefined be- all of the representation bytes of an
haviour? . . . . . . . . . . . . . . . 86 uninitialized value, do the other rep-
2.17.2 Q46. Can reads via invalid pointers resentation bytes still hold unspeci-
be assumed to give runtime errors or fied values? . . . . . . . . . . . . . 109
unspecified values, rather than unde-
fined behaviour? . . . . . . . . . . 88 3.2.11 Q59. If one writes some but not
all of the representation bytes of an
uninitialized value, does a read of
3 Abstract Unspecified Values 89 the whole value still give an unspec-
3.1 Trap Representations . . . . . . . . . . . . 89 ified value? . . . . . . . . . . . . . 111
3.1.1 Q47. Can one reasonably assume
that no types have trap representa- 3.3 Structure and Union Padding . . . . . . . . 114
tions? . . . . . . . . . . . . . . . . 90
3.3.1 Q60. Can structure-copy copy
3.1.2 Q48. Does reading an uninitialised padding? . . . . . . . . . . . . . . 116
object give rise to undefined be-
haviour? . . . . . . . . . . . . . . . 90 3.3.2 Q61. After an explicit write of a
3.2 Unspecified Values . . . . . . . . . . . . . 95 padding byte, does that byte hold a
3.2.1 Q49. Can library calls with well-defined value? (not an unspeci-
unspecified-value arguments be as- fied value) . . . . . . . . . . . . . . 122
sumed to execute with an arbitrary
3.3.3 Q62. After an explicit write of a
choice of a concrete value (not
padding byte followed by a write
necessarily giving rise to undefined
to the whole structure, does the
behaviour)? . . . . . . . . . . . . . 96
padding byte hold a well-defined
3.2.2 Q50. Can control-flow choices
value? (not an unspecified value) . . 123
based on unspecified values be
assumed to make an arbitrary 3.3.4 Q63. After an explicit write of a
choice (not giving rise to undefined padding byte followed by a write to
behaviour)? . . . . . . . . . . . . . 97 adjacent members of the structure,
3.2.3 Q51. In the absence of any writes, does the padding byte hold a well-
is an unspecified value potentially defined value? (not an unspecified
unstable, i.e., can multiple usages of value) . . . . . . . . . . . . . . . . 125
it give different values? . . . . . . . 98
3.2.4 Q52. Do operations on unspecified 3.3.5 Q64. After an explicit write of zero
values result in unspecified values? 101 to a padding byte followed by a
write to adjacent members of the
3.2.5 Q53. Do bitwise operations on un-
structure, does the padding byte
specified values result in unspecified
hold a well-defined zero value? (not
values? . . . . . . . . . . . . . . . 104
an unspecified value) . . . . . . . . 126
3.2.6 Q54. Must unspecified values be
considered daemonically for identi- 3.3.6 Q65. After an explicit write of a
fication of other possible undefined padding byte followed by a write
behaviours? . . . . . . . . . . . . . 105 to a non-adjacent member of the
3.2.7 Q55. Can a structure containing an whole structure, does the padding
unspecified-value member can be byte hold a well-defined value? (not
copied as a whole? . . . . . . . . . 105 an unspecified value) . . . . . . . . 126

3 2016/3/17
3.3.7 Q66. After an explicit write of a 4.3.3 Q78. After writing one member of
padding byte followed by a writes a structure to a malloc’d region, can
to adjacent members of the whole its other members be read? . . . . . 140
structure, but accessed via point- 4.3.4 Q79. After writing one member of
ers to the members rather than via a structure to a malloc’d region, can
the structure, does the padding byte a member of another structure, with
hold a well-defined value? (not an footprint overlapping that of the first
unspecified value) . . . . . . . . . 127 structure, be written? . . . . . . . . 140
3.3.8 Q67. Can one use a malloc’d region 4.3.5 Q80. After writing a structure to a
for a union that is just big enough to malloc’d region, can its members be
hold the subset of members that will accessed via a pointer to a different
be used? . . . . . . . . . . . . . . . 128 structure type that has the same leaf
3.3.9 More remarks on padding . . . . . 130 member type at the same offset? . . 141
3.3.10 Q68. Can the user make a copy 4.3.6 Q81. Can one access two objects,
of a structure or union by copying within a malloc’d region, that have
just the representation bytes of its overlapping but non-identical foot-
members and writing junk into the print? . . . . . . . . . . . . . . . . 142
padding bytes? . . . . . . . . . . . 130
3.3.11 Q69. Can one read an object as 5 Other Questions 143
aligned words without regard for the 5.1 Q82. Given a const-qualified pointer to an
fact that the object’s extent may not object defined with a non-const-qualified
include all of the last word? . . . . 131 type, can the pointer be cast to a non-const-
3.3.12 Q70. Does concurrent access to two qualified pointer and used to mutate the object?143
(non-bitfield) distinct members of a 5.2 Q83. Can char and unsigned char be as-
structure constitute a data race? . . 132 sumed to be 8-bit bytes? . . . . . . . . . . 143
3.3.13 Q71. Does concurrent access to a 5.3 Q84. Can one assume two’s-complement
structure member and a padding arithmetic? . . . . . . . . . . . . . . . . . 143
byte of that structure constitute a 5.4 Q85. In the absence of floating point, can
data race? . . . . . . . . . . . . . . 133 one assume that no base types have multiple
3.3.14 Q72. Does concurrent (read or representations of the same value? . . . . . 143
write) access to an unspecified value
constitute a data race? . . . . . . . 133 6 Related Work 144
6.1 C formalised in HOL; Norrish; PhD thesis
4 Effective Types 133 1998 . . . . . . . . . . . . . . . . . . . . . 145
4.1 Basic effective types . . . . . . . . . . . . 134 6.2 A unified memory model for pointers; Tuch,
4.1.1 Q73. Can one do type punning be- Klein; LPAR 2005 . . . . . . . . . . . . . 145
tween arbitrary types? . . . . . . . 134 6.3 Types, bytes, and separation logic; Tuch,
4.1.2 Q74. Can one do type punning be- Klein, Norrish; POPL 2007 . . . . . . . . . 145
tween distinct but isomorphic struc- 6.4 Formal verification of a C-like memory
ture types? . . . . . . . . . . . . . 135 model and its uses for verifying program
4.2 Effective types and character arrays . . . . 136 transformations; Leroy and Blazy; JAR 2008 148
4.2.1 Q75. Can an unsigned character ar- 6.5 CompCertTSO: A Verified Compiler for
ray with static or automatic storage Relaxed-Memory Concurrency; Ševčı́k,
duration be used (in the same way as Vafeiadis, Zappa Nardelli, Jagannathan,
a malloc’d region) to hold values of Sewell; POPL 2011, JACM 2013 . . . . . . 149
other types? . . . . . . . . . . . . . 136 6.6 The CompCert Memory Model, Version 2;
4.3 Effective types and subobjects . . . . . . . 138 Leroy, Appel, Blazy, Stewart; INRIA RR-
4.3.1 Q76. After writing a structure to a 7987 2012 . . . . . . . . . . . . . . . . . . 149
malloc’d region, can its members 6.7 Formal C semantics: CompCert and the C
can be accessed via pointers of the standard; Krebbers, Leroy, and Wiedijk; ITP
individual member types? . . . . . 138 2014 . . . . . . . . . . . . . . . . . . . . . 153
4.3.2 Q77. Can a non-character value be 6.8 A Precise and Abstract Memory Model for
read from an uninitialised malloc’d C using Symbolic Values, Besson, Blazy,
region? . . . . . . . . . . . . . . . 139 and Wilke; APLAS 2014 . . . . . . . . . . 154

4 2016/3/17
6.9 A Concrete Memory Model for CompCert; flicts seen between different views, this is intended only as a
Besson, Blazy, Wilke; ITP 2015 . . . . . . 157 precise reference point for discussion; no single model can
6.10 A formal C memory model supporting currently be acceptable to all parts of the C community. We
integer-pointer casts; Kang, Hur, Mansky, may later equip it with switches to express particular views
Garbuzov, Zdancewic, Vafeiadis; PLDI 2015 158 of de facto and/or ISO standards. We also discuss the in-
6.11 The C standard formalized in Coq; Kreb- tended behaviour of CHERI C [14], with its hardware sup-
bers; PhD thesis 2015 . . . . . . . . . . . . 160 port for capabilities [55, 56].
6.12 An Executable Formal Semantics of C with In the longer term, this analysis may be helpful to under-
Applications; Ellison and Roşu; POPL 2012 162 stand what a well-designed language for systems program-
6.13 A precise yet efficient memory model for C; ming would have to support.
SSV 2009; Cohen, Moskal, Tobies, Schulte 163 One can look at the de facto semantics from several dif-
6.14 Undefined Behavior: What Happened to ferent perspectives:
My Code?; Wang, Chen, Cheung, Jia, Zel-
dovich, Kaashoek; APSys 2012, and To- 1. the languages implemented by mainstream com-
wards Optimization-Safe Systems: Analyz- pilers (GCC, Clang, ICC, MSVC, etc.), including
ing the Impact of Undefined Behavior. the assumptions their optimisation passes make
Wang, Zeldovich, Kaashoek, Solar-Lezama; about user code and how these change with cer-
SOSP 13 . . . . . . . . . . . . . . . . . . . 163 tain flags (e.g. GCC’s -fno-strict-aliasing and
6.15 Beyond the PDP-11: Architectural support -fno-strict-overflow);
for a memory-safe C abstract machine; Chis- 2. the idioms used in the corpus of mainstream systems
nall et al.; ASPLOS 2015 . . . . . . . . . . 164 code out there, especially in specific large-scale systems
6.16 What every C programmer should know (Linux, FreeBSD, Xen, Apache, etc.);
about undefined behavior; Lattner; Blog 3. the language that systems programmers believe they are
post 2011 . . . . . . . . . . . . . . . . . . 166 writing in, i.e., the assumptions they make about what
6.17 Proposal for a Friendly Dialect of C; Cuoq, behaviour they can rely on;
Flatt, Regehr; Blog post 2014 . . . . . . . . 166
6.18 UB Canaries; Regehr; Blog post 2015 . . . 167 4. the issues that arise in making C code portable between
different compilers and architectures; and
Bibliography 169 5. the behaviour assumed, implicitly or explicitly, by code
analysis tools.
Index of Tests 171
We focus throughout on current mainstream C implemen-
tations: commonly used compilers and hardware platforms.
1. Introduction One could instead consider the set of all current or histori-
In this note we discuss the semantics of memory in C, cal C implementations, or even all conceivable implemen-
focussing on the non-concurrent aspects: the semantics of tations, but that (apart from being even harder to investi-
pointers, casts, effective types, unspecified values, and so gate) would lead to a semantics which is significantly dif-
on. These make up what we call the memory object model, ferent from the one used by the corpus of code we are con-
to distinguish it from the memory concurrency model that cerned with, which does make more assumptions about C
addresses the relaxed-memory semantics of C; the two are than that would permit. Our goals are thus rather different
largely but not completely orthogonal, and together they give from those of the C standard committee, at least as expressed
a complete semantics of C memory. This is a part of our in this from the C99 Rationale v5.10: “Beyond this two-
larger Cerberus C semantics project. level scheme [conforming hosted vs freestanding implemen-
We are concerned principally with the de facto standards tations], no additional subsetting is defined for C, since the
of C as it is used in practice: the existing usage of C, espe- C89 Committee felt strongly that too many levels dilutes the
cially in systems code, and the behaviour of the dominant effectiveness of a standard.”. Our impression is that main-
compiler implementations and the idioms they support. We stream usage and implementations are using a significantly
also discuss C as specified in the ISO C11 standard. The different language from that defined by the standard; this di-
ISO and de facto standards can differ in important ways, vergence makes the standard less relevant and leaves practice
and in reality neither of them are singular: the C11 standard on an uncertain footing.
is prose text, open to interpretation, and there are multiple The main body of this note is a collection of 85 specific
distinct de facto standards in different contexts (some spe- questions about the semantics of C, each stated reasonably
cific to particular compilers or compiler flags). We are de- precisely in prose and most supported by one or more test-
veloping a formal model intended to capture one reasonable case examples and by discussion of the ISO and de facto
view of the de facto standards, though, given the real con- standards. Each particular view of C will have its own an-

5 2016/3/17
swers (or be unclear) for each of these questions; for some paper by Chisnall et al. [14], both from instrumenting LLVM
questions all views will agree on the answer, while for other and trying to port a number of C programs to a more-than-
questions different views have quite different answers. The usually restrictive implementation, their CHERI platform.
answers for a particular view thus locate that view within an We can investigate (3) by asking the community of ex-
85-dimensional space of conceivable Cs. pert C programmers what properties they think they assume
Our questions and test cases were developed in an iter- of the language in practice, which we have done with two
ative process of reading the literature (the ISO standards, surveys (to the best of our knowledge, this is a novel ap-
defect reports, academic papers, and blog posts); building proach to investigating the de facto semantics of a widely
candidate models; writing tests; experimenting with those on used language). The first version, in early 2013, had 42 ques-
particular compilers; writing the surveys we discuss below; tions, with concrete code examples and subquestions about
analysing our survey results; and discussions with experts. the de facto and ISO standards. We targeted this at a small
We have tried to address all the important issues in the se- number of experts, including multiple contributors to the
mantics of C memory object models, but there may well be ISO C or C++ standards committees, C analysis tool de-
others (as there is no well-defined space of “conceivable C velopers, experts in C formal semantics, compiler writers,
semantics”, this cannot be complete in any precise sense); and systems programmers. The results were very instructive,
we would be happy to learn of others that we should add. but this survey demanded a lot from the respondents; it was
Our test cases are typically written to illustrate a partic- best done by discussing the questions with them in person
ular semantic question as concisely as possible. Some are over several hours. The concrete code examples helped make
“natural” examples, of desirable C code that one might find the questions precise, but they also created confusion: being
in the wild, but many are testing corner cases, e.g. to ex- designed to probe semantic questions about the language,
plore just where the defined/undefined-behaviour boundary many are not natural idiomatic code, but many readers tried
is, and would be considered pathological if they occurred in to interpret them as such. Our second version (in mid 2015),
the form given in real code. was simplified, making it feasible to collect responses from
Making the tests concise to illustrate semantic questions a wider community. We designed 15 questions, focussed on
also means that most are not written to trigger interesting some of the most interesting issues, asked only about the
compiler behaviour, which might only occur in a larger con- de facto standard (typically asking (a) whether some idiom
text that permits some analysis or optimisation pass to take would work in normal C compilers and (b) whether it was
effect. Moreover, following the spirit of C, compilers do not used in practice), and omitted the concrete code examples.
report all instances of undefined behaviour. Hence, only in Aiming for a modest-scale but technically expert audience,
some cases is there anything to be learned from the exper- we distributed the survey among our local systems research
imental compiler behaviour. For any executable semantics, group, at EuroLLVM 2015, via technical mailing lists: gcc,
on the other hand, running all of them should be instructive. llvmdev, cfe-dev, libc-alpha, xorg, freebsd-developers, xen-
Direct investigation of (1) and (2) is challenging. For (1), devel, and Google C user and compiler lists, and via John
the behaviour of mainstream compilers is really defined only Regehr’s blog, widely read by C experts. There were around
by their implementations; it is not documented in sufficient 323 responses, including around 100 printed pages of tex-
detail to answer all the important questions. Those are very tual comments. Most respondents reported expertise in C
large bodies of code, and particular behaviour of analysis systems programming (255) and many reported expertise in
and optimisation passes may only be triggered on relatively compiler internals (64) and in the C standard (70). The re-
complex examples. We include experimental data for all our sults are available on the web1 ; we refer to them where ap-
tests nonetheless, for various C implementations; in some propriate but do not include them here.
cases this is instructive.
Given a complete candidate model we could conceivably 1.1 Experimental Testing
do random testing against existing implementations, but that The examples are compiled and run with a range of tools:
is challenging in itself. One of our main concerns is the bor- • GCC 4.8, 4.9, and 5.3, and clang 33-37, all at O0, O2, and
der between defined and undefined behaviour, but (a) we O2 with -fno-strict-aliasing, on x86 on FreeBSD,
do not have a good random test generator for programs on e.g.
that border (the existing Csmith test generator by Yang et
al. [57] is intended to only produce programs without unde- gcc48 -O2 -std=c11 -pedantic -Wall -Wextra
fined behaviour, according to its authors’ interpretation), and -Wno-unused-variable -pthread
(b) mainstream C implementations are not designed to report
• clang37 with address, memory, and undefined-behaviour
all instances of undefined behaviour; they instead assume its
sanitisers, e.g.
absence to justify optimisations.
For (2), it is hard to determine what assumptions a body clang37 -fsanitize=address -std=c11 -pedantic
of C code relies on. We draw on data from the ASPLOS 2015
1 www.cl.cam.ac.uk/
~pes20/cerberus/

6 2016/3/17
-Wall -Wextra -Wno-unused-variable -pthread Note that the last two are inferences from the single data
points (and, for tis, some discussion with the developers);
• CHERI clang at O0, O2, and O2 with they should be treated with caution.
-fno-strict-aliasing, e.g. Of the 85 questions,
clang -O2 -std=c11 -target=cheri-unknown-freebsd • for 39 the ISO standard is unclear;
-mcpu=mips3 -pedantic -Wall -Wextra -mabi=sandbox • for 27 the de facto standards are unclear, in some cases
-Wno-unused-variable -lc -lmalloc_simple with significant differences between usage and imple-
mentation; and
• The CHERI CPU running pure MIPS code, e.g.: • for 27 there are significant differences between the ISO
and the de facto standards.
clang -O2 -std=c11 -target=mips64-unknown-freebsd
-mcpu=mips3 -pedantic -Wall -Wextra We discuss related work in some detail in §6.
-Wno-unused-variable
Acknowledgements We thank all those who have provided
• the TrustInSoft tis-interpreter tool, version responses to our C surveys, without which this work would
Magnesium-20151002+dev not have been possible, especially Hans Boehm and Paul
• the KCC tool, in the evaluation version RV-Match v0.1
McKenney. We thank John Regehr and Pascal Cuoq for run-
ning tis-interpreter tests and discussion of the results,
distributed by Runtime Verification Inc. at https:
Jean Pichon-Pharabod for other testing assistance, and Colin
//runtimeverification.com/match/download/,
Rothwell, Jon Woodruff, Mike Roe, and Simon Moore for
downloaded 2016-03-11.
their work on the CHERI ISA and CHERI C. We acknowl-
Some tests rely on address coincidences for the interesting edge funding from EPSRC grants EP/H005633 (Leadership
execution; for these we include multiple variants, tuned to Fellowship, Sewell) and EP/K008528 (REMS Programme
the allocation behaviour in the implementations we consider. Grant), and a Gates Cambridge Scholarship (Nienhuis). This
Running the tests on other platforms may need additional work is also part of the CTSRD projects sponsored by the
variants to be added. Defense Advanced Research Projects Agency (DARPA) and
The tests are run using a test harness, charon, that gen- the Air Force Research Laboratory (AFRL), under contract
erates individual test instances from JSON files describing FA8750-10-C-0237. The views, opinions, and/or findings
the tests and tools; charon logs all the compile and exe- contained in this paper are those of the authors and should
cution output (together with the test itself and information not be interpreted as representing the official views or poli-
about the host) to another JSON file for analysis. The tests cies, either expressed or implied, of the Department of De-
and harness can be packaged up in a single tarball that can fense or the U.S. Government.
be run easily. charon also supports cross-compilation, to let
the CHERI tests be compiled on a normal host and executed 2. Abstract Pointers
on the CHERI FPGA-based hardware. Selected data from The most important and subtle questions are about the extent
the combined log files is automatically included in this doc- to which C values (especially pointers, but also unspecified
ument. values, structures, and unions) are abstract, as opposed to
being simple bit-vector-represented quantities.
1.2 Summary of answers
For each question we give multiple answers, as below. These 2.1 Pointer Provenance
should be treated with caution: given the complex and con- It might be tempting to think that a C pointer is completely
flicted state of C, many are subject to interpretation or to concrete, simply a machine address, but things are not that
revision, e.g. as we learn more about the de facto standards. simple, either in the de facto or ISO standards.
• iso: the ISO C11 standard 2.1.1 Q1. Must the pointer used for a memory access
• defacto-usage: the de facto standard of usage in practice have the right provenance, i.e. be derived from
• defacto-impl: the de facto standard of mainstream current the pointer to the original allocation (with
implementations undefined behaviour otherwise)? (This lets
• cerberus-defacto: the intended behaviour of our candi- compilers do provenance-based alias analysis)
date de facto formal model
• cheri: the intended behaviour of CHERI ISO : yes DEFACTO - IMPL:yes DEFACTO - USAGE: yes
• tis: the observed behaviour of the TrustInSoft CERBERUS - DEFACTO : yes CHERI: yes TIS: example
tis-interpreter not supported (memcmp of pointer representations) KCC:
• kcc: the observed behaviour of the KCC tool Execution failed (unclear why)

7 2016/3/17
Consider the following pathological code (adapted from Execution failed (configuration
the WG14 Defect Report DR2602 and its committee re- dumped)
sponse), first from the mainstream-implementation point of ISO : undefined behaviour
view. DEFACTO : undefined behaviour

E XAMPLE (provenance_basic_global_yx.c):
Depending on the implementation, x and y might happen
#include <stdio.h> to be allocated in adjacent memory, in which case &x+1
#include <string.h>
int y=2, x=1;
and &y will have bitwise-identical representation values, the
int main() { memcmp will succeed, and p (derived from a pointer to x) will
int *p = &x + 1; have the same representation value as a pointer to a different
int *q = &y;
printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q);
object, y, at the point of the update *p=11. This can occur in
if (memcmp(&p, &q, sizeof(p)) == 0) { practice with GCC -O2. The output of
*p = 11; // does this have undefined behaviour?
printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q); x=1 y=2 *p=11 *q=2
}
return 0; suggests that the compiler is reasoning that *p does not alias
} with y or *q, and hence that the initial value of y=2 can be
GCC -4.8-O2: propagated to the final printf.
Addresses: p=0x600ba4 q=0x600ba4 This outcome would not be correct with respect to a naive
x=1 y=2 *p=11 *q=2 concrete semantics, and so to make the compiler sound it is
GCC -4.9-O2: . . . as above (modulo addresses) necessary for this program to be deemed to have undefined
GCC -5.3-O2: . . . as above behaviour (which in C terms means that the compiler is al-
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- lowed to do anything at all). GCC does not report a compile-
dresses) or run-time warning or error for this example, but that is not
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- required by the standard for programs with undefined be-
dresses) haviour. Note that this example does not involve type-based
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above alias analysis, and the outcome is not affected by GCC’s
TIS - INTERPRETER : -fno-strict-aliasing flag. One might ask whether the
[value] Analyzing a complete application starting at mere formation of the pointer &x+1 is legal. We return to
main such questions later, but this case is explicitly permitted by
[value] Computing initial state the ISO standard.
[value] Initial Clang and GCC -O0 allocate differently, so one has to in-
state computed terchange the declarations of x and y to make p and q happen
to hold bitwise identical values, but then the outcome does
Addresses: p= not exhibit the effects of similar analysis and optimisation.
provenance basic global yx One has to treat such negative results with caution, of course:
.c:8:[kernel] warning: out of bounds read. assert it does not follow that this version of the compiler will not
\valid read((char *)(&p)+(0 .. sizeof(p)-1)); optimise similar examples, as the negative result could be
simply because the test is not complex enough to cause par-
stack: memcmp :: provenance basic global yx.c:8 ticular optimisations to fire.
<-
E XAMPLE (provenance_basic_global_xy.c):
main
GCC -4.8-O0:
[value] Stopping at nth
Addresses: p=0x600bcc q=0x600bcc
alarm
x=1 y=11 *p=11 *q=11
[value] user error: Degeneration occurred:
GCC -4.9-O0: . . . as above (modulo addresses)
GCC -5.3-O2:
results are not correct for lines of code
Addresses: p=0x600bb8 q=0x600bb0
that can be reached from the degeneration point.
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
KCC :
CLANG 33-O0:
Addresses: p=[sym(4 @ static(provenance basic global yx.
Addresses: p=0x600b20 q=0x600b20
cd09dc1b8-37cb-43c0-8631-47836d4a99fb)) + 4] q=[sym(3 @
x=1 y=11 *p=11 *q=11
static(provenance basic global yx.cd09dc1b8-37cb-43c0-86
CLANG 34-O0: . . . as above
31-47836d4a99fb)) + 0]
CLANG 35-O0: . . . as above (modulo addresses)
2 http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260. CLANG 36-O0: . . . as above (modulo addresses)
htm CLANG 37-O0: . . . as above (modulo addresses)

8 2016/3/17
CLANG 33-O2: . . . as above (modulo addresses) For reference, consider similar examples but with two
CLANG 34-O2: . . . as above malloc’d regions rather than global statically allocated ob-
CLANG 35-O2: . . . as above (modulo addresses) jects, e.g. provenance_basic_malloc_offset+2.c and
CLANG 36-O2: . . . as above provenance_basic_malloc_offset+12.c. Here accord-
CLANG 37-O2: . . . as above (modulo addresses) ing to the ISO standard it is illegal to form the pointer re-
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above quired to get from one to the other (as it is not one-past).
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above We return to whether that is allowed in the de facto standard
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- in §2.13 (p.62). Here GCC 4.8 appears not to assume a lack
dresses) of aliasing; the Clang behaviour is the same as the previous
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above example.
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- The current CHERI implementation treats globals and
dresses) variables with automatic storage duration differently (pend-
CLANG 37-UBSAN: . . . as above (modulo addresses) ing improvements to the linker implementation). Accord-
CLANG 37-ASAN: ingly, we include variants of the first test with automatic stor-
Addresses: p=0x69d4c4 q=0x69d500 age duration.
TIS - INTERPRETER :
E XAMPLE (provenance_basic_auto_yx.c):
[value] Analyzing a complete application starting at
#include <stdio.h>
main #include <string.h>
[value] Computing initial state int main() {
[value] Initial int y=2, x=1;
int *p = &x + 1;
state computed int *q = &y;
printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q);
Addresses: p= if (memcmp(&p, &q, sizeof(p)) == 0) {
*p = 11; // does this have undefined behaviour?
provenance basic global xy printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q);
.c:8:[kernel] warning: out of bounds read. assert }
\valid read((char *)(&p)+(0 .. sizeof(p)-1)); return 0;
}

stack: memcmp :: provenance basic global xy.c:8 GCC -4.8-O0:

<- Addresses: p=0x7fffffffea2c q=0x7fffffffea2c


main x=1 y=11 *p=11 *q=11
[value] Stopping at nth GCC -4.9-O0: . . . as above

alarm GCC -5.3-O2:

[value] user error: Degeneration occurred: Addresses: p=0x7fffffffea20 q=0x7fffffffea18


GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
results are not correct for lines of code dresses)
that can be reached from the degeneration point. CLANG 33-O0:

KCC : Addresses: p=0x7fffffffea28 q=0x7fffffffea28


Addresses: p=[sym(3 @ static(provenance basic global xy. x=1 y=11 *p=11 *q=11
c89d33288-44ea-45b5-9b6f-7ac7e4e63f7e)) + 4] q=[sym(4 @ CLANG 34-O0: . . . as above

static(provenance basic global xy.c89d33288-44ea-45b5-9b CLANG 35-O0: . . . as above

6f-7ac7e4e63f7e)) + 0] CLANG 36-O0: . . . as above


Execution failed (configuration CLANG 37-O0: . . . as above
dumped) CLANG 33-O2: . . . as above (modulo addresses)
CLANG 34-O2: . . . as above

On the other hand, ICC on this version gives x=1 y=2 CLANG 35-O2: . . . as above

*p=11 *q=11, so also definitely needs this to be an CLANG 36-O2: . . . as above

undefined-behaviour program to be sound. CLANG 37-O2: . . . as above

Clang37-UBSAN does not detect this undefined be- CLANG 33-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-

haviour. The clang37-ASAN execution does not have the ad- dresses)
dress coincidence needed to make the test result meaningful. CLANG 34-O2- NO - STRICT- ALIASING : . . . as above

CHERI C behaves just like x86 Clang here because linker CLANG 35-O2- NO - STRICT- ALIASING : . . . as above

support (which is needed to provide provenance to pointers CLANG 36-O2- NO - STRICT- ALIASING : . . . as above

to globals) is not yet implemented. CLANG 37-O2- NO - STRICT- ALIASING : . . . as above


CLANG 37-UBSAN: . . . as above (modulo addresses)
CLANG 37-ASAN:

9 2016/3/17
Addresses: p=0x7fffffffe8f4 q=0x7fffffffe8e0
TIS - INTERPRETER : Addresses: p=
[value] Analyzing a complete application starting at provenance basic auto xy.c
main :8:[kernel] warning: out of bounds read. assert
[value] Computing initial state \valid read((char *)(&p)+(0 .. sizeof(p)-1));
[value] Initial
state computed stack: memcmp :: provenance basic auto xy.c:8 <-

Addresses: p= main
provenance basic auto yx.c [value] Stopping at nth
:8:[kernel] warning: out of bounds read. assert alarm
\valid read((char *)(&p)+(0 .. sizeof(p)-1)); [value] user error: Degeneration occurred:

stack: memcmp :: provenance basic auto yx.c:8 <- results are not correct for lines of code
that can be reached from the degeneration point.
main KCC :
[value] Stopping at nth Execution failed (configuration dumped)
alarm ISO : undefined behaviour
[value] user error: Degeneration occurred: DEFACTO : undefined behaviour

results are not correct for lines of code From the ISO-standard point of view, the committee re-
that can be reached from the degeneration point. sponse to Defect Report #260 appears to be regarded as
KCC : definitive, though it has not been folded into the standard
Execution failed (configuration dumped) text. It takes the position that the provenance of a pointer
ISO : undefined behaviour value is significant, writing “[an implementation] may also
DEFACTO : undefined behaviour treat pointers based on different origins as distinct even
though they are bitwise identical”. The pointer addition in
&x + 1 is legal3 but DR260 implies that the write *p = 11
E XAMPLE (provenance_basic_auto_xy.c): gives rise to undefined behaviour, meaning that program-
mers should not write this code and the ISO standard does
#include <stdio.h>
#include <string.h> not constrain how compilers have to treat it. This licenses
int main() { use of an analysis and optimisation that would otherwise be
int x=1, y=2; unsound.
int *p = &x + 1;
int *q = &y; Our de facto and ISO standard semantics should both
printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q); deem this program to have undefined behaviour, to be sound
if (memcmp(&p, &q, sizeof(p)) == 0) { w.r.t. GCC and ICC.
*p = 11; // does this have undefined behaviour?
printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q);
} 2.1.2 Q2. Can equality testing on pointers be affected
return 0;
} by pointer provenance information?
GCC -4.8-O2:
Addresses: p=0x7fffffffea1c q=0x7fffffffea1c ISO : yes (from DR260 CR) DEFACTO - USAGE: unknown
x=1 y=11 *p=11 *q=11 DEFACTO - IMPL: yes, nondeterministically at each occur-
GCC -4.9-O2: . . . as above rence CERBERUS - DEFACTO: yes, nondeterministically at
GCC -5.3-O2: . . . as above each occurrence CHERI: nondet TIS: Such pointer com-
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- parison is a source of nondeterminism which tis intention-
dresses) ally flags (with pointer comparable) KCC: unclear (the
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above printed addresses are not concrete values)
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
TIS - INTERPRETER : 3 The addition is licensed by 6.5.6 “Additive operators”, where: 6.5.6p7 says
[value] Analyzing a complete application starting at “For the purposes of these operators, a pointer to an object that is not an
main element of an array behaves the same as a pointer to the first element of
an array of length one with the type of the object as its element type.”, and
[value] Computing initial state
6.5.6p8 says “[...] Moreover, if the expression P points to the last element
[value] Initial of an array object, the expression (P)+1 points one past the last element of
state computed the array object [...]”.

10 2016/3/17
[Question 4/15 of our What is C in practice? (Cerberus stack: main
survey v2)4 relates to this.] [value] Stopping at nth alarm
The above example shows that C compilers have to be al- [value]
lowed to do static alias analysis and optimisation based on user error: Degeneration occurred:
pointer provenance, but one would not expect a conventional
C implementation to keep provenance information at run- results are not correct for lines of code that can be
time (unconventional and more defensive implementations reached from the degeneration point.
such as Softbound [40], Hardbound [17], or CHERI might KCC :
do that). To see this in practice, we form pointers p and q Addresses: p=[sym(4 @ static(provenance equality global
as above, with different provenance but identical represen- yx.c840475e3-650f-4ffb-a6d4-27ce0e137690)) + 4] q=[sym(3
tations, and then test their equality with == (instead of their @ static(provenance equality global yx.c840475e3-650f-4f
representation equality with memcmp). The result is variously fb-a6d4-27ce0e137690)) + 0]
true or false depending on the context. (p==q) = false
In this first example the equality result is false in GCC ISO : nondeterministically true or false
-O2 (even though the two pointers print the same): DEFACTO : nondeterministically true or false

E XAMPLE (provenance_equality_global_yx.c):
The same holds (perhaps surprisingly) if the test is
#include <stdio.h> pulled out into another function (provenance_equality_
#include <string.h>
int y=2, x=1; global_fn_yx.c), but if that function is put into a sep-
int main() { arate compilation unit (provenance_equality_global_
int *p = &x + 1; cu_yx_a.c and provenance_equality_global_cu_yx_
int *q = &y;
printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q); b.c) the comparison gives true:
_Bool b = (p==q);
// can this be false even with identical addresses? p=0x601024 q=0x601024
printf("(p==q) = %s\n", b?"true":"false"); (p==q) = true
return 0;
} For Clang, again flipping the order of x and y, we see just
GCC -4.8-O2:
true for all these tests where the addresses print the same.
Addresses: p=0x600b1c q=0x600b1c
E XAMPLE (provenance_equality_global_xy.c):
(p==q) = false
GCC -4.8-O0:
GCC -4.9-O2:
Addresses: p=0x600b5c q=0x600b5c
Addresses: p=0x600b2c q=0x600b2c
(p==q) = true
(p==q) = false
GCC -4.9-O0:
GCC -5.3-O2: . . . as above
Addresses: p=0x600b6c q=0x600b6c
GCC -4.8-O2- NO - STRICT- ALIASING :
(p==q) = true
Addresses: p=0x600b1c q=0x600b1c
GCC -5.3-O2:
(p==q) = false
Addresses: p=0x600b30 q=0x600b28
GCC -4.9-O2- NO - STRICT- ALIASING :
(p==q) = false
Addresses: p=0x600b2c q=0x600b2c
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
(p==q) = false
CLANG 33-O0:
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above Addresses: p=0x600b10 q=0x600b10
TIS - INTERPRETER :
(p==q) = true
[value] Analyzing a complete application starting at
CLANG 34-O0: . . . as above
main
CLANG 35-O0:
[value] Computing initial state
Addresses: p=0x600b28 q=0x600b28
[value] Initial
(p==q) = true
state computed
CLANG 36-O0: . . . as above
CLANG 37-O0:
Addresses: p=
Addresses: p=0x600b10 q=0x600b10
provenance equality global
(p==q) = true
yx.c:8:[kernel] warning: pointer comparison: assert
CLANG 33-O2:
\pointer comparable((void *)p, (void *)q);
Addresses: p=0x600ab0 q=0x600ab0
(p==q) = true
4 www.cl.cam.ac.uk/ CLANG 34-O2: . . . as above
~pes20/cerberus/
notes50-survey-discussion.html CLANG 35-O2:

11 2016/3/17
Addresses: p=0x600ac8 q=0x600ac8 (p==q) = true
(p==q) = true GCC -5.3-O2:
CLANG 36-O2: . . . as above Addresses: p=0x600b70 q=0x600b68
CLANG 37-O2: (p==q) = false
Addresses: p=0x600ab0 q=0x600ab0 GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
(p==q) = true CLANG 33-O0:
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above Addresses: p=0x600b58 q=0x600b58
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above (p==q) = true
CLANG 35-O2- NO - STRICT- ALIASING : CLANG 34-O0: . . . as above
Addresses: p=0x600ac8 q=0x600ac8 CLANG 35-O0:
(p==q) = true Addresses: p=0x600b80 q=0x600b80
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above (p==q) = true
CLANG 37-O2- NO - STRICT- ALIASING : CLANG 36-O0: . . . as above
Addresses: p=0x600ab0 q=0x600ab0 CLANG 37-O0:
(p==q) = true Addresses: p=0x600b68 q=0x600b68
CLANG 37-UBSAN: (p==q) = true
Addresses: p=0x627b34 q=0x627b34 CLANG 33-O2:
(p==q) = true Addresses: p=0x600b08 q=0x600b08
CLANG 37-ASAN: (p==q) = true
Addresses: p=0x69d424 q=0x69d460 CLANG 34-O2: . . . as above
(p==q) = false CLANG 35-O2:
TIS - INTERPRETER : Addresses: p=0x600b20 q=0x600b20
[value] Analyzing a complete application starting at (p==q) = true
main CLANG 36-O2: . . . as above
[value] Computing initial state CLANG 37-O2:
[value] Initial Addresses: p=0x600b08 q=0x600b08
state computed (p==q) = true
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
Addresses: p= CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
provenance equality global CLANG 35-O2- NO - STRICT- ALIASING :
xy.c:8:[kernel] warning: pointer comparison: assert Addresses: p=0x600b20 q=0x600b20
\pointer comparable((void *)p, (void *)q); (p==q) = true
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
stack: main CLANG 37-O2- NO - STRICT- ALIASING :
[value] Stopping at nth alarm Addresses: p=0x600b08 q=0x600b08
[value] (p==q) = true
user error: Degeneration occurred: CLANG 37-UBSAN:
Addresses: p=0x627b84 q=0x627b84
results are not correct for lines of code that can be (p==q) = true
reached from the degeneration point. CLANG 37-ASAN:
KCC : Addresses: p=0x69d424 q=0x69d460
Addresses: p=[sym(3 @ static(provenance equality global (p==q) = false
xy.c79ac8515-7473-4c82-8562-7588e69ceb19)) + 4] q=[sym(4 TIS - INTERPRETER :
@ static(provenance equality global xy.c79ac8515-7473-4c [value] Analyzing a complete application starting at
82-8562-7588e69ceb19)) + 0] main
(p==q) = false [value] Computing initial state
[value] Initial
state computed

E XAMPLE (provenance_equality_global_fn_xy.c): Addresses: p=


GCC -4.8-O0: provenance equality global
Addresses: p=0x600ba4 q=0x600ba4 fn xy.c:5:[kernel] warning: pointer comparison: assert
(p==q) = true \pointer comparable((void *)p, (void *)q);
GCC -4.9-O0:
Addresses: p=0x600bb4 q=0x600bb4

12 2016/3/17
stack: f :: provenance equality global fn xy.c:14
<- stack: main
main [value] Stopping at nth alarm
[value] Stopping at nth [value]
alarm user error: Degeneration occurred:
[value] user error: Degeneration occurred:
results are not correct for lines of code that can be
results are not correct for lines of code reached from the degeneration point.
that can be reached from the degeneration point. KCC :
KCC : Execution failed (configuration dumped)
Addresses: p=[sym(3 @ static(provenance equality global
fn xy.cd8999ccf-52a3-416c-b9c8-54e14d68f73b)) + 4]
q=[sym(4 @ static(provenance equality global fn xy.cd899
9ccf-52a3-416c-b9c8-54e14d68f73b)) + 0] E XAMPLE (provenance_equality_auto_fn_yx.c):
(p==q) = false CLANG 33-O0:
Addresses: p=0x7fffffffea18 q=0x7fffffffea18
and provenance_equality_global_cu_xy_a.c / (p==q) = true
provenance_equality_global_cu_xy_b.c. CLANG 34-O0: . . . as above

For CHERI, we again give a version of the example using CLANG 35-O0: . . . as above
automatic storage location variables. CLANG 36-O0: . . . as above
CLANG 37-O0: . . . as above
E XAMPLE (provenance_equality_auto_yx.c): CLANG 33-O2: . . . as above (modulo addresses)
CLANG 33-O0: CLANG 34-O2: . . . as above
Addresses: p=0x7fffffffea18 q=0x7fffffffea18 CLANG 35-O2: . . . as above
(p==q) = true CLANG 36-O2: . . . as above
CLANG 34-O0: . . . as above CLANG 37-O2: . . . as above
CLANG 35-O0: . . . as above CLANG 33-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
CLANG 36-O0: . . . as above dresses)
CLANG 37-O0: . . . as above CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O2: . . . as above (modulo addresses) CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
CLANG 34-O2: . . . as above CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2: . . . as above CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O2: . . . as above CLANG 37-UBSAN: . . . as above (modulo addresses)
CLANG 37-O2: . . . as above CLANG 37-ASAN:
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- Addresses: p=0x7fffffffe974 q=0x7fffffffe960
dresses) (p==q) = false
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above TIS - INTERPRETER :
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above [value] Analyzing a complete application starting at
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above main
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above [value] Computing initial state
CLANG 37-UBSAN: . . . as above (modulo addresses) [value] Initial
CLANG 37-ASAN: state computed
Addresses: p=0x7fffffffe974 q=0x7fffffffe960
(p==q) = false Addresses: p=
TIS - INTERPRETER : provenance equality auto f
[value] Analyzing a complete application starting at n yx.c:4:[kernel] warning: pointer comparison: assert
main \pointer comparable((void *)p, (void *)q);
[value] Computing initial state
[value] Initial stack: f :: provenance equality auto fn yx.c:14 <-
state computed
main
Addresses: p= [value] Stopping at nth
provenance equality auto y alarm
x.c:8:[kernel] warning: pointer comparison: assert [value] user error: Degeneration occurred:
\pointer comparable((void *)p, (void *)q);

13 2016/3/17
results are not correct for lines of code implementation] may also treat pointers based on different
that can be reached from the degeneration point. origins as distinct even though they are bitwise identical”.
KCC : The provenance_equality_global_yx.c behaviour
Execution failed (configuration dumped) is arguably a bug in GCC, violating 6.5.9p6, as we reported
(see Fig. 1). The developer comments disagree, arguing that
and provenance_equality_auto_cu_yx_a.c / pointers need not have stable numerical values (we think that
provenance_equality_auto_cu_yx_b.c. implausible, as it would break lots of code; we return to sta-
To allow this variation, our candidate de facto model bility in §2.9, p.49). But probably the behaviour should be
and any ISO standard semantics should both allow pointer allowed in any case, and the standard should have something
comparison to either use provenance-aware or provenance- better than the if-and-only-if in 6.5.9p6. The proposal above
oblivious comparison nondeterministically. In many cases to nondeterministically choose provenance-aware or con-
the two will give identical results (for performance of the crete comparison relaxes the if-and-only-if (taking DR260
executable semantics, for those one might choose not to to have precedence over the C11 text).
make an explicit nondeterministic choice).
2.2 Pointer provenance via integer types
2.1.3 GCC and ISO C11 differ on the result of a == In practice it seems to be routine to convert from a pointer
comparison on a one-past pointer type to a sufficiently wide integer type and back, e.g. to use
This arises from the preceeding examples: a defect in the unused bits of the pointer to store tag bits. The interaction
ISO standard text, in which the DR260 position has not been between that and provenance is interesting.
consistently incorporated.
From the ISO standard point of view, the standard is clear 2.2.1 Q3. Can one make a usable pointer via casts to
that in general pointers to different objects of compatible intptr t and back?
type can be compared with == (in contrast to relational oper-
ators, where such comparison gives undefined behaviour).5
ISO : yes DEFACTO - USAGE: yes DEFACTO - IMPL: yes
But the text of C11 and DR260 seem inconsistent w.r.t. the
CERBERUS - DEFACTO : yes CHERI : yes TIS : yes KCC :
result of the comparison. In the former, it is specified by
yes
6.5.9p6: “Two pointers compare equal if and only if both are
null pointers, both are pointers to the same object (includ- 2.2.2 Q4. Can one make a usable pointer via casts to
ing a pointer to an object and a subobject at its beginning) unsigned long and back?
or function, both are pointers to one past the last element of
the same array object, or one is a pointer to one past the end
ISO : implementation-defined DEFACTO - USAGE: yes (nor-
of one array object and the other is a pointer to the start of
mally) DEFACTO - IMPL: yes (normally) CERBERUS -
a different array object that happens to immediately follow
DEFACTO : yes (if unsigned long is wide enough)
the first array object in the address space.109)”
CHERI : no TIS : yes KCC : yes
Footnote 109: “Two objects may be adjacent in memory
We first have to consider the basic question of simple
because they are adjacent elements of a larger array or adja-
roundtrips, casting a pointer to an integer type and back,
cent members of a structure with no padding between them,
either via intptr t or unsigned long:
or because the implementation chose to place them so, even
though they are unrelated. If prior invalid pointer operations E XAMPLE (provenance_roundtrip_via_intptr_t.c):
(such as accesses outside array bounds) produced undefined #include <stdio.h>
behavior, subsequent comparisons also produce undefined #include <inttypes.h>
behavior.” int x=1;
int main() {
The last clause of 6.5.9p6 is surprising: given “a pointer int *p = &x;
to one past the end of one array object and the other is a intptr_t i = (intptr_t)p;
pointer to the start of a different array object that happens int *q = (int *)i;
*q = 11; // is this free of undefined behaviour?
to immediately follow the first array object in the address printf("*p=%d *q=%d\n",*p,*q);
space” the standard requires them to compare equal rather }
than merely permitting them to compare equal. This seems
GCC -4.8-O0:
to conflict with the spirit of DR260, which allows the pointer
*p=11 *q=11
provenance to be taken into account. The variation in experi-
GCC -4.9-O0: . . . as above
mental results can be licensed by the may in the DR260 “[an
GCC -4.8-O2: . . . as above
5 The GCC -4.9-O2: . . . as above
use of == to compare the two pointers is licensed by 6.5.9 Equality
operators, which allows the case in which “both operands are pointers to GCC -5.3-O2: . . . as above
qualified or unqualified versions of compatible types;”. GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above

14 2016/3/17
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O0: . . . as above
CLANG 34-O0: . . . as above
CLANG 35-O0: . . . as above
CLANG 36-O0: . . . as above
CLANG 37-O0: . . . as above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502
CLANG 33-O2: . . . as above
Bug ID: 61502
Summary: == comparison on "one-past" pointer gives wrong result CLANG 34-O2: . . . as above
Product: gcc
Version: 4.8.1 CLANG 35-O2: . . . as above
Status: UNCONFIRMED CLANG 36-O2: . . . as above
Severity: normal
Priority: P3 CLANG 37-O2: . . . as above
Component: c
Assignee: unassigned at gcc dot gnu.org CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
Reporter: [...]
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
Created attachment 32934 CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=32934&action=edit
C code as pasted into bug report CLANG 36-O2- NO - STRICT- ALIASING : . . . as above

The following code can produce a pointer to one-past the x object. When it CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
does, according to the C11 standard text, the result of the pointer comparison CLANG 37-UBSAN: . . . as above
should be true, but gcc gives false.
CLANG 37-ASAN: . . . as above
#include <stdio.h>
int y = 2, x=1; TIS - INTERPRETER :
int main()
{
[value] Analyzing a complete application starting at
int *p; main
p = &x +1 ;
printf("&x=%p &y=%p p=%p\n",(void*)&x, (void*)&y, (void*)p); [value] Computing initial state
_Bool b1 = (p==&y);
printf("(p==&y) = %s\n", b1?"true":"false"); [value] Initial
return 0; state computed
}

gcc-4.8 -std=c11 -pedantic -Wall -Wextra -O2 -o a.out


pointer_representation_1e.c && ./a.out *p=11 *q=11
&x=0x601020 &y=0x601024 p=0x601024
(p==&y) = false
[value] done for function
gcc-4.8 --version
gcc-4.8 (Ubuntu 4.8.1-2ubuntu1~12.04) 4.8.1 main
The pointer addition is licensed by 6.5.6 "Additive operators", where: KCC :
*p=11 *q=11
6.5.6p7 says "For the purposes of these operators, a pointer to an object that
is not an element of an array behaves the same as a pointer to the first ISO : defined behaviour (if the intptr type is provided)
element of an array of length one with the type of the object as its element
type.", and DEFACTO : defined behaviour
6.5.6p8 says "[...] Moreover, if the expression P points to the last element of
an array object, the expression (P)+1 points one past the last element of the
array object [...]".

The pointer comparison is licensed by 6.5.9 "Equality operators", where:


E XAMPLE (provenance_roundtrip_via_unsigned_long.c):
#include <stdio.h>
6.5.9p7 says "For the purposes of these operators, a pointer to an object that
is not an element of an array behaves the same as a pointer to the first int x=1;
element of an array of length one with the type of the object as its element int main() {
type.", int *p = &x;
6.5.9p6 says "Two pointers compare equal if and only if [...] or one is a
unsigned long i = (unsigned long)p;
pointer to one past the end of one array object and the other is a pointer to int *q = (int *)i;
the start of a different array object that happens to immediately follow the *q = 11; // is this free of undefined behaviour?
first array object in the address space.109)", and printf("*p=%d *q=%d\n",*p,*q);
Footnote 109 says "Two objects may be adjacent in memory because they are }
adjacent elements of a larger array or adjacent members of a structure with no
padding between them, or because the implementation chose to place them so, GCC -4.8-O0:
even though they are unrelated. [...]".
*p=11 *q=11
GCC -4.9-O0: . . . as above
GCC -4.8-O2: . . . as above
GCC -4.9-O2: . . . as above
GCC -5.3-O2: . . . as above
Figure 1. Bug ID: 61502 GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O0: . . . as above

15 2016/3/17
CLANG 34-O0: . . . as above 6.3.2.3p6: “Any pointer type may be converted to an
CLANG 35-O0: . . . as above integer type. Except as previously specified, the result is
CLANG 36-O0: . . . as above implementation-defined. If the result cannot be represented
CLANG 37-O0: . . . as above in the integer type, the behavior is undefined. The result need
CLANG 33-O2: . . . as above not be in the range of values of any integer type.”
CLANG 34-O2: . . . as above (Footnote 67 says “The mapping functions for converting
CLANG 35-O2: . . . as above a pointer to an integer or an integer to a pointer are intended
CLANG 36-O2: . . . as above to be consistent with the addressing structure of the execu-
CLANG 37-O2: . . . as above tion environment.”; the exact force of this is not clear.)
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above On the other hand, 7.20 Integer types <stdint.h> in-
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above troduces optional types intptr t and uintptr t with
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above roundtrip properties from pointer to integer and back:
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above 7.20.1.4p1 “The following type designates a signed in-
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above teger type with the property that any valid pointer to void
CLANG 37-UBSAN: . . . as above can be converted to this type, then converted back to pointer
CLANG 37-ASAN: . . . as above to void, and the result will compare equal to the original
TIS - INTERPRETER : pointer: intptr t”. “The following type designates an un-
[value] Analyzing a complete application starting at signed integer type with the property that any valid pointer
main to void can be converted to this type, then converted back
[value] Computing initial state to pointer to void, and the result will compare equal to the
[value] Initial original pointer: uintptr t”.
state computed
We presume that this “compare equal” is intended to
imply that the result is interchangeable with the original
*p=11 *q=11
pointer, but, as we have seen examples in which two pointers
compare equal but access via one gives undefined behaviour
[value] done for function
while access via the other does not, this is unfortunate phras-
main
ing (it likely antedates DR260) and should be changed. In the
KCC :
CHERI case tags are not visible in memory, so there also a
*p=11 *q=11
pointer and an integer might compare equal but not be equi-
ISO : implementation-defined
usable.
DEFACTO : defined behaviour
Note that these examples do not involve function point-
ers; things might be different there.
In the de facto standards this is clearly allowed, both for
intptr t and (as in Linux or more generally in Unix) some 2.2.3 Q5. Must provenance information be tracked via
other integer types (e.g. unsigned long). This involves the casts to integer types and integer arithmetic?
Int: storing a pointer in an integer variable in memory of the
CHERI ASPLOS paper, which they observed commonly in ISO : yes DEFACTO - USAGE: yes DEFACTO - IMPL:
practice. yes CERBERUS - DEFACTO: yes CHERI: yes TIS: tis-
One respondent comments that the 8086 model (up to interpreter sees the possibility of signed arithmetic overflow
80286) had 16-bit near pointers (relying on segment registers (correctly so, if one assumes nothing about memory layout)
for 4 more bits) and longer far pointers, so just copying the KCC : Execution failed (unclear why)
former wouldn’t be sufficient. CDC6600 had pointers to 60- Should one be allowed to use intptr t (or uintptr t)
bit words, so character pointers were complex. Neither are arithmetic to work around provenance limitations? The next
current mainstream C. example (also pathological code) is a variant of the §2.1.1
The ISO standard leaves conversions between pointer and (p.7) provenance_basic_global_yx.c in which we use
integer types almost entirely implementation-defined (ex- integer arithmetic (and casts to and from intptr t) instead
cept for conversion of integer constant 0 and null pointers), of pointer arithmetic. The arithmetic again just happens (in
with: these implementations) to be the right offset between the two
6.3.2.3p5: “An integer may be converted to any global variables.
pointer type. Except as previously specified, the result is
E XAMPLE (provenance_basic_using_intptr_t_global_yx.c):
implementation-defined, might not be correctly aligned,
#include <stdio.h>
might not point to an entity of the referenced type, and might #include <string.h>
be a trap representation.67)” #include <stdint.h>
#include <inttypes.h>

16 2016/3/17
int y = 2, x = 1; the resulting pointer must reference the same object as the
int main() { original pointer, otherwise the behavior is undefined. That
intptr_t ux = (intptr_t)&x;
intptr_t uy = (intptr_t)&y; is, one may not use integer arithmetic to avoid the undefined
intptr_t offset = 4; behavior of pointer arithmetic as proscribed in C99 and C11
int *p = (int *)(ux + offset); 6.5.6/8.”6
int *q = &y;
printf("Addresses: &x=%"PRIiPTR" p=%p &y=%"PRIiPTR\ Note that this GCC text presumes that there is an obvious
"\n",ux,(void*)p,uy); “original pointer” associated with any integer value which is
if (memcmp(&p, &q, sizeof(p)) == 0) { cast back to a pointer; as we discuss in §2.3 (p.23), that is
*p = 11; // does this have undefined behaviour?
printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q); not always the case.
} As before, for this version of Clang we don’t see the opti-
} misation for the analogous example with the two allocations
flipped, so this is uninformative.
GCC -4.8-O2:
E XAMPLE (provenance_basic_using_intptr_t_global_xy.c):
Addresses: &x=6294464 p=0x600bc4 &y=6294468
GCC -4.8-O0:
x=1 y=2 *p=11 *q=2
Addresses: &x=6294528 p=0x600c04 &y=6294532
GCC -4.9-O2: . . . as above (modulo addresses)
x=1 y=11 *p=11 *q=11
GCC -5.3-O2: . . . as above
GCC -4.9-O0: . . . as above (modulo addresses)
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
GCC -5.3-O2:
dresses)
Addresses: &x=6294484 p=0x600bd8 &y=6294480
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
dresses)
CLANG 33-O0:
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
Addresses: &x=6294332 p=0x600b40 &y=6294336
TIS - INTERPRETER :
x=1 y=11 *p=11 *q=11
[value] Analyzing a complete application starting at
CLANG 34-O0: . . . as above
main
CLANG 35-O0: . . . as above (modulo addresses)
[value] Computing initial state
CLANG 36-O0: . . . as above
[value] Initial
state computed
CLANG 37-O0: . . . as above (modulo addresses)
provenance basic using intptr t global yx
CLANG 33-O2: . . . as above (modulo addresses)
CLANG 34-O2: . . . as above
.c:10:[kernel] warning: signed overflow. assert
CLANG 35-O2: . . . as above (modulo addresses)
-9223372036854775808 ux+offset;
CLANG 36-O2: . . . as above
CLANG 37-O2: . . . as above (modulo addresses)
stack: main
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
provenance basic using intptr t global yx.c:
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
10:[kernel] warning: signed overflow. assert ux+offset
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
9223372036854775807;
stack:
dresses)
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
main
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
[value] Stopping at nth alarm
[value] user error:
dresses)
CLANG 37-UBSAN: . . . as above (modulo addresses)
Degeneration occurred:
TIS - INTERPRETER :
results are
[value] Analyzing a complete application starting at
not correct for lines of code that can be reached from
main
the degeneration point.
[value] Computing initial state
KCC :
[value] Initial
Execution failed (configuration dumped)
state computed
ISO : undefined behaviour
provenance basic using intptr t global xy
DEFACTO : undefined behaviour
.c:10:[kernel] warning: signed overflow. assert
-9223372036854775808 ux+offset;

As before, we see that GCC seems to be assuming that


stack: main
this cannot occur, by making an optimisation that would be
provenance basic using intptr t global xy.c:
unsound if this program does not have undefined behaviour.
This is consistent with the GCC documentation, which 6 https://gcc.gnu.org/onlinedocs/gcc/

says: “When casting from pointer to integer and back again, Arrays-and-pointers-implementation.html

17 2016/3/17
10:[kernel] warning: signed overflow. assert ux+offset intptr_t ux = (intptr_t)&x;
9223372036854775807; intptr_t uy = (intptr_t)&y;
intptr_t offset = 4;
stack: int *p = (int *)(ux + offset);
main int *q = &y;
[value] Stopping at nth alarm printf("Addresses: &x=%"PRIiPTR" p=%p &y=%"PRIiPTR\
"\n",ux,(void*)p,uy);
[value] user error: if (memcmp(&p, &q, sizeof(p)) == 0) {
Degeneration occurred: *p = 11; // does this have undefined behaviour?
results are printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q);
}
not correct for lines of code that can be reached from }
the degeneration point.
KCC : GCC -4.8-O0:

Execution failed (configuration dumped) Addresses: &x=140737488349680 p=0x7fffffffe9f4


&y=140737488349684
x=1 y=11 *p=11 *q=11
GCC -4.9-O0: . . . as above
E XAMPLE (provenance_basic_using_intptr_t_global_xy_ GCC -5.3-O2:
offset64.c):
Addresses: &x=140737488349692 p=0x7fffffffea00
CLANG 37-UBSAN:
&y=140737488349688
Addresses: &x=6454464 p=0x627d00 &y=6454468
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
CLANG 37-ASAN:
dresses)
Addresses: &x=6935840 p=0x69d560 &y=6935904
CLANG 33-O0:
x=1 y=11 *p=11 *q=11
Addresses: &x=140737488349700 p=0x7fffffffea08
TIS - INTERPRETER :
&y=140737488349704
[value] Analyzing a complete application starting at
x=1 y=11 *p=11 *q=11
main
CLANG 34-O0: . . . as above
[value] Computing initial state
CLANG 35-O0: . . . as above
[value] Initial
CLANG 36-O0: . . . as above
state computed
CLANG 37-O0: . . . as above
provenance basic using intptr t global xy
CLANG 33-O2: . . . as above (modulo addresses)
offset64.c:10:[kernel] warning: signed overflow. assert
CLANG 34-O2: . . . as above
-9223372036854775808 ux+offset;
CLANG 35-O2: . . . as above
CLANG 36-O2: . . . as above
stack: main
CLANG 37-O2: . . . as above
provenance basic using intptr t global xy of
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
fset64.c:10:[kernel] warning: signed overflow. assert
dresses)
ux+offset 9223372036854775807;
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
stack: main
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
[value] Stopping at nth alarm
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
[value] user
CLANG 37-UBSAN: . . . as above (modulo addresses)
error: Degeneration occurred:
CLANG 37-ASAN:
Addresses: &x=140737488349360 p=0x7fffffffe8b4
results are not correct for lines of code that can be
&y=140737488349344
reached from the degeneration point.
TIS - INTERPRETER :
KCC :
[value] Analyzing a complete application starting at
Execution failed (configuration dumped)
main
[value] Computing initial state
For CHERI we include a variant with automatic storage [value] Initial
duration variables: state computed
E XAMPLE (provenance_basic_using_intptr_t_auto_yx.c): provenance basic using intptr t auto yx.c
#include <stdio.h> :10:[kernel] warning: signed overflow. assert
#include <string.h> -9223372036854775808 ux+offset;
#include <stdint.h>
#include <inttypes.h>
int main() { stack: main
int y = 2, x = 1; provenance basic using intptr t auto yx.c:10

18 2016/3/17
:[kernel] warning: signed overflow. assert ux+offset provenance basic using intptr t auto yx offs
9223372036854775807; et-16.c:10:[kernel] warning: signed overflow. assert
stack: ux+offset 9223372036854775807;
main
[value] Stopping at nth alarm stack: main
[value] user error: [value] Stopping at nth alarm
Degeneration occurred: [value] user
results are error: Degeneration occurred:
not correct for lines of code that can be reached from
the degeneration point. results are not correct for lines of code that can be
KCC : reached from the degeneration point.
Execution failed (configuration dumped) KCC :
ISO : undefined behaviour Execution failed (configuration dumped)
DEFACTO : undefined behaviour

E XAMPLE (provenance_basic_using_intptr_t_auto_xy.c):
E XAMPLE (provenance_basic_using_intptr_t_auto_yx_ #include <stdio.h>
offset-16.c): #include <string.h>
#include <stdint.h>
#include <stdio.h> #include <inttypes.h>
#include <string.h> int main() {
#include <stdint.h> int x = 1, y = 2;
#include <inttypes.h> intptr_t ux = (intptr_t)&x;
int main() { intptr_t uy = (intptr_t)&y;
int y = 2, x = 1; intptr_t offset = 4;
intptr_t ux = (intptr_t)&x; int *p = (int *)(ux + offset);
intptr_t uy = (intptr_t)&y; int *q = &y;
intptr_t offset = -16; printf("Addresses: &x=%"PRIiPTR" p=%p &y=%"PRIiPTR\
int *p = (int *)(ux + offset); "\n",ux,(void*)p,uy);
int *q = &y; if (memcmp(&p, &q, sizeof(p)) == 0) {
printf("Addresses: &x=%"PRIiPTR" p=%p &y=%"PRIiPTR\ *p = 11; // does this have undefined behaviour?
"\n",ux,(void*)p,uy); printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q);
if (memcmp(&p, &q, sizeof(p)) == 0) { }
*p = 11; // does this have undefined behaviour? }
printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q);
} GCC -4.8-O2:
}
Addresses: &x=140737488349688 p=0x7fffffffe9fc
CLANG 37-O0: &y=140737488349692
Addresses: &x=140737488349668 p=0x7fffffffe9d4 x=1 y=11 *p=11 *q=11
&y=140737488349672 GCC -4.9-O2: . . . as above
CLANG 37-O2: . . . as above (modulo addresses) GCC -5.3-O2: . . . as above
CLANG 37-O2- NO - STRICT- ALIASING :. . . as above (modulo ad- GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
dresses) dresses)
CLANG 37-UBSAN: . . . as above (modulo addresses) GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-ASAN: GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
Addresses: &x=140737488349328 p=0x7fffffffe880 TIS - INTERPRETER :
&y=140737488349312 [value] Analyzing a complete application starting at
x=1 y=11 *p=11 *q=11 main
TIS - INTERPRETER : [value] Computing initial state
[value] Analyzing a complete application starting at [value] Initial
main state computed
[value] Computing initial state provenance basic using intptr t auto xy.c
[value] Initial :10:[kernel] warning: signed overflow. assert
state computed -9223372036854775808 ux+offset;
provenance basic using intptr t auto yx o
ffset-16.c:10:[kernel] warning: signed overflow. assert stack: main
-9223372036854775808 ux+offset; provenance basic using intptr t auto xy.c:10
:[kernel] warning: signed overflow. assert ux+offset
stack: main 9223372036854775807;

19 2016/3/17
stack: CLANG 34-O0: . . . as above
main CLANG 35-O0: . . . as above
[value] Stopping at nth alarm CLANG 36-O0: . . . as above
[value] user error: CLANG 37-O0: . . . as above
Degeneration occurred: CLANG 33-O2: . . . as above
results are CLANG 34-O2: . . . as above
not correct for lines of code that can be reached from CLANG 35-O2: . . . as above
the degeneration point. CLANG 36-O2: . . . as above
KCC : CLANG 37-O2: . . . as above
Execution failed (configuration dumped) CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
For reference, for a similar example using two malloc’d CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
regions and a constant offset we also see similar GCC and CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
Clang results as before: GCC sometimes assumes the two CLANG 37-UBSAN: . . . as above (modulo addresses)
pointers do not alias (interestingly, only with GCC 4.9 -O2, CLANG 37-ASAN:
not GCC 4.8 -O2), while these versions of Clang do not: Addresses: xp=0x60200000eff0 p=0x60200000eff8
q=0x60200000efd0
E XAMPLE (provenance_basic_using_intptr_t_malloc_offset_ TIS - INTERPRETER :
8.c): [value] Analyzing a complete application starting at
#include <stdio.h> main
#include <string.h> [value] Computing initial state
#include <stdlib.h>
#include <inttypes.h> [value] Initial
int main() { state computed
int *xp=malloc(sizeof(int)); provenance basic using intptr t malloc of
int *yp=malloc(sizeof(int));
*xp=1; fset 8.c:6:[value] allocating variable
*yp=2; malloc main l6
int *p = (int*) (((uintptr_t)xp) + 8); provenance basic using intptr t malloc
int *q = yp;
printf("Addresses: xp=%p p=%p q=%p\n", offset 8.c:7:[value] allocating variable
(void*)xp,(void*)p,(void*)q); malloc main l7
// if (p == q) {
if (memcmp(&p, &q, sizeof(p)) == 0) {
*p = 11; // does this have undefined behaviour? Addresses:
printf("*xp=%d *yp=%d *p=%d *q=%d\n",*xp,*yp,*p,*q); xp=
} provenance basic using intptr t malloc offset 8.c:15
return 0;
} :[kernel] warning: out of bounds read. assert
\valid read((char *)(&p)+(0 .. sizeof(p)-1));
GCC -4.8-O0:
Addresses: xp=0x801417058 p=0x801417060 q=0x801417060 stack: memcmp :: provenance basic using intptr t
*xp=1 *yp=11 *p=11 *q=11 malloc offset 8.c:15 <-
GCC -4.9-O0: . . . as above
GCC -4.8-O2: . . . as above main
GCC -4.9-O2: [value] Stopping at nth alarm
Addresses: xp=0x801417058 p=0x801417060 q=0x801417060 [value] user error:
*xp=1 *yp=2 *p=11 *q=2 Degeneration occurred:
GCC -5.3-O2: . . . as above results are
GCC -4.8-O2- NO - STRICT- ALIASING : not correct for lines of code that can be reached from
Addresses: xp=0x801417058 p=0x801417060 q=0x801417060 the degeneration point.
*xp=1 *yp=11 *p=11 *q=11 KCC :
GCC -4.9-O2- NO - STRICT- ALIASING : Execution failed (configuration dumped)
Addresses: xp=0x801417058 p=0x801417060 q=0x801417060 ISO : undefined behaviour
*xp=1 *yp=2 *p=11 *q=2 DEFACTO : undefined behaviour
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O0: This matches the provenance_basic_malloc_offset+
Addresses: xp=0x801417058 p=0x801417060 q=0x801417060 8.c example of §2.1.1 (p.7), which did the arithmetic di-
*xp=1 *yp=11 *p=11 *q=11

20 2016/3/17
rectly on pointers instead of at uintptr t, and for which CLANG 34-O2: . . . as above
the optimisation was observed in GCC. CLANG 35-O2: . . . as above
CLANG 36-O2: . . . as above
2.2.4 Q6. Can one use bit manipulation and integer CLANG 37-O2: . . . as above
casts to store information in unused bits of CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
pointers? CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
U: ISO CLANG 35-O2- NO - STRICT- ALIASING : . . . as above

ISO : unclear – implementation-defined? DEFACTO - CLANG 36-O2- NO - STRICT- ALIASING : . . . as above

USAGE: yes DEFACTO - IMPL: yes CERBERUS - CLANG 37-O2- NO - STRICT- ALIASING : . . . as above

DEFACTO : yes CHERI : yes TIS : test not supported CLANG 37-UBSAN: . . . as above

( Alignof) KCC: Execution failed (unclear why) CLANG 37-ASAN: . . . as above

Now we extend the first example of §2.2.1 (p.14), that cast TIS - INTERPRETER :

a pointer to intptr t and back, to use logical operations provenance tag bits via uintptr t 1.c:10:[kernel] user
on the integer value to store some tag bits. The following error: syntax error
code exhibits a strong form of this, storing the address and [kernel] user error: stopping on
tag bit combination as a pointer (which thereby creates a
misaligned pointer value, though one not used for accesses); file "provenance tag bits via uintptr t
a weaker form would store the combined value only as an 1.c" that
integer. has errors. Add
’-kernel-msg-key pp’ for preprocessing command.
E XAMPLE (provenance_tag_bits_via_uintptr_t_1.c): [kernel]
#include <assert.h> Frama-C aborted: invalid user input.
#include <stdio.h> KCC :
#include <stdint.h>
int x=1; Execution failed (configuration dumped)
int main() { DEFACTO : defined behaviour
int *p = &x; ISO : unclear - implementation-defined?
// cast &x to an integer
uintptr_t i = (uintptr_t) p;
// check the bottom two bits of an int* are not used This idiom seems to be widely relied on in practice, and
assert(_Alignof(int) >= 4); so our de facto standard semantics should allow it, for any
assert((i & 3u) == 0u);
// construct an integer like &x with low-order bit set integer type of the right width. It is the Mask: simple masking
i = i | 1u; of pointers idiom of the CHERI ASPLOS paper, widely
// cast back to a pointer observed in practice.
int *q = (int *) i; // defined behaviour?
// cast to integer and mask out the low-order two bits Beyond just manipulating the low-order bits, Linux has
uintptr_t j = ((uintptr_t)q) & ~((uintptr_t)3u); “buddy allocators” in which one XORs some particular
// cast back to a pointer pointer bits to move inside a tree structure, within some al-
int *r = (int *) j;
// are r and p now equivalent? located region (though perhaps not made by malloc).
*r = 11; // defined behaviour? In this example there is still an obvious unique prove-
_Bool b = (r==p); nance that one can track through the integer computation; in
printf("x=%i *r=%i (r==p)=%s\n",x,*r,b?"true":"false");
} the next section we consider cases where that is not the case.
For mismatching widths, the GCC documentation7 gives
GCC -4.8-O0: a concrete algorithm for converting between integers and
x=11 *r=11 (r==p)=true pointers which gives the identity on their bit representations
GCC -4.9-O0: . . . as above in this case: “A cast from pointer to integer discards most-
GCC -4.8-O2: . . . as above significant bits if the pointer representation is larger than
GCC -4.9-O2: . . . as above the integer type, sign-extends [Footnote 1: Future versions of
GCC -5.3-O2: . . . as above GCC may zero-extend, or use a target-defined ptr extend
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above pattern. Do not rely on sign extension.] if the pointer repre-
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above sentation is smaller than the integer type, otherwise the bits
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above are unchanged.” and “A cast from integer to pointer discards
CLANG 33-O0: . . . as above most-significant bits if the pointer representation is smaller
CLANG 34-O0: . . . as above than the integer type, extends according to the signedness of
CLANG 35-O0: . . . as above
CLANG 36-O0: . . . as above
CLANG 37-O0: . . . as above 7 Section
4.7 Arrays and pointers of C Implementation-defined behavior,
CLANG 33-O2: . . . as above http://gcc.gnu.org/onlinedocs/gcc/C-Implementation.html

21 2016/3/17
the integer type if the pointer representation is larger than _Bool b = (p==q);
the integer type, otherwise the bits are unchanged.”. // can this be false even with identical addresses?
printf("(p==q) = %s\n", b?"true":"false");
It does not comment on provenance, and it also leaves return 0;
open the question of whether the implementation might }
use the low-order bits for its own purposes (making the GCC -4.8-O0:
assert((i & 3u) == 0u) of the example false). We take Addresses: p=600b60 q=600b58
this to be an omission in the GCC documentation, and as- (p==q) = false
sume implementations do not (otherwise much existing code GCC -4.9-O0: . . . as above (modulo addresses)
would break). Really, the set of unused bits of pointers of GCC -4.8-O2:
each alignment should be explicitly implementation-defined Addresses: p=600b34 q=600b34
in the standard. (p==q) = true
For mismatching widths a de facto semantic model has GCC -4.9-O2: . . . as above (modulo addresses)
to choose whether to follow this GCC documentation (loos- GCC -5.3-O2: . . . as above
ened according to the footnote and strengthened w.r.t. prove- GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
nance and unused bits), or be more nondeterministic. dresses)
This example tells us that at least the specific operations GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
on integers used here should preserve the provenance infor- dresses)
mation. The simplest proposal would be to have all integer GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
operations preserve provenance, but, as we discuss below, TIS - INTERPRETER :
that is not always appropriate. [value] Analyzing a complete application starting at
The CHERI behaviour here, failing in the assert, is quite main
subtle. The uintptr t value i is a capability. All arithmetic [value] Computing initial state
on it is done on the offset. The assert at the start is failing [value] Initial
because i & 3u first promotes 3u to intcap t (the under- state computed
lying type that uintptr t is a typedef for), which gives you
an untagged capability with base 0 and offset 3. This is then Addresses: p=
anded with i, by getting the offsets of both, anding the result provenance equality uintpt
together, and applying the offset to i. The result is therefore r t global yx.c:9:[kernel] warning: pointer comparison:
a capability with the base/length/permissions of i, but an assert \pointer comparable((void *)p, (void *)q);
offset of 0. This is then compared against a null capability,
and the comparison fails (because it is not a null capability). stack: main
The assertion seems like something that a reason- [value] Stopping at nth
able programmer ought to expect to work, so the alarm
best design is an open question at present. Without [value] user error: Degeneration occurred:
the assert, provenance_tag_bits_via_uintptr_t_1_
no_assert.c, the test works on CHERI, so, interestingly, it results are not correct for lines of code
is only code that is defensively written that will experience that can be reached from the degeneration point.
the problem. KCC :

2.2.5 Q7. Can equality testing on integers that are Execution failed (configuration dumped)
derived from pointer values be affected by their ISO : unclear - should be true when the addresses print
provenance? equal?

U: ISO
ISO : unclear (we suggest no) DEFACTO - USAGE: no?
DEFACTO - IMPL: no? (modulo Clang bug?) CERBERUS - E XAMPLE (provenance_equality_uintptr_t_global_xy.c):
DEFACTO : no CHERI : ? TIS : pointer comparable #include <stdio.h>
#include <inttypes.h>
KCC : Execution failed (unclear why)
int x=1, y=2;
int main() {
E XAMPLE (provenance_equality_uintptr_t_global_yx.c):
uintptr_t p = (uintptr_t)(&x + 1);
#include <stdio.h> uintptr_t q = (uintptr_t)&y;
#include <inttypes.h> printf("Addresses: p=%" PRIxPTR " q=%" PRIxPTR "\n",
int y=2, x=1; p,q);
int main() { _Bool b = (p==q);
uintptr_t p = (uintptr_t)(&x + 1); // can this be false even with identical addresses?
uintptr_t q = (uintptr_t)&y; printf("(p==q) = %s\n", b?"true":"false");
printf("Addresses: p=%" PRIxPTR " q=%" PRIxPTR "\n", return 0;
p,q); }

22 2016/3/17
GCC -4.8-O0: equal?
Addresses: p=600b5c q=600b5c
(p==q) = true Can this print false even when the numeric addresses are
GCC -4.9-O0: . . . as above (modulo addresses) identical? This is suggested by an example from Kreb-
GCC -5.3-O2: bers [29], as discussed in §6.11. The observed Clang ‘false’
Addresses: p=600b48 q=600b40 behaviour seems to be a compiler bug, similar to the GCC
(p==q) = false bug reported by them.
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O0:
2.3 Pointers involving multiple provenances
Addresses: p=600b08 q=600b08 We now consider examples in which a pointer is constructed
(p==q) = true using computation based on multiple pointer values. How
CLANG 34-O0: . . . as above widely this is used is not clear to us. There are at least two
CLANG 35-O0: . . . as above (modulo addresses) important examples in the wild, the Linux and FreeBSD
CLANG 36-O0: . . . as above per-CPU allocators, and also the classic XOR linked list
CLANG 37-O0: . . . as above (modulo addresses) implementation (the latter, while much-discussed, appears
CLANG 33-O2: . . . as above (modulo addresses) not to be a currently common idiom, though pointer XOR is
CLANG 34-O2: . . . as above apparently used in L4 [48, §6.2]). We discuss both below.
CLANG 35-O2: . . . as above (modulo addresses)
2.3.1 Q8. Should intra-object pointer subtraction give
CLANG 36-O2: . . . as above
provenance-free integer results?
CLANG 37-O2: . . . as above (modulo addresses)
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above This is uncontroversial:
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- ISO : yes DEFACTO - USAGE: yes DEFACTO - IMPL:
dresses) yes CERBERUS - DEFACTO: yes CHERI: yes TIS: yes
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above (third test has memcmp errors, as Q1) KCC: first tests ok,
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- later tests not supported, with Execution failed error
dresses) We begin with some simple cases. Given two pointers
CLANG 37-UBSAN: . . . as above (modulo addresses) within an array, one should certainly be able to calculate an
CLANG 37-ASAN: offset, by subtracting them, that can be used either within the
Addresses: p=69d424 q=69d460 same array or within a different array, e.g.
(p==q) = false &x([0]) + (&(x[1])-&(x[0]))
TIS - INTERPRETER : &x([0]) + (&(y[1])-&(y[0]))
[value] Analyzing a complete application starting at
main
and in full:
[value] Computing initial state E XAMPLE (provenance_multiple_1_global.c):
[value] Initial #include <stdio.h>
state computed int y[2], x[2];
int main() {
int *p = &(x[0]) + (&(x[1])-&(x[0]));
Addresses: p= *p = 11; // is this free of undefined behaviour?
provenance equality uintpt printf("x[1]=%d *p=%d\n",x[1],*p);
r t global xy.c:9:[kernel] warning: pointer comparison: return 0;
}
assert \pointer comparable((void *)p, (void *)q);
GCC -4.8-O0:

stack: main x[1]=11 *p=11


[value] Stopping at nth GCC -4.9-O0: . . . as above
alarm GCC -4.8-O2: . . . as above
[value] user error: Degeneration occurred: GCC -4.9-O2: . . . as above
GCC -5.3-O2: . . . as above

results are not correct for lines of code GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
that can be reached from the degeneration point. GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
KCC : GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above

Execution failed (configuration dumped) CLANG 33-O0: . . . as above

ISO : unclear - should be true when the addresses print CLANG 34-O0: . . . as above
CLANG 35-O0: . . . as above
CLANG 36-O0: . . . as above

23 2016/3/17
CLANG 37-O0: . . . as above CLANG 35-O2: . . . as above
CLANG 33-O2: . . . as above CLANG 36-O2: . . . as above
CLANG 34-O2: . . . as above CLANG 37-O2: . . . as above
CLANG 35-O2: . . . as above CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O2: . . . as above CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-O2: . . . as above CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above CLANG 37-UBSAN: . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above CLANG 37-ASAN: . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above TIS - INTERPRETER :
CLANG 37-UBSAN: . . . as above [value] Analyzing a complete application starting at
CLANG 37-ASAN: . . . as above main
TIS - INTERPRETER : [value] Computing initial state
[value] Analyzing a complete application starting at [value] Initial
main state computed
[value] Computing initial state
[value] Initial x[1]=11 *p=11
state computed
[value] done for function
x[1]=11 *p=11 main
KCC :
[value] done for function x[1]=11 *p=11
main DEFACTO : defined behaviour (x[1]=11 *p=11)
KCC : ISO : defined behaviour (x[1]=11 *p=11)
x[1]=11 *p=11
DEFACTO : defined behaviour (x[1]=11 *p=11) However, an offset constructed by intra-object subtrac-
ISO : defined behaviour (x[1]=11 *p=11) tion within one object should not, when added to a pointer
to a distinct object, license its use to access the first: in the
examples below, the following should not be allowed to be
used to access y[0], and we observe GCC optimising based
E XAMPLE (provenance_multiple_2_global.c): on that assumption.
#include <stdio.h>
int y[2], x[2];
&x[1] + (&y[1]-&y[1]) + 1
int main() { &x[1] + (&y[1]-&y[0]) + 0
int *p = &(x[0]) + (&(y[1])-&(y[0]));
*p = 11; // is this free of undefined behaviour? In full:
printf("x[1]=%d *p=%d\n",x[1],*p);
return 0; E XAMPLE (provenance_multiple_3_global_yx.c):
} #include <stdio.h>
#include <string.h>
GCC -4.8-O0: int y[2], x[2];
x[1]=11 *p=11 int main() {
int *p = &x[1] + (&y[1]-&y[1]) + 1;
GCC -4.9-O0: . . . as above int *q = &y[0];
GCC -4.8-O2: . . . as above printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q);
GCC -4.9-O2: . . . as above if (memcmp(&p, &q, sizeof(p)) == 0) {
*p = 11; // does this have undefined behaviour?
GCC -5.3-O2: . . . as above
printf("y[0]=%d *p=%d *q=%d\n",y[0],*p,*q);
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above }
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above return 0;
}
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O0: . . . as above GCC -4.8-O0:
CLANG 34-O0: . . . as above Addresses: p=0x600bd8 q=0x600bd8
CLANG 35-O0: . . . as above y[0]=11 *p=11 *q=11
CLANG 36-O0: . . . as above GCC -4.9-O0: . . . as above (modulo addresses)
CLANG 37-O0: . . . as above GCC -4.8-O2:
CLANG 33-O2: . . . as above Addresses: p=0x600bc0 q=0x600bc0
CLANG 34-O2: . . . as above y[0]=0 *p=11 *q=0

24 2016/3/17
GCC -4.9-O2: . . . as above (modulo addresses) Execution failed
GCC -5.3-O2: . . . as above (configuration dumped)
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- ISO : undefined behaviour
dresses) DEFACTO : undefined behaviour
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
dresses)
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
E XAMPLE (provenance_multiple_4_global_yx.c):
CLANG 33-O0:
#include <stdio.h>
Addresses: p=0x600b50 q=0x600b50 #include <string.h>
y[0]=11 *p=11 *q=11 int y[2], x[2];
CLANG 34-O0: . . . as above
int main() {
int *p = &x[1] + (&y[1]-&y[0]) + 0;
CLANG 35-O0: . . . as above (modulo addresses) int *q = &y[0];
CLANG 36-O0: . . . as above (modulo addresses) printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q);
CLANG 37-O0: . . . as above (modulo addresses) if (memcmp(&p, &q, sizeof(p)) == 0) {
*p = 11; // does this have undefined behaviour?
CLANG 33-O2: . . . as above (modulo addresses) printf("y[0]=%d *p=%d *q=%d\n",y[0],*p,*q);
CLANG 34-O2: . . . as above }
CLANG 35-O2: . . . as above (modulo addresses)
return 0;
}
CLANG 36-O2: . . . as above
CLANG 37-O2: . . . as above (modulo addresses) GCC -4.8-O0:

CLANG 33-O2- NO - STRICT- ALIASING : . . . as above Addresses: p=0x600bd8 q=0x600bd8


CLANG 34-O2- NO - STRICT- ALIASING : . . . as above y[0]=11 *p=11 *q=11
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- GCC -4.9-O0: . . . as above (modulo addresses)

dresses) GCC -4.8-O2:

CLANG 36-O2- NO - STRICT- ALIASING : . . . as above Addresses: p=0x600bc0 q=0x600bc0


CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- y[0]=0 *p=11 *q=0
dresses) GCC -4.9-O2: . . . as above (modulo addresses)

CLANG 37-UBSAN: . . . as above (modulo addresses) GCC -5.3-O2: . . . as above

CLANG 37-ASAN: . . . as above (modulo addresses) GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
TIS - INTERPRETER : dresses)
[value] Analyzing a complete application starting at GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
main dresses)
[value] Computing initial state GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
[value] Initial CLANG 33-O0:

state computed Addresses: p=0x600b50 q=0x600b50


y[0]=11 *p=11 *q=11
Addresses: p= CLANG 34-O0: . . . as above
provenance multiple 3 glob CLANG 35-O0: . . . as above (modulo addresses)
al yx.c:8:[kernel] warning: out of bounds read. assert CLANG 36-O0: . . . as above (modulo addresses)

\valid read((char *)(&p)+(0 .. sizeof(p)-1)); CLANG 37-O0: . . . as above (modulo addresses)


CLANG 33-O2: . . . as above (modulo addresses)

stack: memcmp :: provenance multiple 3 global yx. CLANG 34-O2: . . . as above

c:8 <- CLANG 35-O2: . . . as above (modulo addresses)

main CLANG 36-O2: . . . as above

[value] Stopping at CLANG 37-O2: . . . as above (modulo addresses)

nth alarm CLANG 33-O2- NO - STRICT- ALIASING : . . . as above

[value] user error: Degeneration occurred: CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-

results are not correct for lines of dresses)


code that can be reached from the degeneration point. CLANG 36-O2- NO - STRICT- ALIASING : . . . as above

KCC : CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-

Addresses: p=[sym(4 @ static(provenance multiple 3 globa dresses)


l yx.c50873d42-1401-4b46-aa21-027bf737dfe9)) + 8] CLANG 37-UBSAN: . . . as above (modulo addresses)

q=[sym(3 @ static(provenance multiple 3 global yx.c50873 CLANG 37-ASAN: . . . as above (modulo addresses)

d42-1401-4b46-aa21-027bf737dfe9)) + 0] TIS - INTERPRETER :


[value] Analyzing a complete application starting at

25 2016/3/17
main #include <inttypes.h>
[value] Computing initial state int y = 2, x=1;
int main() {
[value] Initial intptr_t ux = (intptr_t)&x;
state computed intptr_t uy = (intptr_t)&y;
intptr_t offset = uy - ux;
printf("Addresses: &x=%"PRIiPTR" &y=%"PRIiPTR\
Addresses: p= " offset=%"PRIiPTR" \n",ux,uy,offset);
provenance multiple 4 glob int *p = (int *)(ux + offset);
al yx.c:8:[kernel] warning: out of bounds read. assert int *q = &y;
if (memcmp(&p, &q, sizeof(p)) == 0) {
\valid read((char *)(&p)+(0 .. sizeof(p)-1)); *p = 11; // is this free of undefined behaviour?
printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q);
stack: memcmp :: provenance multiple 4 global yx. }
}
c:8 <-
main
[value] Stopping at GCC -4.8-O0:

nth alarm Addresses: &x=6294556 &y=6294552 offset=-4


[value] user error: Degeneration occurred: x=1 y=11 *p=11 *q=11
GCC -4.9-O0: . . . as above

results are not correct for lines of GCC -4.8-O2: . . . as above (modulo addresses)

code that can be reached from the degeneration point. GCC -4.9-O2: . . . as above (modulo addresses)
KCC : GCC -5.3-O2: . . . as above
Addresses: p=[sym(4 @ static(provenance multiple 4 globa GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
l yx.cd741e02b-ed40-47f8-aec9-5d914005a66e)) + 8] dresses)
q=[sym(3 @ static(provenance multiple 4 global yx.cd741e GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
02b-ed40-47f8-aec9-5d914005a66e)) + 0] dresses)
Execution failed GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above

(configuration dumped) CLANG 33-O0: . . . as above (modulo addresses)

ISO : undefined behaviour CLANG 34-O0: . . . as above

DEFACTO : undefined behaviour CLANG 35-O0: . . . as above (modulo addresses)


CLANG 36-O0: . . . as above
CLANG 37-O0: . . . as above (modulo addresses)
2.3.2 Q9. Can one make a usable offset between two CLANG 33-O2: . . . as above (modulo addresses)
separately allocated objects by inter-object CLANG 34-O2: . . . as above
subtraction (using either pointer or integer CLANG 35-O2: . . . as above (modulo addresses)
arithmetic), to make a usable pointer to the CLANG 36-O2: . . . as above
second by adding the offset to the first? CLANG 37-O2: . . . as above (modulo addresses)
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
U: ISO D: ISO - VS - DEFACTO
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
ISO : unclear - no? DEFACTO - USAGE: unclear (per-
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
haps Linux/FreeBSD per-CPU variables? perhaps in
sqlite?) DEFACTO - IMPL: compilers apparently assume no dresses)
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CERBERUS - DEFACTO : no CHERI : no TIS : no (fails with
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
signed overflow (correctly so, if one assumes nothing
about memory layout) KCC: no – flags UB dresses)
CLANG 37-UBSAN: . . . as above (modulo addresses)
[Question 3/15 of our What is C in practice? (Cerberus
CLANG 37-ASAN: . . . as above (modulo addresses)
survey v2)8 relates to this.]
TIS - INTERPRETER :
This is a variant of the §2.2.3 (p.16) provenance_
basic_using_intptr_t_global_yx.c in which the con- [value] Analyzing a complete application starting at
stant offset is replaced by a subtraction (here after casting main
from pointer to integer type). [value] Computing initial state
[value] Initial
E XAMPLE (pointer_offset_from_subtraction_1_global.c): state computed
#include <stdio.h> pointer offset from subtraction 1 global.
#include <string.h> c:9:[kernel] warning: signed overflow. assert
#include <stdint.h>
-9223372036854775808 uy-ux;
8 www.cl.cam.ac.uk/ stack:
~pes20/cerberus/
notes50-survey-discussion.html main

26 2016/3/17
pointer offset from subtraction 1 global.c:9:[kerne . . . as above
CLANG 36-O2- NO - STRICT- ALIASING :
l] warning: signed overflow. assert uy-ux . . . as above
CLANG 37-O2- NO - STRICT- ALIASING :
9223372036854775807; CLANG 37-UBSAN: . . . as above (modulo addresses)
stack: CLANG 37-ASAN: . . . as above (modulo addresses)
main TIS - INTERPRETER :
[value] Stopping at nth alarm [value] Analyzing a complete application starting at
[value] user error: main
Degeneration occurred: [value] Computing initial state
results are [value] Initial
not correct for lines of code that can be reached from state computed
the degeneration point. pointer offset from subtraction 1 auto.c:
KCC : 9:[kernel] warning: signed overflow. assert
Execution failed (configuration dumped) -9223372036854775808 uy-ux;
Error: UB-CEA5 stack:
Description: Computing pointer difference between two main
different objects. pointer offset from subtraction 1 auto.c:9:[kernel]
Type: Undefined behavior. warning: signed overflow. assert uy-ux
See also: 9223372036854775807;
C11 sec. 6.5.6:9, J.2:1 item 48 stack:
at main
main(pointer offset from subtraction 1 global.c:9) [value] Stopping at nth alarm
at [value] user error:
<file-scope>(<unknown>) Degeneration occurred:
ISO : unclear - no? results are
DEFACTO : used in practice but not supported in general not correct for lines of code that can be reached from
the degeneration point.
And again in an automatic-storage-duration version: KCC :
Execution failed (configuration dumped)
E XAMPLE (pointer_offset_from_subtraction_1_auto.c): Error: UB-CEA5
GCC -4.8-O0: Description: Computing pointer difference between two
Addresses: &x=140737488349684 &y=140737488349680 different objects.
offset=-4 Type: Undefined behavior.
x=1 y=11 *p=11 *q=11 See also:
GCC -4.9-O0: . . . as above C11 sec. 6.5.6:9, J.2:1 item 48
GCC -4.8-O2: . . . as above (modulo addresses) at
GCC -4.9-O2: . . . as above main(pointer offset from subtraction 1 auto.c:9)
GCC -5.3-O2: . . . as above at
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- <file-scope>(<unknown>)
dresses)
. . . as above
GCC -4.9-O2- NO - STRICT- ALIASING : We do not see the analysis and optimisation consequences
. . . as above
GCC -5.3-O2- NO - STRICT- ALIASING : seen for the previous example, so this experimental data does
CLANG 33-O0: . . . as above (modulo addresses) not force us to make this program have undefined behaviour.
CLANG 34-O0: . . . as above None of the ISO standard text, DR260, and the GCC doc-
CLANG 35-O0: . . . as above umentation discuss multiple-provenance pointers explicitly.
CLANG 36-O0: . . . as above They are consistent either with a multiple-provenance se-
CLANG 37-O0: . . . as above mantics or an aggressively single-provenance semantics that
CLANG 33-O2: . . . as above (modulo addresses) would regard this program as having undefined behaviour.
CLANG 34-O2: . . . as above In practice this idiom is used in Linux and in FreeBSD
CLANG 35-O2: . . . as above for access to variables allocated by the per-CPU alloca-
CLANG 36-O2: . . . as above tors9 . The latter precomputes partially constructed pointers
CLANG 37-O2: . . . as above for CPU-local variables. The linker creates a region for CPU
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- 0’s copy of the kernel per-CPU variables x, y, . . . . A cor-
dresses)
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above 9 FreeBSD:_DPCPU_PTR, https://github.com/freebsd/freebsd/
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above blob/master/sys/sys/pcpu.h

27 2016/3/17
responding region for each other CPU is created early in CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
the boot process, before CPU bringup. Say these start at ad- CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
dresses &x N for each CPU N. Then an array dpcpu off[N] CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
is initialised with &x N - &x 0, and to access a per-CPU CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
variable &y N. we add dpcpu off[N] and &y 0 to get &x N. CLANG 37-UBSAN: . . . as above (modulo addresses)
The point here is to optimise access to these variables. There CLANG 37-ASAN: . . . as above (modulo addresses)
are not very many of them, but they are often used in critical TIS - INTERPRETER :
paths, e.g. in scheduler context switching. [value] Analyzing a complete application starting at
The following example does essentially this, and is main
very similar to pointer_offset_from_subtraction_ [value] Computing initial state
1_global.c above. It differs in using malloc’d regions [value] Initial
rather than global variables and in doing the subtraction at state computed
unsigned char * type rather than after casting to an inte- pointer offset from subtraction 1 malloc.
ger type. c:6:[value] allocating variable
malloc main l6
E XAMPLE (pointer_offset_from_subtraction_1_malloc.c):
pointer offset from subtraction 1 mallo
#include <stdio.h> c.c:7:[value] allocating variable
#include <string.h>
malloc main l7
#include <stdlib.h>
#include <stddef.h> pointer offset from subtraction 1 mallo
int main() { c.c:10:[kernel] warning: pointer subtraction:
void *xp=malloc(sizeof(int)); // allocation P
void *yp=malloc(sizeof(int)); // allocation Q
*((int*)xp)=1; assert \base addr((unsigned char *)yp)
*((int*)yp)=2; \base addr((unsigned char *)xp);
ptrdiff_t offset=(unsigned char*)yp-(unsigned char*)xp;
// provenance ?
unsigned char *p1 = (unsigned char*)xp;// provenance P stack: main
unsigned char *p2 = p1 + offset; // provenance ? [value] Stopping at nth alarm
int *p = (int*)p2;
[value] user
int *q = (int*)yp;
printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q); error: Degeneration occurred:
if (memcmp(&p, &q, sizeof(p)) == 0) {
*p = 11; // is this free of undefined behaviour?
results are not correct for lines of code that can be
printf("*xp=%d *yp=%d *p=%d *q=%d\n",
*(int*)xp,*(int*)yp,*(int*)p,*(int*)q); reached from the degeneration point.
} KCC :
return 0;
Execution failed (configuration dumped)
}
Error: UB-CEA5
GCC -4.8-O0: Description: Computing pointer difference between two
Addresses: p=0x801417060 q=0x801417060 different objects.
*xp=1 *yp=11 *p=11 *q=11 Type: Undefined behavior.
GCC -4.9-O0: . . . as above See also:
GCC -4.8-O2: . . . as above C11 sec. 6.5.6:9, J.2:1 item 48
GCC -4.9-O2: . . . as above at
GCC -5.3-O2: . . . as above main(pointer offset from subtraction 1 malloc.c:10)
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above at
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above <file-scope>(<unknown>)
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O0: . . . as above As before, we do not see an alias-analysis-based optimisa-
CLANG 34-O0: . . . as above tion here. In previous tests we did see that for a version with
CLANG 35-O0: . . . as above a constant offset, but in this dataset we do not, as below.
CLANG 36-O0: . . . as above As usual, one should (of course) be cautious not to read too
CLANG 37-O0: . . . as above much into a lack of optimisation.
CLANG 33-O2: . . . as above
CLANG 34-O2: . . . as above E XAMPLE (pointer_offset_constant_8_malloc.c):
CLANG 35-O2: . . . as above #include <stdio.h>
CLANG 36-O2: . . . as above #include <string.h>
#include <stdlib.h>
CLANG 37-O2: . . . as above #include <stddef.h>
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above int main() {

28 2016/3/17
void *xp=malloc(sizeof(int)); // allocation P pointer offset
void *yp=malloc(sizeof(int)); // allocation Q constant 8 malloc.c:7:[value] allocating variable
*((int*)xp)=1;
*((int*)yp)=2; malloc main l7
ptrdiff_t offset = 8;
// (unsigned char*)yp - (unsigned char*)xp; Addresses:
unsigned char *p1 = (unsigned char*)xp;// provenance P
unsigned char *p2 = p1 + offset; p=
int *p = (int*)p2; pointer offset constant 8 malloc.c:17:[kernel]
int *q = (int*)yp; warning: out of bounds read. assert \valid read((char
printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q);
if (memcmp(&p, &q, sizeof(p)) == 0) { *)(&p)+(0 .. sizeof(p)-1));
*p = 11; // is this free of undefined behaviour? stack:
printf("*xp=%d *yp=%d *p=%d *q=%d\n", memcmp :: pointer offset constant 8 malloc.c:17 <-
*(int*)xp,*(int*)yp,*(int*)p,*(int*)q);
}
return 0; main
} [value] Stopping at nth
alarm
GCC -4.8-O0: [value] user error: Degeneration occurred:
Addresses: p=0x801417060 q=0x801417060
*xp=1 *yp=11 *p=11 *q=11 results are not correct for lines of code
GCC -4.9-O0: . . . as above that can be reached from the degeneration point.
GCC -4.8-O2: KCC :
Addresses: p=0x801417060 q=0x801417060 Addresses: p=[sym(0 @ allocated) + 8] q=[sym(1 @
*xp=1 *yp=2 *p=11 *q=2 allocated) + 0]
GCC -4.9-O2: . . . as above Execution failed (configuration dumped)
GCC -5.3-O2: . . . as above Error: UB-CEA1
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above Description: A pointer (or array subscript) outside the
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above bounds of an object.
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above Type: Undefined behavior.
CLANG 33-O0: See also:
Addresses: p=0x801417060 q=0x801417060 C11 sec. 6.5.6:8, J.2:1 item 46
*xp=1 *yp=11 *p=11 *q=11 at
CLANG 34-O0: . . . as above main(pointer offset constant 8 malloc.c:13)
CLANG 35-O0: . . . as above at
CLANG 36-O0: . . . as above <file-scope>(<unknown>)
CLANG 37-O0: . . . as above Error: UB-CEE3
CLANG 33-O2: . . . as above Description:
CLANG 34-O2: . . . as above Found pointer that refers outside the bounds of an
CLANG 35-O2: . . . as above object + 1.
CLANG 36-O2: . . . as above Type: Undefined behavior.
CLANG 37-O2: . . . as above See also: C11 sec.
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above 6.3.2.1:1, J.2:1 item 19
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above at
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above main(pointer offset constant 8 malloc.c:13)
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above at
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above <file-scope>(<unknown>)
CLANG 37-UBSAN: . . . as above (modulo addresses) Error: UB-CEE3
CLANG 37-ASAN: Description:
Addresses: p=0x60200000eff8 q=0x60200000efd0 Found pointer that refers outside the bounds of an
TIS - INTERPRETER : object + 1.
[value] Analyzing a complete application starting at Type: Undefined behavior.
main See also: C11 sec.
[value] Computing initial state 6.3.2.1:1, J.2:1 item 19
[value] Initial at
state computed main(pointer offset constant 8 malloc.c:14)
pointer offset constant 8 malloc.c:6:[val at
ue] allocating variable malloc main l6

29 2016/3/17
<file-scope>(<unknown>) #include <inttypes.h>
Error: UB-CEE3 int w=4, z=3, y = 2, x=1;
int main() {
Description: intptr_t ux = (intptr_t)&x;
Found pointer that refers outside the bounds of an intptr_t uy = (intptr_t)&y;
object + 1. intptr_t offsetxy = uy - ux;
intptr_t uz = (intptr_t)&z;
Type: Undefined behavior. intptr_t uw = (intptr_t)&w;
See also: C11 sec. intptr_t offsetzw = uw - uz;
6.3.2.1:1, J.2:1 item 19 printf("Addresses: &x=%"PRIiPTR" &y=%"PRIiPTR\
" offsetxy=%"PRIiPTR" \n",ux,uy,offsetxy);
at printf("Addresses: &z=%"PRIiPTR" &w=%"PRIiPTR\
main(pointer offset constant 8 malloc.c:16) " offsetzw=%"PRIiPTR" \n",uz,uw,offsetzw);
at assert(offsetzw==offsetxy);
int *p = (int *)(ux + offsetzw);
<file-scope>(<unknown>) int *q = &y;
Error: UB-CEE3 if (memcmp(&p, &q, sizeof(p)) == 0) {
Description: *p = 11; // is this free of undefined behaviour?
printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q);
Found pointer that refers outside the bounds of an }
object + 1. }
Type: Undefined behavior.
See also: C11 sec. GCC -4.8-O0:

6.3.2.1:1, J.2:1 item 19 Addresses: &x=6294892 &y=6294888 offsetxy=-4


at Addresses: &z=6294884 &w=6294880 offsetzw=-4
printf(pointer offset constant 8 malloc.c:16) x=1 y=11
at *p=11 *q=11
main(pointer offset constant 8 malloc.c:16) GCC -4.9-O0: . . . as above

at GCC -4.8-O2:

<file-scope>(<unknown>) Addresses: &x=6294792 &y=6294796 offsetxy=4


Addresses: &z=6294800 &w=6294804 offsetzw=4
x=1 y=2
2.3.3 Q10. Presuming that one can have valid pointers *p=11 *q=2
with multiple provenances, does an inter-object GCC -4.9-O2: . . . as above (modulo addresses)
pointer subtraction give a value with GCC -5.3-O2:
explicitly-unknown provenance or something Addresses: &x=6294816 &y=6294820 offsetxy=4
more specific? Addresses: &z=6294824 &w=6294828 offsetzw=4
U: ISO x=1 y=11
ISO : unclear – arguably N/A as the premise is false for *p=11 *q=11
ISO? DEFACTO - USAGE: unknown (not significant in nor- GCC -4.8-O2- NO - STRICT- ALIASING :

mal code?) DEFACTO - IMPL: n/a (multiple-provenance not Addresses: &x=6294792 &y=6294796 offsetxy=4
supported anyway?) CERBERUS - DEFACTO: no CHERI: Addresses: &z=6294800 &w=6294804 offsetzw=4
no TIS: fails with signed overflow KCC: no – flags x=1 y=2
UB *p=11 *q=2
The following example partly discriminates between the GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
choices for the provenance of the result of an inter-object dresses)
pointer subtraction (if such programs are not deemed to GCC -5.3-O2- NO - STRICT- ALIASING :

have undefined behaviour): either treating it as a value with Addresses: &x=6294816 &y=6294820 offsetxy=4
explicitly-unknown provenance or one of the other two op- Addresses: &z=6294824 &w=6294828 offsetzw=4
tions. It uses an offset calculated between z and w to move x=1 y=11
from a pointer to x to a pointer to y. GCC does seem to *p=11 *q=11
assume that p and q cannot alias, suggesting that it isn’t us- TIS - INTERPRETER :

ing the explicitly-unknown provenance and might be consis- [value] Analyzing a complete application starting at
tent with the left-provenance or union-of-provenances model main
here. [value] Computing initial state
[value] Initial
E XAMPLE (pointer_offset_from_subtraction_2_global.c):
state computed
#include <stdio.h> pointer offset from subtraction 2 global.
#include <string.h>
#include <stdint.h> c:10:[kernel] warning: signed overflow. assert
#include <assert.h> -9223372036854775808 uy-ux;

30 2016/3/17
stack: x=1 y=2 *p=11 *q=2
main GCC -4.9-O2: . . . as above
pointer offset from subtraction 2 global.c:10:[kern GCC -5.3-O2:
el] warning: signed overflow. assert uy-ux Addresses: &x=140737488349660 &y=140737488349656
9223372036854775807; offsetxy=-4
stack: Addresses: &z=140737488349652
main &w=140737488349648 offsetzw=-4
[value] Stopping at nth alarm x=1 y=11 *p=11 *q=11
[value] user error: GCC -4.8-O2- NO - STRICT- ALIASING :
Degeneration occurred: Addresses: &x=140737488349596 &y=140737488349592
results are offsetxy=-4
not correct for lines of code that can be reached from Addresses: &z=140737488349588
the degeneration point. &w=140737488349584 offsetzw=-4
KCC : x=1 y=2 *p=11 *q=2
Execution failed (configuration dumped) GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
Error: UB-CEA5 GCC -5.3-O2- NO - STRICT- ALIASING :
Description: Computing pointer difference between two Addresses: &x=140737488349612 &y=140737488349608
different objects. offsetxy=-4
Type: Undefined behavior. Addresses: &z=140737488349604
See also: &w=140737488349600 offsetzw=-4
C11 sec. 6.5.6:9, J.2:1 item 48 x=1 y=11 *p=11 *q=11
at CLANG 33-O0: . . . as above (modulo addresses)
main(pointer offset from subtraction 2 global.c:10) CLANG 34-O0: . . . as above
at CLANG 35-O0: . . . as above
<file-scope>(<unknown>) CLANG 36-O0: . . . as above
Error: UB-CEA5 CLANG 37-O0: . . . as above
Description: CLANG 33-O2: . . . as above (modulo addresses)
Computing pointer difference between two different CLANG 34-O2: . . . as above (modulo addresses)
objects. CLANG 35-O2: . . . as above
Type: Undefined behavior. CLANG 36-O2: . . . as above
See also: C11 sec. CLANG 37-O2: . . . as above
6.5.6:9, J.2:1 item 48 CLANG 33-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
at main(pointer offset from sub dresses)
traction 2 global.c:13) CLANG 34-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
at <file-scope>(<unknown>) dresses)
ISO : unclear - undefined behaviour? CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
In this dataset none of the compilers appear to optimise CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
based on reasoning about a lack of aliasing, though earlier CLANG 37-UBSAN: . . . as above (modulo addresses)
experiments (with GCC 4.6.3-14 and 4.7.2-5) did. CLANG 37-ASAN: . . . as above (modulo addresses)
An automatic storage-duration analogue: TIS - INTERPRETER :
[value] Analyzing a complete application starting at
E XAMPLE (pointer_offset_from_subtraction_2_auto.c): main
GCC -4.8-O0: [value] Computing initial state
Addresses: &x=140737488349648 &y=140737488349652 [value] Initial
offsetxy=4 state computed
Addresses: &z=140737488349656 pointer offset from subtraction 2 auto.c:
&w=140737488349660 offsetzw=4 10:[kernel] warning: signed overflow. assert
x=1 y=11 *p=11 *q=11 -9223372036854775808 uy-ux;
GCC -4.9-O0: . . . as above stack:
GCC -4.8-O2: main
Addresses: &x=140737488349644 &y=140737488349640 pointer offset from subtraction 2 auto.c:10:[kernel
offsetxy=-4 ] warning: signed overflow. assert uy-ux
Addresses: &z=140737488349636 9223372036854775807;
&w=140737488349632 offsetzw=-4

31 2016/3/17
stack: // are r and q now equivalent?
main *r = 11; // does this have defined behaviour?
_Bool b = (r==q);
[value] Stopping at nth alarm printf("x=%i y=%i *r=%i (r==p)=%s\n",x,y,*r,
[value] user error: b?"true":"false");
Degeneration occurred: }
results are
not correct for lines of code that can be reached from GCC -4.8-O0:

the degeneration point. x=1 y=11 *r=11 (r==p)=true


KCC : GCC -4.9-O0: . . . as above

Execution failed (configuration dumped) GCC -4.8-O2: . . . as above

Error: UB-CEA5 GCC -4.9-O2: . . . as above

Description: Computing pointer difference between two GCC -5.3-O2: . . . as above

different objects. GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above


Type: Undefined behavior. GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
See also: GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above

C11 sec. 6.5.6:9, J.2:1 item 48 CLANG 33-O0: . . . as above

at CLANG 34-O0: . . . as above

main(pointer offset from subtraction 2 auto.c:10) CLANG 35-O0: . . . as above

at CLANG 36-O0: . . . as above

<file-scope>(<unknown>) CLANG 37-O0: . . . as above

Error: UB-CEA5 CLANG 33-O2: . . . as above

Description: CLANG 34-O2: . . . as above

Computing pointer difference between two different CLANG 35-O2: . . . as above

objects. CLANG 36-O2: . . . as above

Type: Undefined behavior. CLANG 37-O2: . . . as above

See also: C11 sec. CLANG 33-O2- NO - STRICT- ALIASING : . . . as above

6.5.6:9, J.2:1 item 48 CLANG 34-O2- NO - STRICT- ALIASING : . . . as above

at main(pointer offset from sub CLANG 35-O2- NO - STRICT- ALIASING : . . . as above

traction 2 auto.c:13) CLANG 36-O2- NO - STRICT- ALIASING : . . . as above

at <file-scope>(<unknown>) CLANG 37-O2- NO - STRICT- ALIASING : . . . as above


CLANG 37-UBSAN: . . . as above
CLANG 37-ASAN: . . . as above
2.3.4 Q11. Is the XOR linked list idiom supported? TIS - INTERPRETER :

U: ISO U: DEFACTO [value] Analyzing a complete application starting at


ISO : unclear – no? DEFACTO - USAGE: unclear (not main
really used in practice?) DEFACTO - IMPL: unclear [value] Computing initial state
CERBERUS - DEFACTO : no CHERI : no TIS : no (fails at [value] Initial
the pointer XOR) KCC: Execution failed (unclear why) state computed
The classic XOR linked list algorithm (implementing a pointer offset xor global.c:10:[value]
doubly linked list with only one pointer per node, by stor- warning: The following sub-expression cannot be
ing the XOR of two pointers) also makes essential use of evaluated:
multiple-provenance pointers. In this example we XOR the i ^j
integer values from two pointers and XOR the result again
with one of them.
All sub-expressions with their values:
E XAMPLE (pointer_offset_xor_global.c):
#include <stdio.h> uintptr t i {{ (uintptr t)&x }}
#include <inttypes.h>
int x=1;
int y=2; uintptr t j {{ (uintptr t)&y }}
int main() {
int *p = &x;
int *q = &y;
uintptr_t i = (uintptr_t) p; Stopping
uintptr_t j = (uintptr_t) q; stack:
uintptr_t k = i ^ j;
uintptr_t l = k ^ i; main
int *r = (int *)l; [value] user error: Degeneration occurred:

32 2016/3/17
results are not correct for lines of code uintptr t i {{ (uintptr t)&x }}
that can be reached from the degeneration point.
KCC : uintptr t j {{ (uintptr t)&y }}
Execution failed (configuration dumped)
ISO : unclear - undefined behaviour?
DEFACTO : unclear - not really used in practice? Could be Stopping
defined behaviour in a multiple-provenance semantics stack:
main
It is unclear whether this algorithm is important in mod- [value] user error: Degeneration occurred:
ern practice. One respondent remarks that the XOR list im-
plementation interacts badly with modern pipelines and the results are not correct for lines of code
space saving is not a big win. that can be reached from the degeneration point.
An automatic storage duration analogue: KCC :
Execution failed (configuration dumped)
E XAMPLE (pointer_offset_xor_auto.c):
GCC -4.8-O0:
x=1 y=11 *r=11 (r==p)=true
GCC -4.9-O0: . . . as above 2.3.5 Q12. For arithmetic over provenanced integer
GCC -4.8-O2: . . . as above values, is the provenance of the result invariant
GCC -4.9-O2: . . . as above under plus/minus associativity?
GCC -5.3-O2: . . . as above U: ISO U: DEFACTO
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above ISO : unclear – we suggest yes? DEFACTO - USAGE: unclear
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above - presume yes DEFACTO - IMPL: unclear - presume yes
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above CERBERUS - DEFACTO : yes CHERI : yes for CHERI256;
CLANG 33-O0: . . . as above not always for CHERI128 TIS: no (first test ok; second test
CLANG 34-O0: . . . as above fails at the addition of pointers cast to uintptr t) KCC:
CLANG 35-O0: . . . as above test not supported (Translation failed; unclear why)
CLANG 36-O0: . . . as above Normal integer arithmetic or modular arithmetic satisfies
CLANG 37-O0: . . . as above various algebraic laws, e.g. a+(b−c) = (a+b)−c (which we
CLANG 33-O2: . . . as above call “plus/minus associativity”, in the absence of a standard
CLANG 34-O2: . . . as above name). Does that still hold for provenanced values? For C
CLANG 35-O2: . . . as above pointer arithmetic, addition of two pointers is a type error
CLANG 36-O2: . . . as above so there is no re-parenthesised variant of the §2.3.1 (p.23)
CLANG 37-O2: . . . as above examples with, e.g.
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
(&x([0]) + &(y[1]))-&(y[0])
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (in full: pointer_arith_algebraic_properties_1_
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above global.c). But in semantics in which integer values also
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above carry provenance data of some kind, we have the same
CLANG 37-UBSAN: . . . as above question for analogous examples that do the arithmetic at
CLANG 37-ASAN: . . . as above uintptr t type, e.g. asking whether the following two pro-
TIS - INTERPRETER : grams behave the same:
[value] Analyzing a complete application starting at
E XAMPLE (pointer_arith_algebraic_properties_2_global.c):
main
[value] Computing initial state #include <stdio.h>
#include <inttypes.h>
[value] Initial int y[2], x[2];
state computed int main() {
pointer offset xor auto.c:9:[value] int *p=(int*)(((uintptr_t)&(x[0])) +
(((uintptr_t)&(y[1]))-((uintptr_t)&(y[0]))));
warning: The following sub-expression cannot be *p = 11; // is this free of undefined behaviour?
evaluated: printf("x[1]=%d *p=%d\n",x[1],*p);
i ^j return 0;
}

GCC -4.8-O0:
All sub-expressions with their values: x[1]=11 *p=11

33 2016/3/17
GCC -4.9-O0: . . . as above printf("x[1]=%d *p=%d\n",x[1],*p);
GCC -4.8-O2: . . . as above return 0;
}
GCC -4.9-O2: . . . as above
GCC -5.3-O2: . . . as above
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above GCC -4.8-O0:

GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above x[1]=11 *p=11


GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above GCC -4.9-O0: . . . as above
CLANG 33-O0: . . . as above GCC -4.8-O2: . . . as above
CLANG 34-O0: . . . as above GCC -4.9-O2: . . . as above

CLANG 35-O0: . . . as above GCC -5.3-O2: . . . as above

CLANG 36-O0: . . . as above GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above


CLANG 37-O0: . . . as above GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O2: . . . as above GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above

CLANG 34-O2: . . . as above CLANG 33-O0: . . . as above

CLANG 35-O2: . . . as above CLANG 34-O0: . . . as above

CLANG 36-O2: . . . as above CLANG 35-O0: . . . as above

CLANG 37-O2: . . . as above CLANG 36-O0: . . . as above

CLANG 33-O2- NO - STRICT- ALIASING : . . . as above CLANG 37-O0: . . . as above

CLANG 34-O2- NO - STRICT- ALIASING : . . . as above CLANG 33-O2: . . . as above

CLANG 35-O2- NO - STRICT- ALIASING : . . . as above CLANG 34-O2: . . . as above

CLANG 36-O2- NO - STRICT- ALIASING : . . . as above CLANG 35-O2: . . . as above

CLANG 37-O2- NO - STRICT- ALIASING : . . . as above CLANG 36-O2: . . . as above

CLANG 37-UBSAN: . . . as above CLANG 37-O2: . . . as above

CLANG 37-ASAN: . . . as above CLANG 33-O2- NO - STRICT- ALIASING : . . . as above

TIS - INTERPRETER : CLANG 34-O2- NO - STRICT- ALIASING : . . . as above

[value] Analyzing a complete application starting at CLANG 35-O2- NO - STRICT- ALIASING : . . . as above

main CLANG 36-O2- NO - STRICT- ALIASING : . . . as above

[value] Computing initial state CLANG 37-O2- NO - STRICT- ALIASING : . . . as above

[value] Initial CLANG 37-UBSAN: . . . as above

state computed CLANG 37-ASAN: . . . as above


TIS - INTERPRETER :

x[1]=11 *p=11 [value] Analyzing a complete application starting at


main
[value] done for function [value] Computing initial state
main [value] Initial
KCC : state computed
Translation failed. Run kcc -d -o pointer arith algebraic properties 3 glob
pointer arith algebraic properties 2 global.c.kcc.out al.c:5:[value] warning: The following sub-expression
pointer arith algebraic properties 2 global.c to see cannot be evaluated:
commands run. (unsigned long)(x)
sh: 1: pointer arith algebraic properties 2 global.c.kcc + (unsigned long)(& y[1])
.out: not found
DEFACTO : defined behaviour (x[1]=11 *p=11)
ISO : defined behaviour (x[1]=11 *p=11) All sub-expressions with their values:

int * x {{ &x[0] }}
int * &
E XAMPLE (pointer_arith_algebraic_properties_3_global.c): y[1] {{ &y[1] }}
#include <stdio.h> unsigned long
#include <inttypes.h> (unsigned long)(x) {{ (unsigned long)&x }}
int y[2], x[2];
int main() {
int *p=(int*)( unsigned long (unsigned long)(& y[1]) {{
(((uintptr_t)&(x[0])) + ((uintptr_t)&(y[1]))) (unsigned long)&y[1] }}
-((uintptr_t)&(y[0])) );
*p = 11; // is this free of undefined behaviour? int 1 {1}
//(equivalent to the &x[0]+(&(y[1])-&(y[0])) version?)

34 2016/3/17
GCC -4.8-O2: . . . as above
Stopping GCC -4.9-O2: . . . as above
GCC -5.3-O2: . . . as above
stack: main GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
[value] user error: Degeneration GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
occurred: GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
results are not correct CLANG 33-O0: . . . as above
for lines of code that can be reached from the CLANG 34-O0: . . . as above
degeneration point. CLANG 35-O0: . . . as above
KCC : CLANG 36-O0: . . . as above
Translation failed. Run kcc -d -o CLANG 37-O0: . . . as above
pointer arith algebraic properties 3 global.c.kcc.out CLANG 33-O2: . . . as above
pointer arith algebraic properties 3 global.c to see CLANG 34-O2: . . . as above
commands run. CLANG 35-O2: . . . as above
sh: 1: pointer arith algebraic properties 3 global.c.kcc CLANG 36-O2: . . . as above
.out: not found CLANG 37-O2: . . . as above
DEFACTO : unclear CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
ISO : unclear CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
Analogues with automatic storage duration: pointer_ CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
arith_algebraic_properties_2_auto.cand CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
pointer_arith_algebraic_properties_3_auto.c. CLANG 37-UBSAN: . . . as above
CLANG 37-ASAN: . . . as above
2.3.6 Multiple provenance semantics summarised
TIS - INTERPRETER :
2.4 Pointer provenance via pointer representation [value] Analyzing a complete application starting at
copying main
C permits the representation bytes of objects to be accessed, [value] Computing initial state
via unsigned char pointers, so whenever we introduce ab- [value] Initial
stract values we have to consider the semantics of reading state computed
and writing of the associated representation bytes. In partic-
ular, we have to consider when manipulation of pointer value *p=11 *q=11
representations produces usable pointers, and with what at-
tached provenance. [value] done for function
main
2.4.1 Q13. Can one make a usable copy of a pointer by KCC :
copying its representation bytes using the library Execution failed (configuration dumped)
memcpy? DEFACTO : defined behaviour (*p=11 *q=11)
ISO : defined behaviour (*p=11 *q=11)
ISO : yes (not made explicit in ISO, but surely intended
to be yes) DEFACTO - USAGE: yes DEFACTO - IMPL: yes This should be allowed in both de facto and ISO semantics.
CERBERUS - DEFACTO : yes CHERI : yes TIS : yes KCC :
Execution failed (unclear why) 2.4.2 Q14. Can one make a usable copy of a pointer by
copying its representation bytes (unchanged) in
E XAMPLE (pointer_copy_memcpy.c): user code?
#include <stdio.h>
#include <string.h>
U: ISO
int x=1; ISO : not explicitly addressed in ISO – we suggest
int main() { yes DEFACTO - USAGE: yes DEFACTO - IMPL: yes
int *p = &x;
CERBERUS - DEFACTO : yes CHERI : not always TIS : yes
int *q;
memcpy (&q, &p, sizeof p); KCC : Execution failed (unclear why)
*q = 11; // is this free of undefined behaviour?
printf("*p=%d *q=%d\n",*p,*q); E XAMPLE (pointer_copy_user_dataflow_direct_bytewise.c):
}
#include <stdio.h>
GCC -4.8-O0: #include <string.h>
int x=1;
*p=11 *q=11 void user_memcpy(unsigned char* dest,
GCC -4.9-O0: . . . as above unsigned char *src, size_t n) {

35 2016/3/17
while (n > 0) { This should also certainly be allowed in the de facto seman-
*dest = *src; tics. People do reimplement memcpy, and we believe this
src += 1;
dest += 1; works on most compilers and hardware.
n -= 1; The exceptions we are aware of are capability machines
} such as CHERI or IBM system 38 and descendents. In
}
int main() { CHERI you have to copy pointers at pointer types for it to
int *p = &x; work properly, but capability loads and stores can operate
int *q; generically, because the capability registers have tag bits.
user_memcpy((unsigned char*)&q, (unsigned char*)&p,
sizeof(p)); There is also some new tagged memory support for Oracle
*q = 11; // is this free of undefined behaviour? Sparc, to find invalid pointers.
printf("*p=%d *q=%d\n",*p,*q); Real memcpy implementations can be more complex. The
}
glibc memcpy10 involves copying byte-by-byte, as above,
and also word-by-word and, using virtual memory manipu-
GCC -4.8-O0: lation, page-by-page. Word-by-word copying is not permit-
*p=11 *q=11 ted by the ISO standard, as it violates the effective type rules,
GCC -4.9-O0: . . . as above but should be permitted by our de facto semantics. Virtual
GCC -4.8-O2: . . . as above memory manipulation is outside our scope at present.
GCC -4.9-O2: . . . as above
GCC -5.3-O2: . . . as above 2.4.3 Q15. Can one make a usable copy of a pointer by
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above copying its representation bytes by user code that
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above indirectly computes the identity function on
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above those bytes?
CLANG 33-O0: . . . as above U: ISO D: ISO - VS - DEFACTO
CLANG 34-O0: . . . as above ISO : unclear DEFACTO - USAGE: yes DEFACTO - IMPL:
CLANG 35-O0: . . . as above yes (presumably...) CERBERUS - DEFACTO: yes CHERI:
CLANG 36-O0: . . . as above no TIS: no (fails at the XOR of a pointer representation
CLANG 37-O0: . . . as above byte) KCC: Execution failed (unclear why)
CLANG 33-O2: . . . as above [Question 5/15 of our What is C in practice? (Cerberus
CLANG 34-O2: . . . as above survey v2)11 relates to this.]
CLANG 35-O2: . . . as above For example, suppose one reads the bytes of a pointer rep-
CLANG 36-O2: . . . as above resentation pointing to some object, encrypts them, decrypts
CLANG 37-O2: . . . as above them, store them as the representation of another pointer
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above value, and tries to access the object. The following code is
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above a simplified version of this, just using a XOR twice; one
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above should imagine a more complex transform, with the trans-
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above form and its inverse separated in the code and in time so that
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above the compiler cannot analyse them.
CLANG 37-UBSAN: . . . as above
CLANG 37-ASAN: . . . as above E XAMPLE (pointer_copy_user_dataflow_indirect_bytewise.
TIS - INTERPRETER : c):
[value] Analyzing a complete application starting at #include <stdio.h>
main #include <string.h>
int x=1;
[value] Computing initial state void user_memcpy2(unsigned char* dest,
[value] Initial unsigned char *src, size_t n) {
state computed while (n > 0) {
*dest = ((*src) ^ 1) ^ 1;
src += 1;
*p=11 *q=11 dest += 1;
n -= 1;
}
[value] done for function }
main int main() {
KCC : int *p = &x;
Execution failed (configuration dumped) 10 https://sourceware.org/git/?p=glibc.git;a=blob;f=
DEFACTO : defined behaviour (*p=11 *q=11) string/memcpy.c;hb=HEAD
ISO : defined behaviour (*p=11 *q=11) 11 www.cl.cam.ac.uk/ pes20/cerberus/
~
notes50-survey-discussion.html

36 2016/3/17
int *q; (origin:
user_memcpy2((unsigned char*)&q, (unsigned char*)&p, Misaligned
sizeof(p));
*q = 11; // is this free of undefined behaviour?
printf("*p=%d *q=%d\n",*p,*q); {pointer copy user dataflow indirect bytewise.c:7}) }}
}
unsigned char * src {{ (unsigned
GCC -4.8-O0: char *)&p }}
*p=11 *q=11 int 1 {1}
GCC -4.9-O0: . . . as above
GCC -4.8-O2: . . . as above
GCC -4.9-O2: . . . as above Stopping
GCC -5.3-O2: . . . as above stack:
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above user memcpy2 :: pointer copy user dataflow indirect byte
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above wise.c:16 <-
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above main
CLANG 33-O0: . . . as above [value] user
CLANG 34-O0: . . . as above error: Degeneration occurred:
CLANG 35-O0: . . . as above
CLANG 36-O0: . . . as above results are not correct for lines of code that can be
CLANG 37-O0: . . . as above reached from the degeneration point.
CLANG 33-O2: . . . as above KCC :
CLANG 34-O2: . . . as above Execution failed (configuration dumped)
CLANG 35-O2: . . . as above DEFACTO : unclear (*p=11 *q=11)
CLANG 36-O2: . . . as above ISO : unclear (probably undefined behaviour?)
CLANG 37-O2: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above It is unclear whether this needs to be or can be allowed.
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above Pages can and do get encrypted and compressed to disc, and
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above a C semantics that dealt with virtual memory would have
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above to support that, but it is not visible from normal C. One
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above would not do this by tracking provenance via the disc, in any
CLANG 37-UBSAN: . . . as above case, but instead more like our pointer IO semantics (§2.6,
CLANG 37-ASAN: . . . as above p.44): arbitrary (legal...) pointer values can be read in, and
TIS - INTERPRETER : the point is that the compiler has to know that it does not
[value] Analyzing a complete application starting at know anything about them. People do sometimes do user-
main space paging, e.g. in user-space collection classes, but it is
[value] Computing initial state not mainstream.
[value] Initial In CHERI you cannot copy pointers in this way, and
state computed they haven’t yet found code that does this. (If you were
pointer copy user dataflow indirect bytew copying int-by-int, it would be using the capability-aware
ise.c:7:[value] warning: The following sub-expression instructions, so it would work.) This suggests that we could
cannot be evaluated: deem this undefined in the de facto standard, though they
(int)*src ^1 have not tried very much code yet.
As for the ISO standard semantics, DR260 is reasonably
clear that the first of the three examples is allowed, writing
All sub-expressions with “Note that using assignment or bitwise copying via memcpy
their values: or memmove of a determinate value makes the destination
int (int)*src {{ acquire the same determinate value.”. For the second and
garbled mix of &{x} third, DR260 is ambiguous: one could read its special treat-
ment of memcpy and memmove, coupled with its “[an imple-
(origin: Misaligned mentation] may also treat pointers based on different origins
as distinct even though they are bitwise identical” as imply-
{pointer copy user dataflow indirect bytewise.c:7}) }} ing that these have undefined behaviour. On the other hand,
the standard’s 6.5p6 text on effective types suggests that at
unsigned char *src {{ garbled mix of least user memcpy (though perhaps not user memcpy2) can
&{x}

37 2016/3/17
copy values of any effective type, including pointers: “[...] If GCC -5.3-O2: . . . as above
a value is copied into an object having no declared type us- GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
ing memcpy or memmove, or is copied as an array of char- GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
acter type, then the effective type of the modified object for GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
that access and for subsequent accesses that do not modify CLANG 33-O0:
the value is the effective type of the object from which the pointer copy user ctrlflow bytewise.c:266:1: warning:
value is copied, if it has one. [...]” (bold emphasis added). control may reach end of non-void function
[-Wreturn-type]
2.4.4 Q16. Can one carry provenance through }
dataflow alone or also through control flow? ^
U: ISO U: DEFACTO 1 warning generated.
ISO : unclear DEFACTO - USAGE: unclear (not used in *p=11 *q=11
normal code?) DEFACTO - IMPL: unclear CERBERUS - CLANG 34-O0: . . . as above

DEFACTO : no CHERI : no TIS : no (fails at the switch on CLANG 35-O0: . . . as above

a pointer representation byte or bit access – intentionally so, CLANG 36-O0: . . . as above
given that this introduces nondeterminism) KCC: Execu- CLANG 37-O0: . . . as above
tion failed (unclear why) CLANG 33-O2: . . . as above

Our provenance examples so far have all only involved CLANG 34-O2: . . . as above

dataflow; we also have to ask if a usable pointer can be CLANG 35-O2: . . . as above

constructed via non-dataflow control-flow paths. CLANG 36-O2: . . . as above

For example, consider a version of the previous indirect CLANG 37-O2: . . . as above

memcpy example (§2.4.3, p.36) with a control-flow choice on CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
the value of the bytes: CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
E XAMPLE (pointer_copy_user_ctrlflow_bytewise.c): CLANG 36-O2- NO - STRICT- ALIASING : . . . as above

#include <stdio.h> CLANG 37-O2- NO - STRICT- ALIASING : . . . as above


#include <string.h> CLANG 37-UBSAN: . . . as above
#include <assert.h>
CLANG 37-ASAN: . . . as above
#include <limits.h>
int x=1; TIS - INTERPRETER :
unsigned char control_flow_copy(unsigned char c) { pointer copy user ctrlflow bytewise.c:264:[kernel]
assert(UCHAR_MAX==255);
switch (c) { warning: Body of function control flow copy
case 0: return(0); falls-through. Adding a return statement
case 1: return(1); [value]
case 2: return(2);
... Analyzing a complete application starting at
case 255: return(255); main
} [value] Computing initial state
}
void user_memcpy2(unsigned char* dest, [value] Initial
unsigned char *src, size_t n) { state computed
while (n > 0) { pointer copy user ctrlflow bytewise.c:8:[
*dest = control_flow_copy(*src);
src += 1; kernel] warning: pointer comparison: assert
dest += 1; \pointer comparable((void *)0, (void *)((int)c));
n -= 1;
}
} stack: control flow copy ::
int main() { pointer copy user ctrlflow bytewise.c:271 <-
int *p = &x;
int *q;
user_memcpy2((unsigned char*)&q, (unsigned char*)&p, user memcpy2 ::
sizeof(p)); pointer copy user ctrlflow bytewise.c:280 <-
*q = 11; // is this free of undefined behaviour?
printf("*p=%d *q=%d\n",*p,*q);
} main
[value] Stopping at nth alarm
GCC -4.8-O0: [value]
*p=11 *q=11 user error: Degeneration occurred:
GCC -4.9-O0: . . . as above
GCC -4.8-O2: . . . as above results are not correct for lines of code that can be
GCC -4.9-O2: . . . as above

38 2016/3/17
reached from the degeneration point. 1 warning
KCC : generated.
Execution failed (configuration dumped) *p=11 *q=11
DEFACTO : undefined behaviour CLANG 36-O0: . . . as above
ISO : unclear (probably undefined behaviour?) CLANG 37-O0: . . . as above
CLANG 33-O2:
Similarly, one can imagine copying a pointer via pointer copy user ctrlflow bitwise.c:17:9: warning:
uintptr t bit-by-bit via a control-flow choice for explicitly assigning a variable of type ’uintptr t’ (aka
each bit (adapting provenance_basic_using_intptr_ ’unsigned long’) to itself [-Wself-assign]
t_global_yx.c from §2.2.3 (p.16)): j = j;

E XAMPLE (pointer_copy_user_ctrlflow_bitwise.c):
^
#include <stdio.h> 1 warning generated.
#include <inttypes.h>
#include <limits.h> *p=11 *q=11
int x=1; CLANG 34-O2: . . . as above
int main() { CLANG 35-O2:
int *p = &x;
uintptr_t i = (uintptr_t)p; pointer copy user ctrlflow bitwise.c:17:9: warning:
int uintptr_t_width = sizeof(uintptr_t) * CHAR_BIT; explicitly assigning value of variable of type
uintptr_t bit, j; ’uintptr t’ (aka ’unsigned long’) to itself
int k;
j=0; [-Wself-assign]
for (k=0; k<uintptr_t_width; k++) { j = j;
bit = (i & (((uintptr_t)1) << k)) >> k; ^
if (bit == 1)
j = j | ((uintptr_t)1 << k); 1 warning
else generated.
j = j; *p=11 *q=11
}
int *q = (int *)j; CLANG 36-O2: . . . as above
*q = 11; // is this free of undefined behaviour? CLANG 37-O2: . . . as above
printf("*p=%d *q=%d\n",*p,*q);
CLANG 33-O2- NO - STRICT- ALIASING :
}
pointer copy user ctrlflow bitwise.c:17:9: warning:
GCC -4.8-O0: explicitly assigning a variable of type ’uintptr t’ (aka
*p=11 *q=11 ’unsigned long’) to itself [-Wself-assign]
GCC -4.9-O0: . . . as above j = j;
GCC -4.8-O2: . . . as above
GCC -4.9-O2: . . . as above ^
GCC -5.3-O2: . . . as above 1 warning generated.
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above *p=11 *q=11
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above CLANG 35-O2- NO - STRICT- ALIASING :
CLANG 33-O0: pointer copy user ctrlflow bitwise.c:17:9: warning:
pointer copy user ctrlflow bitwise.c:17:9: warning: explicitly assigning value of variable of type
explicitly assigning a variable of type ’uintptr t’ (aka ’uintptr t’ (aka ’unsigned long’) to itself
’unsigned long’) to itself [-Wself-assign] [-Wself-assign]
j = j; j = j;
^
^ 1 warning
1 warning generated. generated.
*p=11 *q=11 *p=11 *q=11
CLANG 34-O0: . . . as above CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O0: CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
pointer copy user ctrlflow bitwise.c:17:9: warning: CLANG 37-UBSAN: . . . as above
explicitly assigning value of variable of type CLANG 37-ASAN: . . . as above
’uintptr t’ (aka ’unsigned long’) to itself TIS - INTERPRETER :
[-Wself-assign] [value] Analyzing a complete application starting at
j = j; main
^

39 2016/3/17
[value] Computing initial state GCC -4.9-O0: . . . as above
[value] Initial GCC -4.8-O2: . . . as above
state computed GCC -4.9-O2: . . . as above
pointer copy user ctrlflow bitwise.c:13:[ GCC -5.3-O2: . . . as above
value] warning: The following sub-expression cannot be GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
evaluated: GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
i & ((unsigned long)1 << k) GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O0: . . . as above
CLANG 34-O0: . . . as above
All sub-expressions CLANG 35-O0: . . . as above
with their values: CLANG 36-O0: . . . as above
unsigned long CLANG 37-O0: . . . as above
(unsigned long)1 {1} CLANG 33-O2: . . . as above
unsigned long CLANG 34-O2: . . . as above
(unsigned long)1 << k {1} CLANG 35-O2: . . . as above
uintptr t CLANG 36-O2: . . . as above
i {{ (uintptr t)&x }} CLANG 37-O2: . . . as above
int k CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
{0} CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
int 1 {1} CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
Stopping CLANG 37-UBSAN: . . . as above
stack: CLANG 37-ASAN: . . . as above
main TIS - INTERPRETER :
[value] user error: Degeneration occurred: [value] Analyzing a complete application starting at
main
results are not correct for lines of code [value] Computing initial state
that can be reached from the degeneration point. [value] Initial
KCC : state computed
Execution failed (configuration dumped) pointer copy user dataflow direct bitwise
DEFACTO : undefined behaviour .c:13:[value] warning: The following sub-expression
ISO : unclear (probably undefined behaviour?) cannot be evaluated:
i & ((unsigned
as opposed to a similar bitwise example with a dataflow path long)1 << k)
for each bit:
All
E XAMPLE (pointer_copy_user_dataflow_direct_bitwise.c):
sub-expressions with their values:
#include <stdio.h>
#include <inttypes.h>
unsigned long (unsigned long)1 {1}
#include <limits.h>
int x=1;
int main() { unsigned long (unsigned long)1 << k {1}
int *p = &x;
uintptr_t i = (uintptr_t)p;
int uintptr_t_width = sizeof(uintptr_t) * CHAR_BIT; uintptr t i {{ (uintptr t)&x }}
uintptr_t bit, j;
int k;
int k {0}
j=0;
for (k=0; k<uintptr_t_width; k++) { int 1 {1}
bit = (i & (((uintptr_t)1) << k)) >> k;
j = j | (bit << k);
}
int *q = (int *)j; Stopping
*q = 11; // is this free of undefined behaviour?
printf("*p=%d *q=%d\n",*p,*q);
stack: main
}
[value] user error: Degeneration occurred:
GCC -4.8-O0:
*p=11 *q=11

40 2016/3/17
results are not correct for lines of :14:[kernel] warning: signed overflow. assert ux+offset
code that can be reached from the degeneration point. 9223372036854775807;
KCC : stack:
Execution failed (configuration dumped) main
DEFACTO : defined behaviour [value] Stopping at nth alarm
ISO : unclear (probably undefined behaviour?) [value] user error:
Degeneration occurred:
Finally, contrasting with the first two examples above, results are
that recover all the concrete value information of not correct for lines of code that can be reached from
the original pointer, we can consider a variant of the degeneration point.
the §2.1.1 (p.7) provenance_basic_using_intptr_t_ KCC :
global_yx.c example in which there is a control-flow Execution failed (configuration dumped)
choice based on partial information of the intended target DEFACTO : undefined behaviour
pointer (here just whether q is null) and the concrete value ISO : unclear (probably undefined behaviour?)
information is obtained otherwise:

E XAMPLE (provenance_basic_mixed_global_offset+4.c):
#include <stdio.h> E XAMPLE (provenance_basic_mixed_global_offset-4.c):
#include <string.h> #include <stdio.h>
#include <stdint.h> #include <string.h>
#include <inttypes.h> #include <stdint.h>
int y = 2, x=1; #include <inttypes.h>
int main() { int y = 2, x=1;
intptr_t ux = (intptr_t)&x; int main() {
intptr_t uy = (intptr_t)&y; intptr_t ux = (intptr_t)&x;
intptr_t offset = 4; intptr_t uy = (intptr_t)&y;
printf("Addresses: &x=%"PRIiPTR" &y=%"PRIiPTR\ intptr_t offset = -4;
"\n",ux,uy); printf("Addresses: &x=%"PRIiPTR" &y=%"PRIiPTR\
int *q = &y; "\n",ux,uy);
if (q != NULL) { int *q = &y;
int *p = (int *)(ux + offset); if (q != NULL) {
if (memcmp(&p, &q, sizeof(p)) == 0) { int *p = (int *)(ux + offset);
*p = 11; // is this free of undefined behaviour? if (memcmp(&p, &q, sizeof(p)) == 0) {
printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q); *p = 11; // is this free of undefined behaviour?
} printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q);
} }
} }
}
GCC -4.8-O2:
Addresses: &x=6294440 &y=6294444 GCC -4.8-O0:
x=1 y=2 *p=11 *q=2 Addresses: &x=6294516 &y=6294512
GCC -5.3-O2: . . . as above (modulo addresses) x=1 y=11 *p=11 *q=11
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- GCC -4.9-O0: . . . as above (modulo addresses)
dresses) GCC -5.3-O2:
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- Addresses: &x=6294456 &y=6294460
dresses) GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
TIS - INTERPRETER : CLANG 33-O0:
[value] Analyzing a complete application starting at Addresses: &x=6294344 &y=6294340
main x=1 y=11 *p=11 *q=11
[value] Computing initial state CLANG 34-O0: . . . as above
[value] Initial CLANG 35-O0: . . . as above (modulo addresses)
state computed CLANG 36-O0: . . . as above
CLANG 37-O0: . . . as above (modulo addresses)
Addresses: &x= CLANG 33-O2: . . . as above (modulo addresses)
provenance basic mixed gl CLANG 34-O2: . . . as above
obal offset+4.c:14:[kernel] warning: signed overflow. CLANG 35-O2: . . . as above (modulo addresses)
assert -9223372036854775808 ux+offset; CLANG 36-O2: . . . as above
CLANG 37-O2: . . . as above (modulo addresses)
stack: main CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
provenance basic mixed global offset+4.c CLANG 34-O2- NO - STRICT- ALIASING : . . . as above

41 2016/3/17
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- union of that type, and then reading from a member of the
dresses) union of pointer type.
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above E XAMPLE (provenance_union_punning_1_global.c):
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
#include <stdio.h>
dresses) #include <string.h>
CLANG 37-UBSAN: . . . as above (modulo addresses) #include <inttypes.h>
TIS - INTERPRETER :
int x=1;
typedef union { uintptr_t ui; int *p; } un;
[value] Analyzing a complete application starting at int main() {
main un u;
[value] Computing initial state int *px = &x;
uintptr_t i = (uintptr_t)px;
[value] Initial u.ui = i;
state computed int *p = u.p;
printf("Addresses: p=%p &x=%p\n",(void*)p,(void*)&x);
*p = 11; // is this free of undefined behaviour?
Addresses: &x= printf("x=%d *p=%d\n",x,*p);
provenance basic mixed gl return 0;
obal offset-4.c:14:[kernel] warning: signed overflow. }
assert -9223372036854775808 ux+offset;
GCC -4.8-O0:

stack: main Addresses: p=0x600b48 &x=0x600b48


provenance basic mixed global offset-4.c x=11 *p=11
:14:[kernel] warning: signed overflow. assert ux+offset GCC -4.9-O0: . . . as above (modulo addresses)

9223372036854775807; GCC -4.8-O2: . . . as above (modulo addresses)

stack: GCC -4.9-O2: . . . as above (modulo addresses)

main GCC -5.3-O2: . . . as above

[value] Stopping at nth alarm GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
[value] user error: dresses)
Degeneration occurred: GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
results are dresses)
not correct for lines of code that can be reached from . . . as above
GCC -5.3-O2- NO - STRICT- ALIASING :

the degeneration point. CLANG 33-O0: . . . as above (modulo addresses)


KCC : CLANG 34-O0: . . . as above

Execution failed (configuration dumped) CLANG 35-O0: . . . as above (modulo addresses)


CLANG 36-O0: . . . as above

The test suite also includes variant provenance_basic_ CLANG 37-O0: . . . as above (modulo addresses)

mixed_global_offset-64.c and, with automatic stor- CLANG 33-O2: . . . as above (modulo addresses)

age duration: provenance_basic_mixed_auto_offset+ CLANG 34-O2: . . . as above

4.c, provenance_basic_mixed_auto_offset-4.c, and CLANG 35-O2: . . . as above (modulo addresses)

provenance_basic_mixed_auto_offset-64.c. CLANG 36-O2: . . . as above


CLANG 37-O2: . . . as above (modulo addresses)
2.5 Pointer provenance and union type punning CLANG 33-O2- NO - STRICT- ALIASING : . . . as above

Type punning via unions, as discused in §2.15.4 (p.80), gives CLANG 34-O2- NO - STRICT- ALIASING : . . . as above

an additional way of constructing pointer values, and so we CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-

have to consider how that interacts with the pointer prove- dresses)
nance semantics. CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
2.5.1 Q17. Is type punning between integer and dresses)
pointer values allowed? CLANG 37-UBSAN: . . . as above (modulo addresses)
U: ISO U: DEFACTO CLANG 37-ASAN: . . . as above (modulo addresses)
ISO : unclear DEFACTO - USAGE: unclear – impl-def or yes? TIS - INTERPRETER :
DEFACTO - IMPL: unclear – impl-def or yes? CERBERUS - [value] Analyzing a complete application starting at
DEFACTO : yes CHERI : yes TIS: yes KCC : yes main
The following example (analogous to the roundtrip- [value] Computing initial state
via-uintptr t example provenance_roundtrip_via_ [value] Initial
intptr_t.c of §2.2.1 (p.14)) constructs a pointer by cast- state computed
ing a pointer to uintptr t, storing that in a member of a

42 2016/3/17
Addresses: p= *p = 11; // does this have undefined behaviour?
printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q);
}
x=11 *p=11 return 0;
}
[value] done
GCC -4.8-O2:
for function main
Addresses: p=0x600ba4 q=0x600ba4
KCC :
x=1 y=2 *p=11 *q=2
Addresses: p=[sym(3 @ static(provenance union punning 1
GCC -4.9-O2: . . . as above (modulo addresses)
global.c07190f9a-6d2a-4cad-bff6-63d1398041f4)) + 0]
GCC -5.3-O2: . . . as above
&x=[sym(3 @ static(provenance union punning 1 global.c07
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
190f9a-6d2a-4cad-bff6-63d1398041f4)) + 0]
dresses)
x=11 *p=11
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
DEFACTO : implementation-defined
dresses)
ISO : unclear
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
TIS - INTERPRETER :
It is unclear whether this should be guaranteed to work. [value] Analyzing a complete application starting at
The ISO standard (see §2.15.4, p.80) says “the appropriate main
part of the object representation of the value is reinterpreted [value] Computing initial state
as an object representation in the new type”, but says lit- [value] Initial
tle about that reinterpretation. In GCC and Clang it appears state computed
to: the above prints x=11 *p=11 suggesting that there the
two types do have compatible representations, at least. What Addresses: p=
alias analysis might be assuming about this situation is un- provenance union punning 2
clear to us. global yx.c:15:[kernel] warning: out of bounds read.
One systems researcher said that it is fairly common assert \valid read((char *)(&p)+(0 .. sizeof(p)-1));
for implementations to satisfy this and for programmers to
exploit it, though more hygienic C would include an explicit stack: memcmp ::
cast. provenance union punning 2 global yx.c:15 <-
2.5.2 Q18. Does type punning between integer and
pointer values preserve provenance? main
[value] Stopping at nth alarm
U: ISO
[value]
ISO : unclear DEFACTO - USAGE: presume yes
user error: Degeneration occurred:
DEFACTO - IMPL: presume yes CERBERUS - DEFACTO :
yes CHERI: yes TIS: example not supported (memcmp
results are not correct for lines of code that can be
of pointer representations) KCC: Execution failed (unclear
reached from the degeneration point.
why)
KCC :
For consistency with the rest of the provenance-tracking
Execution failed (configuration dumped)
semantics, we imagine that at least the following exam-
ISO : unclear
ple (analogous to the pathological provenance_basic_
DEFACTO : undefined behaviour
global_yx.c of §2.1.1 (p.7) but indirected via type pun-
ning) should have undefined behaviour:
E XAMPLE (provenance_union_punning_2_global_yx.c):
E XAMPLE (provenance_union_punning_2_global_xy.c):
#include <stdio.h>
#include <string.h> GCC -4.8-O0:
#include <inttypes.h> Addresses: p=0x600bec q=0x600bec
int y=2, x=1; x=1 y=11 *p=11 *q=11
typedef union { uintptr_t ui; int *p; } un;
int main() { GCC -4.9-O0: . . . as above
un u; GCC -5.3-O2:
int *px = &x; Addresses: p=0x600bb8 q=0x600bb0
uintptr_t i = (uintptr_t)px;
i = i + sizeof(int); GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
u.ui = i; CLANG 33-O0:
int *p = u.p; Addresses: p=0x600b40 q=0x600b40
int *q = &y;
printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q); x=1 y=11 *p=11 *q=11
if (memcmp(&p, &q, sizeof(p)) == 0) { CLANG 34-O0: . . . as above

43 2016/3/17
CLANG 35-O0: . . . as above (modulo addresses) x=1 y=11 *p=11 *q=11
CLANG 36-O0: . . . as above (modulo addresses) CLANG 34-O0: . . . as above
CLANG 37-O0: . . . as above (modulo addresses) CLANG 35-O0: . . . as above (modulo addresses)
CLANG 33-O2: . . . as above (modulo addresses) CLANG 36-O0: . . . as above (modulo addresses)
CLANG 34-O2: . . . as above CLANG 37-O0: . . . as above (modulo addresses)
CLANG 35-O2: . . . as above (modulo addresses) CLANG 33-O2: . . . as above (modulo addresses)
CLANG 36-O2: . . . as above CLANG 34-O2: . . . as above
CLANG 37-O2: . . . as above (modulo addresses) CLANG 35-O2: . . . as above (modulo addresses)
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above CLANG 36-O2: . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above CLANG 37-O2: . . . as above (modulo addresses)
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
dresses) CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- dresses)
dresses) CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-UBSAN: . . . as above (modulo addresses) CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
CLANG 37-ASAN: dresses)
Addresses: p=0x69d5c4 q=0x69d600 CLANG 37-UBSAN: . . . as above (modulo addresses)
TIS - INTERPRETER : CLANG 37-ASAN:
[value] Analyzing a complete application starting at Addresses: p=0x69d5c4 q=0x69d600
main TIS - INTERPRETER :
[value] Computing initial state [value] Analyzing a complete application starting at
[value] Initial main
state computed [value] Computing initial state
[value] Initial
Addresses: p= state computed
provenance union punning 2
global xy.c:15:[kernel] warning: out of bounds read. Addresses: p=
assert \valid read((char *)(&p)+(0 .. sizeof(p)-1)); provenance union punning 2
auto xy.c:15:[kernel] warning: out of bounds read.
stack: memcmp :: assert \valid read((char *)(&p)+(0 .. sizeof(p)-1));
provenance union punning 2 global xy.c:15 <-
stack: memcmp ::
main provenance union punning 2 auto xy.c:15 <-
[value] Stopping at nth alarm
[value] main
user error: Degeneration occurred: [value] Stopping at nth alarm
[value]
results are not correct for lines of code that can be user error: Degeneration occurred:
reached from the degeneration point.
KCC : results are not correct for lines of code that can be
Execution failed (configuration dumped) reached from the degeneration point.
KCC :
Execution failed (configuration dumped)

E XAMPLE (provenance_union_punning_2_auto_xy.c): A semantics that tracks provenance on integer values in


GCC -4.8-O0: memory will naturally do that.
Addresses: p=0x600bec q=0x600bec Here GCC exhibits the otherwise-unsound optimisation,
x=1 y=11 *p=11 *q=11 printing x=1 y=2 *p=11 *q=2.
GCC -4.9-O0: . . . as above
GCC -5.3-O2:
2.6 Pointer provenance via IO
Addresses: p=0x600bb8 q=0x600bb0
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above 2.6.1 Q19. Can one make a usable pointer via IO?
CLANG 33-O0:
Addresses: p=0x600b40 q=0x600b40 ISO : yes DEFACTO - USAGE: yes DEFACTO - IMPL: yes

44 2016/3/17
CERBERUS - DEFACTO : yes CHERI : no TIS : test not sup- CLANG 36-O2: . . . as above
ported (fopen library call) KCC: Execution failed (unclear CLANG 37-O2: . . . as above (modulo addresses)
why) CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
We now consider the extreme example of pointer prove- CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
nance flowing via IO, if one writes the address of an ob- CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
ject to a file and reads it back in. We give three versions: dresses)
one using fprintf/fscanf and the %p format, one using CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
fwrite/fread on the pointer representation bytes, and one CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
converting the pointer to and from uintptr t and using dresses)
fprintf/fscanf on that value with the PRIuPTR/SCNuPTR CLANG 37-UBSAN: . . . as above (modulo addresses)
formats. The first gives a syntactic indication of a potentially CLANG 37-ASAN: . . . as above (modulo addresses)
escaping pointer value, while the others (after preprocessing) TIS - INTERPRETER :
do not. [value] Analyzing a complete application starting at
E XAMPLE (provenance_via_io_percentp_global.c): main
[value] Computing initial state
#include <stdio.h>
#include <stdlib.h> [value] Initial
#include <string.h> state computed
#include <inttypes.h> provenance via io percentp global.c:8:[va
int x=1;
int main() { lue] warning: Library function call. Stopping.
int *p = &x;
FILE *f = fopen( stack: fopen :: provenance via io percentp global
"provenance_via_io_percentp_global.tmp","w+b");
printf("Addresses: p=%p\n",(void*)p); .c:8 <-
// print pointer address to a file main
fprintf(f,"%p\n",(void*)p); [value] user error:
rewind(f);
void *rv; Degeneration occurred:
int n = fscanf(f,"%p\n",&rv); results are
int *r = (int *)rv; not correct for lines of code that can be reached from
if (n != 1) exit(EXIT_FAILURE);
printf("Addresses: r=%p\n",(void*)r); the degeneration point.
// are r and p now equivalent? KCC :
*r=12; // is this free of undefined behaviour? Addresses: p=[sym(3 @ static(provenance via io percentp
_Bool b1 = (r==p); // do they compare equal?
_Bool b2 = (0==memcmp(&r,&p,sizeof(r)));//same reps? global.ce93c921f-acd9-4331-9a5b-7c021c3b892a)) +
printf("x=%i *r=%i b1=%s b2=%s\n",x,*r, 0]
b1?"true":"false",b2?"true":"false"); Execution failed (configuration dumped)
}
ISO : defined behaviour
GCC -4.8-O0:
Addresses: p=0x600e10
Addresses: r=0x600e10
x=12 *r=12 b1=true b2=true E XAMPLE (provenance_via_io_bytewise_global.c):
GCC -4.9-O0: . . . as above (modulo addresses) #include <stdio.h>
GCC -4.8-O2: . . . as above (modulo addresses) #include <stdlib.h>
GCC -4.9-O2: . . . as above (modulo addresses) #include <string.h>
#include <inttypes.h>
GCC -5.3-O2: . . . as above int x=1;
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- int main() {
dresses) int *p = &x;
FILE *f = fopen(
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- "provenance_via_io_bytewise_global.tmp","w+b");
dresses) printf("Addresses: p=%p\n",(void*)p);
. . . as above
GCC -5.3-O2- NO - STRICT- ALIASING : // output pointer address to a file
int nw = fwrite(&p, 1, sizeof(int *), f);
CLANG 33-O0: . . . as above (modulo addresses) if (nw != sizeof(int *)) exit(EXIT_FAILURE);
CLANG 34-O0: . . . as above rewind(f);
CLANG 35-O0: . . . as above (modulo addresses) int *r;
int nr = fread(&r, 1, sizeof(int *), f);
CLANG 36-O0: . . . as above if (nr != sizeof(int *)) exit(EXIT_FAILURE);
CLANG 37-O0: . . . as above (modulo addresses) printf("Addresses: r=%p\n",(void*)r);
CLANG 33-O2: . . . as above (modulo addresses) // are r and p now equivalent?
*r=12; // is this free of undefined behaviour?
CLANG 34-O2: . . . as above _Bool b1 = (r==p); // do they compare equal?
CLANG 35-O2: . . . as above (modulo addresses) _Bool b2 = (0==memcmp(&r,&p,sizeof(r)));//same reps?

45 2016/3/17
printf("x=%i *r=%i b1=%s b2=%s\n",x,*r, Addresses: p=[sym(3 @ static(provenance via io bytewise
b1?"true":"false",b2?"true":"false"); global.c88dd96f4-60b3-4258-8c36-9786d0aa1c10)) +
}
0]
Execution failed (configuration dumped)
GCC -4.8-O0: ISO : defined behaviour
Addresses: p=0x600e18
Addresses: r=0x600e18
x=12 *r=12 b1=true b2=true
GCC -4.9-O0: . . . as above (modulo addresses) E XAMPLE (provenance_via_io_uintptr_t_global.c):
GCC -4.8-O2: . . . as above (modulo addresses) #include <stdio.h>
GCC -4.9-O2: . . . as above (modulo addresses) #include <stdlib.h>
#include <string.h>
GCC -5.3-O2: . . . as above
#include <inttypes.h>
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- int x=1;
dresses) int main() {
int *p = &x;
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- uintptr_t i = (uintptr_t) p;
dresses) FILE *f = fopen(
. . . as above
GCC -5.3-O2- NO - STRICT- ALIASING : "provenance_via_io_uintptr_t_global.tmp","w+b");
printf("Addresses: i=%"PRIuPTR" \n",i);
CLANG 33-O0: . . . as above (modulo addresses) // print pointer address to a file
CLANG 34-O0: . . . as above (modulo addresses) fprintf(f,"%"PRIuPTR"\n",i);
CLANG 35-O0: . . . as above (modulo addresses) rewind(f);
uintptr_t k;
CLANG 36-O0: . . . as above (modulo addresses)
// read a pointer address from the file
CLANG 37-O0: . . . as above (modulo addresses) int n = fscanf(f,"%"SCNuPTR"\n",&k);
CLANG 33-O2: . . . as above (modulo addresses) if (n != 1) exit(EXIT_FAILURE);
printf("Addresses: k=%"PRIuPTR"\n",k);
CLANG 34-O2: . . . as above
int *r = (int *)k;
CLANG 35-O2: . . . as above (modulo addresses) // are r and q now equivalent?
CLANG 36-O2: . . . as above *r=12; // is this free of undefined behaviour?
_Bool b1 = (r==p); // do they compare equal?
CLANG 37-O2: . . . as above (modulo addresses)
_Bool b2 = (0==memcmp(&r,&p,sizeof(r)));//same reps?
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- printf("x=%i *r=%i b1=%s b2=%s\n",x,*r,
dresses) b1?"true":"false",b2?"true":"false");
}
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- GCC -4.8-O0:
dresses) Addresses: i=6295072
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above Addresses: k=6295072
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- x=12 *r=12 b1=true b2=true
dresses) GCC -4.9-O0: . . . as above
CLANG 37-UBSAN: . . . as above (modulo addresses) GCC -4.8-O2: . . . as above (modulo addresses)
CLANG 37-ASAN: . . . as above (modulo addresses) GCC -4.9-O2: . . . as above (modulo addresses)
TIS - INTERPRETER : GCC -5.3-O2: . . . as above
[value] Analyzing a complete application starting at GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
main dresses)
[value] Computing initial state GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
[value] Initial dresses)
state computed GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
provenance via io bytewise global.c:8:[va CLANG 33-O0: . . . as above (modulo addresses)
lue] warning: Library function call. Stopping. CLANG 34-O0: . . . as above
CLANG 35-O0: . . . as above (modulo addresses)
stack: fopen :: provenance via io bytewise global CLANG 36-O0: . . . as above
.c:8 <- CLANG 37-O0: . . . as above (modulo addresses)
main CLANG 33-O2: . . . as above (modulo addresses)
[value] user error: CLANG 34-O2: . . . as above
Degeneration occurred: CLANG 35-O2: . . . as above (modulo addresses)
results are CLANG 36-O2: . . . as above
not correct for lines of code that can be reached from CLANG 37-O2: . . . as above (modulo addresses)
the degeneration point. CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
KCC : CLANG 34-O2- NO - STRICT- ALIASING : . . . as above

46 2016/3/17
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- does have undefined behaviour. Cyclone did not aim to sup-
dresses) port it (this example is adapted from [19, Ch. 2]). Note that
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above our experimental data is (as usual) for execution in a user-
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- space process in a system with virtual memory, for which
dresses) that address is presumably not mapped to anything sensible,
CLANG 37-UBSAN: . . . as above (modulo addresses) so one would not expect it to work; they just illustrate how
CLANG 37-ASAN: . . . as above (modulo addresses) and where the failure is detected.
TIS - INTERPRETER :
E XAMPLE (pointer_from_concrete_address_1.c):
[value] Analyzing a complete application starting at
int main() {
main // on systems where 0xABC is not a legal non-stack/heap
[value] Computing initial state // address, does this have undefined behaviour?
[value] Initial *((int *)0xABC) = 123;
}
state computed
provenance via io uintptr t global.c:9:[v GCC -4.8-O0:

alue] warning: Library function call. Stopping. GCC -4.9-O0: . . . as above


GCC -4.8-O2: . . . as above
stack: fopen :: provenance via io uintptr t glob GCC -4.9-O2: . . . as above

al.c:9 <- GCC -5.3-O2: . . . as above

main GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above


[value] user GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
error: Degeneration occurred: GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O0: . . . as above

results are not correct for lines of code that can be CLANG 34-O0: . . . as above

reached from the degeneration point. CLANG 35-O0: . . . as above

KCC : CLANG 36-O0: . . . as above

Execution failed (configuration dumped) CLANG 37-O0: . . . as above

ISO : defined behaviour CLANG 33-O2: . . . as above


CLANG 34-O2: . . . as above

This is used in practice: in graphics code for mar- CLANG 35-O2: . . . as above

shalling/unmarshalling, at least using %p, and SCNuPTR and CLANG 36-O2: . . . as above

suchlike are used in xlib. Debuggers do this kind of thing CLANG 37-O2: . . . as above

too. CLANG 33-O2- NO - STRICT- ALIASING : . . . as above

In the ISO standard, the standard text for fprintf and CLANG 34-O2- NO - STRICT- ALIASING : . . . as above

scanf for %p say that this should work: “If the input item is a CLANG 35-O2- NO - STRICT- ALIASING : . . . as above

value converted earlier during the same program execution, CLANG 36-O2- NO - STRICT- ALIASING : . . . as above

the pointer that results shall compare equal to that value; CLANG 37-O2- NO - STRICT- ALIASING : . . . as above

otherwise the behavior of the %p conversion is undefined.” CLANG 37-UBSAN: . . . as above

(modulo the usual remarks about “compare equal”), and CLANG 37-ASAN:

the text for uintptr t and the presence of SCNuPTR in ASAN:SIGSEGV


inttypes.h implies the same there. ========================================================
=========
2.7 Q20. Can one make a usable pointer from a ==2779==ERROR: AddressSanitizer: SEGV on
concrete address (of device memory)? unknown address 0x000000000abc (pc 0x00000047fb82 bp
U: ISO 0x7fffffffea10 sp 0x7fffffffea00 T0)
ISO : unclear DEFACTO - USAGE: yes (at least in embedded) #0 0x47fb81
DEFACTO - IMPL: yes (at least in embedded) CERBERUS - (pointer from concrete address 1.c.clang37-ASAN.out+0x47
DEFACTO : yes (for implementation-defined device-memory fb81)
addresses) CHERI : no TIS : test not informative (but #1 0x40b88e (pointer from concrete address 1.
correctly detects UB for the out-of-bounds write) KCC: c.clang37-ASAN.out+0x40b88e)
Segmentation fault #2 0x8006b9fff
C programs should normally not form pointers from (<unknown module>)
particular concrete addresses. For example, the following
should normally be considered to have undefined behaviour, AddressSanitizer can not provide
as address 0xABC might not be mapped or, if it is, might alias additional info.
with other data used by the runtime. By the ISO standard it SUMMARY: AddressSanitizer: SEGV

47 2016/3/17
(pointer from concrete address 1.c.clang37-ASAN.out+0x47 GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
fb81) GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
==2779==ABORTING GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
TIS - INTERPRETER : CLANG 33-O0: . . . as above
[value] Analyzing a complete application starting at CLANG 34-O0: . . . as above
main CLANG 35-O0: . . . as above
[value] Computing initial state CLANG 36-O0: . . . as above
[value] Initial CLANG 37-O0: . . . as above
state computed CLANG 33-O2: . . . as above
pointer from concrete address 1.c:4:[kern CLANG 34-O2: . . . as above
el] warning: out of bounds write. assert \valid((int CLANG 35-O2: . . . as above
*)0xABC); CLANG 36-O2: . . . as above
stack: main CLANG 37-O2: . . . as above
[value] Stopping CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
at nth alarm CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
[value] user error: Degeneration occurred: CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
results are not correct for lines of CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
code that can be reached from the degeneration point. CLANG 37-UBSAN: . . . as above
KCC : CLANG 37-ASAN:
Segmentation fault ASAN:SIGSEGV
ISO : undefined behaviour ========================================================
DEFACTO : implementation-defined whether =========
undefined-behaviour or not ==3088==ERROR: AddressSanitizer: SEGV on
unknown address 0x000040000000 (pc 0x00000047fbaf bp
But in some circumstances it is idiomatic to use concrete 0x7fffffffea10 sp 0x7fffffffe9e0 T0)
addresses in C to access memory-mapped devices. For ex- #0 0x47fbae
ample, ARM documentation12 states “In most ARM embed- (pointer from concrete address 2.c.clang37-ASAN.out+0x47
ded systems, peripherals are located at specific addresses fbae)
in memory. It is often convenient to map a C variable onto #1 0x40b88e (pointer from concrete address 2.
each register of a memory-mapped peripheral, and then c.clang37-ASAN.out+0x40b88e)
read/write the register via a pointer. [...] The simplest way #2 0x8006b9fff
to implement memory-mapped variables is to use pointers (<unknown module>)
to fixed addresses. If the memory is changeable by ‘external
factors’ (for example, by some hardware), it must be labelled AddressSanitizer can not provide
as volatile.” with an example similar to the following. additional info.
SUMMARY: AddressSanitizer: SEGV
E XAMPLE (pointer_from_concrete_address_2.c):
(pointer from concrete address 2.c.clang37-ASAN.out+0x47
#define PORTBASE 0x40000000 fbae)
unsigned int volatile * const port =
(unsigned int *) PORTBASE; ==3088==ABORTING
int main() { TIS - INTERPRETER :
unsigned int value = 0; [value] Analyzing a complete application starting at
// on systems where PORTBASE is a legal non-stack/heap
// address, does this have defined behaviour? main
*port = value; /* write to port */ [value] Computing initial state
value = *port; /* read from port */ [value] Initial
}
state computed
GCC -4.8-O0: pointer from concrete address 2.c:8:[kern
GCC -4.9-O0: . . . as above el] warning: out of bounds write. assert \valid(port);
GCC -4.8-O2: . . . as above
GCC -4.9-O2: . . . as above stack: main
GCC -5.3-O2: . . . as above [value] Stopping at nth
12 Placing alarm
C variables at specific addresses to access memory-
mapped peripherals, ARM Technical Support Knowledge Articles, [value] user error: Degeneration occurred:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.
doc.faqs/ka3750.html

48 2016/3/17
results are not correct for lines of code mentioned above. It rules out C implementations using a
that can be reached from the degeneration point. moving garbage collector.
KCC : For example, we believe the following should be guaran-
Segmentation fault teed to print true:
ISO : undefined behaviour
E XAMPLE (pointer_stability_1.c):
DEFACTO : implementation-defined whether
undefined-behaviour or not #include <stdio.h>
#include <inttypes.h>
int main() {
int x=1;
uintptr_t i = (uintptr_t) &x;
uintptr_t j = (uintptr_t) &x;
2.8 Pointer provenance for other allocators // is this guaranteed to be true?
_Bool b = (i==j);
ISO C has a distinguished malloc, but operating system printf("(i==j)=%s\n",b?"true":"false");
kernels have multiple allocators, e.g. the FreeBSD and Linux return 0;
}
per-CPU allocators mentioned earlier. GCC has a function
attribute attribute ((malloc)) documented with: GCC -4.8-O0:
“This tells the compiler that a function is malloc-like, (i==j)=true
i.e., that the pointer P returned by the function cannot alias GCC -4.9-O0: . . . as above
any other pointer valid when the function returns, and more- GCC -4.8-O2: . . . as above
over no pointers to valid objects occur in any storage ad- GCC -4.9-O2: . . . as above
dressed by P. Using this attribute can improve optimiza- GCC -5.3-O2: . . . as above
tion. Functions like malloc and calloc have this prop- GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
erty because they return a pointer to uninitialized or zeroed- GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
out storage. However, functions like realloc do not have GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
this property, as they can return a pointer to storage con- CLANG 33-O0: . . . as above
taining pointers.” (https://gcc.gnu.org/onlinedocs/ CLANG 34-O0: . . . as above
gcc/Function-Attributes.html). CLANG 35-O0: . . . as above
Ideally a de facto semantics would be able to treat all CLANG 36-O0: . . . as above
malloc-like functions uniformly; we do not currently sup- CLANG 37-O0: . . . as above
port this. Do compilers special-case malloc in any way be- CLANG 33-O2: . . . as above
yond what that text says? CLANG 34-O2: . . . as above
CLANG 35-O2: . . . as above
CLANG 36-O2: . . . as above
2.9 Stability of pointer values
CLANG 37-O2: . . . as above
2.9.1 Q21. Are pointer values stable? CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
ISO : yes (modulo GCC debate) DEFACTO - USAGE: CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
yes DEFACTO - IMPL: yes CERBERUS - DEFACTO : yes CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CHERI : yes TIS: yes KCC : Execution failed (unclear CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
why) CLANG 37-UBSAN: . . . as above
We assume, in both de facto and ISO standard semantics, CLANG 37-ASAN: . . . as above
that pointer values are stable over time, as are the results of TIS - INTERPRETER :
comparisons of them (modulo nondeterministic choices as [value] Analyzing a complete application starting at
to whether their provenance is taken into account in those main
comparisons). [value] Computing initial state
This follows our understanding of normal implemen- [value] Initial
tations and our reading of the ISO standard, which says state computed
(6.2.4p2): “[...] An object exists, has a constant address,
33) and retains its last-stored value throughout its lifetime. (i==j)=true
[...]” where footnote 33 is: “The term “constant address”
means that two pointers to the object constructed at pos- [value] done for function
sibly different times will compare equal. The address may main
be different during two different executions of the same pro- KCC :
gram.”. Though note that this is contrary to one interpre- Execution failed (configuration dumped)
tation of the standard in a response to the GCC bug report DEFACTO : defined behaviour ((i==j)=true)

49 2016/3/17
ISO : defined behaviour ((i==j)=true) (though debated) As we saw in §2.1.2 (p.10), pointer comparison with ==
should be nondeterministically allowed to be provenance-
aware or not.
(pointer_stability_2.c and pointer_stability_ It is not clear whether the restriction to compatible types
3.c are similar but with the equality at pointer type and with is needed for typical modern implementations. It is also
a pointer representation equality, respectively.) not clear whether == comparison between pointers to non-
compatible types is used in practice, and similarly below for
2.10 Pointer Equality Comparison (with == and !=) relational comparison with < etc.
There are several notions of pointer equality which would For the following, GCC and Clang both give warnings;
coincide in a completely concrete semantics but which in a GCC says that this comparison without a cast is enabled
provenance-aware semantics can differ: by default, perhaps suggesting that it is used in the de facto
standard corpus of code and hence that our de facto standard
(a) comparison with ==
semantics should allow it.
(b) comparison of their representations, e.g. with memcmp
E XAMPLE (pointer_comparison_eq_1_global.c):
(c) accessing the same memory
#include <stdio.h>
(d) giving rise to equally defined or undefined behaviour #include <string.h>
int x=1;
(e) equivalent as far as alias analysis is concerned float f=1.0;
int main() {
As we note elsewhere, the standard appears to use “compare int *p = &x;
equal” to imply that the pointers are equally usable, but that float *q = &f;
is not the case. Our first examples show cases where two _Bool b = (p == q); // free of undefined behaviour?
printf("(p==q) = %s\n", b?"true":"false");
pointers are memcmp-equal but ==-unequal, and where they return 0;
are memcmp- or ==-equal but accessing them is not equally }
defined.
GCC -4.8-O0:
Jones [24] mentions some architectures, now more-or-
pointer comparison eq 1 global.c: In function ’main’:
less exotic, in which (b) may not hold.
pointer comparison eq 1 global.c:8:16: warning:
We say that two pointer values are equivalent if they are
comparison of distinct pointer types lacks a cast
interchangeable, satisfying all of (a–e). And we say that a
[enabled by default]
pointer value is usable if accesses using it access the right
Bool b = (p == q); // free of
memory and do not give rise to undefined behaviour.
undefined behaviour?
2.10.1 Q22. Can one do == comparison between ^
pointers to objects of non-compatible types? (p==q) = false
GCC -4.9-O0:
U: DEFACTO D: ISO - VS - DEFACTO
pointer comparison eq 1 global.c: In function ’main’:
ISO : no DEFACTO - USAGE: unclear – should be impl-
pointer comparison eq 1 global.c:8:16: warning:
def? DEFACTO - IMPL: unclear – should be impl-def?
comparison of distinct pointer types lacks a cast
CERBERUS - DEFACTO : yes CHERI : under debate TIS :
yes KCC: yes
Bool b = (p == q); // free of undefined behaviour?
[Question 6/15 of our What is C in practice? (Cerberus
survey v2)13 relates to this.]
^
As we noted in §2.1.3 (p.14), the ISO standard explicitly
(p==q) = false
permits == comparison between pointers to different objects
GCC -4.8-O2:
of compatible types. 6.5.9 Equality operators allows com-
pointer comparison eq 1 global.c: In function ’main’:
parison between any two pointers if
pointer comparison eq 1 global.c:8:16: warning:
• “both operands are pointers to qualified or unqualified comparison of distinct pointer types lacks a cast
versions of compatible types;” [enabled by default]
• “one operand is a pointer to an object type and the other Bool b = (p == q); // free of
is a pointer to a qualified or unqualified version of void; undefined behaviour?
or” ^
(p==q) = false
• “one operand is a pointer and the other is a null pointer
GCC -4.9-O2:
constant.” pointer comparison eq 1 global.c: In function ’main’:
13 www.cl.cam.ac.uk/ pointer comparison eq 1 global.c:8:16: warning:
~pes20/cerberus/
notes50-survey-discussion.html comparison of distinct pointer types lacks a cast

50 2016/3/17
[value] Computing initial state
Bool b = (p == q); // free of undefined behaviour? [value] Initial
state computed
^
(p==q) = false (p==q) = false
GCC -5.3-O2: . . . as above
GCC -4.8-O2- NO - STRICT- ALIASING : [value] done for
pointer comparison eq 1 global.c: In function ’main’: function main
pointer comparison eq 1 global.c:8:16: warning: KCC :
comparison of distinct pointer types lacks a cast (p==q) = false
[enabled by default] DEFACTO : implementation-defined
Bool b = (p == q); // free of ISO : undefined behaviour
undefined behaviour?
^
(p==q) = false
GCC -4.9-O2- NO - STRICT- ALIASING : E XAMPLE (pointer_comparison_eq_1_auto.c):
pointer comparison eq 1 global.c: In function ’main’: GCC -4.8-O0:

pointer comparison eq 1 global.c:8:16: warning: pointer comparison eq 1 auto.c: In function ’main’:


comparison of distinct pointer types lacks a cast pointer comparison eq 1 auto.c:8:16: warning: comparison
of distinct pointer types lacks a cast [enabled by
Bool b = (p == q); // free of undefined behaviour? default]
Bool b = (p == q); // free of undefined
^ behaviour?
(p==q) = false ^
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above (p==q) = false
CLANG 33-O0: GCC -4.9-O0:

pointer comparison eq 1 global.c:8:16: warning: pointer comparison eq 1 auto.c: In function ’main’:


comparison of distinct pointer types (’int *’ and ’float pointer comparison eq 1 auto.c:8:16: warning: comparison
*’) [-Wcompare-distinct-pointer-types] of distinct pointer types lacks a cast
Bool b = (p == Bool b = (p
q); // free of undefined behaviour? == q); // free of undefined behaviour?
^
^
1 warning generated. (p==q) = false
(p==q) = false GCC -4.8-O2:

CLANG 34-O0: . . . as above pointer comparison eq 1 auto.c: In function ’main’:


CLANG 35-O0: . . . as above pointer comparison eq 1 auto.c:8:16: warning: comparison
CLANG 36-O0: . . . as above of distinct pointer types lacks a cast [enabled by
CLANG 37-O0: . . . as above default]
CLANG 33-O2: . . . as above Bool b = (p == q); // free of undefined
CLANG 34-O2: . . . as above behaviour?
CLANG 35-O2: . . . as above ^
CLANG 36-O2: . . . as above (p==q) = false
CLANG 37-O2: . . . as above GCC -4.9-O2:

CLANG 33-O2- NO - STRICT- ALIASING : . . . as above pointer comparison eq 1 auto.c: In function ’main’:
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above pointer comparison eq 1 auto.c:8:16: warning: comparison
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above of distinct pointer types lacks a cast
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above Bool b = (p
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above == q); // free of undefined behaviour?
CLANG 37-UBSAN: . . . as above
CLANG 37-ASAN: . . . as above ^
TIS - INTERPRETER : (p==q) = false
[value] Analyzing a complete application starting at GCC -5.3-O2: . . . as above
main GCC -4.8-O2- NO - STRICT- ALIASING :
pointer comparison eq 1 auto.c: In function ’main’:

51 2016/3/17
pointer comparison eq 1 auto.c:8:16: warning: comparison KCC :
of distinct pointer types lacks a cast [enabled by (p==q) = false
default] DEFACTO : implementation-defined
Bool b = (p == q); // free of undefined ISO : undefined behaviour
behaviour?
^ Compilers might conceivably optimise such comparisons
(p==q) = false (between pointers of non-compatible type) to false, but the
GCC -4.9-O2- NO - STRICT- ALIASING : following example shows that (at least in this case) GCC
pointer comparison eq 1 auto.c: In function ’main’: does not:
pointer comparison eq 1 auto.c:8:16: warning: comparison
E XAMPLE (pointer_comparison_eq_2_global.c):
of distinct pointer types lacks a cast
Bool b = (p #include <stdio.h>
#include <string.h>
== q); // free of undefined behaviour?
int x=1;
float f=1.0;
^ int main() {
int *p = (int *)&f;
(p==q) = false
float *q = &f;
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above _Bool b = (p == q); // free of undefined behaviour?
CLANG 33-O0: printf("(p==q) = %s\n", b?"true":"false");
return 0;
pointer comparison eq 1 auto.c:8:16: warning: comparison
}
of distinct pointer types (’int *’ and ’float *’)
[-Wcompare-distinct-pointer-types] GCC -4.8-O0:

Bool b = (p == q); pointer comparison eq 2 global.c: In function ’main’:


// free of undefined behaviour? pointer comparison eq 2 global.c:8:16: warning:
^ comparison of distinct pointer types lacks a cast
1 [enabled by default]
warning generated. Bool b = (p == q); // free of
(p==q) = false undefined behaviour?
CLANG 34-O0: . . . as above ^
CLANG 35-O0: . . . as above (p==q) = true
CLANG 36-O0: . . . as above GCC -4.9-O0:

CLANG 37-O0: . . . as above pointer comparison eq 2 global.c: In function ’main’:


CLANG 33-O2: . . . as above pointer comparison eq 2 global.c:8:16: warning:
CLANG 34-O2: . . . as above comparison of distinct pointer types lacks a cast
CLANG 35-O2: . . . as above
CLANG 36-O2: . . . as above Bool b = (p == q); // free of undefined behaviour?
CLANG 37-O2: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above ^
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above (p==q) = true
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above GCC -4.8-O2:

CLANG 36-O2- NO - STRICT- ALIASING : . . . as above pointer comparison eq 2 global.c: In function ’main’:
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above pointer comparison eq 2 global.c:8:16: warning:
CLANG 37-UBSAN: . . . as above comparison of distinct pointer types lacks a cast
CLANG 37-ASAN: . . . as above [enabled by default]
TIS - INTERPRETER : Bool b = (p == q); // free of
[value] Analyzing a complete application starting at undefined behaviour?
main ^
[value] Computing initial state (p==q) = true
[value] Initial GCC -4.9-O2:

state computed pointer comparison eq 2 global.c: In function ’main’:


pointer comparison eq 2 global.c:8:16: warning:
(p==q) = false comparison of distinct pointer types lacks a cast

[value] done for Bool b = (p == q); // free of undefined behaviour?


function main
^

52 2016/3/17
(p==q) = true (p==q) = true
GCC -5.3-O2: . . . as above
GCC -4.8-O2- NO - STRICT- ALIASING : [value] done for function
pointer comparison eq 2 global.c: In function ’main’: main
pointer comparison eq 2 global.c:8:16: warning: KCC :
comparison of distinct pointer types lacks a cast (p==q) = true
[enabled by default] DEFACTO : implementation-defined
Bool b = (p == q); // free of ISO : undefined behaviour
undefined behaviour?
^
(p==q) = true
GCC -4.9-O2- NO - STRICT- ALIASING : E XAMPLE (pointer_comparison_eq_2_auto.c):
pointer comparison eq 2 global.c: In function ’main’: GCC -4.8-O0:

pointer comparison eq 2 global.c:8:16: warning: pointer comparison eq 2 auto.c: In function ’main’:


comparison of distinct pointer types lacks a cast pointer comparison eq 2 auto.c:8:16: warning: comparison
of distinct pointer types lacks a cast [enabled by
Bool b = (p == q); // free of undefined behaviour? default]
Bool b = (p == q); // free of undefined
^ behaviour?
(p==q) = true ^
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above (p==q) = true
CLANG 33-O0: GCC -4.9-O0:

pointer comparison eq 2 global.c:8:16: warning: pointer comparison eq 2 auto.c: In function ’main’:


comparison of distinct pointer types (’int *’ and ’float pointer comparison eq 2 auto.c:8:16: warning: comparison
*’) [-Wcompare-distinct-pointer-types] of distinct pointer types lacks a cast
Bool b = (p == Bool b = (p
q); // free of undefined behaviour? == q); // free of undefined behaviour?
^
^
1 warning generated. (p==q) = true
(p==q) = true GCC -4.8-O2:

CLANG 34-O0: . . . as above pointer comparison eq 2 auto.c: In function ’main’:


CLANG 35-O0: . . . as above pointer comparison eq 2 auto.c:8:16: warning: comparison
CLANG 36-O0: . . . as above of distinct pointer types lacks a cast [enabled by
CLANG 37-O0: . . . as above default]
CLANG 33-O2: . . . as above Bool b = (p == q); // free of undefined
CLANG 34-O2: . . . as above behaviour?
CLANG 35-O2: . . . as above ^
CLANG 36-O2: . . . as above (p==q) = true
CLANG 37-O2: . . . as above GCC -4.9-O2:

CLANG 33-O2- NO - STRICT- ALIASING : . . . as above pointer comparison eq 2 auto.c: In function ’main’:
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above pointer comparison eq 2 auto.c:8:16: warning: comparison
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above of distinct pointer types lacks a cast
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above Bool b = (p
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above == q); // free of undefined behaviour?
CLANG 37-UBSAN: . . . as above
CLANG 37-ASAN: . . . as above ^
TIS - INTERPRETER : (p==q) = true
[value] Analyzing a complete application starting at GCC -5.3-O2: . . . as above

main GCC -4.8-O2- NO - STRICT- ALIASING :

[value] Computing initial state pointer comparison eq 2 auto.c: In function ’main’:


[value] Initial pointer comparison eq 2 auto.c:8:16: warning: comparison
state computed of distinct pointer types lacks a cast [enabled by
default]
Bool b = (p == q); // free of undefined

53 2016/3/17
behaviour? ISO : undefined behaviour
^
(p==q) = true
2.10.2 Q23. Can one do == comparison between
GCC -4.9-O2- NO - STRICT- ALIASING :
pointers (to objects of compatible types) with
pointer comparison eq 2 auto.c: In function ’main’:
different provenances that are not strictly
pointer comparison eq 2 auto.c:8:16: warning: comparison
within their original allocations?
of distinct pointer types lacks a cast
Bool b = (p
== q); // free of undefined behaviour? ISO : yes DEFACTO - USAGE: unclear how much this is
used DEFACTO - IMPL: yes (modulo §2.1.3 discussion)
^ CERBERUS - DEFACTO : yes CHERI : ? TIS : fails with
(p==q) = true pointer comparable, as expected KCC: yes
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
E XAMPLE (klw-itp14-2.c):
CLANG 33-O0:
#include <stdio.h>
pointer comparison eq 2 auto.c:8:16: warning: comparison int x=1, y=2;
of distinct pointer types (’int *’ and ’float *’) int main() {
[-Wcompare-distinct-pointer-types] int *p = &x + 1;
int *q = &y;
Bool b = (p == q); _Bool b = (p == q); // free of undefined behaviour?
// free of undefined behaviour? printf("(p==q) = %s\n", b?"true":"false");
^ return 0;
}
1
warning generated. GCC -4.8-O0:

(p==q) = true (p==q) = true


CLANG 34-O0: . . . as above GCC -4.9-O0: . . . as above
CLANG 35-O0: . . . as above GCC -4.8-O2:

CLANG 36-O0: . . . as above (p==q) = false


CLANG 37-O0: . . . as above GCC -4.9-O2: . . . as above

CLANG 33-O2: . . . as above GCC -5.3-O2: . . . as above

CLANG 34-O2: . . . as above GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above


CLANG 35-O2: . . . as above GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O2: . . . as above GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above

CLANG 37-O2: . . . as above CLANG 33-O0:

CLANG 33-O2- NO - STRICT- ALIASING : . . . as above (p==q) = true


CLANG 34-O2- NO - STRICT- ALIASING : . . . as above CLANG 34-O0: . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above CLANG 35-O0: . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above CLANG 36-O0: . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above CLANG 37-O0: . . . as above

CLANG 37-UBSAN: . . . as above CLANG 33-O2: . . . as above

CLANG 37-ASAN: . . . as above CLANG 34-O2: . . . as above

TIS - INTERPRETER : CLANG 35-O2: . . . as above

[value] Analyzing a complete application starting at CLANG 36-O2: . . . as above

main CLANG 37-O2: . . . as above

[value] Computing initial state CLANG 33-O2- NO - STRICT- ALIASING : . . . as above


[value] Initial CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
state computed CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above

(p==q) = true CLANG 37-O2- NO - STRICT- ALIASING : . . . as above


CLANG 37-UBSAN: . . . as above

[value] done for function CLANG 37-ASAN:

main (p==q) = false


KCC : TIS - INTERPRETER :

(p==q) = true [value] Analyzing a complete application starting at


DEFACTO : implementation-defined main
[value] Computing initial state
[value] Initial

54 2016/3/17
state computed CLANG 37-ASAN: . . . as above
klw-itp14-2.c:6:[kernel] warning: pointer TIS - INTERPRETER :
comparison: assert \pointer comparable((void *)p, (void [value] Analyzing a complete application starting at
*)q); main
stack: main [value] Computing initial state
[value] Stopping at [value] Initial
nth alarm state computed
[value] user error: Degeneration occurred: besson blazy wilke 6.2.c:3:[value]
allocating variable malloc main l3
results are not correct for lines of besson blazy wilke
code that can be reached from the degeneration point. 6.2.c:4:[kernel] warning: pointer comparison: assert
KCC : \pointer comparable(p, (void *)((int)(-1)));
(p==q) = false
stack: main
This example is from Krebbers et al. [32], as we discuss in [value] Stopping at nth alarm
§6.7. Their model forbids this, while our candidate de facto [value]
model should allow arbitrary pointer comparison. user error: Degeneration occurred:

2.10.3 Q24. Can one do == comparison of a pointer


results are not correct for lines of code that can be
and (void*)-1?
reached from the degeneration point.
U: ISO KCC :
ISO : unclear DEFACTO - USAGE: yes DEFACTO - IMPL: ISO : unclear
yes CERBERUS - DEFACTO: yes CHERI: ? TIS: fails
with pointer comparable (but needed for sqlite?) KCC: yes This is from Besson et al. [9], as we discuss in §6.8. Their
§6.2 notes that system calls such as mmap return -1 on error,
E XAMPLE (besson_blazy_wilke_6.2.c): and so one must be able to compare pointers against -1. Our
test uses malloc as the source of the pointer, just to avoid
#include <stdlib.h>
int main() { dependence on sys/mman.h, even though malloc should
void *p = malloc(sizeof(int)); not return -1. Their model permits the mmap analogue of this,
_Bool b = (p == (void*)-1); // defined behaviour? apparently by building in the fact that mmap should return
}
aligned values.
GCC -4.8-O0: John Regehr observes that sqlite also compares against -2
GCC -4.9-O0: . . . as above and other error codes.
GCC -4.8-O2: . . . as above In a semantics in which == might respect provenance,
GCC -4.9-O2: . . . as above both -1 values should be constructed in a provenance-free
GCC -5.3-O2: . . . as above fashion, otherwise such a comparison might mistakenly give
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above false.
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
2.11 Pointer Relational Comparison (with <, >, <=, or
CLANG 33-O0: . . . as above
>=)
CLANG 34-O0: . . . as above Here the ISO standard seems to be significantly more restric-
CLANG 35-O0: . . . as above tive than common practice. First, there is a type constraint,
CLANG 36-O0: . . . as above as for ==: 6.5.8p2 “both operands are pointers to qualified
CLANG 37-O0: . . . as above or unqualified versions of compatible object types.”.
CLANG 33-O2: . . . as above Then 6.5.8p5 allows comparison of pointers only to the
CLANG 34-O2: . . . as above same object (or one-past) or to members of the same array,
CLANG 35-O2: . . . as above structure, or union: 6.5.8p5 “When two pointers are com-
CLANG 36-O2: . . . as above pared, the result depends on the relative locations in the ad-
CLANG 37-O2: . . . as above dress space of the objects pointed to. If two pointers to ob-
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above ject types both point to the same object, or both point one
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above past the last element of the same array object, they compare
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above equal. If the objects pointed to are members of the same ag-
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above gregate object, pointers to structure members declared later
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above compare greater than pointers to members declared earlier
CLANG 37-UBSAN: . . . as above in the structure, and pointers to array elements with larger

55 2016/3/17
subscript values compare greater than pointers to elements GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
of the same array with lower subscript values. All pointers CLANG 33-O0:
to members of the same union object compare equal. If the Addresses: p=0x600b20 q=0x600b1c
expression P points to an element of an array object and the (p<q) = false (p>q) = true
expression Q points to the last element of the same array ob- CLANG 34-O0: . . . as above
ject, the pointer expression Q+1 compares greater than P. In CLANG 35-O0: . . . as above (modulo addresses)
all other cases, the behavior is undefined.” CLANG 36-O0: . . . as above
(Similarly to 6.5.6p7 for pointer arithmetic, 6.5.8p4 treats CLANG 37-O0: . . . as above (modulo addresses)
all non-array element objects as arrays of size one for this: CLANG 33-O2: . . . as above (modulo addresses)
6.5.8p4 “For the purposes of these operators, a pointer to an CLANG 34-O2: . . . as above
object that is not an element of an array behaves the same as CLANG 35-O2: . . . as above (modulo addresses)
a pointer to the first element of an array of length one with CLANG 36-O2: . . . as above
the type of the object as its element type.”) CLANG 37-O2: . . . as above (modulo addresses)
This rules out the following comparisons, between point- CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
ers to two separately allocated objects and between a pointer CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
to a structure member and one to a sub-member of another CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
member, but some of these seem to be relied upon in prac- dresses)
tice. CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
2.11.1 Q25. Can one do relational comparison (with <, dresses)
>, <=, or >=) of two pointers to separately CLANG 37-UBSAN: . . . as above (modulo addresses)
allocated objects (of compatible object types)? CLANG 37-ASAN: . . . as above (modulo addresses)

D: ISO - VS - DEFACTO TIS - INTERPRETER :

ISO : no DEFACTO - USAGE: impl-def or yes? DEFACTO - [value] Analyzing a complete application starting at
IMPL: impl-def or yes? CERBERUS - DEFACTO : yes main
CHERI : yes TIS : no (fails with pointer comparable, in- [value] Computing initial state
tentionally) KCC: no (flags UB) [value] Initial
[Question 7/15 of our What is C in practice? (Cerberus state computed
survey v2)14 relates to this.] pointer comparison rel 1 global.c:5:[kern
el] warning: pointer comparison: assert
E XAMPLE (pointer_comparison_rel_1_global.c): \pointer comparable((void *)p, (void *)q);
#include <stdio.h>
int y = 2, x=1; stack: main
int main() {
int *p = &x, *q = &y; [value] Stopping at nth alarm
_Bool b1 = (p < q); // defined behaviour? [value]
_Bool b2 = (p > q); // defined behaviour? user error: Degeneration occurred:
printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q);
printf("(p<q) = %s (p>q) = %s\n",
b1?"true":"false", b2?"true":"false"); results are not correct for lines of code that can be
} reached from the degeneration point.
GCC -4.8-O0: KCC :

Addresses: p=0x600b84 q=0x600b80 Addresses: p=[sym(4 @ static(pointer comparison rel 1 gl


(p<q) = false (p>q) = true obal.c6eb90942-c1aa-425c-a801-8c68858d1fa3)) + 0]
GCC -4.9-O0: . . . as above (modulo addresses) q=[sym(3 @ static(pointer comparison rel 1 global.c6eb90
GCC -4.8-O2: 942-c1aa-425c-a801-8c68858d1fa3)) + 0]
Addresses: p=0x600b38 q=0x600b3c (p<q) = false
(p<q) = true (p>q) = false (p>q) = false
GCC -4.9-O2: . . . as above (modulo addresses) Error: UB-CERL1
GCC -5.3-O2: . . . as above Description: Cannot compare pointers with different base
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- objects using ’<’.
dresses) Type: Undefined behavior.
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- See also:
dresses) C11 sec. 6.5.8:5, J.2:1 item 53
at
14 www.cl.cam.ac.uk/ main(pointer comparison rel 1 global.c:5)
~pes20/cerberus/
notes50-survey-discussion.html

56 2016/3/17
at CLANG 34-O0: . . . as above
<file-scope>(<unknown>) CLANG 35-O0: . . . as above
Error: UB-CEE2 CLANG 36-O0: . . . as above
Description: CLANG 37-O0: . . . as above
Indeterminate value used in an expression. CLANG 33-O2: . . . as above (modulo addresses)
Type: CLANG 34-O2: . . . as above
Undefined behavior. CLANG 35-O2: . . . as above
See also: C11 sec. 6.2.4, 6.7.9, CLANG 36-O2: . . . as above
6.8, J.2:1 item 11 CLANG 37-O2: . . . as above
at main(pointer comparison rel 1 gl CLANG 33-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
obal.c:5) dresses)
at <file-scope>(<unknown>) CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
Error: CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
UB-CERL3 CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
Description: Cannot compare pointers with CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
different base objects using ’>’. CLANG 37-UBSAN: . . . as above (modulo addresses)
Type: Undefined CLANG 37-ASAN:
behavior. Addresses: p=0x7fffffffe950 q=0x7fffffffe940
See also: C11 sec. 6.5.8:5, J.2:1 item 53 (p<q) = false (p>q) = true
at TIS - INTERPRETER :
main(pointer comparison rel 1 global.c:6) [value] Analyzing a complete application starting at
at main
<file-scope>(<unknown>) [value] Computing initial state
Error: UB-CEE2 [value] Initial
Description: state computed
Indeterminate value used in an expression. pointer comparison rel 1 auto.c:5:[kernel
Type: ] warning: pointer comparison: assert
Undefined behavior. \pointer comparable((void *)p, (void *)q);
See also: C11 sec. 6.2.4, 6.7.9,
6.8, J.2:1 item 11 stack: main
at main(pointer comparison rel 1 gl [value] Stopping at nth alarm
obal.c:6) [value]
at <file-scope>(<unknown>) user error: Degeneration occurred:
DEFACTO : defined behaviour
ISO : undefined behaviour results are not correct for lines of code that can be
reached from the degeneration point.
And with automatic storage duration: KCC :
Execution failed (configuration dumped)
E XAMPLE (pointer_comparison_rel_1_auto.c): Error: UB-CERL1
GCC -4.8-O0: Description: Cannot compare pointers with different base
Addresses: p=0x7fffffffea04 q=0x7fffffffea08 objects using ’<’.
(p<q) = true (p>q) = false Type: Undefined behavior.
GCC -4.9-O0: . . . as above See also:
GCC -4.8-O2: C11 sec. 6.5.8:5, J.2:1 item 53
Addresses: p=0x7fffffffea0c q=0x7fffffffea08 at
(p<q) = false (p>q) = true main(pointer comparison rel 1 auto.c:5)
GCC -4.9-O2: . . . as above at
GCC -5.3-O2: . . . as above <file-scope>(<unknown>)
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- Error: UB-CEE2
dresses) Description:
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above Indeterminate value used in an expression.
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above Type:
CLANG 33-O0: Undefined behavior.
Addresses: p=0x7fffffffea18 q=0x7fffffffea1c See also: C11 sec. 6.2.4, 6.7.9,
(p<q) = true (p>q) = false

57 2016/3/17
6.8, J.2:1 item 11 printf("(p<q) = %s\n", b?"true":"false");
at main(pointer comparison rel 1 au }
to.c:5)
at <file-scope>(<unknown>) GCC -4.8-O0:

Error: Addresses: p=0x7fffffffea00 q=0x7fffffffea04


UB-CERL3 (p<q) = true
Description: Cannot compare pointers with GCC -4.9-O0: . . . as above
different base objects using ’>’. GCC -4.8-O2: . . . as above (modulo addresses)
Type: Undefined GCC -4.9-O2: . . . as above

behavior. GCC -5.3-O2: . . . as above

See also: C11 sec. 6.5.8:5, J.2:1 item 53 GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-

at dresses)
main(pointer comparison rel 1 auto.c:6) GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above

at GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above

<file-scope>(<unknown>) CLANG 33-O0: . . . as above (modulo addresses)

Error: UB-CEE2 CLANG 34-O0: . . . as above

Description: CLANG 35-O0: . . . as above

Indeterminate value used in an expression. CLANG 36-O0: . . . as above

Type: CLANG 37-O0: . . . as above

Undefined behavior. CLANG 33-O2: . . . as above

See also: C11 sec. 6.2.4, 6.7.9, CLANG 34-O2: . . . as above

6.8, J.2:1 item 11 CLANG 35-O2: . . . as above

at main(pointer comparison rel 1 au CLANG 36-O2: . . . as above

to.c:6) CLANG 37-O2: . . . as above

at <file-scope>(<unknown>) CLANG 33-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-

DEFACTO : defined behaviour dresses)


ISO : undefined behaviour CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above

In practice, comparison of pointers to different objects seems CLANG 36-O2- NO - STRICT- ALIASING : . . . as above

to be used heavily, e.g. in memory allocators and for a lock CLANG 37-O2- NO - STRICT- ALIASING : . . . as above

order in Linux, and we believe the de facto semantics should CLANG 37-UBSAN: . . . as above (modulo addresses)

allow it, leaving aside segmented architectures. Though CLANG 37-ASAN: . . . as above (modulo addresses)

one respondent reported for pointer_comparison_rel_ TIS - INTERPRETER :

1_global.c: “May produce inconsistent results in prac- [value] Analyzing a complete application starting at
tice if p and q straddle the exact middle of the address main
space. We’ve run into practical problems with this. Cast to [value] Computing initial state
intptr t first in the rare case you really need it.”. [value] Initial
state computed
2.11.2 Q26. Can one do relational comparison (with <,
>, <=, or >=) of a pointer to a structure member Addresses: p=
and one to a sub-member of another member, of
compatible object types? (p<q) = true
U: ISO D: ISO - VS - DEFACTO
ISO : unclear - no? (subject to interpretation) DEFACTO - [value]
USAGE: yes DEFACTO - IMPL: yes CERBERUS - done for function main
DEFACTO : yes CHERI : yes TIS : yes KCC : Execution KCC :

failed (unclear why) Execution failed (configuration dumped)


DEFACTO : defined behaviour (true)
E XAMPLE (pointer_comparison_rel_substruct.c):
ISO : undefined behaviour?
#include <stdio.h>
typedef struct { int i1; float f1; } st1;
typedef struct { int i2; st1 s2; } st2; Whether this is allowed in the ISO standard depends on
int main() { one’s interpretation of 6.5.8p5 “If the objects pointed to are
st2 s = {.i2=2, .s2={.i1=1, .f1=1.0 } }; members of the same aggregate object”. A literal reading
int *p = &(s.i2), *q = &(s.s2.i1);
_Bool b = (p < q); // does this have defined behaviour? suggests that it is not, as the object pointed to by q is not a
printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q); member of the struct, but merely a part of a member of it.

58 2016/3/17
2.11.3 Q27. Can one do relational comparison (with <, Addresses: p=0x7fffffffe9f0 q=0x7fffffffe9f4
>, <=, or >=) of pointers to two members of a (p<q) = true
structure that have incompatible types? GCC -4.9-O2:

U: DEFACTO D: ISO - VS - DEFACTO pointer comparison rel different type members.c: In


ISO : no DEFACTO - USAGE: unclear - should be impl- function ’main’:
def? DEFACTO - IMPL: unclear - should be impl-def? pointer comparison rel different type m
CERBERUS - DEFACTO : yes CHERI : under debate TIS : embers.c:7:16: warning: comparison of distinct pointer
yes KCC: Execution failed (unclear why) types lacks a cast
Bool b = (p < q); // does this
The ISO standard constraint also rules out comparison of
pointers to two members of a structure with different types: have defined behaviour?
^
E XAMPLE (pointer_comparison_rel_different_type_members. Addresses: p=0x7fffffffe9f0 q=0x7fffffffe9f4
c): (p<q) = true
GCC -5.3-O2: . . . as above
#include <stdio.h>
typedef struct { int i; float f; } st; GCC -4.8-O2- NO - STRICT- ALIASING :
int main() { pointer comparison rel different type members.c: In
st s = {.i=1, .f=1.0 }; function ’main’:
int *p = &(s.i);
float *q = &(s.f); pointer comparison rel different type m
_Bool b = (p < q); // does this have defined behaviour? embers.c:7:16: warning: comparison of distinct pointer
printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q); types lacks a cast [enabled by default]
printf("(p<q) = %s\n", b?"true":"false");
} Bool b = (p
< q); // does this have defined behaviour?
GCC -4.8-O0:
pointer comparison rel different type members.c: In ^
function ’main’: Addresses: p=0x7fffffffe9d0 q=0x7fffffffe9d4
pointer comparison rel different type m (p<q) = true
embers.c:7:16: warning: comparison of distinct pointer GCC -4.9-O2- NO - STRICT- ALIASING :
types lacks a cast [enabled by default] pointer comparison rel different type members.c: In
Bool b = (p function ’main’:
< q); // does this have defined behaviour? pointer comparison rel different type m
embers.c:7:16: warning: comparison of distinct pointer
^ types lacks a cast
Addresses: p=0x7fffffffe9e0 q=0x7fffffffe9e4 Bool b = (p < q); // does this
(p<q) = true have defined behaviour?
GCC -4.9-O0: ^
pointer comparison rel different type members.c: In Addresses: p=0x7fffffffe9d0 q=0x7fffffffe9d4
function ’main’: (p<q) = true
pointer comparison rel different type m GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
embers.c:7:16: warning: comparison of distinct pointer CLANG 33-O0:
types lacks a cast pointer comparison rel different type members.c:7:16:
Bool b = (p < q); // does this warning: comparison of distinct pointer types (’int *’
have defined behaviour? and ’float *’) [-Wcompare-distinct-pointer-types]
^
Addresses: p=0x7fffffffe9e0 q=0x7fffffffe9e4 Bool b = (p < q); // does this have defined behaviour?
(p<q) = true
GCC -4.8-O2: ^
pointer comparison rel different type members.c: In 1 warning generated.
function ’main’: Addresses: p=0x7fffffffe9f8 q=0x7fffffffe9fc
pointer comparison rel different type m (p<q) = true
embers.c:7:16: warning: comparison of distinct pointer CLANG 34-O0: . . . as above
types lacks a cast [enabled by default] CLANG 35-O0: . . . as above
Bool b = (p CLANG 36-O0: . . . as above
< q); // does this have defined behaviour? CLANG 37-O0: . . . as above
CLANG 33-O2: . . . as above
^

59 2016/3/17
CLANG 34-O2: . . . as above “An integer constant expression with the value 0, or such
CLANG 35-O2: . . . as above an expression cast to type void *, is called a null pointer
CLANG 36-O2: . . . as above constant.66) If a null pointer constant is converted to a
CLANG 37-O2: . . . as above pointer type, the resulting pointer, called a null pointer, is
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- guaranteed to compare unequal to a pointer to any object
dresses) or function. 66) The macro NULL is defined in <stddef.h>
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above (and other headers) as a null pointer constant; see 7.19.”
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
E XAMPLE (null_pointer_1.c):
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
#include <stdio.h>
#include <stddef.h>
CLANG 37-UBSAN: . . . as above (modulo addresses) #include <assert.h>
CLANG 37-ASAN: . . . as above (modulo addresses) int y=0;
TIS - INTERPRETER :
int main() {
assert(sizeof(long)==sizeof(int*));
[value] Analyzing a complete application starting at long x=0;
main int *p = (int *)x;
[value] Computing initial state // is the value of p a null pointer?
_Bool b1 = (p == NULL);// guaranteed to be true?
[value] Initial _Bool b2 = (p == &y); // guaranteed to be false?
state computed printf("(p==NULL)=%s (p==&y)=%s\n", b1?"true":"false",
b2?"true":"false");
}
Addresses: p=
GCC -4.8-O0:

(p<q) = true (p==NULL)=true (p==&y)=false


GCC -4.9-O0: . . . as above

[value] GCC -4.8-O2: . . . as above

done for function main GCC -4.9-O2: . . . as above


KCC : GCC -5.3-O2: . . . as above
Execution failed (configuration dumped) GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
DEFACTO : implementation-defined GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
ISO : undefined behaviour GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O0: . . . as above

As for == comparison (pointer_comparison_eq_1_ CLANG 34-O0: . . . as above

global.c, §2.10.1, p.50), this is presumably to let im- CLANG 35-O0: . . . as above

plementations use different representations for pointers to CLANG 36-O0: . . . as above

different types. In practice GCC gives the same warn- CLANG 37-O0: . . . as above

ing, comparison of distinct pointer types lacks CLANG 33-O2: . . . as above

a cast [enabled by default], which weakly implies CLANG 34-O2: . . . as above

that this is used in practice and that our de facto semantics CLANG 35-O2: . . . as above

should allow it. CLANG 36-O2: . . . as above


CLANG 37-O2: . . . as above
2.12 Null pointers CLANG 33-O2- NO - STRICT- ALIASING : . . . as above

2.12.1 Q28. Can one make a null pointer by casting CLANG 34-O2- NO - STRICT- ALIASING : . . . as above

from a non-constant integer expression? CLANG 35-O2- NO - STRICT- ALIASING : . . . as above


CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
D: ISO - VS - DEFACTO
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
ISO : no DEFACTO - USAGE: yes DEFACTO - IMPL: yes
CLANG 37-UBSAN: . . . as above
(modulo segmented or multiple-address-space architectures)
CLANG 37-ASAN: . . . as above
CERBERUS - DEFACTO : yes CHERI : yes TIS : yes KCC :
TIS - INTERPRETER :
yes
[value] Analyzing a complete application starting at
[Question 12/15 of our What is C in practice? (Cerberus
main
survey v2)15 relates to this.]
[value] Computing initial state
The standard permits the construction of null pointers
[value] Initial
by casting from integer constant zero expressions, but not
state computed
from other integer values that happen to be zero (6.3.2.3p3):
15 www.cl.cam.ac.uk/ (p==NULL)=true (p==&y)=false
~pes20/cerberus/
notes50-survey-discussion.html

60 2016/3/17
[value] GCC -5.3-O2: . . . as above
done for function main GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
KCC : GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
(p==NULL)=true (p==&y)=false GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
DEFACTO : implementation-defined (typically true/false) CLANG 33-O0: . . . as above
ISO : defined behaviour (nondeterministic results)? CLANG 34-O0: . . . as above
CLANG 35-O0: . . . as above
The situation in practice is not completely clear. The CHERI CLANG 36-O0: . . . as above
ASPLOS paper observes that “this distinction is difficult to CLANG 37-O0: . . . as above
support in modern compilers” and points to an LLVM mail- CLANG 33-O2: . . . as above
ing list thread16 that suggests that lots of code depends on be- CLANG 34-O2: . . . as above
ing able to form null pointers from non-constant zero expres- CLANG 35-O2: . . . as above
sions. The comp.lang.c FAQ17 has an example claimed to CLANG 36-O2: . . . as above
show that in some cases the compiler will get it wrong if CLANG 37-O2: . . . as above
not given an explicit cast, but this is essentially just telling CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
the compiler the right type. It would be useful to know of CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
any current platforms in which the NULL pointer isn’t rep- CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
resented with a zero value (perhaps embedded systems?). CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
2.12.2 Q29. Can one assume that all null pointers have CLANG 37-UBSAN: . . . as above
the same representation? CLANG 37-ASAN: . . . as above
D: ISO - VS - DEFACTO TIS - INTERPRETER :
ISO : no DEFACTO - USAGE: yes DEFACTO - IMPL: yes [value] Analyzing a complete application starting at
(modulo segmented or multiple-address-space architectures) main
CERBERUS - DEFACTO : iff the implementation-defined set of [value] Computing initial state
null pointer values is a singleton CHERI: yes? TIS: yes [value] Initial
KCC : Execution failed (unclear why) state computed
6.3.2.3p3 says this for == comparison: “Conversion of
a null pointer to another pointer type yields a null pointer p=
of that type. Any two null pointers shall compare equal.”
but leaves open whether they have the same representation equal
bytes.
[value] done for function
E XAMPLE (null_pointer_2.c):
main
#include <stdio.h> KCC :
#include <stddef.h>
#include <string.h> Execution failed (configuration dumped)
#include <assert.h> DEFACTO : implementation-defined (typically equal)
int y=0; ISO : defined behaviour but nondeterministic results?
int main() {
assert(sizeof(int*)==sizeof(char*)); Should be an implementation-defined set of null-pointer
int *p = NULL; representations
char *q = NULL;
// are two null pointers guaranteed to have the
// same representation? A de facto semantics could base this on the implementation-
_Bool b = (memcmp(&p, &q, sizeof(p))==0); defined set of null-pointer values. Or, even more simply and
printf("p=%p q=%p\n",(void*)p,(void*)q); consistent with the desire for calloc to initialise memory
printf("%s\n",b?"equal":"unequal");
} that will be used as pointer values to the representation of
NULL, just fix on zero.
GCC -4.8-O0:
p=0x0 q=0x0 2.12.3 Q30. Can null pointers be assumed to have
equal all-zero representation bytes?
GCC -4.9-O0: . . . as above D: ISO - VS - DEFACTO
GCC -4.8-O2: . . . as above ISO : no DEFACTO - USAGE: yes DEFACTO - IMPL: yes
GCC -4.9-O2: . . . as above (modulo segmented or multiple-address-space architectures)
16 http://lists.cs.uiuc.edu/pipermail/llvmdev/ CERBERUS - DEFACTO : iff the implementation-defined set of
2015-January/080288.html null pointer values contains just zero CHERI: yes TIS:
17 http://c-faq.com/null/null2.html yes KCC: Execution failed (unclear why)

61 2016/3/17
[Question 13/15 of our What is C in practice? (Cerberus KCC :
survey v2)18 relates to this.] Execution failed (configuration dumped)
DEFACTO : implementation-defined (typically zero)
E XAMPLE (null_pointer_3.c):
ISO : defined behaviour but nondeterministic results
#include <stdio.h>
#include <stddef.h>
#include <string.h>
#include <stdlib.h>
int y=0; 2.13 Pointer Arithmetic
int main() { The ISO standard permits only very limited pointer arith-
int *p = NULL;
int **q = (int **) calloc(1,sizeof(int*)); metic, restricting the formation of pointer values.
// is this guaranteed to be true? First, there is arithmetic within an array: 6.5.6 Additive
_Bool b = (memcmp(&p, q, sizeof(p))==0); operators (6.5.6p{8,9}) permits one to add a pointer and
printf("%s\n",b?"zero":"nonzero");
} integer (or subtract an integer from a pointer) only within
the start and one past the end of an array object, inclusive.
GCC -4.8-O0:
6.5.6p7 adds “For the purposes of these operators, a pointer
zero
to an object that is not an element of an array behaves
GCC -4.9-O0: . . . as above the same as a pointer to the first element of an array of
GCC -4.8-O2: . . . as above length one with the type of the object as its element type.”.
GCC -4.9-O2: . . . as above
Subtraction of two pointers is permitted only if both are in
GCC -5.3-O2: . . . as above
a similar range (and only if the result is representable in the
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above result type).
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above Second, 6.3.2.3p7 says that one can do pointer arithmetic
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
on character-type pointers to access representation bytes:
CLANG 33-O0: . . . as above
“[...] When a pointer to an object is converted to a pointer
CLANG 34-O0: . . . as above
to a character type, the result points to the lowest addressed
CLANG 35-O0: . . . as above
byte of the object. Successive increments of the result, up to
CLANG 36-O0: . . . as above
the size of the object, yield pointers to the remaining bytes of
CLANG 37-O0: . . . as above
the object.”.
CLANG 33-O2: . . . as above
CLANG 34-O2: . . . as above 2.13.1 Q31. Can one construct out-of-bounds (by more
CLANG 35-O2: . . . as above than one) pointer values by pointer arithmetic
CLANG 36-O2: . . . as above (without undefined behaviour)?
CLANG 37-O2: . . . as above U: DEFACTO D: ISO - VS - DEFACTO
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above ISO : no DEFACTO - USAGE: yes sometimes DEFACTO -
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above IMPL: yes sometimes but not in general CERBERUS -
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above DEFACTO : yes CHERI : yes in 256-bit CHERI, not always
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above in 128-bit CHERI TIS: yes for first test; correctly found a
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above bug in mis-edited second test KCC: no (flags UB at pointer
CLANG 37-UBSAN: . . . as above arithmetic)
CLANG 37-ASAN: . . . as above [Question 9/15 of our What is C in practice? (Cerberus
TIS - INTERPRETER : survey v2)19 relates to this.]
[value] Analyzing a complete application starting at In practice it seems to be common to transiently con-
main struct out-of-bounds pointer values, e.g. with (px +11)
[value] Computing initial state -10 rather than px + (11-10), as below, and we are not
[value] Initial aware of examples where this will go wrong in standard
state computed implementations, at least for small deltas. There are cases
FRAMAC SHARE/libc/stdlib.c:101:[value] where pointer arithmetic subtraction can overflow20 . There
allocating variable malloc calloc l101 might conceivably be an issue on some platforms if the tran-
sient value is not aligned and only aligned values are rep-
zero resentable at the particular pointer type, or if the hardware
is doing bounds checking, but both of those seem exotic
[value]
19 www.cl.cam.ac.uk/ pes20/cerberus/
done for function main ~
notes50-survey-discussion.html
18 www.cl.cam.ac.uk/ 20 http://sourceforge.net/p/png-mng/mailman/
~pes20/cerberus/
notes50-survey-discussion.html png-mng-implement/?viewmonth=201511

62 2016/3/17
at present. There are also cases where pointer arithmetic Error: UB-CEA1
might wrap at values less than the obvious word size, e.g. for Description: A pointer (or array subscript) outside the
“near” or “huge” pointers on 8086 [53, §2.4], but it is not bounds of an object.
clear if any of these are current. We give examples involving Type: Undefined behavior.
pointers to an integer array and to representation bytes, and See also:
with both addition and subtraction. C11 sec. 6.5.6:8, J.2:1 item 46
at
E XAMPLE (cheri_03_ii.c):
main(cheri 03 ii.c:6)
#include <stdio.h> at <file-scope>(<unknown>)
int main() {
int x[2]; Error
int *p = &x[0]; : UB-CEE3
//is this free of undefined behaviour? Description: Found pointer that refers outside
int *q = p + 11;
q = q - 10; the bounds of an object + 1.
*q = 1; Type: Undefined
printf("x[1]=%i *q=%i\n",x[1],*q);
behavior.
}
See also: C11 sec. 6.3.2.1:1, J.2:1 item 19
GCC -4.8-O0:
x[1]=1 *q=1 at main(cheri 03 ii.c:6)
GCC -4.9-O0: . . . as above at
GCC -4.8-O2: . . . as above <file-scope>(<unknown>)
GCC -4.9-O2: . . . as above Error: UB-CEE3
GCC -5.3-O2: . . . as above Description:
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above Found pointer that refers outside the bounds of an
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above object + 1.
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above Type: Undefined behavior.
CLANG 33-O0: . . . as above See also: C11 sec.
CLANG 34-O0: . . . as above 6.3.2.1:1, J.2:1 item 19
CLANG 35-O0: . . . as above at main(cheri 03 ii.c:7)
CLANG 36-O0: . . . as above at
CLANG 37-O0: . . . as above <file-scope>(<unknown>)
CLANG 33-O2: . . . as above DEFACTO : defined behaviour
CLANG 34-O2: . . . as above ISO : undefined behaviour
CLANG 35-O2: . . . as above
CLANG 36-O2: . . . as above
CLANG 37-O2: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above E XAMPLE (cheri_03_ii_char.c):
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above #include <stdio.h>
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above int main() {
unsigned char x;
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
unsigned char *p = &x;
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above //is this free of undefined behaviour?
CLANG 37-UBSAN: . . . as above unsigned char *q = p + 11;
q = q - 10;
CLANG 37-ASAN: . . . as above
*q = 1;
TIS - INTERPRETER : printf("x=0x%x *p=0x%x *q=0x%x\n",x,*p,*q);
[value] Analyzing a complete application starting at }
main
[value] Computing initial state SOURCES MISMATCHGCC -4.8-O0:
[value] Initial x=0x0 *p=0x0 *q=0xea
state computed GCC -4.9-O0: . . . as above
GCC -4.8-O2:
x[1]=1 *q=1 cheri 03 ii char.c: In function ’main’:
cheri 03 ii char.c:9:9: warning: ’x’ is used
[value] done for function uninitialized in this function [-Wuninitialized]
main
KCC : printf("x=0x%x *p=0x%x *q=0x%x\n",x,*p,*q);
x[1]=1 *q=1

63 2016/3/17
^ WRITE
x=0x0 *p=0x0 *q=0x1 of size 1 at 0x7fffffffe961 thread T0
GCC -4.9-O2: #0 0x47fc6f
cheri 03 ii char.c: In function ’main’: (cheri 03 ii char.c.clang37-ASAN.out+0x47fc6f)
cheri 03 ii char.c:9:3: warning: ’x’ is used #1
uninitialized in this function [-Wuninitialized] 0x40b88e (cheri 03 ii char.c.clang37-ASAN.out+0x40b88e)

printf("x=0x%x *p=0x%x *q=0x%x\n",x,*p,*q); #2 0x8006b9fff (<unknown module>)


^
x=0x0 *p=0x0 *q=0x1 Address
GCC -5.3-O2: . . . as above 0x7fffffffe961 is located in stack of thread T0 at
GCC -4.8-O2- NO - STRICT- ALIASING : offset 33 in frame
cheri 03 ii char.c: In function ’main’: #0 0x47fb4f
cheri 03 ii char.c:9:9: warning: ’x’ is used (cheri 03 ii char.c.clang37-ASAN.out+0x47fb4f)
uninitialized in this function [-Wuninitialized]
This
printf("x=0x%x *p=0x%x *q=0x%x\n",x,*p,*q); frame has 1 object(s):
[32, 33) ’x’ <== Memory
^ access at offset 33 overflows this variable
x=0x0 *p=0x0 *q=0x1 HINT: this
GCC -4.9-O2- NO - STRICT- ALIASING : may be a false positive if your program uses some custom
cheri 03 ii char.c: In function ’main’: stack unwind mechanism or swapcontext
cheri 03 ii char.c:9:3: warning: ’x’ is used (longjmp and
uninitialized in this function [-Wuninitialized] C++ exceptions *are* supported)
SUMMARY:
printf("x=0x%x *p=0x%x *q=0x%x\n",x,*p,*q); AddressSanitizer: stack-buffer-overflow
^ (cheri 03 ii char.c.clang37-ASAN.out+0x47fc6f)
x=0x0 *p=0x0 *q=0x1 Shadow
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above bytes around the buggy address:
CLANG 33-O0: 0x4ffffffffcd0: 00 00
x=0x0 *p=0x0 *q=0x1 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CLANG 34-O0: . . . as above
CLANG 35-O0: . . . as above 0x4ffffffffce0: 00 00 00 00 00 00 00 00 00 00 00 00 00
CLANG 36-O0: . . . as above 00 00 00
CLANG 37-O0: . . . as above 0x4ffffffffcf0: 00 00 00 00 00 00 00 00 00 00
CLANG 33-O2: 00 00 00 00 00 00
x=0x0 *p=0x0 *q=0x0 0x4ffffffffd00: 00 00 00 00 00 00 00
CLANG 34-O2: . . . as above 00 00 00 00 00 00 00 00 00
CLANG 35-O2: . . . as above 0x4ffffffffd10: 00 00 00 00
CLANG 36-O2: . . . as above 00 00 00 00 00 00 00 00 00 00 00 00
CLANG 37-O2: . . . as above =>0x4ffffffffd20: 00
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above 00 00 00 00 00 00 00 f1 f1 f1 f1[01]f3 f3 f3
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above 0x4ffffffffd30: 00 00 00 00 00 00 00 00 00 00 00 00 00
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above 00 00 00
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above 0x4ffffffffd40: 00 00 00 00 00 00 00 00 00 00
CLANG 37-UBSAN: 00 00 00 00 00 00
x=0x0 *p=0x0 *q=0x1 0x4ffffffffd50: 00 00 00 00 00 00 00
CLANG 37-ASAN: 00 00 00 00 00 00 00 00 00
======================================================== 0x4ffffffffd60: 00 00 00 00
========= 00 00 00 00 00 00 00 00 00 00 00 00
==7727==ERROR: AddressSanitizer: 0x4ffffffffd70: 00
stack-buffer-overflow on address 0x7fffffffe961 at pc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00000047fc70 bp 0x7fffffffe910 sp 0x7fffffffe908 Shadow byte

64 2016/3/17
legend (one shadow byte represents 8 application bounds of an object.
bytes): Type: Undefined behavior.
Addressable: 00 See also:
Partially C11 sec. 6.5.6:8, J.2:1 item 46
addressable: 01 02 03 04 05 06 07 at
Heap left redzone: main(cheri 03 ii char.c:6)
fa at
Heap right redzone: fb <file-scope>(<unknown>)
Freed heap Error: UB-CEE3
region: fd Description:
Stack left redzone: f1 Found pointer that refers outside the bounds of an
Stack object + 1.
mid redzone: f2 Type: Undefined behavior.
Stack right redzone: f3 See also: C11 sec.
6.3.2.1:1, J.2:1 item 19
Stack partial redzone: f4 at
Stack after return: main(cheri 03 ii char.c:6)
f5 at
Stack use after scope: f8 <file-scope>(<unknown>)
Global redzone: Error: UB-CEE3
f9 Description:
Global init order: f6 Found pointer that refers outside the bounds of an
Poisoned by user: object + 1.
f7 Type: Undefined behavior.
Container overflow: fc See also: C11 sec.
Array cookie: 6.3.2.1:1, J.2:1 item 19
ac at
Intra object redzone: bb main(cheri 03 ii char.c:7)
ASan at
internal: fe <file-scope>(<unknown>)
Left alloca redzone: ca Error: UB-CEE2
Description:
Right alloca redzone: cb Indeterminate value used in an expression.
==7727==ABORTING Type:
TIS - INTERPRETER : Undefined behavior.
[value] Analyzing a complete application starting at See also: C11 sec. 6.2.4, 6.7.9,
main 6.8, J.2:1 item 11
[value] Computing initial state at main(cheri 03 ii char.c:9)
[value] Initial at
state computed <file-scope>(<unknown>)
cheri 03 ii char.c:8:[kernel] warning: Error: UB-STDIO1
out of bounds write. assert \valid(q); Description:
’printf’: Mismatch between the type expected by the
stack: main conversion specifier %x and the type of the
[value] Stopping at nth alarm argument.
[value] user Type: Undefined behavior.
error: Degeneration occurred: See also: C11 sec.
7.21.6.1:9, J.2:1 item 153
results are not correct for lines of code that can be at
reached from the degeneration point. printf(cheri 03 ii char.c:9)
KCC : at
Execution failed (configuration dumped) main(cheri 03 ii char.c:9)
Error: UB-CEA1 at <file-scope>(<unknown>)
Description: A pointer (or array subscript) outside the DEFACTO : defined behaviour

65 2016/3/17
ISO : undefined behaviour dresses)
. . . as above
CLANG 36-O2- NO - STRICT- ALIASING :
This is the II invalid intermediate idiom of the CHERI AS- . . . as above
CLANG 37-O2- NO - STRICT- ALIASING :
PLOS paper; the second example also involves the Sub CLANG 37-UBSAN: . . . as above (modulo addresses)
pointer subtraction idiom and perhaps the IA performing CLANG 37-ASAN: . . . as above (modulo addresses)
integer arithmetic on pointers idiom (it’s not clear exactly TIS - INTERPRETER :
what that is). All are widely observed in practice. [value] Analyzing a complete application starting at
main
2.13.2 Q32. Can one form pointer values by pointer [value] Computing initial state
addition that overflows (without undefined [value] Initial
behaviour)? state computed
D: ISO - VS - DEFACTO
ISO : no DEFACTO - USAGE: yes sometimes DEFACTO - Addresses: p =
IMPL: yes sometimes but not in general CERBERUS -
DEFACTO : yes? CHERI : ? yes in 256-bit CHERI, not al- Addresses: q2=
ways in 128-bit CHERI TIS: yes KCC: no (flags UB at [value]
pointer arithmetic) done for function main
KCC :
E XAMPLE (pointer_add_wrap_1.c):
Execution failed (configuration dumped)
#include <stdio.h>
Error: UB-CEA1
int main() {
unsigned char x; Description: A pointer (or array subscript) outside the
unsigned char *p = &x; bounds of an object.
unsigned long long h = ( 1ull << 63 ); Type: Undefined behavior.
//are the following free of undefined behaviour?
unsigned char *q1 = p + h; See also:
unsigned char *q2 = q1 + h; C11 sec. 6.5.6:8, J.2:1 item 46
printf("Addresses: p =%p q1=%p\n",
at
(void*)p,(void*)q1);
printf("Addresses: q2=%p h =0x%llx\n", main(pointer add wrap 1.c:7)
(void*)q2,h); at
} <file-scope>(<unknown>)
Error: UB-CEE3
GCC -4.8-O0: Description:
Addresses: p =0x7fffffffea0f q1=0x80007fffffffea0f Found pointer that refers outside the bounds of an
Addresses: q2=0x7fffffffea0f h =0x8000000000000000 object + 1.
GCC -4.9-O0: . . . as above Type: Undefined behavior.
GCC -4.8-O2: . . . as above (modulo addresses) See also: C11 sec.
GCC -4.9-O2: . . . as above 6.3.2.1:1, J.2:1 item 19
GCC -5.3-O2: . . . as above at
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- main(pointer add wrap 1.c:7)
dresses) at
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above <file-scope>(<unknown>)
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above Error: UB-CEE3
CLANG 33-O0: . . . as above (modulo addresses) Description:
CLANG 34-O0: . . . as above Found pointer that refers outside the bounds of an
CLANG 35-O0: . . . as above object + 1.
CLANG 36-O0: . . . as above Type: Undefined behavior.
CLANG 37-O0: . . . as above See also: C11 sec.
CLANG 33-O2: . . . as above (modulo addresses) 6.3.2.1:1, J.2:1 item 19
CLANG 34-O2: . . . as above at
CLANG 35-O2: . . . as above (modulo addresses) main(pointer add wrap 1.c:8)
CLANG 36-O2: . . . as above at
CLANG 37-O2: . . . as above <file-scope>(<unknown>)
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- Error: UB-CEA1
dresses) Description: A
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above pointer (or array subscript) outside the bounds of an
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-

66 2016/3/17
object. (void*)q2,h);
Type: Undefined behavior. printf("x=0x%x *p=0x%x *q2=0x%x\n",x,*p,*q2);
}
See also: C11 sec.
6.5.6:8, J.2:1 item 46
at main(pointer add wrap 1.c:8) GCC -4.8-O0:
Addresses: p =0x7fffffffea0f q1=0x80007fffffffea0f
at <file-scope>(<unknown>) Addresses: q2=0x7fffffffea0f h
Error: =0x8000000000000000
UB-CEE3 x=0x1 *p=0x1 *q2=0x1
Description: Found pointer that refers outside GCC -4.9-O0: . . . as above

the bounds of an object + 1. GCC -4.8-O2: . . . as above (modulo addresses)


Type: Undefined GCC -4.9-O2: . . . as above

behavior. GCC -5.3-O2: . . . as above


See also: C11 sec. 6.3.2.1:1, J.2:1 item 19 GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
dresses)
at main(pointer add wrap 1.c:9) . . . as above
GCC -4.9-O2- NO - STRICT- ALIASING :

at . . . as above
GCC -5.3-O2- NO - STRICT- ALIASING :

<file-scope>(<unknown>) CLANG 33-O0: . . . as above (modulo addresses)

Error: UB-CEE3 CLANG 34-O0: . . . as above

Description: CLANG 35-O0: . . . as above

Found pointer that refers outside the bounds of an CLANG 36-O0: . . . as above

object + 1. CLANG 37-O0: . . . as above

Type: Undefined behavior. CLANG 33-O2: . . . as above (modulo addresses)

See also: C11 sec. CLANG 34-O2: . . . as above

6.3.2.1:1, J.2:1 item 19 CLANG 35-O2: . . . as above (modulo addresses)

at CLANG 36-O2: . . . as above

printf(pointer add wrap 1.c:9) CLANG 37-O2: . . . as above

at CLANG 33-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-

main(pointer add wrap 1.c:9) dresses)


at CLANG 34-O2- NO - STRICT- ALIASING : . . . as above

<file-scope>(<unknown>) CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-

ISO : undefined behaviour dresses)


CLANG 36-O2- NO - STRICT- ALIASING : . . . as above

Obviously this presumes that constructing an out-of-bounds CLANG 37-O2- NO - STRICT- ALIASING : . . . as above

(by more than one) pointer value by pointer arithmetic, as CLANG 37-UBSAN: . . . as above (modulo addresses)

per §2.13.1 (p.62), is itself allowed. CLANG 37-ASAN: . . . as above (modulo addresses)
TIS - INTERPRETER :
2.13.3 Q33. Can one assume pointer addition wraps on [value] Analyzing a complete application starting at
overflow? main
U: DEFACTO [value] Computing initial state
ISO : no DEFACTO - USAGE: ? DEFACTO - IMPL: ? [value] Initial
CERBERUS - DEFACTO : ? CHERI : ? TIS : no (or, if so, tis state computed
is not assuming a 64-bit address space). Unclear? KCC: pointer add wrap 2.c:9:[kernel] warning:
no (flags UB at pointer arithmetic) out of bounds write. assert \valid(q2);

E XAMPLE (pointer_add_wrap_2.c):
stack: main
#include <stdio.h> [value] Stopping at nth alarm
int main() {
unsigned char x; [value] user
unsigned char *p = &x; error: Degeneration occurred:
unsigned long long h = ( 1ull << 63 );
//are the following free of undefined behaviour?
unsigned char *q1 = p + h; results are not correct for lines of code that can be
unsigned char *q2 = q1 + h; reached from the degeneration point.
*q2 = 1; KCC :
printf("Addresses: p =%p q1=%p\n",
(void*)p,(void*)q1); Execution failed (configuration dumped)
printf("Addresses: q2=%p h =0x%llx\n", Error: UB-CEA1

67 2016/3/17
Description: A pointer (or array subscript) outside the Dereferencing a pointer past the end of an array.
bounds of an object. Type:
Type: Undefined behavior. Undefined behavior.
See also: See also: C11 sec. 6.5.6:8, J.2:1
C11 sec. 6.5.6:8, J.2:1 item 46 item 47
at at main(pointer add wrap 2.c:9)
main(pointer add wrap 2.c:7) at
at <file-scope>(<unknown>)
<file-scope>(<unknown>) Error: UB-EIO2
Error: UB-CEE3 Description:
Description: Trying to write outside the bounds of an object.
Found pointer that refers outside the bounds of an Type:
object + 1. Undefined behavior.
Type: Undefined behavior. See also: C11 sec. 6.5.6:8, J.2:1
See also: C11 sec. item 47
6.3.2.1:1, J.2:1 item 19 at main(pointer add wrap 2.c:9)
at at
main(pointer add wrap 2.c:7) <file-scope>(<unknown>)
at Error: UB-CEE3
<file-scope>(<unknown>) Description:
Error: UB-CEE3 Found pointer that refers outside the bounds of an
Description: object + 1.
Found pointer that refers outside the bounds of an Type: Undefined behavior.
object + 1. See also: C11 sec.
Type: Undefined behavior. 6.3.2.1:1, J.2:1 item 19
See also: C11 sec. at
6.3.2.1:1, J.2:1 item 19 main(pointer add wrap 2.c:10)
at at
main(pointer add wrap 2.c:8) <file-scope>(<unknown>)
at Error: UB-CEE3
<file-scope>(<unknown>) Description:
Error: UB-CEA1 Found pointer that refers outside the bounds of an
Description: A object + 1.
pointer (or array subscript) outside the bounds of an Type: Undefined behavior.
object. See also: C11 sec.
Type: Undefined behavior. 6.3.2.1:1, J.2:1 item 19
See also: C11 sec. at
6.5.6:8, J.2:1 item 46 printf(pointer add wrap 2.c:10)
at main(pointer add wrap 2.c:8) at
main(pointer add wrap 2.c:10)
at <file-scope>(<unknown>) at
Error: <file-scope>(<unknown>)
UB-CEE3 ISO : undefined behaviour
Description: Found pointer that refers outside
the bounds of an object + 1. This presumes that the previous question is allowed.
Type: Undefined
2.13.4 Q34. Can one move among the members of a
behavior.
struct using representation-pointer arithmetic
See also: C11 sec. 6.3.2.1:1, J.2:1 item 19
and casts?
at main(pointer add wrap 2.c:9) U: ISO D: ISO - VS - DEFACTO
at ISO : unclear – impl-def? DEFACTO - USAGE: yes
<file-scope>(<unknown>) DEFACTO - IMPL: yes CERBERUS - DEFACTO : yes
Error: UB-CER4 CHERI : yes TIS : yes KCC : no ((mistakenly) de-
Description: tects UB: A pointer (or array subscript) outside the bounds
of an object)

68 2016/3/17
The standard is ambiguous on the interaction between Error: UB-CEA1
the allowable pointer arithmetic (on unsigned char* rep- Description: A pointer (or array subscript) outside the
resentation pointers) and subobjects. For example, consider: bounds of an object.
Type: Undefined behavior.
E XAMPLE (cast_struct_inter_member_1.c):
See also:
#include <stdio.h> C11 sec. 6.5.6:8, J.2:1 item 46
#include <stddef.h>
typedef struct { float f; int i; } st; at
int main() { main(cast struct inter member 1.c:8)
st s = {.f=1.0, .i=1}; at
int *pi = &(s.i);
unsigned char *pci = ((unsigned char *)pi); <file-scope>(<unknown>)
unsigned char *pcf = (pci - offsetof(st,i)) Error: UB-CER4
+ offsetof(st,f); Description:
float *pf = (float *)pcf;
*pf = 2.0; // is this free of undefined behaviour? Dereferencing a pointer past the end of an array.
printf("s.f=%f *pf=%f s.i=%i\n",s.f,*pf,s.i); Type:
}
Undefined behavior.
See also: C11 sec. 6.5.6:8, J.2:1
GCC -4.8-O0: item 47
s.f=2.000000 *pf=2.000000 s.i=1 at main(cast struct inter member 1.c:11)
GCC -4.9-O0: . . . as above at
GCC -4.8-O2: . . . as above <file-scope>(<unknown>)
GCC -4.9-O2: . . . as above Error: UB-CER4
GCC -5.3-O2: . . . as above Description:
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above Dereferencing a pointer past the end of an array.
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above Type:
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above Undefined behavior.
CLANG 33-O0: . . . as above See also: C11 sec. 6.5.6:8, J.2:1
CLANG 34-O0: . . . as above item 47
CLANG 35-O0: . . . as above at main(cast struct inter member 1.c:12)
CLANG 36-O0: . . . as above at
CLANG 37-O0: . . . as above <file-scope>(<unknown>)
CLANG 33-O2: . . . as above DEFACTO : defined behaviour
CLANG 34-O2: . . . as above ISO : unclear
CLANG 35-O2: . . . as above
CLANG 36-O2: . . . as above This forms an unsigned char* pointer to the second mem-
CLANG 37-O2: . . . as above ber (i) of a struct, does arithmetic on that using offsetof to
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above form an unsigned char* pointer to the first member, casts
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above that into a pointer to the type of the first member (f), and
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above uses that to write.
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above In practice we believe that this is all supported by most
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above compilers and it is used in practice, e.g. as in the Container
CLANG 37-UBSAN: . . . as above idiom of the CHERI ASPLOS paper, where they discuss
CLANG 37-ASAN: . . . as above container macros that take a pointer to a structure member
TIS - INTERPRETER : and compute a pointer to the structure as a whole. They
[value] Analyzing a complete application starting at see it heavily used by one of the example programs they
main studied. We are told that Intel’s MPX compiler does not
[value] Computing initial state support the container macro idiom, while Linux, FreeBSD,
[value] Initial and Windows all rely on it.
state computed The standard says (6.3.2.3p7): “...When a pointer to an
object is converted to a pointer to a character type, the result
s.f=2.000000 *pf=2.000000 points to the lowest addressed byte of the object. Successive
s.i=1 increments of the result, up to the size of the object, yield
pointers to the remaining bytes of the object.”. This licenses
[value] done for function main the construction of the unsigned char* pointer pci to the
KCC : start of the representation of s.i (presuming that a structure
s.f=2.0000000000000000E0 *pf=2.0000000000000000E0 s.i=1

69 2016/3/17
member is itself an “object”, which itself is ambiguous in CLANG 34-O0: . . . as above
the standard), but allows it to be used only to access the CLANG 35-O0: . . . as above
representation of s.i. CLANG 36-O0: . . . as above
The offsetof definition in stddef.h, 7.19p3, “[...] CLANG 37-O0: . . . as above
offsetof(type, member-designator) which expands to an CLANG 33-O2: . . . as above
integer constant expression that has type size t, the value CLANG 34-O2: . . . as above
of which is the offset in bytes, to the structure member (des- CLANG 35-O2: . . . as above
ignated by member-designator), from the beginning of its CLANG 36-O2: . . . as above
structure (designated by type). [...]”, implies that the cal- CLANG 37-O2: . . . as above
culation of pcf gets the correct numerical address, but does CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
not say that it can be used, e.g. to access the representation CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
of s.f. As we saw in the discussion of provenance, the mere CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
fact that a pointer has the correct address does not necessar- CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
ily mean that it can be used to access that memory without CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
giving rise to undefined behaviour. CLANG 37-UBSAN: . . . as above
Finally, if one deems pcf to be a legitimate char* pointer CLANG 37-ASAN: . . . as above
to the representation of s.f, then the standard says that it can TIS - INTERPRETER :
be converted to a pointer to any object type if sufficiently [value] Analyzing a complete application starting at
aligned, which for float* it will be. 6.3.2.3p7: “A pointer main
to an object type may be converted to a pointer to a different [value] Computing initial state
object type. If the resulting pointer is not correctly aligned [value] Initial
68) for the referenced type, the behavior is undefined. Oth- state computed
erwise, when converted back again, the result shall compare struct inter submember 1.c:11:[value]
equal to the original pointer....”. But whether that pointer has warning: format undefined or not supported
the right value and is usable to access memory is left unclear. (yet)
struct inter submember 1.c:11:[value] warning:
2.13.5 Q35. Can one move between subobjects of the assert(match format and arguments)
members of a struct using pointer arithmetic?
U: ISO D: ISO - VS - DEFACTO stack: printf :: struct inter submember 1.c:11 <-
ISO : unclear DEFACTO - USAGE: yes DEFACTO - IMPL:
yes CERBERUS - DEFACTO: yes CHERI: ? TIS: guess main
yes, but tis appears not to support %td format KCC: no [value] user error: Degeneration
(detects UB at the pointer arithmetic) occurred:
results are not correct
E XAMPLE (struct_inter_submember_1.c): for lines of code that can be reached from the
#include <stdio.h> degeneration point.
#include <stddef.h> KCC :
struct S { int a[3]; int b[3]; } s;
int main() { Execution failed (configuration dumped)
s.b[2]=10; Error: UB-CEA1
ptrdiff_t d; Description: A pointer (or array subscript) outside the
d = &(s.b[2]) - &(s.a[0]); // defined behaviour?
int *p; bounds of an object.
p = &(s.a[0]) + d; // defined behaviour? Type: Undefined behavior.
*p = 11; // defined behaviour? See also:
printf("d=%td s.b[2]=%d *p=%d\n",d,s.b[2],*p);
} C11 sec. 6.5.6:8, J.2:1 item 46
at
GCC -4.8-O0: main(struct inter submember 1.c:9)
d=5 s.b[2]=11 *p=11 at
GCC -4.9-O0: . . . as above <file-scope>(<unknown>)
GCC -4.8-O2: . . . as above Error: UB-CER4
GCC -4.9-O2: . . . as above Description:
GCC -5.3-O2: . . . as above Dereferencing a pointer past the end of an array.
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above Type:
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above Undefined behavior.
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above See also: C11 sec. 6.5.6:8, J.2:1
CLANG 33-O0: . . . as above

70 2016/3/17
item 47 CLANG 35-O2: . . . as above
at main(struct inter submember 1.c:10) CLANG 36-O2: . . . as above
at CLANG 37-O2: . . . as above
<file-scope>(<unknown>) CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
Error: UB-CER4 CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
Description: CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
Dereferencing a pointer past the end of an array. CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
Type: CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
Undefined behavior. CLANG 37-UBSAN:
See also: C11 sec. 6.5.6:8, J.2:1 p=0x1 offsetof(struct s,b)=0x1
item 47 ubc addr null 1.c:7:21: runtime error: member access
at main(struct inter submember 1.c:11) within null pointer of type ’struct s’
at CLANG 37-ASAN:
<file-scope>(<unknown>) p=0x1 offsetof(struct s,b)=0x1
ISO : unclear TIS - INTERPRETER :
[value] Analyzing a complete application starting at
This is inspired by an example from Krebbers [29], as dis- main
cussed in §6.11. [value] Computing initial state
[value] Initial
2.13.6 Q36. Can one implement offsetof using the state computed
addresses of members of a NULL struct
pointer? p=
U: ISO [value] done for function main
ISO : unclear DEFACTO - USAGE: yes DEFACTO - IMPL: KCC :

yes CERBERUS - DEFACTO: yes CHERI: ? TIS: unclear Execution failed (configuration dumped)
(the print seems to stop at the %p) KCC: no (flags a null- Error: UB-CER3
dereference UB) Description: Dereferencing a null pointer.
Type:
E XAMPLE (ubc_addr_null_1.c): Undefined behavior.
#include <stddef.h> See also: C11 sec. 6.5.3.2:4, J.2:1
#include <inttypes.h> item 43
#include <stdio.h>
struct s { uint8_t a; uint8_t b; }; at main(ubc addr null 1.c:7)
int main () { at
struct s *f = NULL; <file-scope>(<unknown>)
uint8_t *p = &(f->b); // free of undefined behaviour?
ISO : unclear
// and equal to the offsetof result?
printf("p=%p offsetof(struct s,b)=0x%zx\n",
(void*)p,offsetof(struct s, b)); This seems to be a common idiom in practice. The test
}
is inspired by examples from Regehr’s UB Canaries, as
discussed in §6.18.
GCC -4.8-O0: If one views p->x as syntactic sugar for (*p).x (as stated
p=0x1 offsetof(struct s,b)=0x1 by Jones [24, p.982], but, interestingly, not the ISO standard)
GCC -4.9-O0: . . . as above then this is undefined behaviour when p is null. CompCert
GCC -4.8-O2: . . . as above seems to do this, while GCC seems to keep the -> at least as
GCC -4.9-O2: . . . as above far as GIMPLE.
GCC -5.3-O2: . . . as above
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above 2.14 Casts between pointer types
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
Standard The standard (6.3.2.3p{1–4,7,8}) identifies var-
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
ious circumstances in which conversion between pointer
CLANG 33-O0: . . . as above
types is legal, with some rather weak constraints on the re-
CLANG 34-O0: . . . as above
sults:
CLANG 35-O0: . . . as above
CLANG 36-O0: . . . as above 1 “A pointer to void may be converted to or from a pointer
CLANG 37-O0: . . . as above to any object type. A pointer to any object type may be
CLANG 33-O2: . . . as above converted to a pointer to void and back again; the result
CLANG 34-O2: . . . as above shall compare equal to the original pointer.”

71 2016/3/17
2 “For any qualifier q, a pointer to a non-q-qualified type The Friendly C proposal (Point 4) by Cuoq et al., dis-
may be converted to a pointer to the q-qualified version of cussed in §6.17, has a link21 which points to C committee
the type; the values stored in the original and converted discussion22 in which they considered interconvertability of
pointers shall compare equal.” object and function pointers. POSIX apparently requires it,
7 “A pointer to an object type may be converted to a for dlsym.
pointer to a different object type. If the resulting pointer 2.14.1 Q37. Are usable pointers to a struct and to its
is not correctly aligned 68) for the referenced type, the first member interconvertable?
behavior is undefined. Otherwise, when converted back
again, the result shall compare equal to the original
ISO : yes DEFACTO - USAGE: yes DEFACTO - IMPL: yes
pointer. When a pointer to an object is converted to a
CERBERUS - DEFACTO : yes CHERI : yes TIS : yes KCC :
pointer to a character type, the result points to the low-
est addressed byte of the object. Successive increments of yes
the result, up to the size of the object, yield pointers to the A Linux kernel developer says that they rely on this,
remaining bytes of the object.” and also that they use offsetof to move between members.
If offsetof is not available, it is faked up (with subtraction
8 “A pointer to a function of one type may be converted to between address-of a member reference off the null pointer).
a pointer to a function of another type and back again;
the result shall compare equal to the original pointer. If E XAMPLE (cast_struct_and_first_member_1.c):
a converted pointer is used to call a function whose type #include <stdio.h>
is not compatible with the referenced type, the behavior typedef struct { int i; float f; } st;
int main() {
is undefined.” st s = {.i = 1, .f = 1.0};
int *pi = &(s.i);
st* p = (st*) pi; // free of undefined behaviour?
p->f = 2.0; // and this?
Paragraphs 3 and 4 relate to null pointers, as discussed in printf("s.f=%f p->f=%f\n",s.f,p->f);
§2.12 (p.60). Paragraphs 5 and 6 relate to casts between }
pointer and integer types, as discussed in §2.2 (p.14). Foot-
note 68 just says that “correctly aligned” should be transi- GCC -4.8-O0:
tive. s.f=2.000000 p->f=2.000000
This raises several questions. First, this “compare equal” GCC -4.9-O0: . . . as above
is probably supposed to mean the the pointers are (in our GCC -4.8-O2: . . . as above
sense discussed in §2.10, p.50) equivalent: that they not GCC -4.9-O2: . . . as above
only compare equal with == but also are equally usable to GCC -5.3-O2: . . . as above
access (the same) memory and have equal representations. GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
We imagine that this is pre-DR260 text, when these concepts GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
arguably coincided. GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
Second, the standard only covers roundtrips of size two, CLANG 33-O0: . . . as above
via one other pointer type and back. This seems curiously CLANG 34-O0: . . . as above
irregular: there seems to be no reason not to give a roundtrip CLANG 35-O0: . . . as above
property for longer roundtrips via multiple pointer types, and CLANG 36-O0: . . . as above
both our ISO and de facto standard semantics should allow CLANG 37-O0: . . . as above
that. CLANG 33-O2: . . . as above
Third, (7) gives undefined behaviour for a conversion CLANG 34-O2: . . . as above
between object types where the result value is not aligned CLANG 35-O2: . . . as above
for the new type, while (1) allows such a conversion via CLANG 36-O2: . . . as above
(void *), albeit with no guarantee on the result. CLANG 37-O2: . . . as above
Fourth, it gives no guarantees for the usability of pointers CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
constructed by a combination of casts and arithmetic, as CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
discussed in §2.13.4 (p.68). CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-UBSAN: . . . as above
Additionally, 6.7.2.1 Structure and union specifiers li- 21 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/
censes conversions (in both directions) between pointers to 2008/n2605.pdf
structures and their initial members, and between unions and 22 Defect Report 195 in http://www.open-std.org/jtc1/sc22/

their members. wg21/docs/cwg_defects.html

72 2016/3/17
CLANG 37-ASAN: . . . as above CLANG 36-O0: . . . as above
TIS - INTERPRETER : CLANG 37-O0: . . . as above
[value] Analyzing a complete application starting at CLANG 33-O2: . . . as above
main CLANG 34-O2: . . . as above
[value] Computing initial state CLANG 35-O2: . . . as above
[value] Initial CLANG 36-O2: . . . as above
state computed CLANG 37-O2: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
s.f=2.000000 p->f=2.000000 CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
[value] CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
done for function main CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
KCC : CLANG 37-UBSAN: . . . as above
s.f=2.0000000000000000E0 p->f=2.0000000000000000E0 CLANG 37-ASAN: . . . as above
DEFACTO : defined behaviour TIS - INTERPRETER :
ISO : defined behaviour [value] Analyzing a complete application starting at
main
This is allowed in the standard: 6.7.2.1p15 “Within a struc- [value] Computing initial state
ture object, the non-bit-field members and the units in which [value] Initial
bit-fields reside have addresses that increase in the order in state computed
which they are declared. A pointer to a structure object,
suitably converted, points to its initial member (or if that u.f=2.000000 p->f=2.000000
member is a bit-field, then to the unit in which it resides),
and vice versa. There may be unnamed padding within a [value]
structure object, but not at its beginning.” (bold emphasis done for function main
added). KCC :
u.f=2.0000000000000000E0 p->f=2.0000000000000000E0
2.14.2 Q38. Are usable pointers to a union and to its DEFACTO : defined behaviour
current member interconvertable? ISO : defined behaviour

ISO : yes DEFACTO - USAGE: yes DEFACTO - IMPL: yes The standard says: 6.7.2.1p16 “The size of a union is
CERBERUS - DEFACTO : yes CHERI : yes TIS : yes KCC : sufficient to contain the largest of its members. The value of
yes at most one of the members can be stored in a union object
at any time. A pointer to a union object, suitably converted,
E XAMPLE (cast_union_and_member_1.c):
points to each of its members (or if a member is a bit-field,
#include <stdio.h> then to the unit in which it resides), and vice versa.” (bold
typedef union { int i; float f; } un;
int main() { emphasis added).
un u = {.i = 1}; This is likewise allowed in practice and in the standard.
int *pi = &(u.i);
un* p = (un*) pi; // free of undefined behaviour?
p->f = 2.0; // and this? 2.15 Accesses to related structure and union types
printf("u.f=%f p->f=%f\n",u.f,p->f);
} If one only accesses structures via assignment and member
projections, the standard treats structure types abstractly.
Type declarations create new types:
GCC -4.8-O0:
u.f=2.000000 p->f=2.000000 • 6.7.2.1p8 “The presence of a struct-declaration-list in a
GCC -4.9-O0: . . . as above struct-or-union-specifier declares a new type, within a
GCC -4.8-O2: . . . as above translation unit. [...]”
GCC -4.9-O2: . . . as above • 6.7.2.3p5 “Two declarations of structure, union, or enu-
GCC -5.3-O2: . . . as above merated types which are in different scopes or use dif-
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
ferent tags declare distinct types. Each declaration of a
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
structure, union, or enumerated type which does not in-
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
clude a tag declares a distinct type.”;
CLANG 33-O0: . . . as above
CLANG 34-O0: . . . as above accessing a structure member requires the name of a member
CLANG 35-O0: . . . as above of the type:

73 2016/3/17
• 6.5.2.3p1 “The first operand of the . operator shall have CLANG 33-O0: . . . as above
an atomic, qualified, or unqualified structure or union CLANG 34-O0: . . . as above
type, and the second operand shall name a member of CLANG 35-O0: . . . as above
that type.” CLANG 36-O0: . . . as above

• 6.5.2.3p2 “The first operand of the -> operator shall CLANG 37-O0: . . . as above
CLANG 33-O2: . . . as above
have type “pointer to atomic, qualified, or unqualified
CLANG 34-O2: . . . as above
structure” or “pointer to atomic, qualified, or unqualified
CLANG 35-O2: . . . as above
union”, and the second operand shall name a member of
CLANG 36-O2: . . . as above
the type pointed to.”;
CLANG 37-O2: . . . as above
and assignment requires the left and right-hand-side types to CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
be compatible: CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
• 6.5.16.1p1b2 “the left operand has an atomic, qualified, CLANG 35-O2- NO - STRICT- ALIASING : . . . as above

or unqualified version of a structure or union type com- CLANG 36-O2- NO - STRICT- ALIASING : . . . as above

patible with the type of the right;” CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-UBSAN: . . . as above
• 6.5.16.1p1b3 “the left operand has atomic, qualified, or
CLANG 37-ASAN: . . . as above
unqualified pointer type, and (considering the type the
TIS - INTERPRETER :
left operand would have after lvalue conversion) both
[value] Analyzing a complete application starting at
operands are pointers to qualified or unqualified versions
main
of compatible types, and the type pointed to by the left has
[value] Computing initial state
all the qualifiers of the type pointed to by the right;”,
[value] Initial
where (6.2.7p1) for two structure types to be compatible they state computed
have to be either the same or (if declared in separate trans- struct initialise members.c:12:[value]
lation units) very similar: broadly, with the same ordering, warning: argument (int)s2.c has type int but format
names, and compatible types of members. indicates unsigned int
But the standard permits several ways to break this type [value] warning: Continuing
abstraction: conversion between pointers to object types, analysis because this seems innocuous
reading from a union of structures sharing a common initial
sequence, and type punning by writing and reading different s2.c=0x41
union members. s2.f=1.000000
Most simply, one can initialise a structure by initialising
its individual members at their underlying types: [value] done for function main
KCC :
E XAMPLE (struct_initialise_members.c):
s2.c=0x41 s2.f=1.0000000000000000E0
#include <stdio.h> Error: UB-STDIO1
void f(char* cp, float*fp) {
*cp=’A’; Description: ’printf’: Mismatch between the type
*fp=1.0; expected by the conversion specifier %x and the type of
} the argument.
typedef struct { char c; float f; } st;
int main() { Type: Undefined behavior.
st s1; See also: C11
f(&s1.c, &s1.f); sec. 7.21.6.1:9, J.2:1 item 153
st s2;
s2 = s1; at
printf("s2.c=0x%x s2.f=%f\n",s2.c,s2.f); printf(struct initialise members.c:12)
} at
main(struct initialise members.c:12)
GCC -4.8-O0: at
s2.c=0x41 s2.f=1.000000 <file-scope>(<unknown>)
GCC -4.9-O0: . . . as above DEFACTO : defined behaviour
GCC -4.8-O2: . . . as above ISO : defined behaviour
GCC -4.9-O2: . . . as above
GCC -5.3-O2: . . . as above This suggests that isomorphic structs could be interchange-
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above able as memory objects, at least if one can cast from one
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above pointer type to the other. This is reasonable in the de facto
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above

74 2016/3/17
semantics, but the standard’s effective types (discussed in §4, ^
p.133) make it false in the standard. use struct isomorphic.c.gcc-5.3-O2.out: not found
Even in the de facto semantics, isomorphic struct types GCC -4.8-O2- NO - STRICT- ALIASING :
are not directly interchangeable. The following example use struct isomorphic.c: In function ’main’:
gives a static type error in GCC and Clang, and is clearly use struct isomorphic.c:7:6: error: incompatible types
forbidden in the standard (for the two struct types to be com- when assigning to type ’st2’ from type ’st1’
patible they have to be almost identical). s2 = s1;
E XAMPLE (use_struct_isomorphic.c):
^
#include <stdio.h>
typedef struct { int i1; float f1; } st1; use struct isomorphic.c.gcc-4.8-O2-no-strict-aliasing.ou
typedef struct { int i2; float f2; } st2; t: not found
int main() { GCC -4.9-O2- NO - STRICT- ALIASING :
st1 s1 = {.i1 = 1, .f1 = 1.0 };
st2 s2; use struct isomorphic.c: In function ’main’:
s2 = s1; use struct isomorphic.c:7:6: error: incompatible types
printf("s2.i2=%i2 s2.f2=%f\n",s2.i2,s2.f2); when assigning to type ’st2’ from type ’st1’
}
s2 = s1;
GCC -4.8-O0:
use struct isomorphic.c: In function ’main’: ^
use struct isomorphic.c:7:6: error: incompatible types use struct isomorphic.c.gcc-4.9-O2-no-strict-aliasing.ou
when assigning to type ’st2’ from type ’st1’ t: not found
s2 = s1; GCC -5.3-O2- NO - STRICT- ALIASING :
use struct isomorphic.c: In function ’main’:
^ use struct isomorphic.c:7:6: error: incompatible types
use struct isomorphic.c.gcc-4.8-O0.out: not found when assigning to type ’st2 {aka struct <anonymous>}’
GCC -4.9-O0: from type ’st1 {aka struct <anonymous>}’
use struct isomorphic.c: In function ’main’: s2 = s1;
use struct isomorphic.c:7:6: error: incompatible types
when assigning to type ’st2’ from type ’st1’ ^
s2 = s1; use struct isomorphic.c.gcc-5.3-O2-no-strict-aliasing.ou
t: not found
^ CLANG 33-O0:
use struct isomorphic.c.gcc-4.9-O0.out: not found use struct isomorphic.c:7:6: error: assigning to ’st2’
GCC -4.8-O2: from incompatible type ’st1’
use struct isomorphic.c: In function ’main’: s2 = s1;
use struct isomorphic.c:7:6: error: incompatible types ^
when assigning to type ’st2’ from type ’st1’ 1
s2 = s1; error generated.
use struct isomorphic.c.clang33-O0.out: not found
^ CLANG 34-O0:
use struct isomorphic.c.gcc-4.8-O2.out: not found use struct isomorphic.c:7:6: error: assigning to ’st2’
GCC -4.9-O2: from incompatible type ’st1’
use struct isomorphic.c: In function ’main’: s2 = s1;
use struct isomorphic.c:7:6: error: incompatible types ^
when assigning to type ’st2’ from type ’st1’ 1
s2 = s1; error generated.
use struct isomorphic.c.clang34-O0.out: not found
^ CLANG 35-O0:
use struct isomorphic.c.gcc-4.9-O2.out: not found use struct isomorphic.c:7:6: error: assigning to ’st2’
GCC -5.3-O2: from incompatible type ’st1’
use struct isomorphic.c: In function ’main’: s2 = s1;
use struct isomorphic.c:7:6: error: incompatible types ^
when assigning to type ’st2 {aka struct <anonymous>}’ 1
from type ’st1 {aka struct <anonymous>}’ error generated.
s2 = s1; use struct isomorphic.c.clang35-O0.out: not found

75 2016/3/17
CLANG 36-O0: error generated.
use struct isomorphic.c:7:6: error: assigning to ’st2’ use struct isomorphic.c.clang37-O2.out: not found
from incompatible type ’st1’ CLANG 33-O2- NO - STRICT- ALIASING :
s2 = s1; use struct isomorphic.c:7:6: error: assigning to ’st2’
^ from incompatible type ’st1’
1 s2 = s1;
error generated. ^
use struct isomorphic.c.clang36-O0.out: not found 1
CLANG 37-O0: error generated.
use struct isomorphic.c:7:6: error: assigning to ’st2’ use struct isomorphic.c.clang33-O2-no-strict-aliasing.ou
from incompatible type ’st1’ t: not found
s2 = s1; CLANG 34-O2- NO - STRICT- ALIASING :
^ use struct isomorphic.c:7:6: error: assigning to ’st2’
1 from incompatible type ’st1’
error generated. s2 = s1;
use struct isomorphic.c.clang37-O0.out: not found ^
CLANG 33-O2: 1
use struct isomorphic.c:7:6: error: assigning to ’st2’ error generated.
from incompatible type ’st1’ use struct isomorphic.c.clang34-O2-no-strict-aliasing.ou
s2 = s1; t: not found
^ CLANG 35-O2- NO - STRICT- ALIASING :
1 use struct isomorphic.c:7:6: error: assigning to ’st2’
error generated. from incompatible type ’st1’
use struct isomorphic.c.clang33-O2.out: not found s2 = s1;
CLANG 34-O2: ^
use struct isomorphic.c:7:6: error: assigning to ’st2’ 1
from incompatible type ’st1’ error generated.
s2 = s1; use struct isomorphic.c.clang35-O2-no-strict-aliasing.ou
^ t: not found
1 CLANG 36-O2- NO - STRICT- ALIASING :
error generated. use struct isomorphic.c:7:6: error: assigning to ’st2’
use struct isomorphic.c.clang34-O2.out: not found from incompatible type ’st1’
CLANG 35-O2: s2 = s1;
use struct isomorphic.c:7:6: error: assigning to ’st2’ ^
from incompatible type ’st1’ 1
s2 = s1; error generated.
^ use struct isomorphic.c.clang36-O2-no-strict-aliasing.ou
1 t: not found
error generated. CLANG 37-O2- NO - STRICT- ALIASING :
use struct isomorphic.c.clang35-O2.out: not found use struct isomorphic.c:7:6: error: assigning to ’st2’
CLANG 36-O2: from incompatible type ’st1’
use struct isomorphic.c:7:6: error: assigning to ’st2’ s2 = s1;
from incompatible type ’st1’ ^
s2 = s1; 1
^ error generated.
1 use struct isomorphic.c.clang37-O2-no-strict-aliasing.ou
error generated. t: not found
use struct isomorphic.c.clang36-O2.out: not found CLANG 37-UBSAN:
CLANG 37-O2: use struct isomorphic.c:7:6: error: assigning to ’st2’
use struct isomorphic.c:7:6: error: assigning to ’st2’ from incompatible type ’st1’
from incompatible type ’st1’ s2 = s1;
s2 = s1; ^
^ 1
1 error generated.

76 2016/3/17
use struct isomorphic.c.clang37-UBSAN.out: not found 2.15.1 Q39. Given two different structure types
CLANG 37-ASAN: sharing a prefix of members that have
use struct isomorphic.c:7:6: error: assigning to ’st2’ compatible types, can one cast a usable pointer
from incompatible type ’st1’ to an object of the first to a pointer to the
s2 = s1; second, that can be used to read and write
^ members of that prefix (with strict-aliasing
1 disabled and without packing variation)?
error generated. D: ISO - VS - DEFACTO
use struct isomorphic.c.clang37-ASAN.out: not found
ISO : n/a (ISO does not specify semantics with strict
TIS - INTERPRETER :
aliasing disabled, and effective types forbid this)
use struct isomorphic.c:7:[kernel] failure: castTo st1 DEFACTO - USAGE: yes DEFACTO - IMPL: yes (with
-> struct anonstruct st2 2 -fno-effective-types, at least) CERBERUS -
[kernel] user error: DEFACTO : yes CHERI : yes TIS : yes KCC : yes
stopping on file "use struct isomorphic.c" that (contrary to ISO effective types)
[Question 10/15 of our What is C in practice? (Cerberus
has errors. Add ’-kernel-msg-key pp’ for survey v2)23 relates to this.]
preprocessing command. First we consider a case with two isomorphic structure
[kernel] Frama-C aborted: invalid types:
user input.
KCC : E XAMPLE (cast_struct_isomorphic.c):
File: use struct isomorphic.c #include <stdio.h>
Line: 7 typedef struct { int i1; float f1; } st1;
typedef struct { int i2; float f2; } st2;
Error: CV-TEAS1 int main() {
Description: Incompatible types in assignment or st1 s1 = {.i1 = 1, .f1 = 1.0 };
function call arguments. st2 *p2 = (st2 *) (&s1);// is this free of undef.beh.?
p2->f2=2.0; // and this?
Type: Constraint violation. printf("s1.f1=%f p2->f2=%f\n",s1.f1,p2->f2);
See }
also: C11 sec. 6.5.16.1:1
GCC -4.8-O0:
Translation failed. Run kcc -d
s1.f1=2.000000 p2->f2=2.000000
-o use struct isomorphic.c.kcc.out
GCC -4.9-O0: . . . as above
use struct isomorphic.c to see commands run.
GCC -4.8-O2: . . . as above
sh: 1: use struct isomorphic.c.kcc.out: not found
GCC -4.9-O2: . . . as above
DEFACTO : type error
GCC -5.3-O2: . . . as above
ISO : type error
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
Most generally, 6.3.2.3p7 says that “A pointer to an ob-
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
ject type may be converted to a pointer to a different object
CLANG 33-O0: . . . as above
type”, if “the resulting pointer is correctly aligned”, other-
CLANG 34-O0: . . . as above
wise undefined behaviour results. (6.5.4 Cast operators does
CLANG 35-O0: . . . as above
not add any type restrictions to this.)
CLANG 36-O0: . . . as above
There are two interesting cases here: conversion to a
CLANG 37-O0: . . . as above
char * pointer and conversion to a related structure type.
CLANG 33-O2: . . . as above
In the former, 6.3.2.3p7 (as discussed in §2.14, p.71) goes
CLANG 34-O2: . . . as above
on to specify enough about the value of the resulting pointer
CLANG 35-O2: . . . as above
to make it usable for accessing the representation bytes of
CLANG 36-O2: . . . as above
the original object. In the latter, the standard says little about
CLANG 37-O2: . . . as above
the resulting value, but it might be used to access related
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
structures without going via a union type:
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-UBSAN: . . . as above

23 www.cl.cam.ac.uk/ pes20/cerberus/
~
notes50-survey-discussion.html

77 2016/3/17
CLANG 37-ASAN: . . . as above CLANG 37-ASAN: . . . as above
TIS - INTERPRETER : TIS - INTERPRETER :
[value] Analyzing a complete application starting at [value] Analyzing a complete application starting at
main main
[value] Computing initial state [value] Computing initial state
[value] Initial [value] Initial
state computed state computed

s1.f1=2.000000 p2->f2=2.000000 s1.f1=2.000000 p2->f2=2.000000

[value] [value]
done for function main done for function main
KCC : KCC :
s1.f1=2.0000000000000000E0 p2->f2=2.0000000000000000E0 s1.f1=2.0000000000000000E0 p2->f2=2.0000000000000000E0
DEFACTO : defined behaviour DEFACTO : defined behaviour (with effective types switched
ISO : undefined behaviour off)
ISO : undefined behaviour
And now with a common prefix but differing after that:
Several survey respondents reported that this idiom is
E XAMPLE (cast_struct_same_prefix.c): both used and supported in practice, e.g. in some C object
#include <stdio.h> systems and in the Perl interpreter.
typedef struct { int i1; float f1; char c1; double d1; } For it to work in implementations,
st1;
typedef struct { int i2; float f2; double d2; char c2; }
st2; 1. the offsets of f1 and f2 have to be equal,
int main() { 2. the code emitted by the compiler for the f2 access has
st1 s1 = {.i1 = 1, .f1 = 1.0, .c1 = ’a’, .d1 = 1.0};
st2 *p2 = (st2 *) (&s1);// is this free of undef.beh.? to be independent of the subsequent members of the
p2->f2=2.0; // and this? structure (in particular, it cannot use an over-wide write
printf("s1.f1=%f p2->f2=%f\n",s1.f1,p2->f2); that would only hit padding in one structure but hit data
}
in the other). Or we need a more elaborate condition: the
last member of the common prefix is only writable if it
GCC -4.8-O0:
is aligned and sized such that wide writes will never be
s1.f1=2.000000 p2->f2=2.000000 used (an implementation-defined property).
GCC -4.9-O0: . . . as above
GCC -4.8-O2: . . . as above
3. either the alignments of st1 and st2 have to be equal or
GCC -4.9-O2: . . . as above
the code emitted by the compiler for the f2 access has
GCC -5.3-O2: . . . as above to be independent of the structure alignment (we imagine
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above that the latter holds in practice), and
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above 4. the compiler has to not be doing some alias analysis that
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above assumes that it is illegal.
CLANG 33-O0: . . . as above
CLANG 34-O0: . . . as above
For the offsets, the standard implies that within the scope
CLANG 35-O0: . . . as above
of each compilation, there is a fixed layout for the members
CLANG 36-O0: . . . as above
of each structure, and that that is available to the program-
CLANG 37-O0: . . . as above
mer via offsetof(type, member-designator), “the offset
CLANG 33-O2: . . . as above
in bytes, to the structure member (designated by member-
CLANG 34-O2: . . . as above
designator), from the beginning of its structure (designated
CLANG 35-O2: . . . as above
by type).” (7.19p3, in Common definitions <stddef.h>),
CLANG 36-O2: . . . as above
and via the the 6.5.3.4 sizeof and Alignof operators. The
CLANG 37-O2: . . . as above
C standard provides only weak constraints for these layout
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
values24 ; it does not guarantee that st1 and st2 have the
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
same offsets for f1 and f2.25
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above 24 e.g.
that they increase along a structure, per 6.7.2.1p15
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above 25 DR074CR confirms this: http://www.open-std.org/jtc1/sc22/
CLANG 37-UBSAN: . . . as above wg14/www/docs/dr_074.html

78 2016/3/17
In practice, however, these values are typically com- typedef struct { int i1; float f1; char c1; } st1;
pletely determined by the ABI, with constant sizes and align- typedef struct { int i2; float f2; double d2; } st2;
typedef union { st1 m1; st2 m2; } un;
ments for the fundamental types and the algorithm “Each int main() {
member is assigned to the lowest available offset with the ap- un u = {.m1 = {.i1 = 1, .f1 = 1.0, .c1 = ’a’}};
propriate alignment.” for structures, from the x86-64 Unix int i = u.m2.i2; // is this free of undef.beh.?
printf("i=%i\n",i);
ABI [37]. There is similar text for Power [6], MIPS [45], and }
Visual Studio [38]. The ARM ABI [5] is an exception in that
it does not clearly state this, but the wording suggests that
the writers may well have had the same algorithm in mind.
This algorithm will guarantee that the offsets are equal.
W.r.t. the (hypothetical) use of wide writes, the situation
is unclear to us.
We should recall also that there are various compiler flags
and pragmas to control packing, so it can (and does) happen
that the same type (and code manipulating it) is compiled GCC -4.8-O0:

with different packing in different compilation units, rely- i=1


ing on the programmer to not intermix them. We currently GCC -4.9-O0: . . . as above
ignore this possibility but it should be relatively straight- GCC -4.8-O2: . . . as above
forward to add the packing flags to the structure name used GCC -4.9-O2: . . . as above

within the semantics. GCC -5.3-O2: . . . as above


GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
If one wanted to argue that this example should be illegal
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
(e.g. to license an otherwise-unsound analysis), one might
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
attempt to do so in terms of the effective types of 6.5p{6,7}.
CLANG 33-O0: . . . as above
The key question here is whether one considers the effective
CLANG 34-O0: . . . as above
type of a structure member to be simply the type of the
CLANG 35-O0: . . . as above
member itself or also to involve the structure type that it is
CLANG 36-O0: . . . as above
part of, which the text (with its ambiguous use of “object”)
CLANG 37-O0: . . . as above
leaves unclear. In the former case the example would be
CLANG 33-O2: . . . as above
allowed, while in the latter it would not. We return to this
CLANG 34-O2: . . . as above
in §4 (p.133).
CLANG 35-O2: . . . as above

2.15.2 Q40. Can one read from the initial part of a CLANG 36-O2: . . . as above

union of structures sharing a common initial CLANG 37-O2: . . . as above

sequence via any union member (if the union CLANG 33-O2- NO - STRICT- ALIASING : . . . as above

type is visible)? CLANG 34-O2- NO - STRICT- ALIASING : . . . as above


CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
ISO : yes DEFACTO - USAGE: yes DEFACTO - IMPL: yes
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
CERBERUS - DEFACTO : yes CHERI : yes TIS : yes (though
CLANG 37-UBSAN: . . . as above
they ask whether union type visibility is obscured by *&?)
CLANG 37-ASAN: . . . as above
KCC : yes
TIS - INTERPRETER :
Next we have 6.5.2.3p6, which licenses reading from a
[value] Analyzing a complete application starting at
common initial sequence of two structure types which are
main
members of a union type declaration: “One special guar-
[value] Computing initial state
antee is made in order to simplify the use of unions: if a
[value] Initial
union contains several structures that share a common ini-
state computed
tial sequence (see below), and if the union object currently
contains one of these structures, it is permitted to inspect
i=1
the common initial part of any of them anywhere that a dec-
laration of the completed type of the union is visible. Two
[value] done for function main
structures share a common initial sequence if corresponding
KCC :
members have compatible types (and, for bit-fields, the same
i=1
widths) for a sequence of one or more initial members.”
DEFACTO : defined behaviour
E XAMPLE (read_union_same_prefix_visible.c): ISO : defined behaviour
#include <stdio.h>

79 2016/3/17
2.15.3 Q41. Is writing to the initial part of a union of
structures sharing a common initial sequence u.m1.i1=2 u.m2.i2=2
allowed via any union member (if the union
type is visible)? [value] done for
U: DEFACTO function main
KCC :
ISO : no DEFACTO - USAGE: unclear DEFACTO - IMPL:
unclear CERBERUS - DEFACTO: yes CHERI: yes TIS: u.m1.i1=2 u.m2.i2=2
DEFACTO : defined behaviour (under the ‘more elaborate
yes KCC: yes
We presume the above is restricted to reading to avoid the condition’)
ISO : undefined behaviour
case in which a write to one structure type might overwrite
what is padding there but not padding in the other structure
type. We return to padding below.
2.15.4 Q42. Is type punning by writing and reading
E XAMPLE (write_union_same_prefix_visible.c):
different union members allowed (if the lvalue
is syntactically obvious)?
#include <stdio.h>
typedef struct { int i1; float f1; char c1; } st1; U: DEFACTO D: ISO - VS - DEFACTO
typedef struct { int i2; float f2; double d2; } st2; ISO : yes DEFACTO - USAGE: yes (subject to GCC “syn-
typedef union { st1 m1; st2 m2; } un;
int main() {
tactically obvious” notion) DEFACTO - IMPL: yes (sub-
un u = {.m1 = {.i1 = 1, .f1 = 1.0, .c1 = ’a’}}; ject to GCC “syntactically obvious” notion) CERBERUS -
u.m2.i2 = 2; // is this free of undef.beh.? DEFACTO : yes? CHERI : yes TIS : yes KCC : Execution
printf("u.m1.i1=%i u.m2.i2=%i\n",u.m1.i1,u.m2.i2);
}
failed (unclear why)
[Question 15/15 of our What is C in practice? (Cerberus
survey v2)26 relates to this.]
GCC -4.8-O0:
And finally, in some cases subsuming the previous clause,
u.m1.i1=2 u.m2.i2=2 6.5.2.3p3 and Footnote 95 explicitly license much more gen-
GCC -4.9-O0: . . . as above
eral type punning for union members, allowing the represen-
GCC -4.8-O2: . . . as above
tation of one member to be reinterpreted as another member.
GCC -4.9-O2: . . . as above
GCC -5.3-O2: . . . as above
• 6.5.2.3p3 “A postfix expression followed by the . operator
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above and an identifier designates a member of a structure or
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above union object. The value is that of the named member,95)
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above and is an lvalue if the first expression is an lvalue. If the
CLANG 33-O0: . . . as above first expression has qualified type, the result has the so-
CLANG 34-O0: . . . as above qualified version of the type of the designated member.”.
CLANG 35-O0: . . . as above • Footnote 95) “If the member used to read the contents of
CLANG 36-O0: . . . as above a union object is not the same as the member last used
CLANG 37-O0: . . . as above to store a value in the object, the appropriate part of
CLANG 33-O2: . . . as above the object representation of the value is reinterpreted as
CLANG 34-O2: . . . as above an object representation in the new type as described in
CLANG 35-O2: . . . as above 6.2.6 (a process sometimes called “type punning”). This
CLANG 36-O2: . . . as above might be a trap representation.”
CLANG 37-O2: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
The GCC documentation27 suggests that for this to work
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
the union must be somehow syntactically visible in the ac-
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
cess, in the construction of the lvalue, or in other words that
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
GCC pays attention to more of the lvalue than just the lvalue
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
type (at least with -fstrict-aliasing; without that, it’s
CLANG 37-UBSAN: . . . as above
not clear):
CLANG 37-ASAN: . . . as above
-fstrict-aliasing Allow the compiler to assume the
TIS - INTERPRETER :
strictest aliasing rules applicable to the language being
[value] Analyzing a complete application starting at
compiled. For C (and C++), this activates optimizations
main 26 www.cl.cam.ac.uk/ pes20/cerberus/
~
[value] Computing initial state notes50-survey-discussion.html
[value] Initial 27 https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.

state computed html#Type-punning

80 2016/3/17
based on the type of expressions. In particular, an object of
one type is assumed never to reside at the same address as j=-1065151889
an object of a different type, unless the types are almost the
same. For example, an unsigned int can alias an int, but [va
not a void* or a double. A character type may alias any lue] done for function main
other type. KCC :
Pay special attention to code like this: Execution failed (configuration dumped)
DEFACTO : defined behaviour (with implementation-defined
E XAMPLE (union_punning_gcc_1.c):
value)
// adapted from GCC docs
#include <stdio.h> ISO : defined behaviour (with implementation-defined
union a_union { value)
int i;
double d;
}; The practice of reading from a different union member than
int main() { the one most recently written to (called “type-punning”)
union a_union t; is common. Even with -fstrict-aliasing, type-punning is al-
t.d = 3.1415;
int j = t.i; // is this defined behaviour? lowed, provided the memory is accessed through the union
printf("j=%d\n",j); type. So, the code above works as expected. See Structures
} unions enumerations and bit-fields implementation. How-
ever, this code might not:
GCC -4.8-O0:
j=-1065151889 E XAMPLE (union_punning_gcc_2.c):
GCC -4.9-O0: . . . as above // adapted from GCC docs
GCC -4.8-O2: . . . as above #include <stdio.h>
union a_union {
GCC -4.9-O2: . . . as above int i;
GCC -5.3-O2: . . . as above double d;
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above };
int main() {
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above union a_union t;
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above int* ip;
CLANG 33-O0: . . . as above t.d = 3.1415;
ip = &t.i; // is this defined behaviour?
CLANG 34-O0: . . . as above
int j = *ip; // is this defined behaviour?
CLANG 35-O0: . . . as above printf("j=%d\n",j);
CLANG 36-O0: . . . as above }
CLANG 37-O0: . . . as above
CLANG 33-O2: . . . as above GCC -4.8-O0:
CLANG 34-O2: . . . as above j=-1065151889
CLANG 35-O2: . . . as above GCC -4.9-O0: . . . as above
CLANG 36-O2: . . . as above GCC -4.8-O2: . . . as above
CLANG 37-O2: . . . as above GCC -4.9-O2: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above GCC -5.3-O2: . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above CLANG 33-O0: . . . as above
CLANG 37-UBSAN: . . . as above CLANG 34-O0: . . . as above
CLANG 37-ASAN: . . . as above CLANG 35-O0: . . . as above
TIS - INTERPRETER : CLANG 36-O0: . . . as above
union punning gcc 1.c:9:[kernel] warning: Floating-point CLANG 37-O0: . . . as above
constant 3.1415 is not represented exactly. Will use CLANG 33-O2: . . . as above
0x1.921cac083126fp1. See documentation for option CLANG 34-O2: . . . as above
-warn-decimal-float CLANG 35-O2: . . . as above
[value] Analyzing a complete CLANG 36-O2: . . . as above
application starting at main CLANG 37-O2: . . . as above
[value] Computing initial CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
state CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
[value] Initial state computed CLANG 35-O2- NO - STRICT- ALIASING : . . . as above

81 2016/3/17
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above 3. their representation bytes can be inspected and still con-
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above tain their address values,
CLANG 37-UBSAN: . . . as above 4. pointer arithmetic and member offset calculations can be
CLANG 37-ASAN: . . . as above performed,
TIS - INTERPRETER :
union punning gcc 2.c:10:[kernel] warning: 5. they can be used to access a newer object that happens to
Floating-point constant 3.1415 is not represented be allocated at the same address, or
exactly. Will use 0x1.921cac083126fp1. See documentation 6. they can be used to access the memory that was used for
for option -warn-decimal-float the lifetime-ended object.
[value] Analyzing a
The ISO standard is clear that these are not allowed in a
complete application starting at main
useful way: 6.2.4 Storage durations of objects says (6.4.2p2)
[value] Computing
“If an object is referred to outside of its lifetime, the behavior
initial state
is undefined. The value of a pointer becomes indeterminate
[value] Initial state
when the object it points to (or just past) reaches the end
computed
of its lifetime.”. More precisely, the first sentence makes 6
and 5 undefined behaviour. The second sentence means that
j=-1065151889
1, 2, 3, and 4 are not guaranteed to have useful results, but
(in our reading, and in the absence of trap representations)
[value] done for function main
the standard text does not make these operations undefined
KCC :
behaviour. Other authors differ on this point.
Execution failed (configuration dumped)
This side-effect of lifetime end on all pointer values that
DEFACTO : undefined behaviour
point to the object, wherever they may be in the abstract-
ISO : unclear (perhaps defined behaviour with
machine state, is an unusual aspect of C when compared with
implementation-defined value?)
other programming language definitions.
Note that there is no analogue of this “lifetime-end zap”
in the standard text for pointers to objects stored within a
See also the LLVM mailing list thread on the
malloc’d region when those objects are overwritten (with
same topic: http://lists.cs.uiuc.edu/pipermail/
a strong update) with something of a different type; the
cfe-dev/2015-March/042034.html
lifetime end zap is not sufficient to maintain the invariant
Hence one should presumably regard both of these as
that all extant pointer values point to something live of the
giving undefined behaviour in the a facto semantics. The
appropriate type.
ISO standard text is unclear about whether it is allowed in
In practice the situation is less clear:
the standard or not.
For reference: a GCC mailing list post28 observes that 1. some debugging environments null out the pointer being
upcasts from int to union can go wrong in practice, and freed (though presumably not other pointers to the same
another29 says that GCC conforms to TC3 with respect to object)
type punning through union accesses. 2. one respondent notes “After a pointer is freed, its value
is undefined. A fairly common optimisation is to reuse the
2.16 Pointer lifetime end
stack slot used for a pointer in between it being freed and
After the end of the lifetime of an object30 , one can ask it having a defined value assigned to it.” though it is not
whether pointers to that object retain their values, or, in more clear whether this actually happens.
detail, whether:
On the other hand, several respondents suggest that checking
1. they can be compared (with == and !=) against other equality (with == or !=) against a pointer to an object whose
pointers, lifetime has ended is used and is supported by implementa-
2. they can be compared (with <, >, <=, or >=) against other tions. One remarks that whether the object has gone out of
pointers, scope or been free’d may be significant here, and so we give
an example below for each.
28 https://gcc.gnu.org/ml/gcc/2010-01/msg00013.html In a TrustInSoft blog post31 , Julian Cretin gives examples
29 https://gcc.gnu.org/ml/gcc/2010-01/msg00027.html showing GCC giving surprising results for comparisons be-
30 For an object of thread storage duration, the lifetime ends at the termi- tween lifetime-ended pointers. He argues that those pointers
nation of the thread (6.2.4p4). For an object of automatic storage duration have indeterminate values and hence that any uses of them,
(leaving aside those that “have a variable length array type” for the mo-
ment), the lifetime ends when “execution of that block ends in any way”
even in a == comparison, give undefined behaviour. The first
(6.2.4p6). For an object of allocated storage duration, the lifetime ends at
the deallocation of an associated free or realloc call (7.22.3p1). 31 http://trust-in-soft.com/dangling-pointer-indeterminate/

82 2016/3/17
is clear in the ISO standard; the second is not, at least in our (&i==pj)=false
reading – especially in implementations where there are no (&i==pj)=false
trap representations at pointer types. The behaviour he ob- GCC -5.3-O2: . . . as above
serves for pointer comparison could also be explained by GCC -4.8-O2- NO - STRICT- ALIASING :
the semantics we envision that nondeterministically takes pointer comparison eq zombie 1.c: In function ’main’:
pointer provenance into account, without requiring an appeal pointer comparison eq zombie 1.c:8:7: warning: attempt
to undefined behaviour. The behaviour of the corresponding to free a non-heap object ’i’ [-Wfree-nonheap-object]
integers (cast from pointers to uintptr t) is less clear, but
that could arguably be a compiler bug. free(pj);
^
2.16.1 Q43. Can one inspect the value, (e.g. by testing (&i==pj)=false
equality with ==) of a pointer to an object whose (&i==pj)=false
lifetime has ended (either at a free() or block GCC -4.9-O2- NO - STRICT- ALIASING :
exit)? pointer comparison eq zombie 1.c: In function ’main’:
D: ISO - VS - DEFACTO pointer comparison eq zombie 1.c:8:3: warning: attempt
ISO : no DEFACTO - USAGE: yes DEFACTO - IMPL: to free a non-heap object ’i’ [-Wfree-nonheap-object]
yes (except in debugging environments) CERBERUS -
DEFACTO : yes CHERI : yes TIS : no (warning of access free(pj);
to escaping addresses) KCC: no (flags UB) ^
[Question 8/15 of our What is C in practice? (Cerberus (&i==pj)=false
survey v2)32 relates to this.] (&i==pj)=false
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
E XAMPLE (pointer_comparison_eq_zombie_1.c):
CLANG 33-O0:
#include <stdio.h> (&i==pj)=false
#include <stdlib.h>
int main() { (&i==pj)=false
int i=0; CLANG 34-O0: . . . as above
int *pj = (int *)(malloc(sizeof(int))); CLANG 35-O0: . . . as above
*pj=1;
CLANG 36-O0: . . . as above
printf("(&i==pj)=%s\n",(&i==pj)?"true":"false");
free(pj); CLANG 37-O0: . . . as above
printf("(&i==pj)=%s\n",(&i==pj)?"true":"false"); CLANG 33-O2: . . . as above
// is the == comparison above defined behaviour?
CLANG 34-O2: . . . as above
return 0;
} CLANG 35-O2: . . . as above
CLANG 36-O2: . . . as above
GCC -4.8-O0:
CLANG 37-O2: . . . as above
(&i==pj)=false
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
(&i==pj)=false
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
GCC -4.9-O0: . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
GCC -4.8-O2:
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
pointer comparison eq zombie 1.c: In function ’main’:
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
pointer comparison eq zombie 1.c:8:7: warning: attempt
CLANG 37-UBSAN: . . . as above
to free a non-heap object ’i’ [-Wfree-nonheap-object]
CLANG 37-ASAN: . . . as above
TIS - INTERPRETER :
free(pj);
[value] Analyzing a complete application starting at
^
main
(&i==pj)=false
[value] Computing initial state
(&i==pj)=false
[value] Initial
GCC -4.9-O2:
state computed
pointer comparison eq zombie 1.c: In function ’main’:
pointer comparison eq zombie 1.c:5:[value
pointer comparison eq zombie 1.c:8:3: warning: attempt
] allocating variable malloc main l5
to free a non-heap object ’i’ [-Wfree-nonheap-object]

(&i==pj)=false
free(pj);
^
32 www.cl.cam.ac.uk/ pointer comparison eq zombie 1.c:9:[kernel] warning:
~pes20/cerberus/
notes50-survey-discussion.html

83 2016/3/17
accessing left-value that contains escaping addresses: CLANG 34-O0: . . . as above
CLANG 35-O0: . . . as above
assert \dangling(&pj); CLANG 36-O0: . . . as above
CLANG 37-O0: . . . as above
stack: main CLANG 33-O2: . . . as above
[value] Stopping at nth alarm CLANG 34-O2: . . . as above
[value] user CLANG 35-O2: . . . as above
error: Degeneration occurred: CLANG 36-O2: . . . as above
CLANG 37-O2: . . . as above
results are not correct for lines of code that can be CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
reached from the degeneration point. CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
KCC : CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
(&i==pj)=false CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
(&i==pj)=false CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
Error: UB-CEE4 CLANG 37-UBSAN: . . . as above
Description: Referring to an object outside of its CLANG 37-ASAN: . . . as above
lifetime. TIS - INTERPRETER :
Type: Undefined behavior. [value] Analyzing a complete application starting at
See also: C11 sec. main
6.2.4:2, J.2:1 item 9 [value] Computing initial state
at main(pointer comparison eq zo [value] Initial
mbie 1.c:9) state computed
at <file-scope>(<unknown>)
DEFACTO : switchable (&i==pj)=false
ISO : unclear -- nondeterministic or undefined behaviour
pointer comparison eq zo
Here the comparison against pj after the free() is unde- mbie 2.c:9:[value] warning: locals {j} escaping the
fined behaviour according to the ISO standard. GCC -O2 scope of a block of main through
gives a misleading warning about the free() itself (the pj
warning goes away if one omits either printf() or with pointer comparison eq zombie 2.c:11:[kernel] warning:
-O0); that might be a GCC bug. accessing left-value that contains escaping addresses:

E XAMPLE (pointer_comparison_eq_zombie_2.c):
assert \dangling(&pj);
#include <stdio.h>
#include <stdlib.h>
int main() { stack: main
int i=0; [value] Stopping at nth alarm
int *pj; [value] user
{
int j=1; error: Degeneration occurred:
pj = &j;
printf("(&i==pj)=%s\n",(&i==pj)?"true":"false"); results are not correct for lines of code that can be
}
printf("(&i==pj)=%s\n",(&i==pj)?"true":"false"); reached from the degeneration point.
// is the == comparison above defined behaviour? KCC :
return 0; (&i==pj)=false
}
(&i==pj)=false
GCC -4.8-O0: Error: UB-CEE4
(&i==pj)=false Description: Referring to an object outside of its
(&i==pj)=false lifetime.
GCC -4.9-O0: . . . as above Type: Undefined behavior.
GCC -4.8-O2: . . . as above See also: C11 sec.
GCC -4.9-O2: . . . as above 6.2.4:2, J.2:1 item 9
GCC -5.3-O2: . . . as above at main(pointer comparison eq zo
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above mbie 2.c:11)
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above at <file-scope>(<unknown>)
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above DEFACTO : switchable
CLANG 33-O0: . . . as above

84 2016/3/17
ISO : unclear -- nondeterministic or undefined behaviour compcertTSO-2.c:8:3: warning: function returns
address of local variable [-Wreturn-local-addr]
One could construct similar examples for rest of the first
four items above (relational comparison, access to represen- return (uintptr t)&a; }
tation bytes, and pointer arithmetic). We do not expect the ^
last two of the six (access to newly allocated objects or to (f()==g())=false
now-deallocated memory) are used in practice, at least in GCC -4.9-O2: . . . as above
non-malicious code. GCC -5.3-O2:
compcertTSO-2.c: In function ’f’:
2.16.2 Q44. Is the dynamic reuse of allocation compcertTSO-2.c:5:10: warning: function returns address
addresses permitted? of local variable [-Wreturn-local-addr]
return
ISO : yes DEFACTO - USAGE: yes DEFACTO - IMPL: yes (uintptr t)&a; }
CERBERUS - DEFACTO : yes? CHERI : ? TIS: test not sup- ^
ported (tis fails with escaping address, even though it’s cast compcertTSO-2.c: In
to intptr t – perhaps intentionally due to nondetermin- function ’g’:
ism?) KCC: mistakenly flags reference to an object outside compcertTSO-2.c:8:10: warning: function
its lifetime returns address of local variable [-Wreturn-local-addr]

E XAMPLE (compcertTSO-2.c): return (uintptr t)&a; }


#include <stdio.h> ^
#include <inttypes.h> (f()==g())=true
uintptr_t f() {
GCC -4.8-O2- NO - STRICT- ALIASING :
int a;
return (uintptr_t)&a; } compcertTSO-2.c: In function ’f’:
uintptr_t g() { compcertTSO-2.c:5:3: warning: function returns address
int a;
return (uintptr_t)&a; } of local variable [-Wreturn-local-addr]
int main() { return
_Bool b = (f() == g()); // can this be true? (uintptr t)&a; }
printf("(f()==g())=%s\n",b?"true":"false");
} ^
compcertTSO-2.c: In function
GCC -4.8-O0: ’g’:
compcertTSO-2.c: In function ’f’: compcertTSO-2.c:8:3: warning: function returns
compcertTSO-2.c:5:3: warning: function returns address address of local variable [-Wreturn-local-addr]
of local variable [-Wreturn-local-addr]
return return (uintptr t)&a; }
(uintptr t)&a; } ^
^ (f()==g())=false
compcertTSO-2.c: In function GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
’g’: GCC -5.3-O2- NO - STRICT- ALIASING :
compcertTSO-2.c:8:3: warning: function returns compcertTSO-2.c: In function ’f’:
address of local variable [-Wreturn-local-addr] compcertTSO-2.c:5:10: warning: function returns address
of local variable [-Wreturn-local-addr]
return (uintptr t)&a; } return
^ (uintptr t)&a; }
(f()==g())=true ^
GCC -4.9-O0: . . . as above compcertTSO-2.c: In
GCC -4.8-O2: function ’g’:
compcertTSO-2.c: In function ’f’: compcertTSO-2.c:8:10: warning: function
compcertTSO-2.c:5:3: warning: function returns address returns address of local variable [-Wreturn-local-addr]
of local variable [-Wreturn-local-addr]
return return (uintptr t)&a; }
(uintptr t)&a; } ^
^ (f()==g())=true
compcertTSO-2.c: In function CLANG 33-O0:
’g’:

85 2016/3/17
(f()==g())=true <file-scope>(<unknown>)
CLANG 34-O0: . . . as above
CLANG 35-O0: . . . as above This example based on one from CompCertTSO, as dis-
CLANG 36-O0: . . . as above cussed in §6.5. This version casts to uintptr t to make the
CLANG 37-O0: . . . as above out-of-lifetime == comparison permitted (at least w.r.t. our
CLANG 33-O2: reading of ISO), though GCC 4.8 -O2 still warns that the
(f()==g())=false functions return addresses of local variables. One could
CLANG 34-O2: . . . as above write analogous tests using other constructs that expose
CLANG 35-O2: . . . as above the concrete address of a pointer value, e.g. casting to an
CLANG 36-O2: . . . as above integer type, examining the pointer representation bytes,
CLANG 37-O2: . . . as above or using printf with %p. The CompCertTSO example
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above compcertTSO-1.c uses == on the pointer values directly
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above because (as in CompCert 1.5) none of those are supported
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above there, while CompCertTSO does allow that comparison.
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above 2.17 Invalid Accesses
CLANG 37-UBSAN: In the ISO standard, reads and writes to invalid pointers give
(f()==g())=true undefined behaviour, and likewise in typical implementa-
CLANG 37-ASAN: . . . as above tions. For a conventional C implementation, undefined be-
TIS - INTERPRETER : haviour for general invalid writes is essentially forced, given
[value] Analyzing a complete application starting at that they might (e.g.) write over return addresses on the
main stack. But accesses to NULL pointers and reads from an in-
[value] Computing initial state valid pointer could conceivably be strengthened, as in the
[value] Initial following two questions.
state computed
compcertTSO-2.c:5:[value] warning: locals 2.17.1 Q45. Can accesses via a null pointer be assumed
{a} escaping the scope of f through to give runtime errors, rather than give rise to
\result undefined behaviour?
compcertTSO-2.c:8:[value] warning: locals {a}
escaping the scope of g through ISO : no DEFACTO - USAGE: no? DEFACTO - IMPL: no?
\result CERBERUS - DEFACTO : should flag UB CHERI: ? TIS:
compcertTSO-2.c:10:[kernel] warning: accessing flags UB KCC: flags UB
left-value that contains escaping addresses:
E XAMPLE (null_pointer_4.c):
assert \dangling(&tmp); #include <stdio.h>
(tmp int main() {
int x;
from f())
// is this guaranteed to trap (rather than be
stack: main // undefined behaviour)?
[value] Stopping x = *(int*)NULL;
printf("x=%i\n",x);
at nth alarm
}
[value] user error: Degeneration occurred:
GCC -4.8-O0:

results are not correct for lines of GCC -4.9-O0: . . . as above


code that can be reached from the degeneration point. GCC -4.8-O2: . . . as above
KCC : GCC -4.9-O2: . . . as above

(f()==g())=true GCC -5.3-O2: . . . as above

Error: UB-CEE4 GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above


Description: Referring to an object outside of its GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
lifetime. GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above

Type: Undefined behavior. CLANG 33-O0:

See also: C11 sec. null pointer 4.c:6:7: warning: indirection of


6.2.4:2, J.2:1 item 9 non-volatile null pointer will be deleted, not trap
at main(compcertTSO-2.c:10) [-Wnull-dereference]
at x = *(int*)NULL;

86 2016/3/17
^ [-Wnull-dereference]
null pointer 4.c:6:7: note: consider using x = *(int*)NULL;
builtin trap() or qualifying pointer with ’volatile’
1 ^
warning generated. null pointer 4.c:6:7: note: consider using
CLANG 34-O0: . . . as above builtin trap() or qualifying pointer with ’volatile’
CLANG 35-O0: . . . as above 1
CLANG 36-O0: . . . as above warning generated.
CLANG 37-O0: . . . as above x=-5512
CLANG 33-O2: CLANG 37-UBSAN:
null pointer 4.c:6:7: warning: indirection of null pointer 4.c:6:7: warning: indirection of
non-volatile null pointer will be deleted, not trap non-volatile null pointer will be deleted, not trap
[-Wnull-dereference] [-Wnull-dereference]
x = *(int*)NULL; x = *(int*)NULL;

^ ^
null pointer 4.c:6:7: note: consider using null pointer 4.c:6:7: note: consider using
builtin trap() or qualifying pointer with ’volatile’ builtin trap() or qualifying pointer with ’volatile’
1 1
warning generated. warning generated.
x=0 null pointer 4.c:6:7: runtime error: load of null
CLANG 34-O2: . . . as above pointer of type ’int’
CLANG 35-O2: . . . as above CLANG 37-ASAN:
CLANG 36-O2: . . . as above null pointer 4.c:6:7: warning: indirection of
CLANG 37-O2: non-volatile null pointer will be deleted, not trap
null pointer 4.c:6:7: warning: indirection of [-Wnull-dereference]
non-volatile null pointer will be deleted, not trap x = *(int*)NULL;
[-Wnull-dereference]
x = *(int*)NULL; ^
null pointer 4.c:6:7: note: consider using
^ builtin trap() or qualifying pointer with ’volatile’
null pointer 4.c:6:7: note: consider using 1
builtin trap() or qualifying pointer with ’volatile’ warning generated.
1 ASAN:SIGSEGV
warning generated. ========================================================
x=-5472 =========
CLANG 33-O2- NO - STRICT- ALIASING : ==36176==ERROR: AddressSanitizer: SEGV on
null pointer 4.c:6:7: warning: indirection of unknown address 0x000000000000 (pc 0x00000047fb84 bp
non-volatile null pointer will be deleted, not trap 0x7fffffffea40 sp 0x7fffffffea30 T0)
[-Wnull-dereference] #0 0x47fb83
x = *(int*)NULL; (null pointer 4.c.clang37-ASAN.out+0x47fb83)
#1
^ 0x40b88e (null pointer 4.c.clang37-ASAN.out+0x40b88e)
null pointer 4.c:6:7: note: consider using
builtin trap() or qualifying pointer with ’volatile’ #2 0x8006b9fff (<unknown module>)
1
warning generated. AddressSanitizer
x=0 can not provide additional info.
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above SUMMARY:
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above AddressSanitizer: SEGV (null pointer 4.c.clang37-ASAN.ou
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above t+0x47fb83)
CLANG 37-O2- NO - STRICT- ALIASING : ==36176==ABORTING
null pointer 4.c:6:7: warning: indirection of TIS - INTERPRETER :
non-volatile null pointer will be deleted, not trap [value] Analyzing a complete application starting at

87 2016/3/17
main CLANG 33-O0: . . . as above
[value] Computing initial state CLANG 34-O0: . . . as above
[value] Initial CLANG 35-O0: . . . as above
state computed CLANG 36-O0: . . . as above
null pointer 4.c:6:[kernel] warning: out CLANG 37-O0: . . . as above
of bounds read. assert \valid read((int *)0); CLANG 33-O2: . . . as above
CLANG 34-O2: . . . as above
stack: main CLANG 35-O2: . . . as above
[value] Stopping at nth alarm CLANG 36-O2: . . . as above
[value] CLANG 37-O2: . . . as above
user error: Degeneration occurred: CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
results are not correct for lines of code that can be CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
reached from the degeneration point. CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
KCC : CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
File: null pointer 4.c CLANG 37-UBSAN:
Line: 6 x=0
Error: UB-CER3 read via invalid 1.c:5:7: runtime error: load of
Description: Dereferencing a null pointer. misaligned address 0x000000654321 for type ’int’, which
Type: requires 4 byte alignment
Undefined behavior. 0x000000654321: note: pointer
See also: C11 sec. 6.5.3.2:4, J.2:1 points here
item 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Translation failed. Run kcc -d -o 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
null pointer 4.c.kcc.out null pointer 4.c to see
commands run. ^
sh: 1: null pointer 4.c.kcc.out: not found CLANG 37-ASAN:
ISO : undefined behaviour ASAN:SIGSEGV
========================================================
This is inspired by the fifth example of Wang et al. [53], =========
discussed in §6.14. ==37103==ERROR: AddressSanitizer: SEGV on
unknown address 0x000000654321 (pc 0x00000047fb8a bp
2.17.2 Q46. Can reads via invalid pointers be assumed 0x7fffffffea30 sp 0x7fffffffea20 T0)
to give runtime errors or unspecified values, #0 0x47fb89
rather than undefined behaviour? (read via invalid 1.c.clang37-ASAN.out+0x47fb89)
#1
ISO : no DEFACTO - USAGE: no DEFACTO - IMPL: no 0x40b88e (read via invalid 1.c.clang37-ASAN.out+0x40b88
CERBERUS - DEFACTO : no CHERI : ? TIS: flags UB e)
KCC : reads some value, mistakenly not flagging UB #2 0x8006b9fff (<unknown
module>)
E XAMPLE (read_via_invalid_1.c):
#include <stdio.h> AddressSanitizer can not provide additional
int main() {
info.
int x;
// is this free of undefined behaviour? SUMMARY: AddressSanitizer: SEGV
x = *(int*)0x654321; (read via invalid 1.c.clang37-ASAN.out+0x47fb89)
printf("x=%i\n",x);
}
==37103==ABORTING
GCC -4.8-O0: TIS - INTERPRETER :
GCC -4.9-O0: . . . as above [value] Analyzing a complete application starting at
GCC -4.8-O2: . . . as above main
GCC -4.9-O2: . . . as above [value] Computing initial state
GCC -5.3-O2: . . . as above [value] Initial
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above state computed
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above read via invalid 1.c:5:[kernel] warning:
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above

88 2016/3/17
out of bounds read. assert \valid read((int For the de facto standard, as far as we can tell, trap
*)0x654321); representations can be neglected, and the last sentence of
stack: main 6.3.2.1p2 has debatable force.
[value]
Stopping at nth alarm 3.1 Trap Representations
[value] user error: Degeneration In the ISO standard, trap representations are object repre-
occurred: sentations that do not represent values of the object type, for
results are not correct which reading a trap representation, except by an lvalue of
for lines of code that can be reached from the character type, is undefined behaviour. Note that this gives
degeneration point. undefined behaviour to programs that merely read such a
KCC : representation, even if they do not operate on it. Note also
x=821003081 that this need not give rise to a hardware trap35 ; trap repre-
ISO : undefined behaviour sentations might simply licence some compiler optimisation,
by imposing an obligation on the programmer to avoid them.
This is from the Friendly C proposal (Point 4) by Cuoq 6.2.6.1p5 “Certain object representations need not rep-
et al., discussed in §6.17. For such a semantics one would resent a value of the object type. If the stored value of an
nonetheless want to identify a (different, not expressed in object has such a representation and is read by an lvalue
terms of undefined behaviour) sense in which such reads expression that does not have character type, the behavior
indicate programmer errors. is undefined. If such a representation is produced by a side
effect that modifies all or any part of the object by an lvalue
expression that does not have character type, the behavior is
3. Abstract Unspecified Values
undefined.50) Such a representation is called a trap repre-
[Question 2/15 of our What is C in practice? (Cerberus sentation.”. Footnote 50: “Thus, an automatic variable can
survey v2)33 relates to uninitialised values.] be initialized to a trap representation without causing unde-
The ISO standard introduces: fined behavior, but the value of the variable cannot be used
• indeterminate values which are “either an unspecified
until a proper value is stored in it.”.
However, it is not clear that trap representations are sig-
value or a trap representation” (3.19.2),
nificant in practice for current mainstream C implementa-
• unspecified values, saying “valid value of the relevant tions. For integer types it appears not:
type where this International Standard imposes no re-
quirements on which value is chosen in any instance. 2 • 6.2.6.1p5 makes clear that trap representations are partic-
NOTE An unspecified value cannot be a trap representa- ular concrete bit patterns, and in the most common inte-
tion.” (3.19.3), and ger type implementations there are no spare bits for inte-
ger types (See DR338 for similar reasoning), and
• trap representations, “an object representation that need
not represent a value of the object type” (3.19.4). • the GCC documentation states “GCC supports only two’s
complement integer types, and all bit patterns are or-
In the standard text, reading uninitialised values can give rise dinary values.”36 . (This resolves 6.2.6.2p2 “Which of
to undefined behaviour in two ways, either these applies is implementation-defined, as is whether the
value with sign bit 1 and all value bits zero (for the first
1. if the type being read does have some trap representations
two), or with sign bit and all value bits 1 (for ones’ com-
in the particular implementation being used, or
plement), is a trap representation or a normal value.”.)
2. if the last sentence of 6.3.2.1p2 applies (c.f. the DR338
CR34 ): “If the lvalue designates an object of automatic It is sometimes suggested that trap representations exist to
storage duration that could have been declared with the model Itanium’s NaT (“not a thing”) flag, e.g. in a stack-
register storage class (never had its address taken), and overflow discussion37 : “Such variables are treated specially
that object is uninitialized (not declared with an initial- because there are architectures that have real CPU registers
izer and no assignment to it has been performed prior that have a sort of extra state that is ”uninitialized” and that
to use), the behavior is undefined.”. This makes reading doesn’t correspond to a value in the type domain.” and “Ita-
such lvalues undefined behaviour irrespective of the ex- nium CPUs have a NaT (Not a Thing) flag for each integer
istence of trap representations. 35 3.19.5Footnote 2 “[...] Note that fetching a trap representation might
perform a trap but is not required to [...]”
33 www.cl.cam.ac.uk/ 36 https://gcc.gnu.org/onlinedocs/gcc/
~pes20/cerberus/
notes50-survey-discussion.html Integers-implementation.html#Integers-implementation
34 http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_ 37 http://stackoverflow.com/questions/11962457/

338.htm why-is-using-an-uninitialized-variable-undefined-behavior-in-c

89 2016/3/17
register. The NaT Flag is used to control speculative execu- GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
tion and may linger in registers which aren’t properly initial- CLANG 33-O0:
ized before usage.”. But that is at odds with this 6.2.6.1p5 CLANG 34-O0: . . . as above
text that makes clear that trap representations are storable CLANG 35-O0: . . . as above
concrete bit patterns. CLANG 36-O0: . . . as above
If it were not for this 6.2.6.1p5 text, one might deem there CLANG 37-O0: . . . as above
to be shadow semantic state determining whether any value CLANG 33-O2: . . . as above
is a trap representation, analogous to the pointer provenance CLANG 34-O2: . . . as above
data discussed earlier, but we see no reason to introduce that. CLANG 35-O2: . . . as above
For pointer types, one can imagine machines that check CLANG 36-O2: . . . as above
well-formedness of a pointer value when an address is CLANG 37-O2: . . . as above
loaded (e.g. into a particular kind of register), but this doesn’t CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
occur in the most common current hardware. We would be CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
interested to hear of any cases where it does, or where a com- CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
piler internally uses an analysis about trap representations. CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
There is also the case of floating point Signalling NaN’s. CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
One respondent remarks that in general we wouldn’t expect CLANG 37-UBSAN: . . . as above
to get a trap by reading an uninitialised value unless the FP CLANG 37-ASAN: . . . as above
settings enable signalling NaNs, and that Intel FPUs can do TIS - INTERPRETER :
that but Clang doesn’t support them, and so arranges for [value] Analyzing a complete application starting at
there to never be signalling NaNs. main
[value] Computing initial state
3.1.1 Q47. Can one reasonably assume that no types [value] Initial
have trap representations? state computed
U: DEFACTO D: ISO - VS - DEFACTO [value] done for function main
ISO : no DEFACTO - USAGE: yes DEFACTO - IMPL: yes for KCC :
most integer types; debatable for Bool, float, and pointer Error: UB-CEE2
types CERBERUS - DEFACTO: yes? CHERI: yes TIS: Description: Indeterminate value used in an
yes KCC: no (flags UB indeterminate value used in ex- expression.
pression) Type: Undefined behavior.
The following example has undefined behaviour in the See also: C11 sec.
ISO standard if and only if the implementation has a trap 6.2.4, 6.7.9, 6.8, J.2:1 item 11
representation for type int; one can also consider similar at
examples for any other object type (the address of i is taken, main(trap representation 1.c:4)
so the last sentence of 6.3.2.1p2 does not apply here). at
<file-scope>(<unknown>)
E XAMPLE (trap_representation_1.c):
DEFACTO : defined behaviour
int main() { ISO : defined or undefined behaviour depending on
int i;
int *p = &i; implementation-defined presence of trap representations
int j=i; // is this free of undefined behaviour? at this type
// note that i is read but the value is not used
}
Do any current C implementations rely on concrete trap
GCC -4.8-O0: representations that are representable as bit patterns? The
GCC -4.9-O0: . . . as above only possible case we are aware of is “signalling NaNs”.
GCC -4.8-O2: Supposedly definitely not for Clang. Do any current C im-
trap representation 1.c: In function ’main’: plementations rely on semantic shadow-state trap “represen-
trap representation 1.c:4:7: warning: ’i’ is used tations”?
uninitialized in this function [-Wuninitialized]
int 3.1.2 Q48. Does reading an uninitialised object give
j=i; // is this free of undefined behaviour? rise to undefined behaviour?
^ U: DEFACTO D: ISO - VS - DEFACTO
GCC -4.9-O2: . . . as above ISO : in some cases, depending on trap representations
GCC -5.3-O2: . . . as above and whether the address is taken DEFACTO - USAGE: no
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above DEFACTO - IMPL: unclear – perhaps for Bool and some
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above float types, and on Itanium? CERBERUS - DEFACTO: no

90 2016/3/17
CHERI : no more than the base Clang implementation TIS : CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
no for some tests, yes for others (guess that reading unini- CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
tialised is not flagged as UB, but branching on one is, as CLANG 37-UBSAN: . . . as above
nondeterministic) KCC: yes (flags UB Indeterminate value CLANG 37-ASAN: . . . as above
used in an expression) TIS - INTERPRETER :
The real question is then whether compiler writers as- [value] Analyzing a complete application starting at
sume that reading an uninitialised value gives rise to unde- main
fined behaviour (not merely an unspecified value), and rely [value] Computing initial state
on that to permit optimisation. [value] Initial
E XAMPLE (trap_representation_2.c): state computed
[value] done for function main
int main() {
int i; KCC :
int j=i; // does this have undefined behaviour? Error: UB-CEE2
// note that i is read but the value is not used Description: Indeterminate value used in an
}
expression.
GCC -4.8-O0: Type: Undefined behavior.
trap representation 2.c: In function ’main’: See also: C11 sec.
trap representation 2.c:3:7: warning: ’i’ is used 6.2.4, 6.7.9, 6.8, J.2:1 item 11
uninitialized in this function [-Wuninitialized] at
int main(trap representation 2.c:3)
j=i; // does this have undefined behaviour? at
^ <file-scope>(<unknown>)
GCC -4.9-O0: . . . as above DEFACTO : defined behaviour
GCC -4.8-O2: . . . as above ISO : undefined behaviour
GCC -4.9-O2: . . . as above
GCC -5.3-O2: . . . as above In practice we suspect that this would be at odds with too
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above much extant code. For example, it would mean that a partly
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above initialised struct could not be copied by a function that reads
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above and writes all its members.
CLANG 33-O0: Uninitialised memory is sometimes intentionally read as
trap representation 2.c:3:9: warning: variable ’i’ is a source of entropy, e.g. in openSSL, but whether this hap-
uninitialized when used here [-Wuninitialized] pens at non-character type is unclear, and it is now widely
int agreed to be undesirable in any case (see the Xi Wang blog
j=i; // does this have undefined behaviour? post38 which notes the problems involved).
On the other hand, Chris Lattner’s What Every C Pro-
^ grammer Should Know About Undefined Behavior #1/3 blog
trap representation 2.c:2:8: note: initialize the post39 says without qualification that “use of an uninitialized
variable ’i’ to silence this warning variable” is undefined behaviour (though this is in an intro-
int i; ductory section which might have been simplified for expo-
^ sition). Looking at the LLVM IR generated from

= 0 E XAMPLE (trap_representation_3.c):
1 warning generated.
CLANG 34-O0: . . . as above int f() {
CLANG 35-O0: . . . as above int i,j;
CLANG 36-O0: . . . as above j=i;
// int* ip=&i;
CLANG 37-O0: . . . as above return j;
CLANG 33-O2: . . . as above }
CLANG 34-O2: . . . as above
the front-end of Clang doesn’t seem to be assuming unde-
CLANG 35-O2: . . . as above
fined behaviour.
CLANG 36-O2: . . . as above
CLANG 37-O2: . . . as above 38 http://kqueue.org/blog/2012/06/25/
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above more-randomness-or-less/
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above 39 http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.

CLANG 35-O2- NO - STRICT- ALIASING : . . . as above html

91 2016/3/17
Besson et al. [9] seem to interpret the standard to mean frama-c-2013-03-13-2.c:5:17: note: initialize the
that reading an uninitialised variable always gives rise to variable ’j’ to silence this warning
undefined behaviour, but it’s not clear why. unsigned int j;
A Frama-C blog post by Pascal Cuoq40 gives examples
which it argues show that GCC has to be considered at treat- ^
ing reads of an uninitialised int as undefined behaviour, not = 0
unspecified behaviour, and (in the second example below) 2 warnings
even if its address is taken: generated.
j:16 c:1
E XAMPLE (frama-c-2013-03-13-2.c):
CLANG 34-O0: . . . as above
#include <stdio.h> CLANG 35-O0: . . . as above
CLANG 36-O0: . . . as above
int main(int c, char **v)
{ CLANG 37-O0: . . . as above
unsigned int j; CLANG 33-O2:
if (c==4)
frama-c-2013-03-13-2.c:3:24: warning: unused parameter
j = 1;
else ’v’ [-Wunused-parameter]
j *= 2; int main(int c, char **v)
// does this have undefined behaviour for c != 4 ?
printf("j:%u ",j);
printf("c:%d\n",c); ^
} frama-c-2013-03-13-2.c:9:5: warning:
variable ’j’ is uninitialized when used here
GCC -4.8-O0:
[-Wuninitialized]
frama-c-2013-03-13-2.c: In function ’main’:
j *= 2;
frama-c-2013-03-13-2.c:3:24: warning: unused parameter
’v’ [-Wunused-parameter]
^
int main(int c, char **v)
frama-c-2013-03-13-2.c:5:17: note: initialize the
variable ’j’ to silence this warning
^
unsigned int j;
frama-c-2013-03-13-2.c:9:7:
warning: ’j’ may be used uninitialized in this function
^
[-Wmaybe-uninitialized]
= 0
j *= 2;
2 warnings
^
generated.
j:0 c:1
j:0 c:1
GCC -4.9-O0: . . . as above
GCC -4.8-O2: . . . as above CLANG 34-O2: . . . as above
GCC -4.9-O2: . . . as above
CLANG 35-O2: . . . as above
GCC -5.3-O2: . . . as above
CLANG 36-O2: . . . as above
CLANG 37-O2: . . . as above
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O0:
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
frama-c-2013-03-13-2.c:3:24: warning: unused parameter
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
’v’ [-Wunused-parameter]
CLANG 37-UBSAN:
int main(int c, char **v)
frama-c-2013-03-13-2.c:3:24: warning: unused parameter
’v’ [-Wunused-parameter]
^
int main(int c, char **v)
frama-c-2013-03-13-2.c:9:5: warning:
variable ’j’ is uninitialized when used here
^
[-Wuninitialized]
frama-c-2013-03-13-2.c:9:5: warning:
j *= 2;
variable ’j’ is uninitialized when used here
[-Wuninitialized]
^
j *= 2;
40 http://blog.frama-c.com/index.php?post/2013/03/13/

indeterminate-undefined

92 2016/3/17
^ printf("c:%d\n",c);
frama-c-2013-03-13-2.c:5:17: note: initialize the }
variable ’j’ to silence this warning
unsigned int j; GCC -4.8-O0:
frama-c-2013-03-13-3.c: In function ’main’:
^ frama-c-2013-03-13-3.c:3:24: warning: unused parameter
= 0 ’v’ [-Wunused-parameter]
2 warnings int main(int c, char **v)
generated.
j:16 c:1 ^
CLANG 37-ASAN: . . . as above j:0 c:1
TIS - INTERPRETER : GCC -4.9-O0: . . . as above
[value] Analyzing a complete application starting at GCC -4.8-O2:

main frama-c-2013-03-13-3.c: In function ’main’:


[value] Computing initial state frama-c-2013-03-13-3.c:3:24: warning: unused parameter
[value] Initial ’v’ [-Wunused-parameter]
state computed int main(int c, char **v)
frama-c-2013-03-13-2.c:9:[kernel]
warning: accessing uninitialized left-value: assert ^
\initialized(&j); frama-c-2013-03-13-3.c:10:7:
stack: main warning: ’j’ may be used uninitialized in this function
[value] [-Wmaybe-uninitialized]
Stopping at nth alarm j *= 2;
[value] user error: Degeneration ^
occurred: j:0 c:1
results are not correct GCC -4.9-O2: . . . as above

for lines of code that can be reached from the GCC -5.3-O2: . . . as above

degeneration point. GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above


KCC : GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
j:0 c:1 GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above

Error: UB-CEE2 CLANG 33-O0:

Description: Indeterminate value used in an frama-c-2013-03-13-3.c:3:24: warning: unused parameter


expression. ’v’ [-Wunused-parameter]
Type: Undefined behavior. int main(int c, char **v)
See also: C11 sec.
6.2.4, 6.7.9, 6.8, J.2:1 item 11 ^
at 1 warning generated.
main(frama-c-2013-03-13-2.c:9) j:16 c:1
at CLANG 34-O0: . . . as above

<file-scope>(<unknown>) CLANG 35-O0: . . . as above

DEFACTO : nondeterministic value for j CLANG 36-O0: . . . as above


ISO : undefined behaviour CLANG 37-O0: . . . as above
CLANG 33-O2:
frama-c-2013-03-13-3.c:3:24: warning: unused parameter
’v’ [-Wunused-parameter]
E XAMPLE (frama-c-2013-03-13-3.c):
int main(int c, char **v)
#include <stdio.h>

int main(int c, char **v) ^


{ 1 warning generated.
unsigned int j; j:0 c:1
unsigned int *p = &j;
if (c==4) CLANG 34-O2: . . . as above
j = 1; CLANG 35-O2: . . . as above
else CLANG 36-O2: . . . as above
j *= 2;
// does this have undefined behaviour for c != 4 ? CLANG 37-O2: . . . as above
printf("j:%u ",j); CLANG 33-O2- NO - STRICT- ALIASING : . . . as above

93 2016/3/17
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above The same happens using unsigned char instead of int41 .
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above But this behaviour is still consistent with a semantics that
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above treats reads of uninitialised variables as giving a symbolic
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above undefined value which arithmetic operations are strict in,
CLANG 37-UBSAN: which is a possible semantics not discussed in that blog
frama-c-2013-03-13-3.c:3:24: warning: unused parameter
’v’ [-Wunused-parameter] 41
int main(int c, char **v)
E XAMPLE (frama-c-2013-03-13-3-uc.c):
GCC -4.8-O0:
^
frama-c-2013-03-13-3-uc.c: In function ’main’:
1 warning generated. frama-c-2013-03-13-3-uc.c:2:24: warning: unused
j:16 c:1 parameter ’v’ [-Wunused-parameter]
CLANG 37-ASAN: int main(int c, char
frama-c-2013-03-13-3.c:3:24: warning: unused parameter **v) {
^
’v’ [-Wunused-parameter] j:0 c:1
int main(int c, char **v) GCC -4.9-O0: . . . as above
GCC -4.8-O2:
frama-c-2013-03-13-3-uc.c: In function ’main’:
^
frama-c-2013-03-13-3-uc.c:2:24: warning: unused
1 warning generated. parameter ’v’ [-Wunused-parameter]
j:0 c:1 int main(int c, char
TIS - INTERPRETER : **v) {
[value] Analyzing a complete application starting at
^
main frama-c-2013-03-13-3-uc.c:8:7: warning: ’j’ may be
[value] Computing initial state used uninitialized in this function
[value] Initial [-Wmaybe-uninitialized]
j *= 2;
state computed ^
frama-c-2013-03-13-3.c:10:[kernel] j:0 c:1
warning: accessing uninitialized left-value: assert GCC -4.9-O2: . . . as above
GCC -5.3-O2: . . . as above
\initialized(&j);
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
stack: main GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
[value] GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
Stopping at nth alarm CLANG 33-O0:
frama-c-2013-03-13-3-uc.c:2:24: warning: unused
[value] user error: Degeneration
parameter ’v’ [-Wunused-parameter]
occurred: int main(int c, char
results are not correct **v) {
for lines of code that can be reached from the ^
1 warning generated.
degeneration point.
j:0 c:1
KCC : CLANG 34-O0: . . . as above
j:0 c:1 CLANG 35-O0: . . . as above
Error: UB-CEE2 CLANG 36-O0: . . . as above
CLANG 37-O0: . . . as above
Description: Indeterminate value used in an
CLANG 33-O2: . . . as above
expression. CLANG 34-O2: . . . as above
Type: Undefined behavior. CLANG 35-O2: . . . as above
See also: C11 sec. CLANG 36-O2: . . . as above
CLANG 37-O2: . . . as above
6.2.4, 6.7.9, 6.8, J.2:1 item 11 CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
at CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
main(frama-c-2013-03-13-3.c:10) CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
at
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
<file-scope>(<unknown>) CLANG 37-UBSAN: . . . as above
DEFACTO : nondeterministic value for j CLANG 37-ASAN: . . . as above
ISO : nondeterministic value for j TIS - INTERPRETER :
[value] Analyzing a complete application starting at
main
[value] Computing initial state
[value] Initial
state computed

94 2016/3/17
post; it does not force a semantics giving global undefined CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
behaviour. CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
Returning to the last sentence of 6.3.2.1p2, it is restricted CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
in two ways: to objects of automatic storage duration, and CLANG 37-UBSAN: . . . as above
moreover to those whose address is not taken. That makes CLANG 37-ASAN: . . . as above
the above trap_representation_2.c have undefined be- TIS - INTERPRETER :
haviour but the following example just read an unspecified [value] Analyzing a complete application starting at
value (presuming that int has no trap representations). main
[value] Computing initial state
E XAMPLE (trap_representation_1.c):
[value] Initial
int main() {
state computed
int i;
int *p = &i; [value] done for function main
int j=i; // is this free of undefined behaviour? KCC :
// note that i is read but the value is not used
Error: UB-CEE2
}
Description: Indeterminate value used in an
GCC -4.8-O0: expression.
GCC -4.9-O0: . . . as above Type: Undefined behavior.
GCC -4.8-O2: See also: C11 sec.
trap representation 1.c: In function ’main’: 6.2.4, 6.7.9, 6.8, J.2:1 item 11
trap representation 1.c:4:7: warning: ’i’ is used at
uninitialized in this function [-Wuninitialized] main(trap representation 1.c:4)
int at
j=i; // is this free of undefined behaviour? <file-scope>(<unknown>)
^ DEFACTO : defined behaviour
GCC -4.9-O2: . . . as above ISO : defined or undefined behaviour depending on
GCC -5.3-O2: . . . as above implementation-defined presence of trap representations
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above at this type
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O0: 3.2 Unspecified Values
CLANG 34-O0: . . . as above Standard Unspecified values are introduced in the stan-
CLANG 35-O0: . . . as above dard principally:
CLANG 36-O0: . . . as above
1. for otherwise-uninitialized objects with automatic stor-
CLANG 37-O0: . . . as above
age duration (6.2.4p6 and 6.7.9p10), and
CLANG 33-O2: . . . as above
CLANG 34-O2: . . . as above 2. for the values of padding bytes on writes to structures or
CLANG 35-O2: . . . as above unions (6.2.6.1p6 “When a value is stored in an object
CLANG 36-O2: . . . as above of structure or union type, including in a member object,
CLANG 37-O2: . . . as above the bytes of the object representation that correspond to
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above any padding bytes take unspecified values.51) [...]” with
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above Footnote 51: “Thus, for example, structure assignment
need not copy any padding bits.”).
frama-c-2013-03-13-3-uc.c:8:[kernel]
warning: accessing uninitialized left-value: assert In principle those two could have different semantics, but so
\initialized(&j); far we see no reason to distinguish them.
stack: main
[value] The behaviour of an unspecified value is described as:
Stopping at nth alarm “[...] valid value of the relevant type where this International
[value] user error: Degeneration Standard imposes no requirements on which value is chosen
occurred: in any instance. [...]” (3.19.3).
results are not correct
for lines of code that can be reached from the Semantics That standard text leaves several quite different
degeneration point.
semantic interpretations of unspecified values open:
KCC :
Execution failed (configuration dumped) 1. the semantics could choose a concrete value nondeter-
DEFACTO : nondeterministic value for j
ministically (from among the set of valid values) for each
ISO : nondeterministic value for j
unspecified value, at the time of the initialization or store
(and keeping that concrete value stable thereafter), or

95 2016/3/17
2. the semantics could include a symbolic constant repre- eration may be affected, but that your entire program isn’t
senting an abstract unspecified value, allow that to occur destroyed. This is why the optimizer ends up deleting code
in memory writes, and either that operates on uninitialized variables, for example.”.
It also says “The optimizer does go to some effort to
(a) choose a concrete value nondeterministically each ”do the right thing” when it is obvious what the program-
time such a constant is read from, or mer meant (such as code that does ”*(int*)P” when P is a
(b) propagate the abstract unspecified value through arith- pointer to float). This helps in many common cases, but you
metic, regarding all operations as strict (giving the really don’t want to rely on this, and there are lots of ex-
unspecified-value result if any of their arguments are amples that you might think are ”obvious” that aren’t after
unspecified values). Then on a control-flow choice a long series of transformations have been applied to your
based on an unspecified value, it could either code.”, which suggests that it’s a bit more liberal than one
i. nondeterministically branch or might imagine for type-based alias analysis?
ii. give undefined behaviour. 3.2.1 Q49. Can library calls with unspecified-value
And on any library call (or perhaps better any I/O sys- arguments be assumed to execute with an
tem call?) involving an unspecified-value argument, it arbitrary choice of a concrete value (not
could either: necessarily giving rise to undefined behaviour)?

A. nondeterministically choose a concrete value, or U: ISO D: ISO - VS - DEFACTO


ISO : unclear (unless one follows DR451) DEFACTO -
B. give undefined behaviour. USAGE: yes DEFACTO - IMPL: yes CERBERUS -
Or it could have a per-representation-bit undefined-value DEFACTO : yes CHERI : no more than the
constant rather than a per-abstract-value undefined-value base Clang implementation TIS: no (warning
constant (with the same sub-choices) unspecified value libary call argument) KCC :
Execution failed (unclear why)
3. Or (as per Besson et al. [9]) pick a fresh symbolic value We start with this so that printf can be used in later
(per bit, byte, or value) and allow computation on that. examples.
The following examples explore what one can assume E XAMPLE (unspecified_value_library_call_argument.c):
about the behaviour of uninitialised variables. We use #include <stdio.h>
unsigned char in these examples so that there is no ques- int main()
tion of trap representations being involved. We take unspec- {
unsigned char c;
ified values directly from uninitialised variables with auto- unsigned char *p = &c;
matic storage duration, so the compiler can easily see that printf("char 0x%x\n",(unsigned int)c);
they are uninitialised, but they could equally be taken from // does this have defined behaviour?
}
reads of a computed pointer that happens to end up point-
ing at a structure padding byte. We also take the address of
the uninitialised variable in each example to ensure the last GCC -4.8-O0:

sentence of 6.3.2.1p2 does not apply, though in our de facto char 0x0
semantics that makes no difference. GCC -4.9-O0: . . . as above

See the LLVM discussion of its undef and GCC -4.8-O2:

poison 42 . And this LLVM thread about “poison”: unspecified value library call argument.c: In function
http://lists.cs.uiuc.edu/pipermail/llvmdev/ ’main’:
2015-January/081310.html unspecified value library call argument.c:6:9:
Chris Lattner’s What Every C Programmer Should Know warning: ’c’ is used uninitialized in this function
About Undefined Behavior #3/3 blog post43 says that “Arith- [-Wuninitialized]
metic that operates on undefined values is considered to pro- printf("char 0x%x\n",(unsigned
duce a undefined value instead of producing undefined be- int)c);
havior.” and “Arithmetic that dynamically executes an un- ^
defined operation (such as a signed integer overflow) gen- char 0x0
erates a logical trap value which poisons any computation GCC -4.9-O2:

based on it, but that does not destroy your entire program. unspecified value library call argument.c: In function
This means that logic downstream from the undefined op- ’main’:
unspecified value library call argument.c:6:3:
42 http://llvm.org/docs/LangRef.html#undefined-values warning: ’c’ is used uninitialized in this function
[-Wuninitialized]
43 http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_

21.html printf("char 0x%x\n",(unsigned

96 2016/3/17
int)c); [value] Stopping at nth alarm
^ [value] user error:
char 0x0 Degeneration occurred:
GCC -5.3-O2: . . . as above results are
GCC -4.8-O2- NO - STRICT- ALIASING : not correct for lines of code that can be reached from
unspecified value library call argument.c: In function the degeneration point.
’main’: KCC :
unspecified value library call argument.c:6:9: Execution failed (configuration dumped)
warning: ’c’ is used uninitialized in this function DEFACTO :nondeterministic value
[-Wuninitialized] ISO : unclear - nondeterministic value or (from DR451CR)
printf("char 0x%x\n",(unsigned undefined behaviour
int)c);
^ GCC and Clang both print a zero value.
char 0x0 The CR to DR451, below (§3.2.3, p.98), implies that call-
GCC -4.9-O2- NO - STRICT- ALIASING : ing library functions on indeterminate values is undefined
unspecified value library call argument.c: In function behaviour, but that seems too restrictive, e.g. preventing se-
’main’: rialising a struct that contains padding or uninitialised mem-
unspecified value library call argument.c:6:3: bers by printing it (byte-by-byte or member-by-member).
warning: ’c’ is used uninitialized in this function And we don’t see how it is exploitable by compilers.
[-Wuninitialized] We also have to consider library calls with unspecified-
printf("char 0x%x\n",(unsigned value arguments of pointer type; they should give undefined
int)c); behaviour if the pointer is used for access, and perhaps could
^ be deemed to give undefined behaviour whether or not the
char 0x0 pointer is used.
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O0: 3.2.2 Q50. Can control-flow choices based on
char 0x0 unspecified values be assumed to make an
CLANG 34-O0: . . . as above arbitrary choice (not giving rise to undefined
CLANG 35-O0: . . . as above behaviour)?
CLANG 36-O0: . . . as above U: ISO U: DEFACTO
CLANG 37-O0: . . . as above ISO : unclear - yes? DEFACTO - USAGE: yes DEFACTO -
CLANG 33-O2: . . . as above IMPL: unclear - yes? CERBERUS - DEFACTO : yes CHERI :
CLANG 34-O2: . . . as above yes TIS: no KCC: yes
CLANG 35-O2: . . . as above
CLANG 36-O2: . . . as above E XAMPLE (unspecified_value_control_flow_choice.c):
CLANG 37-O2: . . . as above #include <stdio.h>
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above int main()
{
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above unsigned char c;
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above unsigned char *p = &c;
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above if (c == ’a’)
printf("equal\n");
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
else
CLANG 37-UBSAN: . . . as above printf("nonequal\n");
CLANG 37-ASAN: . . . as above // does this have defined behaviour?
}
TIS - INTERPRETER :
[value] Analyzing a complete application starting at
main GCC -4.8-O0:

[value] Computing initial state nonequal


[value] Initial GCC -4.9-O0: . . . as above
state computed GCC -4.8-O2:

unspecified value library call argument.c unspecified value control flow choice.c: In function
:6:[kernel] warning: accessing uninitialized left-value: ’main’:
assert \initialized(&c); unspecified value control flow choice.c:6:9:
stack: warning: ’c’ is used uninitialized in this function
main [-Wuninitialized]
if (c == ’a’)

97 2016/3/17
^ go either way, but they have to come from one of the two
nonequal operands.”44 .
GCC -4.9-O2: . . . as above An example from Joseph Myers, with a switch derived
GCC -5.3-O2: . . . as above from several uninitialised Bool values, suggests that com-
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above pilers could do wild jumps if the values are not in {0, 1},
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above but he didn’t observe GCC actually do that. If they do, and
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above if such values are not regarded as trap representations (in
CLANG 33-O0: which case the program would already have undefined be-
nonequal haviour due to the loads), then this question would have to
CLANG 34-O0: . . . as above be ‘no’.
CLANG 35-O0: . . . as above In the de facto standards this example seems to be permit-
CLANG 36-O0: . . . as above ted. The ISO standard does not address the question explic-
CLANG 37-O0: . . . as above itly, but the value of c is unambigously an unspecified value
CLANG 33-O2: . . . as above w.r.t. the standard, and 3.19.3p1 “unspecified value: valid
CLANG 34-O2: . . . as above value of the relevant type where this International Standard
CLANG 35-O2: . . . as above imposes no requirements on which value is chosen in any
CLANG 36-O2: . . . as above instance” implies that one should be able to make a compar-
CLANG 37-O2: . . . as above ison and branch based on it.
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above 3.2.3 Q51. In the absence of any writes, is an
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
unspecified value potentially unstable, i.e., can
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
multiple usages of it give different values?
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above U: ISO
CLANG 37-UBSAN: . . . as above ISO : unclear - yes? DEFACTO - USAGE: yes DEFACTO -
CLANG 37-ASAN: . . . as above IMPL: yes CERBERUS - DEFACTO : yes CHERI : yes
TIS - INTERPRETER : TIS : test not supported – it seems printing the uninitialised
[value] Analyzing a complete application starting at value makes tis flag an error KCC: flags UB indeterminate
main value in expression (also reports error for printing signed
[value] Computing initial state int with %x)
[value] Initial
E XAMPLE (unspecified_value_stability.c):
state computed
unspecified value control flow choice.c:6 #include <stdio.h>
int main() {
:[kernel] warning: accessing uninitialized left-value: // assume here that int has no trap representations and
assert \initialized(&c); // that printing an unspecified value is not itself
stack: // undefined behaviour
int i;
main int *p = &i;
[value] Stopping at nth alarm // can the following print different values?
[value] user error: printf("i=0x%x\n",i);
printf("i=0x%x\n",i);
Degeneration occurred: printf("i=0x%x\n",i);
results are printf("i=0x%x\n",i);
not correct for lines of code that can be reached from }
the degeneration point. GCC -4.8-O0:
KCC : i=0x0
nonequal i=0x0
DEFACTO : defined behaviour (printing a nondeterministic i=0x0
value) i=0x0
ISO : defined behaviour (printing a nondeterministic GCC -4.9-O0: . . . as above
value) GCC -4.8-O2:
unspecified value stability.c: In function ’main’:
One respondent remarks that Clang decides c is definitely unspecified value stability.c:9:9: warning: ’i’ is used
not equal to ’a’; GCC appears to do the same. This is con- uninitialized in this function [-Wuninitialized]
sistent with the docmentation for the Clang internal undef:
“undefined ‘select’ (and conditional branch) conditions can printf("i=0x%x\n",i);

44 http://llvm.org/docs/LangRef.html#undefined-values

98 2016/3/17
^ CLANG 34-O2: . . . as above
i=0x0 CLANG 35-O2: . . . as above
i=0x0 CLANG 36-O2: . . . as above
i=0x0 CLANG 37-O2: . . . as above
i=0x0 CLANG 33-O2- NO - STRICT- ALIASING :
GCC -4.9-O2: i=0xffffea60
unspecified value stability.c: In function ’main’: i=0x4007cd
unspecified value stability.c:9:3: warning: ’i’ is used i=0x4007cd
uninitialized in this function [-Wuninitialized] i=0x4007cd
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
printf("i=0x%x\n",i); CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
^ CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
i=0x0 CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
i=0x0 CLANG 37-UBSAN:
i=0x0 i=0x0
i=0x0 i=0x0
GCC -5.3-O2: . . . as above i=0x0
GCC -4.8-O2- NO - STRICT- ALIASING : i=0x0
unspecified value stability.c: In function ’main’: CLANG 37-ASAN: . . . as above
unspecified value stability.c:9:9: warning: ’i’ is used TIS - INTERPRETER :
uninitialized in this function [-Wuninitialized] [value] Analyzing a complete application starting at
main
printf("i=0x%x\n",i); [value] Computing initial state
^ [value] Initial
i=0x0 state computed
i=0x0 unspecified value stability.c:9:[kernel]
i=0x0 warning: accessing uninitialized left-value: assert
i=0x0 \initialized(&i);
GCC -4.9-O2- NO - STRICT- ALIASING : stack: main
unspecified value stability.c: In function ’main’: [value]
unspecified value stability.c:9:3: warning: ’i’ is used Stopping at nth alarm
uninitialized in this function [-Wuninitialized] [value] user error: Degeneration
occurred:
printf("i=0x%x\n",i); results are not correct
^ for lines of code that can be reached from the
i=0x0 degeneration point.
i=0x0 KCC :
i=0x0 i=0x0
i=0x0 i=0x0
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above i=0x0
CLANG 33-O0: i=0x0
i=0x0 Error: UB-CEE2
i=0x0 Description: Indeterminate value used in an
i=0x0 expression.
i=0x0 Type: Undefined behavior.
CLANG 34-O0: . . . as above See also: C11 sec.
CLANG 35-O0: . . . as above 6.2.4, 6.7.9, 6.8, J.2:1 item 11
CLANG 36-O0: . . . as above at
CLANG 37-O0: . . . as above main(unspecified value stability.c:9)
CLANG 33-O2: at
i=0xffffea88 <file-scope>(<unknown>)
i=0x4007cd Error: UB-STDIO1
i=0x4007cd Description:
i=0x4007cd ’printf’: Mismatch between the type expected by the

99 2016/3/17
conversion specifier %x and the type of the printf(unspecified value stability.c:11)
argument. at
Type: Undefined behavior. main(unspecified value stability.c:11)
See also: C11 sec. at
7.21.6.1:9, J.2:1 item 153 <file-scope>(<unknown>)
at Error: UB-CEE2
printf(unspecified value stability.c:9) Description:
at Indeterminate value used in an expression.
main(unspecified value stability.c:9) Type:
at Undefined behavior.
<file-scope>(<unknown>) See also: C11 sec. 6.2.4, 6.7.9,
Error: UB-CEE2 6.8, J.2:1 item 11
Description: at main(unspecified value stability
Indeterminate value used in an expression. .c:12)
Type: at <file-scope>(<unknown>)
Undefined behavior. Error:
See also: C11 sec. 6.2.4, 6.7.9, UB-STDIO1
6.8, J.2:1 item 11 Description: ’printf’: Mismatch between the
at main(unspecified value stability type expected by the conversion specifier %x and the
.c:10) type of the argument.
at <file-scope>(<unknown>) Type: Undefined behavior.
Error: See
UB-STDIO1 also: C11 sec. 7.21.6.1:9, J.2:1 item 153
Description: ’printf’: Mismatch between the at
type expected by the conversion specifier %x and the printf(unspecified value stability.c:12)
type of the argument. at
Type: Undefined behavior. main(unspecified value stability.c:12)
See at
also: C11 sec. 7.21.6.1:9, J.2:1 item 153 <file-scope>(<unknown>)
at DEFACTO : defined behaviour (printing nondeterministic
printf(unspecified value stability.c:10) values)
at ISO : unclear - nondeterministic value or (from DR451CR)
main(unspecified value stability.c:10) undefined behaviour
at
<file-scope>(<unknown>) If we assume that printing an unspecified value is not itself
Error: UB-CEE2 undefined behaviour, we can test with this example. Note
Description: that in a semantics (like our Cerberus candidate de facto
Indeterminate value used in an expression. model) with a symbolic unspecified value, and in which op-
Type: erations are strict in unspecified-value-ness, this question
Undefined behavior. only really makes sense for external library calls, as other
See also: C11 sec. 6.2.4, 6.7.9, (data-flow) uses of an unspecified value will result in the
6.8, J.2:1 item 11 (unique) symbolic unspecified value, not in a nondetermin-
at main(unspecified value stability istic choice of concrete values.
.c:11) Both GCC and Clang warn that i is used uninitialized;
at <file-scope>(<unknown>) Clang sometimes prints distinct values. That is the first time
Error: that we’ve seen instability in practice; it (under the above
UB-STDIO1 assumption) rules out (1).
Description: ’printf’: Mismatch between the This is consistent with the Clang internal undef docu-
type expected by the conversion specifier %x and the mentation: “an ‘undef’ “variable” can arbitrarily change
type of the argument. its value”45 .
Type: Undefined behavior.
See
also: C11 sec. 7.21.6.1:9, J.2:1 item 153
at
45 http://llvm.org/docs/LangRef.html#undefined-values

100 2016/3/17
DR 451 by Freek Wiedijk and Robbert Krebbers46 asks CERBERUS - DEFACTO : yes CHERI : yes TIS: test not
about stability of uninitialised variables with automatic stor- supported (fails either on first read of uninitialised value or
age duration, and also about library calls with indeterminate on the arithmetic) KCC: flags UB indeterminate value in
values. Their questions and the committee responses are: expression
1 “Can an uninitialized variable with automatic storage E XAMPLE (unspecified_value_strictness_int.c):
duration (of a type that does not have trap values, whose
#include <stdio.h>
address has been taken so 6.3.2.1p2 does not apply, and int main() {
which is not volatile) change its value without direct int i;
action of the program?”. CR: yes int *p = &i;
int j = (i-i); // is this an unspecified value?
2 “If the answer to question 1 is ”yes”, then how far can _Bool b = (j==j); // can this be false?
this kind of ”instability” propagate?” CR: any operation printf("b=%s\n",b?"true":"false");
}
performed on indeterminate values will have an indeter-
minate value as a result. GCC -4.8-O0:

Note that this strong strictness is stronger than Clang’s b=true


documented behaviour, as we discuss in §3.2.4 (p.101). GCC -4.9-O0: . . . as above
GCC -4.8-O2: . . . as above
3 “If “unstable” values can propagate through function GCC -4.9-O2: . . . as above
arguments into a called function, can calling a C stan- GCC -5.3-O2: . . . as above
dard library function exhibit undefined behavior because GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
of this?” CR: “library functions will exhibit undefined GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
behavior when used on indeterminate values”. GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
Note that this means one cannot print an uninitialised CLANG 33-O0:
value or padding byte. For our de facto semantics, we unspecified value strictness int.c:6:15: warning:
argue otherwise (c.f. §3.2.1, p.96). self-comparison always evaluates to true
The CR also says “ The committee agrees that this area [-Wtautological-compare]
would benefit from a new definition of something akin to a Bool b = (j==j); // can this
“wobbly” value and that this should be considered in any be false?
subsequent revision of this standard. The committee also ^
notes that padding bytes within structures are possibly a 1 warning generated.
distinct form of “wobbly” representation. ” b=true
The unspecified values of our de facto semantics seem to CLANG 34-O0: . . . as above
be serving the same role as those “wobbly” values. CLANG 35-O0: . . . as above
See also §3.3.2 (p.122) for the question of whether CLANG 36-O0: . . . as above
padding bytes intrinsically hold unspecified values (even if CLANG 37-O0: . . . as above
CLANG 33-O2:
concrete values are written over the top), and whether that
varies between structs in malloc’d regions and those with unspecified value strictness int.c:6:15: warning:
automatic, static, and thread storage durations. self-comparison always evaluates to true
The observed behaviour forces this to be “yes”, and rules [-Wtautological-compare]
out the unspecified-value semantics in which a concrete Bool b = (j==j); // can this
value is chosen nondeterministically at allocation time. be false?
The ISO semantics similarly has nondeterministic prints ^
(unless one follows the DR451CR notion that a print of an 1 warning generated.
unspecified value immediately gives undefined behaviour, b=false
CLANG 34-O2: . . . as above
which we do not).
CLANG 35-O2: . . . as above

3.2.4 Q52. Do operations on unspecified values result CLANG 36-O2: . . . as above

in unspecified values? CLANG 37-O2: . . . as above


CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
U: ISO U: DEFACTO
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
ISO : unclear - yes? DEFACTO - USAGE: unclear - yes?
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
(though see some cases in which the LLVM docs
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
give stronger guarantees, and [9]) DEFACTO - IMPL: yes
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
46 http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_ CLANG 37-UBSAN:
451.htm unspecified value strictness int.c:6:15: warning:

101 2016/3/17
self-comparison always evaluates to true GCC -4.8-O0:
[-Wtautological-compare] b=true
Bool b = (j==j); // can this GCC -4.9-O0: . . . as above
be false? GCC -4.8-O2: . . . as above
^ GCC -4.9-O2: . . . as above
1 warning generated. GCC -5.3-O2: . . . as above
b=true GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-ASAN: . . . as above GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
TIS - INTERPRETER : GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
[value] Analyzing a complete application starting at CLANG 33-O0:
main unspecified value strictness unsigned char.c:6:15:
[value] Computing initial state warning: self-comparison always evaluates to true
[value] Initial [-Wtautological-compare]
state computed Bool b = (j==j); // can this
unspecified value strictness int.c:5:[ker be false?
nel] warning: accessing uninitialized left-value: assert ^
\initialized(&i); 1 warning generated.
stack: main b=true
[value] CLANG 34-O0: . . . as above
Stopping at nth alarm CLANG 35-O0: . . . as above
[value] user error: Degeneration CLANG 36-O0: . . . as above
occurred: CLANG 37-O0: . . . as above
results are not correct CLANG 33-O2: . . . as above
for lines of code that can be reached from the CLANG 34-O2: . . . as above
degeneration point. CLANG 35-O2: . . . as above
KCC : CLANG 36-O2: . . . as above
b=true CLANG 37-O2: . . . as above
Error: UB-CEE2 CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
Description: Indeterminate value used in an CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
expression. CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
Type: Undefined behavior. CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
See also: C11 sec. CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
6.2.4, 6.7.9, 6.8, J.2:1 item 11 CLANG 37-UBSAN: . . . as above
at CLANG 37-ASAN: . . . as above
main(unspecified value strictness int.c:5) TIS - INTERPRETER :
at [value] Analyzing a complete application starting at
<file-scope>(<unknown>) main
ISO : unclear [value] Computing initial state
[value] Initial
GCC gives true and Clang gives false (despite the Clang state computed
warning that a self-comparison always gives true, presum- unspecified value strictness unsigned cha
ably a bug in Clang). This could be explained by taking sub- r.c:5:[kernel] warning: accessing uninitialized
traction on one or more unspecified values to give an un- left-value: assert \initialized(&c);
specified value which can then be instantiated to any valid
value. stack: main
For an unsigned char variant, both GCC and Clang [value] Stopping at nth alarm
give true: [value] user
error: Degeneration occurred:
E XAMPLE (unspecified_value_strictness_unsigned_char.c):
#include <stdio.h>
int main() { results are not correct for lines of code that can be
unsigned char c; reached from the degeneration point.
unsigned char *p=&c; KCC :
int j = (c-c); // is this an unspecified value?
_Bool b = (j==j); // can this be false? Execution failed (configuration dumped)
printf("b=%s\n",b?"true":"false"); DEFACTO : defined behaviour (printing nondeterministically
}

102 2016/3/17
true or false) state computed
ISO : unclear unspecified value strictness mod 1.c:5:[k
ernel] warning: accessing uninitialized left-value:
For another test of whether arithmetic operators are strict assert \initialized(&c);
w.r.t. unspecified values, consider: stack:
main
E XAMPLE (unspecified_value_strictness_mod_1.c): [value] Stopping at nth alarm
#include <stdio.h> [value] user error:
int main() { Degeneration occurred:
unsigned char c;
unsigned char *p=&c; results are
unsigned char c2 = (c % 2); not correct for lines of code that can be reached from
// can reading c2 give something other than 0 or 1? the degeneration point.
printf("c=%i c2=%i\n",(int)c,(int)c2);
KCC :
}
Execution failed (configuration dumped)
GCC -4.8-O0: DEFACTO : defined behaviour (printing nondeterministically
c=0 c2=0 true or false)
GCC -4.9-O0: . . . as above ISO : unclear
GCC -4.8-O2:
unspecified value strictness mod 1.c: In function GCC and Clang both print c=0 c2=0 on x86 (though not
’main’: on non-CHERI MIPS). Making the computation of c2 more
unspecified value strictness mod 1.c:5:17: complex by appending a +(1-c) makes them both print
warning: ’c’ is used uninitialized in this function c=0 c2=1, weakly suggesting that they are not (in this
[-Wuninitialized] instance) aggressively propagating unspecifiedness strictly
unsigned char c2 = (c % 2); through these arithmetic operators.

^ E XAMPLE (unspecified_value_strictness_mod_2.c):
c=0 c2=0 #include <stdio.h>
GCC -4.9-O2: . . . as above int main() {
GCC -5.3-O2: . . . as above
unsigned char c;
unsigned char *p=&c;
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above unsigned char c2 = (c % 2) + (1-c);
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above // can reading c2 give something other than 0 or 1?
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
printf("c=%i c2=%i\n",(int)c,(int)c2);
}
CLANG 33-O0:
c=0 c2=0 GCC -4.8-O0:
CLANG 34-O0: . . . as above c=0 c2=1
CLANG 35-O0: . . . as above GCC -4.9-O0: . . . as above
CLANG 36-O0: . . . as above GCC -4.8-O2:
CLANG 37-O0: . . . as above unspecified value strictness mod 2.c: In function
CLANG 33-O2: . . . as above ’main’:
CLANG 34-O2: . . . as above unspecified value strictness mod 2.c:5:17:
CLANG 35-O2: . . . as above warning: ’c’ is used uninitialized in this function
CLANG 36-O2: . . . as above [-Wuninitialized]
CLANG 37-O2: . . . as above unsigned char c2 = (c % 2) +
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above (1-c);
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above ^
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above c=0 c2=1
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above GCC -4.9-O2: . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above GCC -5.3-O2: . . . as above
CLANG 37-UBSAN: . . . as above GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-ASAN: . . . as above GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
TIS - INTERPRETER : GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
[value] Analyzing a complete application starting at CLANG 33-O0:
main c=0 c2=1
[value] Computing initial state CLANG 34-O0: . . . as above
[value] Initial CLANG 35-O0: . . . as above

103 2016/3/17
CLANG 36-O0: . . . as above %B = sub %X, undef
%C = xor %X, undef
CLANG 37-O0: . . . as above Safe:
CLANG 33-O2: . . . as above %A = undef
%B = undef
CLANG 34-O2: . . . as above
%C = undef
CLANG 35-O2: . . . as above This is safe because all of the output bits are affected by the
undef bits. Any output bit can have a zero or one depending on
CLANG 36-O2: . . . as above
the input bits.
CLANG 37-O2: . . . as above
%A = or %X, undef
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above %B = and %X, undef
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above Safe:
%A = -1
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
%B = 0
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above Unsafe:
%A = undef
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
%B = undef
CLANG 37-UBSAN: . . . as above
These logical operations have bits that are not always affected
CLANG 37-ASAN: . . . as above
by the input. For example, if %X has a zero bit, then the output
TIS - INTERPRETER : of the ‘and‘ operation will always be a zero for that bit, no
matter what the corresponding bit from the ‘undef‘ is. As such,
[value] Analyzing a complete application starting at it is unsafe to optimize or assume that the result of the ‘and‘
main is ‘undef‘. However, it is safe to assume that all bits of the
‘undef‘ could be 0, and optimize the ‘and‘ to 0. Likewise, it is
[value] Computing initial state safe to assume that all the bits of the ‘undef‘ operand to the
[value] Initial ‘or‘ could be set, allowing the ‘or‘ to be folded to -1.
state computed
unspecified value strictness mod 2.c:5:[k 3.2.5 Q53. Do bitwise operations on unspecified values
ernel] warning: accessing uninitialized left-value: result in unspecified values?
assert \initialized(&c);
U: ISO U: DEFACTO
stack:
ISO : unclear - yes? DEFACTO - USAGE: unclear - yes? (as
main
for previous question) DEFACTO - IMPL: ? CERBERUS -
[value] Stopping at nth alarm
DEFACTO : yes CHERI : ? TIS : test not supported, simi-
[value] user error:
larly KCC: Execution failed (unclear why)
Degeneration occurred:
results are E XAMPLE (unspecified_value_strictness_and_1.c):
not correct for lines of code that can be reached from
#include <stdio.h>
the degeneration point. int main() {
KCC : unsigned char c;
unsigned char *p=&c;
Execution failed (configuration dumped)
unsigned char c2 = (c | 1);
DEFACTO : defined behaviour (printing nondeterministically unsigned char c3 = (c2 & 1);
true or false) // does c3 hold an unspecified value (not 1)?
printf("c=%i c2=%i c3=%i\n",(int)c,(int)c2,(int)c3);
ISO : unclear
}

An LLVM developer remarks that different parts of TIS - INTERPRETER :

LLVM assume that undef is propagated aggressively or that [value] Analyzing a complete application starting at
it represents an unknown particular number. main
The Clang undef documentation below47 suggests that [value] Computing initial state
their internal undef is a per-value not a per-bit entity, and [value] Initial
any instance can be regarded as giving any bit pattern, but state computed
operations are not simply strict. Instead, if any resulting rep- unspecified value strictness and 1.c:5:[k
resentation bit is unaffected by the choice of a concrete value ernel] warning: accessing uninitialized left-value:
for the undefs, the text suggests it is guaranteed to hold its assert \initialized(&c);
“proper” value. Does the fact that they go to this trouble im- stack:
ply that it is needed for code found in the wild? The text does main
not mention correlations between bits; presumably those are [value] Stopping at nth alarm
simply lost. And is this affected by any value-range-analysis [value] user error:
facts the compiler knows about the non-undef values in- Degeneration occurred:
volved? results are
%A = add %X, undef
not correct for lines of code that can be reached from
the degeneration point.
47 http://llvm.org/docs/LangRef.html#undefined-values KCC :

104 2016/3/17
Execution failed (configuration dumped) unspecified value daemonic 1.c:5:12: runtime error:
DEFACTO : defined behaviour (printing a nondeterministic division by zero
unsigned char value) CLANG 37-ASAN:
ISO : unclear TIS - INTERPRETER :
[value] Analyzing a complete application starting at
Refining the previous question, this tests whether bits of an main
unspecified value can be set and cleared individually to result [value] Computing initial state
in a specified value. [value] Initial
state computed
3.2.6 Q54. Must unspecified values be considered unspecified value daemonic 1.c:5:[kernel]
daemonically for identification of other possible warning: accessing uninitialized left-value: assert
undefined behaviours? \initialized(&j);
U: ISO stack: main
ISO : unclear – yes? DEFACTO - USAGE:
yes DEFACTO - [value]
IMPL: yes CERBERUS - DEFACTO:
yes CHERI : yes Stopping at nth alarm
TIS : test not
supported (any arithmetic on uninitialised val- [value] user error: Degeneration
ues makes it flag an error?) KCC: (flags UB indeterminate occurred:
value in expression) results are not correct
for lines of code that can be reached from the
E XAMPLE (unspecified_value_daemonic_1.c):
degeneration point.
int main() {
KCC :
int i;
int *p = &i; Execution failed (configuration dumped)
int j = i; Error: UB-CEE2
int k = 1/j; // does this have undefined behaviour? Description: Indeterminate value used in an
}
expression.
GCC -4.8-O0: Type: Undefined behavior.
GCC -4.9-O0: . . . as above See also: C11 sec.
GCC -4.8-O2: 6.2.4, 6.7.9, 6.8, J.2:1 item 11
unspecified value daemonic 1.c: In function ’main’: at
unspecified value daemonic 1.c:4:7: warning: ’i’ is used main(unspecified value daemonic 1.c:4)
uninitialized in this function [-Wuninitialized] at
int <file-scope>(<unknown>)
j = i; Error: UB-CEMX1
^ Description:
GCC -4.9-O2: . . . as above Division by 0.
GCC -5.3-O2: . . . as above Type: Undefined behavior.
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above See also: C11
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above sec. 6.5.5:5, J.2:1 item 45
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above at
CLANG 33-O0: main(unspecified value daemonic 1.c:5)
CLANG 34-O0: . . . as above at
CLANG 35-O0: . . . as above <file-scope>(<unknown>)
CLANG 36-O0: . . . as above DEFACTO : undefined behaviour
CLANG 37-O0: . . . as above ISO : unclear, but should be undefined behaviour
CLANG 33-O2: . . . as above
CLANG 34-O2: . . . as above Similarly, division by the Clang internal undef is considered
CLANG 35-O2: . . . as above to give rise to undefined behaviour48 .
CLANG 36-O2: . . . as above
CLANG 37-O2: . . . as above 3.2.7 Q55. Can a structure containing an
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above unspecified-value member can be copied as a
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above whole?
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above U: ISO
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above ISO : unclear – yes? DEFACTO - USAGE: yes DEFACTO -
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-UBSAN: 48 http://llvm.org/docs/LangRef.html#undefined-values

105 2016/3/17
IMPL: yes CERBERUS - DEFACTO: yes CHERI : yes CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
TIS : yes KCC : yes CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
This and the following questions investigate whether the CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
property of being an unspecified value is associated with CLANG 37-UBSAN: . . . as above
arbitrary (possibly struct) C values, or with “leaf” (non- CLANG 37-ASAN: . . . as above
struct/non-union) values, or with individual bitfields, or with TIS - INTERPRETER :
individual representation bytes of values, or with individual [value] Analyzing a complete application starting at
representation bits of values (see the later examples and main
LLVM documentation in §3.2.4 for the last). [value] Computing initial state
It seems intuitively clear (though not specified in the ISO [value] Initial
standard) that a structure value as a whole should not be state computed
allowed to be an unspecified value; instead one should have a
struct containing unspecified values for each of its members s2.i1=1
(or hereditarily, for nested structs). It’s not clear that one can
express a test that distinguishes the two in ISO C, however. [value] done for function main
Consistent with this, forming a structure value should not KCC :
be strict in unspecified-value-ness: in the following example, s2.i1=1
the read of the structure value from s1 and write to s2 DEFACTO : defined behaviour (s2.i1=1)
should both be permitted, and should copy the value of i1=1. ISO : unclear, but should be defined behaviour (s2.i1=1)
The read of the uninitialised member should not give rise
to undefined behaviour (is this contrary to the last sentence Then there is a similar question for unions: can a union
of 6.3.2.1p2, or could the structure not “have been declared value as a whole be an unspecified value? Here there might
with the register storage class” in any case?) . What s2.i2 be a real semantic difference, between an unspecified value
holds after the structure copy depends on the rest of the as whole and a union that contains a specific member which
unspecified-value semantics. itself is an unspecified value. However, it’s again unclear
E XAMPLE (unspecified_value_struct_copy.c): whethere there is a test in ISO C that distinguishes between
them. Consider:
#include <stdio.h>
typedef struct { int i1; int i2; } st;
int main() { E XAMPLE (unspecified_value_union_1.c):
st s1;
s1.i1 = 1;
st s2; #include <stdio.h>
s2 = s1; // does this have defined behaviour? typedef union { int i; float f; } un;
printf("s2.i1=%i\n",s2.i1); int main() {
} un u;
int j;
GCC -4.8-O0: u.i = j;
s2.i1=1 // does u contain an unspecified union value, or an
// i member that itself has an unspecified int value?
GCC -4.9-O0: . . . as above int k;
GCC -4.8-O2: . . . as above float g;
GCC -4.9-O2: . . . as above k = *((int*)&u); //does this have defined behaviour?
g = *((float*)&u);//does this have undefined behaviour?
GCC -5.3-O2: . . . as above }
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above If those are both true, then u does not contain an unspecified
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above union value, but rather it contains an i member which con-
CLANG 33-O0: . . . as above tains an unspecified int value. Because the two accesses to
CLANG 34-O0: . . . as above u are via int* and float* pointers, not via pointers to the
CLANG 35-O0: . . . as above union type, the type punning allowed by Footnote 9549 does
CLANG 36-O0: . . . as above not apply. Then we were hoping that the effective type of the
CLANG 37-O0: . . . as above subobject addressed by (int*)&u would be int and hence
CLANG 33-O2: . . . as above that the 65p6 effective type rules would forbid the second
CLANG 34-O2: . . . as above access. But in fact 6.5p6 doesn’t treat subobjects properly
CLANG 35-O2: . . . as above
49 95) If the member used to read the contents of a union object is not the
CLANG 36-O2: . . . as above
same as the member last used to store a value in the object, the appropriate
CLANG 37-O2: . . . as above
part of the object representation of the value is reinterpreted as an object
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above representation in the new type as described in 6.2.6 (a process sometimes
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above called type punning). This might be a trap representation.

106 2016/3/17
and the effective type is just the union type, and the second [value] done for function main
load is permitted. KCC :
a=1
3.2.8 Q56. Given multiple bitfields that may be in the ISO : defined behaviour (a=1)
same word, can one be a well-defined value while
another is an unspecified value? This example is from Besson et al. [10], discussed in §6.9.
The obvious de facto standards semantics answer is “yes”,
ISO : yes DEFACTO - USAGE: yes DEFACTO - IMPL: yes with a per-leaf-value unspecified value. Though Cerberus
CERBERUS - DEFACTO : yes CHERI : ? TIS : yes KCC : does not currently support bitfields, so our candidate formal
yes model likely will also not.
The Besson et al. example suggests a per-bit property.
E XAMPLE (besson_blazy_wilke_bitfields_1u.c):
The Clang undef documentation is a hybrid, with some per-
#include <stdio.h> bit reasoning but a per-leaf-value undef.
struct f {
unsigned int a0 : 1; unsigned int a1 : 1;
} bf ; 3.2.9 Q57. Are the representation bytes of an
int main() { unspecified value themselves also unspecified
unsigned int a; values? (not an arbitrary choice of concrete byte
bf.a1 = 1;
a = bf.a1; values)
printf("a=%u\n",a);
} U: ISO U: DEFACTO
ISO : unclear DEFACTO - USAGE: unclear DEFACTO -
GCC -4.8-O0: IMPL: unclear CERBERUS - DEFACTO: yes? CHERI : un-
a=1 clear TIS: unclear – either reading or printing a represen-
GCC -4.9-O0: . . . as above tation byte of an uninitialised value makes it flag an error
GCC -4.8-O2: . . . as above KCC : (flags indeterminate value used in an expression for
GCC -4.9-O2: . . . as above this uninitialised unsigned char)
GCC -5.3-O2: . . . as above If so, then a bytewise hash or checksum computation
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above involving them would produce an unspecified value (given
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above the other answers above), or (in a more concrete semantics)
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above would produce different results in different invocations, even
CLANG 33-O0: . . . as above if the value is not mutated in the meantime. It is not clear
CLANG 34-O0: . . . as above whether that is an issue in practice, and similarly for the
CLANG 35-O0: . . . as above padding bytes of structs.
CLANG 36-O0: . . . as above
CLANG 37-O0: . . . as above E XAMPLE (unspecified_value_representation_bytes_1.c):
CLANG 33-O2: . . . as above #include <stdio.h>
CLANG 34-O2: . . . as above int main() {
// assume here that the implementation-defined
CLANG 35-O2: . . . as above
// representation of int has no trap representations
CLANG 36-O2: . . . as above int i;
CLANG 37-O2: . . . as above unsigned char c = * ((unsigned char*)(&i));
// does c now hold an unspecified value?
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
printf("i=0x%x c=0x%x\n",i,(int)c);
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above printf("i=0x%x c=0x%x\n",i,(int)c);
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above }
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above GCC -4.8-O0:
CLANG 37-UBSAN: . . . as above i=0x0 c=0x0
CLANG 37-ASAN: . . . as above i=0x0 c=0x0
TIS - INTERPRETER : GCC -4.9-O0: . . . as above
[value] Analyzing a complete application starting at GCC -4.8-O2:
main unspecified value representation bytes 1.c: In function
[value] Computing initial state ’main’:
[value] Initial unspecified value representation bytes 1.c:8:9:
state computed warning: ’i’ is used uninitialized in this function
[-Wuninitialized]
a=1 printf("i=0x%x
c=0x%x\n",i,(int)c);

107 2016/3/17
^
^ unspecified value representati
unspecified value representation bytes 1.c:6:17: on bytes 1.c:6:17: warning: ’i’ is used uninitialized in
warning: ’i’ is used uninitialized in this function this function [-Wuninitialized]
[-Wuninitialized] unsigned char c = *
unsigned char c = * ((unsigned ((unsigned char*)(&i));
char*)(&i)); ^
^ i=0x8 c=0x8
i=0x8 c=0x8 i=0x8 c=0x8
i=0x8 c=0x8 GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
GCC -4.9-O2: CLANG 33-O0:
unspecified value representation bytes 1.c: In function i=0x0 c=0x0
’main’: i=0x0 c=0x0
unspecified value representation bytes 1.c:8:3: CLANG 34-O0: . . . as above
warning: ’i’ is used uninitialized in this function CLANG 35-O0: . . . as above
[-Wuninitialized] CLANG 36-O0: . . . as above
printf("i=0x%x CLANG 37-O0: . . . as above
c=0x%x\n",i,(int)c); CLANG 33-O2: . . . as above
^ CLANG 34-O2: . . . as above
unspecified value representati CLANG 35-O2: . . . as above
on bytes 1.c:6:17: warning: ’i’ is used uninitialized in CLANG 36-O2: . . . as above
this function [-Wuninitialized] CLANG 37-O2: . . . as above
unsigned char c = * CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
((unsigned char*)(&i)); CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
^ CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
i=0x8 c=0x8 CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
i=0x8 c=0x8 CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
GCC -5.3-O2: . . . as above CLANG 37-UBSAN: . . . as above
GCC -4.8-O2- NO - STRICT- ALIASING : CLANG 37-ASAN: . . . as above
unspecified value representation bytes 1.c: In function TIS - INTERPRETER :
’main’: [value] Analyzing a complete application starting at
unspecified value representation bytes 1.c:8:9: main
warning: ’i’ is used uninitialized in this function [value] Computing initial state
[-Wuninitialized] [value] Initial
printf("i=0x%x state computed
c=0x%x\n",i,(int)c); unspecified value representation bytes 1.
c:8:[kernel] warning: accessing uninitialized
^ left-value: assert \initialized(&c);
unspecified value representation bytes 1.c:6:17:
warning: ’i’ is used uninitialized in this function stack: main
[-Wuninitialized] [value] Stopping at nth alarm
unsigned char c = * ((unsigned [value] user
char*)(&i)); error: Degeneration occurred:
^
i=0x8 c=0x8 results are not correct for lines of code that can be
i=0x8 c=0x8 reached from the degeneration point.
GCC -4.9-O2- NO - STRICT- ALIASING : KCC :
unspecified value representation bytes 1.c: In function Execution failed (configuration dumped)
’main’: Error: UB-CEE2
unspecified value representation bytes 1.c:8:3: Description: Indeterminate value used in an
warning: ’i’ is used uninitialized in this function expression.
[-Wuninitialized] Type: Undefined behavior.
printf("i=0x%x See also: C11 sec.
c=0x%x\n",i,(int)c); 6.2.4, 6.7.9, 6.8, J.2:1 item 11

108 2016/3/17
at [-Wuninitialized]
main(unspecified value representation bytes 1.c:8) printf("i=0x%x\n",i);
at ^
<file-scope>(<unknown>) i=0x0
Error: UB-STDIO1 i=0x0
Description: *cp=0x0
’printf’: Mismatch between the type expected by the *cp=0x0
conversion specifier %x and the type of the GCC -4.9-O2:
argument. unspecified value representation bytes 4.c: In function
Type: Undefined behavior. ’main’:
See also: C11 sec. unspecified value representation bytes 4.c:6:3:
7.21.6.1:9, J.2:1 item 153 warning: ’i’ is used uninitialized in this function
at [-Wuninitialized]
printf(unspecified value representation bytes 1.c:8) printf("i=0x%x\n",i);
^
at main(unspecified value representation bytes 1.c:8) i=0x0
i=0x0
at <file-scope>(<unknown>) *cp=0x0
DEFACTO : defined behaviour (printing nondeterministically *cp=0x0
true or false) GCC -5.3-O2: . . . as above
ISO : unclear GCC -4.8-O2- NO - STRICT- ALIASING :
unspecified value representation bytes 4.c: In function
’main’:
3.2.10 Q58. If one writes some but not all of the
unspecified value representation bytes 4.c:6:9:
representation bytes of an uninitialized value,
warning: ’i’ is used uninitialized in this function
do the other representation bytes still hold
[-Wuninitialized]
unspecified values?
printf("i=0x%x\n",i);
U: ISO U: DEFACTO ^
ISO : unclear DEFACTO - USAGE: unclear DEFACTO - i=0x0
IMPL: unclear CERBERUS - DEFACTO : yes CHERI : un- i=0x0
clear TIS: yes KCC: (flags indeterminate value used in *cp=0x0
an expression for this uninitialised unsigned char) *cp=0x0
GCC -4.9-O2- NO - STRICT- ALIASING :
E XAMPLE (unspecified_value_representation_bytes_4.c):
unspecified value representation bytes 4.c: In function
#include <stdio.h>
int main() { ’main’:
// assume here that the implementation-defined unspecified value representation bytes 4.c:6:3:
// representation of int has no trap representations warning: ’i’ is used uninitialized in this function
int i;
printf("i=0x%x\n",i); [-Wuninitialized]
printf("i=0x%x\n",i); printf("i=0x%x\n",i);
unsigned char *cp = (unsigned char*)(&i); ^
*(cp+1) = 0x22;
// does *cp now hold an unspecified value? i=0x0
printf("*cp=0x%x\n",*cp); i=0x0
printf("*cp=0x%x\n",*cp); *cp=0x0
}
*cp=0x0
GCC -4.8-O0: GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
i=0x0 CLANG 33-O0:
i=0x0 unspecified value representation bytes 4.c:6:21:
*cp=0x0 warning: variable ’i’ is uninitialized when used here
*cp=0x0 [-Wuninitialized]
GCC -4.9-O0: . . . as above printf("i=0x%x\n",i);
GCC -4.8-O2:
unspecified value representation bytes 4.c: In function ^
’main’: unspecified value representation bytes 4.c:5:8:
unspecified value representation bytes 4.c:6:9: note: initialize the variable ’i’ to silence this
warning: ’i’ is used uninitialized in this function

109 2016/3/17
warning = 0
int i; 1 warning
^ generated.
= 0 i=0x0
1 warning i=0x0
generated. *cp=0x0
i=0x0 *cp=0x0
i=0x0 CLANG 37-ASAN: . . . as above
*cp=0x0 TIS - INTERPRETER :
*cp=0x0 [value] Analyzing a complete application starting at
CLANG 34-O0: . . . as above main
CLANG 35-O0: . . . as above [value] Computing initial state
CLANG 36-O0: . . . as above [value] Initial
CLANG 37-O0: . . . as above state computed
CLANG 33-O2: unspecified value representation bytes 4.
unspecified value representation bytes 4.c:6:21: c:6:[kernel] warning: accessing uninitialized
warning: variable ’i’ is uninitialized when used here left-value: assert \initialized(&i);
[-Wuninitialized]
printf("i=0x%x\n",i); stack: main
[value] Stopping at nth alarm
^ [value] user
unspecified value representation bytes 4.c:5:8: error: Degeneration occurred:
note: initialize the variable ’i’ to silence this
warning results are not correct for lines of code that can be
int i; reached from the degeneration point.
^ KCC :
= 0 i=0x0
1 warning i=0x0
generated. Execution failed (configuration dumped)
i=0x2200 Error: UB-CEE2
i=0x2200 Description: Indeterminate value used in an
*cp=0x0 expression.
*cp=0x0 Type: Undefined behavior.
CLANG 34-O2: . . . as above See also: C11 sec.
CLANG 35-O2: . . . as above 6.2.4, 6.7.9, 6.8, J.2:1 item 11
CLANG 36-O2: . . . as above at
CLANG 37-O2: . . . as above main(unspecified value representation bytes 4.c:6)
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above at
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above <file-scope>(<unknown>)
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above Error: UB-STDIO1
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above Description:
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above ’printf’: Mismatch between the type expected by the
CLANG 37-UBSAN: conversion specifier %x and the type of the
unspecified value representation bytes 4.c:6:21: argument.
warning: variable ’i’ is uninitialized when used here Type: Undefined behavior.
[-Wuninitialized] See also: C11 sec.
printf("i=0x%x\n",i); 7.21.6.1:9, J.2:1 item 153
at
^ printf(unspecified value representation bytes 4.c:6)
unspecified value representation bytes 4.c:5:8:
note: initialize the variable ’i’ to silence this at main(unspecified value representation bytes 4.c:6)
warning
int i; at <file-scope>(<unknown>)
^ Error: UB-CEE2

110 2016/3/17
Description: printf("i=0x%x\n",i);
Indeterminate value used in an expression. printf("i=0x%x\n",i);
* (((unsigned char*)(&i))+1) = 0x22;
Type: // does i now hold an unspecified value?
Undefined behavior. printf("i=0x%x\n",i);
See also: C11 sec. 6.2.4, 6.7.9, printf("i=0x%x\n",i);
}
6.8, J.2:1 item 11
at main(unspecified value represent
ation bytes 4.c:7) GCC -4.8-O0:

at <file-scope>(<unknown>) i=0x0
Error: i=0x0
UB-STDIO1 i=0x2200
Description: ’printf’: Mismatch between the i=0x2200
type expected by the conversion specifier %x and the GCC -4.9-O0: . . . as above
type of the argument. GCC -4.8-O2:

Type: Undefined behavior. unspecified value representation bytes 2.c: In function


See ’main’:
also: C11 sec. 7.21.6.1:9, J.2:1 item 153 unspecified value representation bytes 2.c:6:9:
at warning: ’i’ is used uninitialized in this function
printf(unspecified value representation bytes 4.c:7) [-Wuninitialized]
printf("i=0x%x\n",i);
at main(unspecified value representation bytes 4.c:7) ^
i=0x0
at <file-scope>(<unknown>) i=0x0
Error: UB-STDIO1 i=0x2200
Description: i=0x2200
’printf’: Mismatch between the type expected by the GCC -4.9-O2:

conversion specifier %x and the type of the unspecified value representation bytes 2.c: In function
argument. ’main’:
Type: Undefined behavior. unspecified value representation bytes 2.c:6:3:
See also: C11 sec. warning: ’i’ is used uninitialized in this function
7.21.6.1:9, J.2:1 item 153 [-Wuninitialized]
at printf("i=0x%x\n",i);
printf(unspecified value representation bytes 4.c:11) ^
i=0x0
at main(unspecified value representation bytes 4.c:11) i=0x0
i=0x2200
at <file-scope>(<unknown>) i=0x2200
ISO : unclear GCC -5.3-O2: . . . as above
GCC -4.8-O2- NO - STRICT- ALIASING :
unspecified value representation bytes 2.c: In function
3.2.11 Q59. If one writes some but not all of the ’main’:
representation bytes of an uninitialized value, unspecified value representation bytes 2.c:6:9:
does a read of the whole value still give an warning: ’i’ is used uninitialized in this function
unspecified value? [-Wuninitialized]
U: ISO U: DEFACTO printf("i=0x%x\n",i);
ISO : unclear DEFACTO - USAGE: unclear DEFACTO - ^
IMPL: unclear CERBERUS - DEFACTO : yes CHERI : un- i=0x0
clear TIS: yes KCC: (flags indeterminate value used in i=0x0
an expression) i=0x2200
i=0x2200
E XAMPLE (unspecified_value_representation_bytes_2.c): GCC -4.9-O2- NO - STRICT- ALIASING :
#include <stdio.h> unspecified value representation bytes 2.c: In function
int main() { ’main’:
// assume here that the implementation-defined
// representation of int has no trap representations unspecified value representation bytes 2.c:6:3:
int i; warning: ’i’ is used uninitialized in this function

111 2016/3/17
[-Wuninitialized] CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
printf("i=0x%x\n",i); CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
^ CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
i=0x0 CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
i=0x0 CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
i=0x2200 CLANG 37-UBSAN:
i=0x2200 unspecified value representation bytes 2.c:6:21:
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above warning: variable ’i’ is uninitialized when used here
CLANG 33-O0: [-Wuninitialized]
unspecified value representation bytes 2.c:6:21: printf("i=0x%x\n",i);
warning: variable ’i’ is uninitialized when used here
[-Wuninitialized] ^
printf("i=0x%x\n",i); unspecified value representation bytes 2.c:5:8:
note: initialize the variable ’i’ to silence this
^ warning
unspecified value representation bytes 2.c:5:8: int i;
note: initialize the variable ’i’ to silence this ^
warning = 0
int i; 1 warning
^ generated.
= 0 i=0x0
1 warning i=0x0
generated. i=0x2200
i=0x0 i=0x2200
i=0x0 CLANG 37-ASAN: . . . as above
i=0x2200 TIS - INTERPRETER :
i=0x2200 [value] Analyzing a complete application starting at
CLANG 34-O0: . . . as above main
CLANG 35-O0: . . . as above [value] Computing initial state
CLANG 36-O0: . . . as above [value] Initial
CLANG 37-O0: . . . as above state computed
CLANG 33-O2: unspecified value representation bytes 2.
unspecified value representation bytes 2.c:6:21: c:6:[kernel] warning: accessing uninitialized
warning: variable ’i’ is uninitialized when used here left-value: assert \initialized(&i);
[-Wuninitialized]
printf("i=0x%x\n",i); stack: main
[value] Stopping at nth alarm
^ [value] user
unspecified value representation bytes 2.c:5:8: error: Degeneration occurred:
note: initialize the variable ’i’ to silence this
warning results are not correct for lines of code that can be
int i; reached from the degeneration point.
^ KCC :
= 0 i=0x0
1 warning i=0x0
generated. i=0x0
i=0x2200 i=0x0
i=0x2200 Error: UB-CEE2
i=0x2200 Description: Indeterminate value used in an
i=0x2200 expression.
CLANG 34-O2: . . . as above Type: Undefined behavior.
CLANG 35-O2: . . . as above See also: C11 sec.
CLANG 36-O2: . . . as above 6.2.4, 6.7.9, 6.8, J.2:1 item 11
CLANG 37-O2: . . . as above at

112 2016/3/17
main(unspecified value representation bytes 2.c:6) type expected by the conversion specifier %x and the
at type of the argument.
<file-scope>(<unknown>) Type: Undefined behavior.
Error: UB-STDIO1 See
Description: also: C11 sec. 7.21.6.1:9, J.2:1 item 153
’printf’: Mismatch between the type expected by the at
conversion specifier %x and the type of the printf(unspecified value representation bytes 2.c:10)
argument.
Type: Undefined behavior. at main(unspecified value representation bytes 2.c:10)
See also: C11 sec.
7.21.6.1:9, J.2:1 item 153 at <file-scope>(<unknown>)
at Error: UB-CEE2
printf(unspecified value representation bytes 2.c:6) Description:
Indeterminate value used in an expression.
at main(unspecified value representation bytes 2.c:6) Type:
Undefined behavior.
at <file-scope>(<unknown>) See also: C11 sec. 6.2.4, 6.7.9,
Error: UB-CEE2 6.8, J.2:1 item 11
Description: at main(unspecified value represent
Indeterminate value used in an expression. ation bytes 2.c:11)
Type: at <file-scope>(<unknown>)
Undefined behavior. Error:
See also: C11 sec. 6.2.4, 6.7.9, UB-STDIO1
6.8, J.2:1 item 11 Description: ’printf’: Mismatch between the
at main(unspecified value represent type expected by the conversion specifier %x and the
ation bytes 2.c:7) type of the argument.
at <file-scope>(<unknown>) Type: Undefined behavior.
Error: See
UB-STDIO1 also: C11 sec. 7.21.6.1:9, J.2:1 item 153
Description: ’printf’: Mismatch between the at
type expected by the conversion specifier %x and the printf(unspecified value representation bytes 2.c:11)
type of the argument.
Type: Undefined behavior. at main(unspecified value representation bytes 2.c:11)
See
also: C11 sec. 7.21.6.1:9, J.2:1 item 153 at <file-scope>(<unknown>)
at DEFACTO : defined behaviour (printing nondeterministic
printf(unspecified value representation bytes 2.c:7) values)
ISO : unclear
at main(unspecified value representation bytes 2.c:7)
If one comments out the first two printfs, neither give a
at <file-scope>(<unknown>) warning:
Error: UB-CEE2 E XAMPLE (unspecified_value_representation_bytes_3.c):
Description:
#include <stdio.h>
Indeterminate value used in an expression. int main() {
Type: // assume here that the implementation-defined
Undefined behavior. // representation of int has no trap representations
int i;
See also: C11 sec. 6.2.4, 6.7.9, // printf("i=0x%x\n",i);
6.8, J.2:1 item 11 // printf("i=0x%x\n",i);
at main(unspecified value represent * (((unsigned char*)(&i))+1) = 0x22;
// does i now hold an unspecified value?
ation bytes 2.c:10) printf("i=0x%x\n",i);
at <file-scope>(<unknown>) printf("i=0x%x\n",i);
Error: }
UB-STDIO1
Description: ’printf’: Mismatch between the GCC -4.8-O0:
i=0x2200

113 2016/3/17
i=0x2200 <file-scope>(<unknown>)
GCC -4.9-O0: . . . as above Error: UB-STDIO1
GCC -4.8-O2: . . . as above Description:
GCC -4.9-O2: . . . as above ’printf’: Mismatch between the type expected by the
GCC -5.3-O2: . . . as above conversion specifier %x and the type of the
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above argument.
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above Type: Undefined behavior.
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above See also: C11 sec.
CLANG 33-O0: . . . as above 7.21.6.1:9, J.2:1 item 153
CLANG 34-O0: . . . as above at
CLANG 35-O0: . . . as above printf(unspecified value representation bytes 3.c:10)
CLANG 36-O0: . . . as above
CLANG 37-O0: . . . as above at main(unspecified value representation bytes 3.c:10)
CLANG 33-O2: . . . as above
CLANG 34-O2: . . . as above at <file-scope>(<unknown>)
CLANG 35-O2: . . . as above Error: UB-CEE2
CLANG 36-O2: . . . as above Description:
CLANG 37-O2: . . . as above Indeterminate value used in an expression.
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above Type:
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above Undefined behavior.
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above See also: C11 sec. 6.2.4, 6.7.9,
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above 6.8, J.2:1 item 11
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above at main(unspecified value represent
CLANG 37-UBSAN: . . . as above ation bytes 3.c:11)
CLANG 37-ASAN: . . . as above at <file-scope>(<unknown>)
TIS - INTERPRETER : Error:
[value] Analyzing a complete application starting at UB-STDIO1
main Description: ’printf’: Mismatch between the
[value] Computing initial state type expected by the conversion specifier %x and the
[value] Initial type of the argument.
state computed Type: Undefined behavior.
unspecified value representation bytes 3. See
c:10:[kernel] warning: accessing uninitialized also: C11 sec. 7.21.6.1:9, J.2:1 item 153
left-value: assert \initialized(&i); at
printf(unspecified value representation bytes 3.c:11)
stack: main
[value] Stopping at nth alarm at main(unspecified value representation bytes 3.c:11)
[value] user
error: Degeneration occurred: at <file-scope>(<unknown>)
ISO : unclear
results are not correct for lines of code that can be
reached from the degeneration point. These two observations weakly suggest that Clang forgets
KCC : that any part of the int is an unspecified value after a write
i=0x0 of one of the representation bytes.
i=0x0
Error: UB-CEE2 3.3 Structure and Union Padding
Description: Indeterminate value used in an
[Question 1/15 of our What is C in practice? (Cerberus
expression.
survey v2)50 relates to structure padding]
Type: Undefined behavior.
See also: C11 sec.
6.2.4, 6.7.9, 6.8, J.2:1 item 11
Standard The standard discusses two quite different kinds
at
of padding: padding bits within the representation of integer
main(unspecified value representation bytes 3.c:10)
at 50 www.cl.cam.ac.uk/ pes20/cerberus/
~
notes50-survey-discussion.html

114 2016/3/17
types (6.2.6.2), and padding bytes in structures and unions. as padding but which they would have to take care never to
We focus here on the latter51 . overwrite; we call this metadata padding.

Padding can be added by an implementation between the Usage For the current processors that we are familiar with,
members of a structure, or at the end of a structure or union, we are not aware of any cases of (b) that are not handled
but not before the first member: by fixing the type size. Simple code with GCC does not
seem to exhibit (a) except for struct copying, but we ex-
• 6.7.2.1p15 “[...] There may be unnamed padding within
pect that compilers using vector instructions for optimisation
a structure object, but not at its beginning.” might well do so. It’s possible that implementations over-
• 6.7.2.1p17 “There may be unnamed padding at the end write union member padding in a similar way. We would
of a structure or union.” like more ground-truth data on all this.
Padding might be needed simply to ensure alignment: Semantics Space padding is semantically more interesting
that alignment padding as the semantics has to permit the
(1) for performance, where some machine instructions are
implementation to overwrite those padding bytes. There are
significantly faster when used on suitably aligned data
two main options:
than on misaligned data; or
(2) for correctness, where the machine instruction has the (i) regard the padding bytes as holding unspecified values
right width but must be suitably aligned to operate cor- throughout the lifetime of the object, or
rectly (e.g. for some synchronisation instructions). (ii) write unspecified values to the padding bytes when any
member of the object is written (or perhaps (ii0 ): when an
or to ensure that there is some spare space that the imple-
adjacent member is written)
mentation is free to overwrite:
Standard The standard is unclear which of these it
(a) for performance, where it is faster to use a wider machine
chooses. On the one hand, we have:
memory access than the actual size of the data, and hence
for the wider stores one has to allow spare space (other- • 6.2.6.1p6 “When a value is stored in an object of struc-
wise the implementation would be wrong for concurrent ture or union type, including in a member object, the
accesses — just reading and writing back adjacent data bytes of the object representation that correspond to any
would be incorrect); or padding bytes take unspecified values.51) [...]” Footnote
(b) for correctness, where the machine does not have an 51: “Thus, for example, structure assignment need not
instruction that touches just the right width of footprint, copy any padding bits.”
and so again one needs spare space (e.g. again for some that suggests (ii), with similar text for object member
synchronisation instructions — though some cases of padding:
those are dealt with not by padding but by making the
• 6.2.6.1p7 “When a value is stored in a member of an
size of the relevant atomic type larger than one would
expect from its precision). object of union type, the bytes of the object representation
that do not correspond to that member but do correspond
We call these alignment padding and space padding respec- to other members take unspecified values.”
tively. There is also the space between the end of a union’s
This is reiterated in J.1 Unspecified behavior p1: “The fol-
current member and the size of the maximally sized mem-
lowing are unspecified:”
ber of its union type. The standard does not refer to this as
padding, writing instead (6.2.6.1p7) “...the bytes of the ob- ...
ject representation that do not correspond to that member • “The value of padding bytes when storing values in struc-
but do correspond to other members...”, but it behaves in a tures or unions (6.2.6.1).”
similar way; we call it union member padding.
• “The values of bytes that correspond to union members
It is also conceivable that the compiler would reserve
space in a structure or union type for its own purposes, other than the one last stored into (6.2.6.1).”
e.g. to store a runtime representation of the name of the most ...
recently written union member, or other bounds-checking or
Then the 6.7.9p10 text on initialization says that in some
debug information, which would appear to the programmer
circumstances padding is initialized “to zero bits”: 6.7.9p10
51 In
“If an object that has automatic storage duration is not
fact, in the implementations we are most familiar with, there seem to
be no integer-type padding bits, and we neglect them in our semantics. The initialized explicitly, its value is indeterminate. If an object
C99 Rationale [2, p.43] refers to a machine that implements a 32-bit signed that has static or thread storage duration is not initialized
integer type with two 16-bit signed integers, with one of those two sign bits explicitly, then:
being deemed a padding bit. That machine is not named, so it is hard to tell
whether it still exists. • if it has pointer type, it is initialized to a null pointer;

115 2016/3/17
• if it has arithmetic type, it is initialized to (positive or (unsigned char*)(&s2) + offset_padding;
unsigned) zero; // can this print something other than 0xBA then the
// last line print 0xBA ?
• if it is an aggregate, every member is initialized (recur- printf("*padding2=0x%x\n",(int)*padding2);//warn
sively) according to these rules, and any padding is ini- f(&s2,&s1); //s2 = s1;
printf("*padding2=0x%x\n",(int)*padding2);
tialized to zero bits; }
• if it is a union, the first named member is initialized
(recursively) according to these rules, and any padding GCC -4.8-O0:
is initialized to zero bits;” *padding1=0xba
This suggests that one can sometimes depend on the values *padding2=0xea
of padding bytes, and hence that in the absence of writes to *padding2=0xba
the structure, they are stable. GCC -4.9-O0: . . . as above
Note that this text does not say anything about the value GCC -4.8-O2:

of padding for an object (of automatic, static, or thread padding struct copy 1.c: In function ’main’:
storage duration) that is initialized explicitly. An oversight? padding struct copy 1.c:25:9: warning: ’*((void
On the other hand, 7.24.4.1 The memcmp function implies *)&s2+1)’ is used uninitialized in this function
that padding bytes within structures always hold unspeci- [-Wuninitialized]
fied values: Footnote 310 “The contents of “holes” used as printf("*padding2=0x%x\n",(int)*pad
padding for purposes of alignment within structure objects ding2);//warn
are indeterminate. Strings shorter than their allocated space ^
and unions may also cause problems in comparison.” (even *padding1=0xba
in the standard there are no trap representations here so in- *padding2=0x0
determinate values are unspecified values). *padding2=0xba
Reading uninitialised local variables one might perhaps GCC -4.9-O2:

take to be undefined behaviour, but reading padding bytes padding struct copy 1.c: In function ’main’:
(at least bytewise) surely has to be allowed, even if com- padding struct copy 1.c:25:3: warning: ’*((void
pletely nondeterministic or symbolic-undefined with strict *)&s2+1)’ is used uninitialized in this function
computation. And should that strictness extend to making [-Wuninitialized]
a structure value an undefined value if one of its members printf("*padding2=0x%x\n",(int)*pad
is? Surely not. ding2);//warn
^
3.3.1 Q60. Can structure-copy copy padding? *padding1=0xba
U: ISO *padding2=0x0
ISO : unclear DEFACTO - USAGE: yes DEFACTO - IMPL: *padding2=0xba
GCC -5.3-O2: . . . as above
yes CERBERUS - DEFACTO: yes CHERI: yes? TIS: un-
GCC -4.8-O2- NO - STRICT- ALIASING :
clear (the test seems to fail on the first print) KCC : yes
(though also reports %x error) padding struct copy 1.c: In function ’main’:
padding struct copy 1.c:25:9: warning: ’*((void
E XAMPLE (padding_struct_copy_1.c): *)&s2+1)’ is used uninitialized in this function
#include <stdio.h> [-Wuninitialized]
#include <stddef.h> printf("*padding2=0x%x\n",(int)*pad
#include <assert.h>
#include <inttypes.h> ding2);//warn
typedef struct { char c; uint16_t u; } st; ^
int x; *padding1=0xba
void f(st* s2p, st* s1p) {
*s2p=*s1p; *padding2=0x0
} *padding2=0xba
int main() { GCC -4.9-O2- NO - STRICT- ALIASING :
// check there is a padding byte between c and u
size_t offset_padding = offsetof(st,c)+sizeof(char); padding struct copy 1.c: In function ’main’:
assert(offsetof(st,u)>offset_padding); padding struct copy 1.c:25:3: warning: ’*((void
st s1 = { .c = ’A’, .u = 0x1234 }; *)&s2+1)’ is used uninitialized in this function
unsigned char *padding1 =
(unsigned char*)(&s1) + offset_padding; [-Wuninitialized]
// printf("*padding1=0x%x\n",(int)*padding1); printf("*padding2=0x%x\n",(int)*pad
*padding1 = 0xBA; ding2);//warn
printf("*padding1=0x%x\n",(int)*padding1);
st s2; ^
unsigned char *padding2 = *padding1=0xba

116 2016/3/17
*padding2=0x0 Degeneration occurred:
*padding2=0xba results are
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above not correct for lines of code that can be reached from
CLANG 33-O0: the degeneration point.
*padding1=0xba KCC :
*padding2=0xff *padding1=0xba
*padding2=0xba Execution failed (configuration dumped)
CLANG 34-O0: . . . as above Error: UB-STDIO1
CLANG 35-O0: . . . as above Description: ’printf’: Mismatch between the type
CLANG 36-O0: . . . as above expected by the conversion specifier %x and the type of
CLANG 37-O0: . . . as above the argument.
CLANG 33-O2: Type: Undefined behavior.
*padding1=0xba See also: C11
*padding2=0x0 sec. 7.21.6.1:9, J.2:1 item 153
*padding2=0xba at
CLANG 34-O2: . . . as above printf(padding struct copy 1.c:19)
CLANG 35-O2: . . . as above at
CLANG 36-O2: . . . as above main(padding struct copy 1.c:19)
CLANG 37-O2: . . . as above at
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above <file-scope>(<unknown>)
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above Error: UB-STDIO1
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above Description:
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above ’printf’: Mismatch between the type expected by the
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above conversion specifier %x and the type of the
CLANG 37-UBSAN: argument.
*padding1=0xba Type: Undefined behavior.
*padding2=0x1f See also: C11 sec.
*padding2=0xba 7.21.6.1:9, J.2:1 item 153
CLANG 37-ASAN: at
*padding1=0xba printf(padding struct copy 1.c:25)
*padding2=0x0 at
*padding2=0xba main(padding struct copy 1.c:25)
TIS - INTERPRETER : at
[value] Analyzing a complete application starting at <file-scope>(<unknown>)
main DEFACTO : defined behaviour (printing 0xBA then two
[value] Computing initial state nondeterministic values)
[value] Initial ISO : unclear
state computed
padding struct copy 1.c:19:[value] (padding_struct_copy_2.c is the same with the padding
warning: argument (int)*padding1 has type int but format at the end of the struct:
indicates unsigned int
[value] warning: Continuing E XAMPLE (padding_struct_copy_2.c):
analysis because this seems innocuous GCC -4.8-O0:
*padding1=0xba
*padding1=0xba *padding2=0xff
*padding2=0xba
p GCC -4.9-O0: . . . as above
adding struct copy 1.c:25:[kernel] warning: accessing GCC -4.8-O2:

uninitialized left-value: assert padding struct copy 2.c: In function ’main’:


\initialized(padding2); padding struct copy 2.c:25:9: warning: ’*((void
stack: *)&s2+3)’ is used uninitialized in this function
main [-Wuninitialized]
[value] Stopping at nth alarm printf("*padding2=0x%x\n",(int)*pad
[value] user error: ding2);//warn
^

117 2016/3/17
*padding1=0xba CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
*padding2=0x0 CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
*padding2=0xba CLANG 37-UBSAN:
GCC -4.9-O2: *padding1=0xba
padding struct copy 2.c: In function ’main’: *padding2=0x1
padding struct copy 2.c:25:3: warning: ’*((void *padding2=0xba
*)&s2+3)’ is used uninitialized in this function CLANG 37-ASAN:
[-Wuninitialized] *padding1=0xba
printf("*padding2=0x%x\n",(int)*pad *padding2=0x0
ding2);//warn *padding2=0xba
^ TIS - INTERPRETER :
*padding1=0xba [value] Analyzing a complete application starting at
*padding2=0x0 main
*padding2=0xba [value] Computing initial state
GCC -5.3-O2: . . . as above [value] Initial
GCC -4.8-O2- NO - STRICT- ALIASING : state computed
padding struct copy 2.c: In function ’main’: padding struct copy 2.c:19:[value]
padding struct copy 2.c:25:9: warning: ’*((void warning: argument (int)*padding1 has type int but format
*)&s2+3)’ is used uninitialized in this function indicates unsigned int
[-Wuninitialized] [value] warning: Continuing
printf("*padding2=0x%x\n",(int)*pad analysis because this seems innocuous
ding2);//warn
^ *padding1=0xba
*padding1=0xba
*padding2=0x0 p
*padding2=0xba adding struct copy 2.c:25:[kernel] warning: accessing
GCC -4.9-O2- NO - STRICT- ALIASING : uninitialized left-value: assert
padding struct copy 2.c: In function ’main’: \initialized(padding2);
padding struct copy 2.c:25:3: warning: ’*((void stack:
*)&s2+3)’ is used uninitialized in this function main
[-Wuninitialized] [value] Stopping at nth alarm
printf("*padding2=0x%x\n",(int)*pad [value] user error:
ding2);//warn Degeneration occurred:
^ results are
*padding1=0xba not correct for lines of code that can be reached from
*padding2=0x0 the degeneration point.
*padding2=0xba KCC :
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above *padding1=0xba
CLANG 33-O0: Execution failed (configuration dumped)
*padding1=0xba Error: UB-STDIO1
*padding2=0x0 Description: ’printf’: Mismatch between the type
*padding2=0xba expected by the conversion specifier %x and the type of
CLANG 34-O0: . . . as above the argument.
CLANG 35-O0: . . . as above Type: Undefined behavior.
CLANG 36-O0: . . . as above See also: C11
CLANG 37-O0: . . . as above sec. 7.21.6.1:9, J.2:1 item 153
CLANG 33-O2: . . . as above at
CLANG 34-O2: . . . as above printf(padding struct copy 2.c:19)
CLANG 35-O2: . . . as above at
CLANG 36-O2: . . . as above main(padding struct copy 2.c:19)
CLANG 37-O2: . . . as above at
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above <file-scope>(<unknown>)
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above Error: UB-STDIO1
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above Description:

118 2016/3/17
’printf’: Mismatch between the type expected by the GCC -4.9-O2:
conversion specifier %x and the type of the padding struct members copy.c: In function ’main’:
argument. padding struct members copy.c:20:3: warning: ’*((void
Type: Undefined behavior. *)&s2+1)’ is used uninitialized in this function
See also: C11 sec. [-Wuninitialized]
7.21.6.1:9, J.2:1 item 153 printf("*padding2=0x%x\n",(int)*pad
at ding2);//warn
printf(padding struct copy 2.c:25) ^
at *padding1=0xba
main(padding struct copy 2.c:25) *padding2=0x0
at *padding2=0x0
<file-scope>(<unknown>) GCC -5.3-O2: . . . as above
GCC -4.8-O2- NO - STRICT- ALIASING :
padding struct members copy.c: In function ’main’:
However, slightly surprisingly, in the following example padding struct members copy.c:20:9: warning: ’*((void
neither GCC nor Clang appear to recognise that copying the *)&s2+1)’ is used uninitialized in this function
two members of the structure (with one-byte and two-byte [-Wuninitialized]
instructions) could be optimised to a single four-byte copy: printf("*padding2=0x%x\n",(int)*pad
ding2);//warn
E XAMPLE (padding_struct_members_copy.c): ^
#include <stdio.h> *padding1=0xba
#include <stddef.h> *padding2=0x0
#include <assert.h>
#include <inttypes.h> *padding2=0x0
typedef struct { char c; uint16_t u; } st; GCC -4.9-O2- NO - STRICT- ALIASING :
int x; padding struct members copy.c: In function ’main’:
int main() {
// check there is a padding byte between c and u padding struct members copy.c:20:3: warning: ’*((void
size_t offset_padding = offsetof(st,c)+sizeof(char); *)&s2+1)’ is used uninitialized in this function
assert(offsetof(st,u)>offset_padding); [-Wuninitialized]
st s1 = { .c = ’A’, .u = 0x1234 };
unsigned char *padding1 = printf("*padding2=0x%x\n",(int)*pad
(unsigned char*)(&s1) + offset_padding; ding2);//warn
// printf("*padding1=0x%x\n",(int)*padding1); ^
*padding1 = 0xBA;
printf("*padding1=0x%x\n",(int)*padding1); *padding1=0xba
st s2; *padding2=0x0
unsigned char *padding2 = *padding2=0x0
(unsigned char*)(&s2) + offset_padding;
printf("*padding2=0x%x\n",(int)*padding2);//warn GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
s2.c = s1.c; CLANG 33-O0:
s2.u = s1.u; *padding1=0xba
printf("*padding2=0x%x\n",(int)*padding2);
} *padding2=0xff
*padding2=0xff
GCC -4.8-O0: CLANG 34-O0: . . . as above
*padding1=0xba CLANG 35-O0: . . . as above
*padding2=0xea CLANG 36-O0: . . . as above
*padding2=0xea CLANG 37-O0: . . . as above
GCC -4.9-O0: . . . as above CLANG 33-O2:
GCC -4.8-O2: *padding1=0xba
padding struct members copy.c: In function ’main’: *padding2=0x0
padding struct members copy.c:20:9: warning: ’*((void *padding2=0x0
*)&s2+1)’ is used uninitialized in this function CLANG 34-O2: . . . as above
[-Wuninitialized] CLANG 35-O2: . . . as above
printf("*padding2=0x%x\n",(int)*pad CLANG 36-O2: . . . as above
ding2);//warn CLANG 37-O2: . . . as above
^ CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
*padding1=0xba CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
*padding2=0x0 CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
*padding2=0x0

119 2016/3/17
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above ’printf’: Mismatch between the type expected by the
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above conversion specifier %x and the type of the
CLANG 37-UBSAN: argument.
*padding1=0xba Type: Undefined behavior.
*padding2=0x2f See also: C11 sec.
*padding2=0x2f 7.21.6.1:9, J.2:1 item 153
CLANG 37-ASAN: at
*padding1=0xba printf(padding struct members copy.c:20)
*padding2=0x0 at
*padding2=0x0 main(padding struct members copy.c:20)
TIS - INTERPRETER : at
[value] Analyzing a complete application starting at <file-scope>(<unknown>)
main DEFACTO : defined behaviour (printing 0xBA then two
[value] Computing initial state nondeterministic values)
[value] Initial ISO : unclear
state computed
padding struct members copy.c:16:[value] padding_struct_copy_3.c is similar except with the
warning: argument (int)*padding1 has type int but format copy in a separate function:
indicates unsigned int
[value] warning: Continuing E XAMPLE (padding_struct_copy_3.c):
analysis because this seems innocuous GCC -4.8-O0:
*padding1=0xba
*padding1=0xba *padding2=0xea
*padding2=0xea
p GCC -4.9-O0: . . . as above

adding struct members copy.c:20:[kernel] warning: GCC -4.8-O2:

accessing uninitialized left-value: assert padding struct copy 3.c: In function ’main’:
\initialized(padding2); padding struct copy 3.c:24:9: warning: ’*((void
stack: *)&s2+1)’ is used uninitialized in this function
main [-Wuninitialized]
[value] Stopping at nth alarm printf("*padding2=0x%x\n",(int)*pad
[value] user error: ding2);//warn
Degeneration occurred: ^
results are *padding1=0xba
not correct for lines of code that can be reached from *padding2=0x0
the degeneration point. *padding2=0x0
KCC : GCC -4.9-O2:

*padding1=0xba padding struct copy 3.c: In function ’main’:


Execution failed (configuration dumped) padding struct copy 3.c:24:3: warning: ’*((void
Error: UB-STDIO1 *)&s2+1)’ is used uninitialized in this function
Description: ’printf’: Mismatch between the type [-Wuninitialized]
expected by the conversion specifier %x and the type of printf("*padding2=0x%x\n",(int)*pad
the argument. ding2);//warn
Type: Undefined behavior. ^
See also: C11 *padding1=0xba
sec. 7.21.6.1:9, J.2:1 item 153 *padding2=0x0
at *padding2=0x0
printf(padding struct members copy.c:16) GCC -5.3-O2: . . . as above
at GCC -4.8-O2- NO - STRICT- ALIASING :

main(padding struct members copy.c:16) padding struct copy 3.c: In function ’main’:
at padding struct copy 3.c:24:9: warning: ’*((void
<file-scope>(<unknown>) *)&s2+1)’ is used uninitialized in this function
Error: UB-STDIO1 [-Wuninitialized]
Description: printf("*padding2=0x%x\n",(int)*pad
ding2);//warn

120 2016/3/17
^ [value] warning: Continuing
*padding1=0xba analysis because this seems innocuous
*padding2=0x0
*padding2=0x0 *padding1=0xba
GCC -4.9-O2- NO - STRICT- ALIASING :
padding struct copy 3.c: In function ’main’: p
padding struct copy 3.c:24:3: warning: ’*((void adding struct copy 3.c:24:[kernel] warning: accessing
*)&s2+1)’ is used uninitialized in this function uninitialized left-value: assert
[-Wuninitialized] \initialized(padding2);
printf("*padding2=0x%x\n",(int)*pad stack:
ding2);//warn main
^ [value] Stopping at nth alarm
*padding1=0xba [value] user error:
*padding2=0x0 Degeneration occurred:
*padding2=0x0 results are
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above not correct for lines of code that can be reached from
CLANG 33-O0: the degeneration point.
*padding1=0xba KCC :
*padding2=0xff *padding1=0xba
*padding2=0xff Execution failed (configuration dumped)
CLANG 34-O0: . . . as above Error: UB-STDIO1
CLANG 35-O0: . . . as above Description: ’printf’: Mismatch between the type
CLANG 36-O0: . . . as above expected by the conversion specifier %x and the type of
CLANG 37-O0: . . . as above the argument.
CLANG 33-O2: Type: Undefined behavior.
*padding1=0xba See also: C11
*padding2=0x0 sec. 7.21.6.1:9, J.2:1 item 153
*padding2=0x0 at
CLANG 34-O2: . . . as above printf(padding struct copy 3.c:20)
CLANG 35-O2: . . . as above at
CLANG 36-O2: . . . as above main(padding struct copy 3.c:20)
CLANG 37-O2: . . . as above at
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above <file-scope>(<unknown>)
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above Error: UB-STDIO1
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above Description:
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above ’printf’: Mismatch between the type expected by the
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above conversion specifier %x and the type of the
CLANG 37-UBSAN: argument.
*padding1=0xba Type: Undefined behavior.
*padding2=0x2f See also: C11 sec.
*padding2=0x2f 7.21.6.1:9, J.2:1 item 153
CLANG 37-ASAN: at
*padding1=0xba printf(padding struct copy 3.c:24)
*padding2=0x0 at
*padding2=0x0 main(padding struct copy 3.c:24)
TIS - INTERPRETER : at
[value] Analyzing a complete application starting at <file-scope>(<unknown>)
main DEFACTO : defined behaviour (printing 0xBA then two
[value] Computing initial state nondeterministic values)
[value] Initial ISO : unclear
state computed
padding struct copy 3.c:20:[value] Nonetheless, we presume that a reasonable compiler might
warning: argument (int)*padding1 has type int but format combine member writes. And that it might be dependent
indicates unsigned int on inlining and code motion, and so that one cannot tell

121 2016/3/17
locally syntactically whether a write is “really” to a single CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
struct member or whether the padding might be affected by CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
combining it with writes of adjacent members? CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
Similarly, when we think about writing a struct member CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
to a malloc’d region, differentiating between a write of the CLANG 37-UBSAN: . . . as above
value qua the struct member and a write of the value sim- CLANG 37-ASAN: . . . as above
ply of its underlying type is problematic, as optimisations TIS - INTERPRETER :
inlining might convert the latter to the former? Unclear. [value] Analyzing a complete application starting at
main
3.3.2 Q61. After an explicit write of a padding byte, [value] Computing initial state
does that byte hold a well-defined value? (not an [value] Initial
unspecified value) state computed
U: ISO U: DEFACTO
ISO : unclear DEFACTO - USAGE: unclear – well-defined as- c1=A
sumed for security leak prevention and CAS? DEFACTO -
IMPL: unclear – well-defined? CERBERUS - DEFACTO : [value] done for function main
well-defined CHERI : well-defined? TIS : well-defined KCC :
(surprisingly so, given the previous test result) KCC: well- c1=A
defined DEFACTO : defined behaviour (printing A)
ISO : unclear
E XAMPLE (padding_unspecified_value_1.c):
#include <stdio.h> The observations (of A) don’t constrain the answer to this
#include <stddef.h>
typedef struct { char c; float f; int i; } st; question.
int main() { In the ISO standard, for objects with static, thread, or
// check there is a padding byte between c and f automatic storage durations, and leaving aside unions, for
size_t offset_padding = offsetof(st,c)+sizeof(char);
if (offsetof(st,f)>offset_padding) { each byte it’s fixed whether it’s a padding byte or not for
st s; the lifetime of the object, and one could conceivably regard
unsigned char *p = ((unsigned char*)(&s)) the padding bytes as being unspecified values irrespective of
+ offset_padding;
*p = ’A’; any explicit writes to them (for a union, the padding status
unsigned char c1 = *p; of a byte depends on which member the union “currently
// does c1 hold ’A’, not an unspecified value? contains”). But for objects with allocated storage duration,
printf("c1=%c\n",c1);
} that is at odds with the idea that a malloc’d region can be
return 0; reused.
} In practice we imagine (though without data) that “wide
writes” for a single struct member only ever extend over the
GCC -4.8-O0: preceeding and following padding (or perhaps just only the
c1=A following padding). Then the fact that concurrent access to
GCC -4.9-O0: . . . as above distinct members is allowed (§3.3.12, p.132) constrains wide
GCC -4.8-O2: . . . as above writes to not touch other members, at least in the absence of
GCC -4.9-O2: . . . as above sophisticated analysis. There is again an issue here if memcmp
GCC -5.3-O2: . . . as above or uniform hashing of structure representations is desired;
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above it is debatable what circumstances one might reasonable
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above expect those to work.
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above There is also a security-relevant issue here: one might
CLANG 33-O0: . . . as above want an assurance that potentially secret data does not leak
CLANG 34-O0: . . . as above into reads from padding bytes, and hence might (a) explicitly
CLANG 35-O0: . . . as above clear those bytes and (b) rely on the compiler not analysing
CLANG 36-O0: . . . as above that those bytes contain unspecified values and hence using
CLANG 37-O0: . . . as above values that happen to be found in registers in place of reads.
CLANG 33-O2: . . . as above
CLANG 34-O2: . . . as above
CLANG 35-O2: . . . as above
CLANG 36-O2: . . . as above
CLANG 37-O2: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above

122 2016/3/17
3.3.3 Q62. After an explicit write of a padding byte [value] Analyzing a complete application starting at
followed by a write to the whole structure, does main
the padding byte hold a well-defined value? (not [value] Computing initial state
an unspecified value) [value] Initial
U: ISO state computed
ISO : unclear DEFACTO - USAGE: unspecified value padding unspecified value 2.c:12:[kernel]
DEFACTO - IMPL: unspecified value CERBERUS - warning: undefined multiple accesses in expression.
DEFACTO : unspecified value CHERI : unspecified value
assert \separated(& constr expr 0.c,
TIS :test not supported (tis bug, reported and fixed) KCC :
(reports error for printing signed int with %x) & constr expr 0);
stack: main
E XAMPLE (padding_unspecified_value_2.c): [value]
#include <stdio.h> done for function main
#include <stddef.h> KCC :
typedef struct { char c; float f; int i; } st;
int main() { Execution failed (configuration dumped)
// check there is a padding byte between c and f Error: UB-STDIO1
size_t offset_padding = offsetof(st,c)+sizeof(char); Description: ’printf’: Mismatch between the type
if (offsetof(st,f)>offset_padding) {
st s; expected by the conversion specifier %x and the type of
unsigned char *p = the argument.
((unsigned char*)(&s)) + offset_padding; Type: Undefined behavior.
*p = ’B’;
s = (st){ .c=’E’, .f=1.0, .i=1}; See also: C11
unsigned char c2 = *p; sec. 7.21.6.1:9, J.2:1 item 153
// does c2 hold ’B’, not an unspecified value? at
printf("c2=0x%x\n",(int)c2);
} printf(padding unspecified value 2.c:15)
return 0; at
} main(padding unspecified value 2.c:15)
at
GCC -4.8-O0: <file-scope>(<unknown>)
c2=0x42 DEFACTO : defined behaviour (printing a nondeterministic
GCC -4.9-O0: . . . as above value)
GCC -4.8-O2: . . . as above ISO : unclear (printing an unspecified value?)
GCC -4.9-O2: . . . as above
GCC -5.3-O2: . . . as above Here we see reads both of B and of 0x0.
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above Changing the example to one in which the compiler might
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above naturally use a 4-byte copy, we sometimes see an overwrite
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above of the padding byte on the write of the struct value:
CLANG 33-O0:
c2=0x0
CLANG 34-O0: . . . as above E XAMPLE (padding_unspecified_value_3.c):
CLANG 35-O0: . . . as above
CLANG 36-O0: . . . as above #include <stdio.h>
#include <stddef.h>
CLANG 37-O0: . . . as above #include <inttypes.h>
CLANG 33-O2: . . . as above #include <assert.h>
CLANG 34-O2: . . . as above typedef struct { char c; uint16_t u; } st;
int main() {
CLANG 35-O2: . . . as above // check there is a padding byte between c and u
CLANG 36-O2: . . . as above size_t offset_padding = offsetof(st,c)+sizeof(char);
CLANG 37-O2: . . . as above assert(offsetof(st,u)>offset_padding);
st s;
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above unsigned char *p =
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above ((unsigned char*)(&s)) + offset_padding;
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above *p = ’B’;
s = (st){ .c=’E’, .u=1};
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above unsigned char c = *p;
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above // does c hold ’B’, not an unspecified value?
CLANG 37-UBSAN: . . . as above printf("c=0x%x\n",(int)c);
return 0;
CLANG 37-ASAN: . . . as above }
TIS - INTERPRETER :

123 2016/3/17
GCC -4.8-O0: printf(padding unspecified value 3.c:17)
c=0x42 at
GCC -4.9-O0: . . . as above main(padding unspecified value 3.c:17)
GCC -4.8-O2: . . . as above at
GCC -4.9-O2: . . . as above <file-scope>(<unknown>)
GCC -5.3-O2: . . . as above DEFACTO : defined behaviour (printing a nondeterministic
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above value)
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above ISO : unclear (printing an unspecified value?)
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O0: and again here, copying another struct value on top as a
c=0xff whole:
CLANG 34-O0: . . . as above E XAMPLE (padding_unspecified_value_4.c):
CLANG 35-O0: . . . as above
#include <stdio.h>
CLANG 36-O0: . . . as above #include <stddef.h>
CLANG 37-O0: . . . as above #include <inttypes.h>
CLANG 33-O2:
#include <assert.h>
typedef struct { char c; uint16_t u; } st;
c=0x0 int main() {
CLANG 34-O2: . . . as above // check there is a padding byte between c and u
CLANG 35-O2: . . . as above size_t offset_padding = offsetof(st,c)+sizeof(char);
assert(offsetof(st,u)>offset_padding);
CLANG 36-O2: . . . as above st s;
CLANG 37-O2: . . . as above unsigned char *p =
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above ((unsigned char*)(&s)) + offset_padding;
*p = ’B’;
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above st s2 = { .c=’E’, .u=1};
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above s = s2;
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
unsigned char c = *p;
// does c hold ’B’, not an unspecified value?
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above printf("c=0x%x\n",(int)c);
CLANG 37-UBSAN: return 0;
c=0x1f }
CLANG 37-ASAN:
c=0x0 GCC -4.8-O0:

TIS - INTERPRETER : c=0xea


[value] Analyzing a complete application starting at GCC -4.9-O0: . . . as above

main GCC -4.8-O2:

[value] Computing initial state padding unspecified value 4.c: In function ’main’:
[value] Initial padding unspecified value 4.c:15:5: warning: ’*((void
state computed *)&s+1)’ is used uninitialized in this function
padding unspecified value 3.c:14:[kernel] [-Wuninitialized]
warning: undefined multiple accesses in expression. s = s2;
^
assert \separated(& constr expr 0.c, c=0x0
& constr expr 0); GCC -4.9-O2: . . . as above

stack: main GCC -5.3-O2: . . . as above


[value] GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
done for function main GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
KCC : GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above

Execution failed (configuration dumped) CLANG 33-O0:

Error: UB-STDIO1 c=0x0


Description: ’printf’: Mismatch between the type CLANG 34-O0: . . . as above
expected by the conversion specifier %x and the type of CLANG 35-O0: . . . as above
the argument. CLANG 36-O0: . . . as above

Type: Undefined behavior. CLANG 37-O0: . . . as above

See also: C11 CLANG 33-O2: . . . as above

sec. 7.21.6.1:9, J.2:1 item 153 CLANG 34-O2: . . . as above

at CLANG 35-O2: . . . as above


CLANG 36-O2: . . . as above

124 2016/3/17
CLANG 37-O2: . . . as above E XAMPLE (padding_unspecified_value_7.c):
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above #include <stdio.h>
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above #include <stddef.h>
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above typedef struct { char c; float f; int i; } st;
int main() {
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
// check there is a padding byte between c and f
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above size_t offset_padding = offsetof(st,c)+sizeof(char);
CLANG 37-UBSAN: . . . as above if (offsetof(st,f)>offset_padding) {
st s;
CLANG 37-ASAN: . . . as above
unsigned char *p =
TIS - INTERPRETER : ((unsigned char*)(&s)) + offset_padding;
[value] Analyzing a complete application starting at *p = ’C’;
s.c = ’A’;
main s.f = 1.0;
[value] Computing initial state s.i = 42;
[value] Initial unsigned char c3 = *p;
// does c3 hold ’C’, not an unspecified value?
state computed printf("c3=%c\n",c3);
padding unspecified value 4.c:18:[kernel] }
warning: accessing uninitialized left-value: assert return 0;
}
\initialized(&c);
stack: main
[value] GCC -4.8-O0:

Stopping at nth alarm c3=C


[value] user error: Degeneration GCC -4.9-O0: . . . as above
occurred: GCC -4.8-O2: . . . as above
results are not correct GCC -4.9-O2: . . . as above
for lines of code that can be reached from the GCC -5.3-O2: . . . as above
degeneration point. GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
KCC : GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
Execution failed (configuration dumped) GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above

Error: UB-STDIO1 CLANG 33-O0: . . . as above

Description: ’printf’: Mismatch between the type CLANG 34-O0: . . . as above

expected by the conversion specifier %x and the type of CLANG 35-O0: . . . as above

the argument. CLANG 36-O0: . . . as above

Type: Undefined behavior. CLANG 37-O0: . . . as above

See also: C11 CLANG 33-O2: . . . as above

sec. 7.21.6.1:9, J.2:1 item 153 CLANG 34-O2: . . . as above

at CLANG 35-O2: . . . as above

printf(padding unspecified value 4.c:18) CLANG 36-O2: . . . as above

at CLANG 37-O2: . . . as above

main(padding unspecified value 4.c:18) CLANG 33-O2- NO - STRICT- ALIASING : . . . as above

at CLANG 34-O2- NO - STRICT- ALIASING : . . . as above

<file-scope>(<unknown>) CLANG 35-O2- NO - STRICT- ALIASING : . . . as above

DEFACTO : defined behaviour (printing a nondeterministic CLANG 36-O2- NO - STRICT- ALIASING : . . . as above

value) CLANG 37-O2- NO - STRICT- ALIASING : . . . as above

ISO : unclear (printing an unspecified value?) CLANG 37-UBSAN: . . . as above


CLANG 37-ASAN: . . . as above
TIS - INTERPRETER :
3.3.4 Q63. After an explicit write of a padding byte [value] Analyzing a complete application starting at
followed by a write to adjacent members of the main
structure, does the padding byte hold a [value] Computing initial state
well-defined value? (not an unspecified value) [value] Initial
U: ISO U: DEFACTO state computed
ISO : unclear DEFACTO - USAGE: unclear – unspecified
value? DEFACTO - IMPL: unclear – unspecified value? c3=C
CERBERUS - DEFACTO : unspecified value CHERI : unclear
– unspecified value? TIS: well-defined value KCC : well- [value] done for function main
defined value KCC :

125 2016/3/17
c3=C CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
DEFACTO : unspecified value CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
ISO : unclear (printing an unspecified value?) CLANG 37-UBSAN: . . . as above
CLANG 37-ASAN: . . . as above
TIS - INTERPRETER :
3.3.5 Q64. After an explicit write of zero to a padding
[value] Analyzing a complete application starting at
byte followed by a write to adjacent members of
main
the structure, does the padding byte hold a
[value] Computing initial state
well-defined zero value? (not an unspecified
[value] Initial
value)
state computed
U: ISO U: DEFACTO padding unspecified value 8.c:17:[value]
ISO : unclear DEFACTO - USAGE: unclear DEFACTO - warning: argument (int)c3 has type int but format
IMPL: unclear CERBERUS - DEFACTO : unspecified value indicates unsigned int
CHERI : unspecified value TIS : well-defined zero KCC : [value] warning: Continuing
well-defined zero (though also reports %x error) analysis because this seems innocuous
E XAMPLE (padding_unspecified_value_8.c):
c3=0x0
#include <stdio.h>
#include <stddef.h>
typedef struct { char c; float f; int i; } st; [value]
int main() { done for function main
// check there is a padding byte between c and f
size_t offset_padding = offsetof(st,c)+sizeof(char); KCC :
if (offsetof(st,f)>offset_padding) { c3=0x0
st s; Error: UB-STDIO1
unsigned char *p =
((unsigned char*)(&s)) + offset_padding; Description: ’printf’: Mismatch between the type
*p = 0; expected by the conversion specifier %x and the type of
s.c = ’A’; the argument.
s.f = 1.0;
s.i = 42; Type: Undefined behavior.
unsigned char c3 = *p; See also: C11
// does c3 hold 0, not an unspecified value? sec. 7.21.6.1:9, J.2:1 item 153
printf("c3=0x%x\n",c3);
} at
return 0; printf(padding unspecified value 8.c:17)
} at
main(padding unspecified value 8.c:17)
GCC -4.8-O0: at
c3=0x0 <file-scope>(<unknown>)
GCC -4.9-O0: . . . as above DEFACTO : unspecified value
GCC -4.8-O2: . . . as above ISO : unclear (printing an unspecified value?)
GCC -4.9-O2: . . . as above
GCC -5.3-O2: . . . as above (There was a typo c in an earlier version of this test.)
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above This is perhaps the most relevant of these cases in prac-
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above tice, covering the case where the whole footprint of the struct
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above has been filled with zero before use, and also covering the
CLANG 33-O0: . . . as above case where all members of the struct have been written (and
CLANG 34-O0: . . . as above hence where compilers might coalesce the writes). By re-
CLANG 35-O0: . . . as above quiring the explicit write to be of zero, compilers could im-
CLANG 36-O0: . . . as above plement this either by preserving the in-memory padding
CLANG 37-O0: . . . as above byte value or by writing a constant zero to it. Whether that
CLANG 33-O2: . . . as above would be sound w.r.t. actual practice is unclear.
CLANG 34-O2: . . . as above
CLANG 35-O2: . . . as above
3.3.6 Q65. After an explicit write of a padding byte
CLANG 36-O2: . . . as above
followed by a write to a non-adjacent member of
CLANG 37-O2: . . . as above
the whole structure, does the padding byte hold a
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
well-defined value? (not an unspecified value)
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above U: ISO U: DEFACTO
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above ISO : unclear DEFACTO - USAGE: well-defined value?

126 2016/3/17
DEFACTO - IMPL: well-defined value? CERBERUS - [value] done for function main
DEFACTO : well-defined value CHERI : well-defined value? KCC :
TIS : well-defined value KCC : well-defined value c3=C
DEFACTO : defined behaviour (printing C)
E XAMPLE (padding_unspecified_value_5.c):
ISO : unclear (printing an unspecified value?)
#include <stdio.h>
#include <stddef.h>
typedef struct { char c; float f; int i; } st;
These observations (of C) don’t constrain the answer to this
int main() { question.
// check there is a padding byte between c and f
size_t offset_padding = offsetof(st,c)+sizeof(char); 3.3.7 Q66. After an explicit write of a padding byte
if (offsetof(st,f)>offset_padding) {
st s;
followed by a writes to adjacent members of the
unsigned char *p = whole structure, but accessed via pointers to the
((unsigned char*)(&s)) + offset_padding; members rather than via the structure, does the
*p = ’C’;
s.i = 42;
padding byte hold a well-defined value? (not an
unsigned char c3 = *p; unspecified value)
// does c3 hold ’C’, not an unspecified value?
printf("c3=%c\n",c3); U: ISO U: DEFACTO
} ISO : unclear DEFACTO - USAGE: well-defined value?
return 0; DEFACTO - IMPL: well-defined value? CERBERUS -
}
DEFACTO : well-defined value CHERI : well-defined value?
TIS : well-defined value KCC : well-defined value
GCC -4.8-O0:
c3=C E XAMPLE (padding_unspecified_value_6.c):
GCC -4.9-O0: . . . as above #include <stdio.h>
GCC -4.8-O2: . . . as above #include <stddef.h>
void g(char *c, float *f) {
GCC -4.9-O2: . . . as above
*c=’A’;
GCC -5.3-O2: . . . as above *f=1.0;
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above }
typedef struct { char c; float f; int i; } st;
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above int main() {
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above // check there is a padding byte between c and f
CLANG 33-O0: . . . as above size_t offset_padding = offsetof(st,c)+sizeof(char);
if (offsetof(st,f)>offset_padding) {
CLANG 34-O0: . . . as above
st s;
CLANG 35-O0: . . . as above unsigned char *p =
CLANG 36-O0: . . . as above ((unsigned char*)(&s)) + offset_padding;
*p = ’D’;
CLANG 37-O0: . . . as above
g(&s.c, &s.f);
CLANG 33-O2: . . . as above unsigned char c4 = *p;
CLANG 34-O2: . . . as above // does c4 hold ’D’, not an unspecified value?
printf("c4=%c\n",c4);
CLANG 35-O2: . . . as above
}
CLANG 36-O2: . . . as above return 0;
CLANG 37-O2: . . . as above }
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above GCC -4.8-O0:
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above c4=D
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above GCC -4.9-O0: . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above GCC -4.8-O2: . . . as above
CLANG 37-UBSAN: . . . as above GCC -4.9-O2: . . . as above
CLANG 37-ASAN: . . . as above GCC -5.3-O2: . . . as above
TIS - INTERPRETER : GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
[value] Analyzing a complete application starting at GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
main GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
[value] Computing initial state CLANG 33-O0: . . . as above
[value] Initial CLANG 34-O0: . . . as above
state computed CLANG 35-O0: . . . as above
CLANG 36-O0: . . . as above
c3=C CLANG 37-O0: . . . as above
CLANG 33-O2: . . . as above

127 2016/3/17
CLANG 34-O2: . . . as above GCC -4.9-O0: . . . as above
CLANG 35-O2: . . . as above GCC -4.8-O2: . . . as above
CLANG 36-O2: . . . as above GCC -4.9-O2: . . . as above
CLANG 37-O2: . . . as above GCC -5.3-O2: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above CLANG 33-O0: . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above CLANG 34-O0: . . . as above
CLANG 37-UBSAN: . . . as above CLANG 35-O0: . . . as above
CLANG 37-ASAN: . . . as above CLANG 36-O0: . . . as above
TIS - INTERPRETER : CLANG 37-O0: . . . as above
[value] Analyzing a complete application starting at CLANG 33-O2: . . . as above
main CLANG 34-O2: . . . as above
[value] Computing initial state CLANG 35-O2: . . . as above
[value] Initial CLANG 36-O2: . . . as above
state computed CLANG 37-O2: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
c4=D CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
[value] done for function main CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
KCC : CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
c4=D CLANG 37-UBSAN: . . . as above
DEFACTO : defined behaviour (printing D) CLANG 37-ASAN: . . . as above
ISO : unclear (printing an unspecified value?) TIS - INTERPRETER :
[value] Analyzing a complete application starting at
These observations (of D) don’t constrain the answer to main
this question. [value] Computing initial state
[value] Initial
3.3.8 Q67. Can one use a malloc’d region for a union state computed
that is just big enough to hold the subset of padding subunion 1.c:8:[value] allocating
members that will be used? variable malloc main l8
U: ISO U: DEFACTO D: ISO - VS - DEFACTO padding subunion 1.c:10:[value
ISO : unclear – no? DEFACTO - USAGE: yes? DEFACTO - ] warning: argument (int)u->s1.c1 has type int but
IMPL: yes? CERBERUS - DEFACTO : no CHERI : unclear? format indicates unsigned int
TIS : yes KCC : no (flags UB Trying to write outside the [value] warning:
bounds of an object) Continuing analysis because this seems
One of our respondents remarks that it is an acceptable innocuous
idiom, if one has a union but knows that only some of the
members will be used, to malloc something only big enough u->s1.c1=0x61
for those members.
[value] done for function
E XAMPLE (padding_subunion_1.c): main
#include <stdio.h> KCC :
#include <stdlib.h> u->s1.c1=0x61
typedef struct { char c1; } st1;
typedef struct { float f2; } st2; Error: UB-EIO2
typedef union { st1 s1; st2 s2; } un; Description: Trying to write outside the bounds of an
int main() { object.
// is this free of undefined behaviour?
un* u = (un*)malloc(sizeof(st1)); Type: Undefined behavior.
u->s1.c1 = ’a’; See also: C11 sec.
printf("u->s1.c1=0x%x\n",(int)u->s1.c1); 6.5.6:8, J.2:1 item 47
}
at main(padding subunion 1.c:9)

GCC -4.8-O0: at <file-scope>(<unknown>)


u->s1.c1=0x61

128 2016/3/17
Error: CLANG 36-O2: . . . as above
UB-STDIO1 CLANG 37-O2: . . . as above
Description: ’printf’: Mismatch between the CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
type expected by the conversion specifier %x and the CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
type of the argument. CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
Type: Undefined behavior. CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
See CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
also: C11 sec. 7.21.6.1:9, J.2:1 item 153 CLANG 37-UBSAN: . . . as above
at CLANG 37-ASAN: . . . as above
printf(padding subunion 1.c:10) TIS - INTERPRETER :
at [value] Analyzing a complete application starting at
main(padding subunion 1.c:10) main
at [value] Computing initial state
<file-scope>(<unknown>) [value] Initial
DEFACTO : undefined behaviour state computed
ISO : unclear - undefined behaviour? padding subunion 2.c:12:[value]
allocating variable malloc main l12
If that is supported, then presumably one can rely on the padding subunion 2
compiler, for a union member write, not writing beyond the .c:19:[value] warning: argument (int)c has type int but
footprint of that member: format indicates unsigned int
[value] warning:
E XAMPLE (padding_subunion_2.c):
Continuing analysis because this seems
#include <stdio.h>
#include <stdlib.h> innocuous
#include <assert.h>
typedef struct { char c1; } st1; c=0x42
typedef struct { float f2; } st2;
typedef union { st1 s1; st2 s2; } un;
int main() { [value] done for function main
// check that st2 is bigger than st1 KCC :
// (otherwise the test is uninteresting)
assert(sizeof(st2) > sizeof(st1)); Execution failed (configuration dumped)
// is this free of undefined behaviour? Error: UB-STDIO1
unsigned char* p = malloc(sizeof(st1)+sizeof(int)); Description: ’printf’: Mismatch between the type
un* pu = (un*)p;
char *pc = (char*)(p + sizeof(st1)); expected by the conversion specifier %x and the type of
*pc=’B’; the argument.
pu->s1.c1 = ’A’; Type: Undefined behavior.
// is this guaranteed to read ’B’?
unsigned char c = *pc; See also: C11
printf("c=0x%x\n",(int)c); sec. 7.21.6.1:9, J.2:1 item 153
} at
printf(padding subunion 2.c:19)
GCC -4.8-O0: at
c=0x42 main(padding subunion 2.c:19)
GCC -4.9-O0: . . . as above at
GCC -4.8-O2: . . . as above <file-scope>(<unknown>)
GCC -4.9-O2: . . . as above DEFACTO : defined behaviour (printing a nondeterministic
GCC -5.3-O2: . . . as above value)
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above ISO : unclear
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above But that is at odds with the idea that after writing a union
CLANG 33-O0: . . . as above member, the footprint of the union holds unspecified values
CLANG 34-O0: . . . as above beyond the footprint of that member.
CLANG 35-O0: . . . as above If one does want this to be allowed, should be be allowed
CLANG 36-O0: . . . as above only when the lvalue is manifestly part of the union, or is it
CLANG 37-O0: . . . as above just a fact about struct writes, that they are never widened
CLANG 33-O2: . . . as above (very much or at all)?
CLANG 34-O2: . . . as above
CLANG 35-O2: . . . as above

129 2016/3/17
3.3.9 More remarks on padding prefixes up to one member past the last one used for write
One respondent remarks: accesses.
• The C frontend of Clang will make packed structs with There is also an interaction between padding and the def-
i8 members wherever padding is needed (because the IR inition of data races: should a programmer access to padding
is too underspecified). So the mid-level optimisers don’t be regarded as racing with a non-happens-before-related
know what’s padding and what’s not write to any member of the structure, or to an adjacent (or
preceding) member of the structure?
• A struct copy might really emit particular loads and stores
for a small struct (rather than a memcpy); in that case it Padding also relates to memcmp and to related functions,
wouldn’t copy the padding. e.g. hash functions that hash all the representation bytes of
• Doing wide writes to narrow members was mostly an al-
a structure. The 7.24.4.1 memcmp text quoted above suggests
that memcmp over structures that contain padding is not use-
pha thing? Not sure on x86 if there are shorter encod-
ful, and with (i), in our symbolic, strict interpretation of un-
ings that do that. Something in LLVM “scalar evolution”
specified values (2b of §3.2, p.95) it (and hash functions) will
optimisation might do this, but probably only when they
return the unspecified value for all such. But it appears that
know they’re working over a bunch of members.
in at least some cases in practice one relies on the padding
• He hasn’t actually seen generic hash-all-the-bytes-of- have been initialised and not overwritten.
a-struct code. Maybe for deduplication and content-
addressable stores? Also for encrypting structs and do- 3.3.10 Q68. Can the user make a copy of a structure or
ing CRCs. But the only code he knows care about this union by copying just the representation bytes
use byte arrays or packed structs. Another respondent re- of its members and writing junk into the
marks he thinks he has seen code that does something padding bytes?
like this - in one of the SPEC CPU2006 benchmarks.
With respect to the semantic options outlined earlier, with ISO :yes? (though not made explicit) DEFACTO - USAGE:
(i), continuously unspecified values for padding bytes, c1 yes DEFACTO - IMPL: yes CERBERUS - DEFACTO : yes
gets an unspecified value despite the fact that ’A’ was just CHERI : yes TIS : yes KCC : (fails with a mistaken OOB
written to the address that c1 is read from. And c2, c3, and pointer UB)
c4 are likewise all unspecified values. We also have to ask whether the compiler can use padding
With (ii), c1 is guaranteed to get ’A’, but c2 gets an bytes for its own purposes, e.g. to hold some array bounds
unspecified value, as the structure members are all written to information or dynamic representations of union tags. In
after the write of *p=’B’. c3 similarly gets an unspecified other words, is it legal to copy a structure or union by
value due to the intervening write of s.i, despite the fact copying just the representation bytes of its member(s), and
that i is not adjacent to the padding pointed to by p. writing junk into the padding bytes?
With (ii0 ), c2 gets an unspecified value but c3 is guaran- E XAMPLE (padding_struct_copy_of_representation_bytes.c):
teed to get ’C’. #include <stdio.h>
Finally, with either (ii) or (ii0 ), we believe that c4 should #include <stddef.h>
be guaranteed to get ’D’, unaffected by the writes within #include <string.h>
typedef struct { char c; float f; } st;
members of s that are performed by f (which might be in a int main() {
different compilation unit). st s1 = {.c = ’A’, .f = 1.0 };
st s2;
For union member padding, we presume that the standard memcpy(&(s2.c), &(s1.c), sizeof(char));
semantics should synthesise explicit writes of undefined val- memset(&(s2.c)+sizeof(char),’X’,
ues whenever a short member is written. But if compilers offsetof(st,f)-offsetof(st,c)-sizeof(char));
memcpy(&(s2.f), &(s1.f), sizeof(float));
don’t walk over that space, the concrete semantics need not //memset(&(s2.f)+sizeof(float),’Y’,
and both can leave it stable inbetween. // sizeof(st)-offsetof(st,f)-sizeof(float));
// is s2 now a copy of s1?
If compilers ever do write to structure padding, then this printf("s2.c=%c s2.f=%f\n",s2.c,s2.f);
interacts with the use of a pointer to access a structure with a }
similar prefix, illustrated in Example cast_struct_same_
prefix.c of §2.15.1 (p.77). The most plausible case seems GCC -4.8-O0:
to be for a compiler to make a wider-than-expected write s2.c=A s2.f=1.000000
starting at the base address of the member representation but GCC -4.9-O0: . . . as above
continuing strictly beyond it, but the padding after a struc- GCC -4.8-O2: . . . as above
ture member is determined (in the common ABIs, as dis- GCC -4.9-O2: . . . as above
cussed above) by the alignment requirement of the subse- GCC -5.3-O2: . . . as above
quent member, so the structures would have to have similar GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above

130 2016/3/17
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above -match/workspace/rv-match/c-semantics-plugin/src/main/oc
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above aml/../c-semantics/x86 64-linux-gcc-glibc/src/string.c:3
CLANG 33-O0: . . . as above 1)
CLANG 34-O0: . . . as above at main(padding struct copy of representation bytes
CLANG 35-O0: . . . as above .c:9)
CLANG 36-O0: . . . as above at <file-scope>(<unknown>)
CLANG 37-O0: . . . as above DEFACTO : defined behaviour (s2.c=A s2.f=1.000000)
CLANG 33-O2: . . . as above ISO : defined behaviour (s2.c=A s2.f=1.000000)
CLANG 34-O2: . . . as above
CLANG 35-O2: . . . as above We are not aware of any implementations that use padding
CLANG 36-O2: . . . as above bytes in that way, and for a de facto sematics it should be
CLANG 37-O2: . . . as above legal to copy a structure or union by just copying the member
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above representation bytes.
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
3.3.11 Q69. Can one read an object as aligned words
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
without regard for the fact that the object’s
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
extent may not include all of the last word?
CLANG 37-UBSAN: . . . as above D: ISO - VS - DEFACTO
CLANG 37-ASAN: . . . as above ISO : no DEFACTO - USAGE: yes DEFACTO - IMPL: yes
TIS - INTERPRETER : CERBERUS - DEFACTO : no? CHERI : ? TIS : no (flags
[value] Analyzing a complete application starting at OOB read) KCC: flags UB for a pointer conversion align-
main ment (arguably correctly), UB for an effective type error (de-
[value] Computing initial state batable), and an OOB read (mistaken)
[value] Initial [Question 14/15 of our What is C in practice? (Cerberus
state computed survey v2)52 relates to this.]
This is a question from the CHERI ASPLOS paper, where
s2.c=A s2.f=1.000000 they write: “This is used as an optimization for strlen() in
FreeBSD libc. While this is undefined behavior in C, it works
[value] done for in systems with pagebased memory protection mechanisms,
function main but not in CHERI where objects have byte granularity. We
KCC : have found this idiom only in FreeBSD’s libc, as reported by
Execution failed (configuration dumped) valgrind.”
Error: UB-CEA1
E XAMPLE (cheri_08_last_word.c):
Description: A pointer (or array subscript) outside the
bounds of an object. #include <assert.h>
#include <stdio.h>
Type: Undefined behavior. #include <inttypes.h>
See also: char c[5];
C11 sec. 6.5.6:8, J.2:1 item 46 int main() {
char *cp = &(c[0]);
at assert(sizeof(uint32_t) == 4);
memset(/var/lib/jenkins/jobs/c-semantics-rv-match/worksp uint32_t x0 = *((uint32_t *)cp);
ace/rv-match/c-semantics-plugin/src/main/ocaml/../c-sema // does this have defined behaviour?
uint32_t x1 = *((uint32_t *)(cp+4));
ntics/x86 64-linux-gcc-glibc/src/string.c:31) printf("x0=%x x1=%x\n",x0,x1);
at }
main(padding struct copy of representation bytes.c:9)
GCC -4.8-O0:
x0=0 x1=0
at <file-scope>(<unknown>)
GCC -4.9-O0: . . . as above
Error: UB-CER4
GCC -4.8-O2: . . . as above
Description:
GCC -4.9-O2: . . . as above
Dereferencing a pointer past the end of an array.
GCC -5.3-O2: . . . as above
Type:
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
Undefined behavior.
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
See also: C11 sec. 6.5.6:8, J.2:1
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
item 47
at memset(/var/lib/jenkins/jobs/c-semantics-rv 52 www.cl.cam.ac.uk/ pes20/cerberus/
~
notes50-survey-discussion.html

131 2016/3/17
CLANG 33-O0: . . . as above at
CLANG 34-O0: . . . as above main(cheri 08 last word.c:8)
CLANG 35-O0: . . . as above at
CLANG 36-O0: . . . as above <file-scope>(<unknown>)
CLANG 37-O0: . . . as above Error: UB-CCV11
CLANG 33-O2: . . . as above Description:
CLANG 34-O2: . . . as above Conversion to a pointer type with a stricter alignment
CLANG 35-O2: . . . as above requirement (possibly undefined).
CLANG 36-O2: . . . as above Type: Undefined
CLANG 37-O2: . . . as above behavior.
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above See also: C11 sec. 6.3.2.3:7, J.2:1 item 25
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above at main(cheri 08 last word.c:10)
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above at
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above <file-scope>(<unknown>)
CLANG 37-UBSAN: . . . as above Error: UB-EIO10
CLANG 37-ASAN: . . . as above Description:
TIS - INTERPRETER : Type of lvalue not compatible with the effective type of
[value] Analyzing a complete application starting at the object being accessed.
main Type: Undefined behavior.
[value] Computing initial state See
[value] Initial also: C11 sec. 6.5:7, J.2:1 item 37
state computed at
cheri 08 last word.c:10:[kernel] warning: main(cheri 08 last word.c:10)
out of bounds read. assert \valid read((uint32 t at
*)(cp+4)); <file-scope>(<unknown>)
stack: main Error: UB-EIO7
[value] Description:
Stopping at nth alarm Reading outside the bounds of an object.
[value] user error: Degeneration Type: Undefined
occurred: behavior.
results are not correct See also: C11 sec. 6.3.2.1:1, J.2:1 item 19
for lines of code that can be reached from the
degeneration point. at main(cheri 08 last word.c:10)
KCC : at
x0=0 x1=0 <file-scope>(<unknown>)
Error: UB-CCV11 ISO : undefined behaviour
Description: Conversion to a pointer type with a
stricter alignment requirement (possibly
undefined).
Type: Undefined behavior.
See also: C11 sec.
6.3.2.3:7, J.2:1 item 25
at
3.3.12 Q70. Does concurrent access to two
main(cheri 08 last word.c:8)
(non-bitfield) distinct members of a structure
at
constitute a data race?
<file-scope>(<unknown>)
Error: UB-EIO10
Description: ISO : no DEFACTO - USAGE: no DEFACTO - IMPL: no
Type of lvalue not compatible with the effective type of CERBERUS - DEFACTO : no CHERI : no TIS: no concur-
the object being accessed. rency support
Type: Undefined behavior. This is part of the C11 concurrency model.
See It puts an upper bound on the “wide writes” that a com-
also: C11 sec. 6.5:7, J.2:1 item 37 piler might do for a struct member write: they cannot overlap
any other members.

132 2016/3/17
3.3.13 Q71. Does concurrent access to a structure Several major systems software projects, including the
member and a padding byte of that structure Linux Kernel, the FreeBSD Kernel, and PostgreSQL (though
constitute a data race? not Apache) disable type-based alias analyis with the
U: ISO U: DEFACTO -fno-strict-aliasing compiler flag [53]. Our de facto
ISO : unclear DEFACTO - USAGE: unclear DEFACTO - standard semantics should either simply follow that or have
IMPL: unclear CERBERUS - DEFACTO : unclear CHERI : a corresponding switch; for the moment we go for the for-
unclear TIS: no concurrency support mer.
It is hard to imagine that this will matter for any reason- Standard “6.5p6 The effective type of an object for an
able code, but any semantics will have to decide one way or access to its stored value is the declared type of the object, if
the other, and it will impact the design of race detectors that any.87) If a value is stored into an object having no declared
aim to be complete. type through an lvalue having a type that is not a character
type, then the type of the lvalue becomes the effective type of
3.3.14 Q72. Does concurrent (read or write) access to
the object for that access and for subsequent accesses that
an unspecified value constitute a data race?
do not modify the stored value. If a value is copied into an
U: ISO U: DEFACTO object having no declared type using memcpy or memmove,
ISO : unclear DEFACTO - USAGE: unclear DEFACTO - or is copied as an array of character type, then the effective
IMPL: unclear CERBERUS - DEFACTO : unclear CHERI : type of the modified object for that access and for subsequent
unclear TIS: no concurrency support accesses that do not modify the value is the effective type of
One might conceivably want to allow this, to allow con- the object from which the value is copied, if it has one. For
current accesses to adjacent members of a struct to write un- all other accesses to an object having no declared type, the
specified values to padding without creating a bogus data effective type of the object is simply the type of the lvalue
race. It could be restricted to just padding bytes, but it is used for the access.
simpler to allow races on all unspecified-value accesses. 6.5p7 An object shall have its stored value accessed
(Note that you don’t see those accesses in a naive source only by an lvalue expression that has one of the following
semantics, but in a semantics in which writes to a member types:88)
also write unspecified values to the adjacent padding on both
• a type compatible with the effective type of the object,
sides, it matters, and in Core and the memory model those
writes have to be there.) • a qualified version of a type compatible with the effective
type of the object,
4. Effective Types • a type that is the signed or unsigned type corresponding
Paragraphs 6.5p{6,7} of the standard introduce effective to the effective type of the object,
types. These were added to C in C99 to permit compil- • a type that is the signed or unsigned type corresponding
ers to do optimisations driven by type-based alias analy- to a qualified version of the effective type of the object,
sis, by ruling out programs involving unannotated aliasing • an aggregate or union type that includes one of the afore-
of references to different types (regarding them as having mentioned types among its members (including, recur-
undefined behaviour). This is one of the less clear, less sively, a member of a subaggregate or contained union),
well-understood, and more controversial aspects of the stan- or
dard, as one can see from various GCC and Linux Kernel
• a character type.
mailing list threads5354 55 and blog postings5657585960 . The
type-based aliasing question of our preliminary survey was Footnote 87) Allocated objects have no declared type.
the only one which received a unanimous response: “don’t Footnote 88) The intent of this list is to specify those cir-
know”. cumstances in which an object may or may not be aliased.”
53 https://gcc.gnu.org/ml/gcc/2010-01/msg00013.html
As Footnote 87 says, allocated objects (from malloc,
54 https://lkml.org/lkml/2003/2/26/158
calloc, and presumably any fresh space from realloc)
55 http://www.mail-archive.com/linux-btrfs@vger.kernel.
have no declared type, whereas objects with static, thread,
org/msg01647.html or automatic storage durations have some declared type.
56 http://blog.regehr.org/archives/959
For the latter, 6.5p{6,7} say that the effective types are
57 http://cellperformance.beyond3d.com/articles/2006/06/
fixed and that their values can only be accessed by an lvalue
understanding-strict-aliasing.html
58 http://davmac.wordpress.com/2010/02/26/c99-revisited/ that is similar (“compatible”, modulo signedness and qual-
59 http://dbp-consulting.com/tutorials/StrictAliasing. ifiers), an aggregate or union containing such a type, or (to
html access its representation) a character type.
60 http://stackoverflow.com/questions/2958633/ For the former, the effective type is determined by the
gcc-strict-aliasing-and-horror-stories type of the last write, or, if that is done by a memcpy,

133 2016/3/17
memmove, or user-code char array copy, the effective type f: *p1 = 1077936128
of the source. i=1077936128 *p1=1077936128 *p2=3.000000
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
4.1 Basic effective types
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
4.1.1 Q73. Can one do type punning between arbitrary CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
types? CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-UBSAN: . . . as above

ISO : no DEFACTO - USAGE: yes, with CLANG 37-ASAN: . . . as above

-fno-strict-aliasting DEFACTO - IMPL: yes, TIS - INTERPRETER :

with -fno-strict-aliasting CERBERUS - DEFACTO : [value] Analyzing a complete application starting at


? CHERI: ? TIS: yes KCC: no (flags effective-type main
UB) [value] Computing initial state
[value] Initial
E XAMPLE (effective_type_1.c):
state computed
#include <stdio.h>
#include <inttypes.h>
#include <assert.h> f: *p1 = 1077936128
void f(uint32_t *p1, float *p2) {
*p1 = 2;
*p2 = 3.0; // does this have defined behaviour?
printf("f: *p1 = %" PRIu32 "\n",*p1); i=1077936128
} *p1=1077936128 *p2=3.000000
int main() {
assert(sizeof(uint32_t)==sizeof(float));
uint32_t i = 1; [value] done for function
uint32_t *p1 = &i; main
float *p2; KCC :
p2 = (float *)p1;
f(p1, p2); Execution failed (configuration dumped)
printf("i=%" PRIu32 " *p1=%" PRIu32 Error: UB-EIO10
" *p2=%f\n",i,*p1,*p2); Description: Type of lvalue not compatible with the
}
effective type of the object being accessed.
GCC -4.8-O0: Type:
f: *p1 = 1077936128 Undefined behavior.
i=1077936128 *p1=1077936128 *p2=3.000000 See also: C11 sec. 6.5:7, J.2:1 item
GCC -4.9-O0: . . . as above 37
GCC -4.8-O2: at f(effective type 1.c:6)
f: *p1 = 2 at
i=1077936128 *p1=1077936128 *p2=3.000000 main(effective type 1.c:15)
GCC -4.9-O2: . . . as above at
GCC -5.3-O2: . . . as above <file-scope>(<unknown>)
GCC -4.8-O2- NO - STRICT- ALIASING : DEFACTO : defined behaviour iff -no-strict-aliasing, with
f: *p1 = 1077936128 implementation-defined value for the first three prints
i=1077936128 *p1=1077936128 *p2=3.000000 ISO : undefined behaviour
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above With -fstrict-aliasing (the default for GCC here),
CLANG 33-O0: . . . as above GCC assumes in the body of f that the write to *p2 can-
CLANG 34-O0: . . . as above not affect the value of *p1, printing 2 (instead of the integer
CLANG 35-O0: . . . as above value of the representation of 3.0 that would the most recent
CLANG 36-O0: . . . as above write in a concrete semantics):
CLANG 37-O0: . . . as above
gcc-4.8 -O2 -fstrict-aliasing -std=c11 -pedantic -Wall
CLANG 33-O2: -Wextra -pthread effective_types_13.c && ./a.out
f: *p1 = 2 f: *p1 = 2
i=1077936128 *p1=1077936128 *p2=3.000000 i=1077936128 *p1=1077936128 *p2=3.000000
CLANG 34-O2: . . . as above while with -fno-strict-aliasing (as used in the Linux
CLANG 35-O2: . . . as above kernel, among other places) it does not assume that:
CLANG 36-O2: . . . as above
gcc-4.8 -O2 -fno-strict-aliasing -std=c11 -pedantic -Wall
CLANG 37-O2: . . . as above -Wextra -pthread effective_types_13.c && ./a.out
CLANG 33-O2- NO - STRICT- ALIASING : f: *p1 = 1077936128

134 2016/3/17
i=1077936128 *p1=1077936128 *p2=3.000000 [value] Initial
state computed
The former behaviour can be explained by regarding the
effective type 10.c:8:[value] warning:
program as having undefined behaviour, due to the write of
argument (int)y has type int but format indicates
the uint32 t i with a float* lvalue.
unsigned int
We give another basic effective type example below, here
[value] warning: Continuing analysis
just involving integer types and without the function call.
because this seems innocuous
E XAMPLE (effective_type_10.c):
x=1144201745
#include <stdio.h>
#include <stdint.h> y=0x2211
int main() {
int32_t x;
[value] done for function main
uint16_t y;
x = 0x44332211; KCC :
y = *(uint16_t *)&x; // defined behaviour? x=1144201745 y=0x2211
printf("x=%i y=0x%x\n",x,y);
Error: UB-EIO10
}
Description: Type of lvalue not compatible with the
GCC -4.8-O0: effective type of the object being accessed.
x=1144201745 y=0x2211 Type:
GCC -4.9-O0: . . . as above Undefined behavior.
GCC -4.8-O2: See also: C11 sec. 6.5:7, J.2:1 item
effective type 10.c: In function ’main’: 37
effective type 10.c:7:3: warning: dereferencing at main(effective type 10.c:7)
type-punned pointer will break strict-aliasing rules at
[-Wstrict-aliasing] <file-scope>(<unknown>)
y = *(uint16 t *)&x; // defined Error: UB-STDIO1
behaviour? Description:
^ ’printf’: Mismatch between the type expected by the
x=1144201745 y=0x2211 conversion specifier %x and the type of the
GCC -4.9-O2: . . . as above argument.
GCC -5.3-O2: . . . as above Type: Undefined behavior.
GCC -4.8-O2- NO - STRICT- ALIASING : See also: C11 sec.
x=1144201745 y=0x2211 7.21.6.1:9, J.2:1 item 153
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above at
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above printf(effective type 10.c:8)
CLANG 33-O0: . . . as above at
CLANG 34-O0: . . . as above main(effective type 10.c:8)
CLANG 35-O0: . . . as above at
CLANG 36-O0: . . . as above <file-scope>(<unknown>)
CLANG 37-O0: . . . as above ISO : undefined behaviour
CLANG 33-O2: . . . as above
CLANG 34-O2: . . . as above
CLANG 35-O2: . . . as above 4.1.2 Q74. Can one do type punning between distinct
CLANG 36-O2: . . . as above but isomorphic structure types?
CLANG 37-O2: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above ISO : no DEFACTO - USAGE: yes, with
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above -fno-strict-aliasting DEFACTO - IMPL: yes,
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above with -fno-strict-aliasting CERBERUS - DEFACTO :
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above ? CHERI: ? TIS: yes KCC: yes (contrary to ISO)
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above Similar compiler behaviour occurs with pointers to two
CLANG 37-UBSAN: . . . as above distinct but isomorphic structure types:
CLANG 37-ASAN: . . . as above
E XAMPLE (effective_type_2.c):
TIS - INTERPRETER :
[value] Analyzing a complete application starting at #include <stdio.h>
typedef struct { int i1; } st1;
main typedef struct { int i2; } st2;
[value] Computing initial state void f(st1* s1p, st2* s2p) {

135 2016/3/17
s1p->i1 = 2; [value] done for function main
s2p->i2 = 3; KCC :
printf("f: s1p->i1 = %i\n",s1p->i1);
} f: s1p->i1 = 3
int main() { s.i1=3 s1p->i1=3 s2p->i2=3
st1 s = {.i1 = 1}; DEFACTO : defined behaviour iff -no-strict-aliasing
st1 * s1p = &s;
st2 * s2p; ISO : undefined behaviour
s2p = (st2*)s1p;
f(s1p, s2p); // defined behaviour?
printf("s.i1=%i s1p->i1=%i s2p->i2=%i\n", gcc-4.8 -O2 -fstrict-aliasing -std=c11 -pedantic -Wall
s.i1,s1p->i1,s2p->i2); -Wextra -pthread effective_types_12.c && ./a.out
} f: s1p->i1 = 2
s.i1=3 s1p->i1=3 s2p->i2=3

GCC -4.8-O0: gcc-4.8 -O2 -fno-strict-aliasing -std=c11 -pedantic -Wall


f: s1p->i1 = 3 -Wextra -pthread effective_types_12.c && ./a.out
f: s1p->i1 = 3
s.i1=3 s1p->i1=3 s2p->i2=3
s.i1=3 s1p->i1=3 s2p->i2=3
GCC -4.9-O0: . . . as above
GCC -4.8-O2: 4.2 Effective types and character arrays
f: s1p->i1 = 2
4.2.1 Q75. Can an unsigned character array with
s.i1=3 s1p->i1=3 s2p->i2=3
static or automatic storage duration be used (in
GCC -4.9-O2: . . . as above the same way as a malloc’d region) to hold
GCC -5.3-O2: . . . as above values of other types?
GCC -4.8-O2- NO - STRICT- ALIASING :
f: s1p->i1 = 3 D: ISO - VS - DEFACTO
s.i1=3 s1p->i1=3 s2p->i2=3 ISO : no DEFACTO - USAGE: yes DEFACTO - IMPL: no
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (w.r.t. compiler respondents) CERBERUS - DEFACTO: yes
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above (for -fno-strict-aliasing) CHERI: yes TIS: yes?
CLANG 33-O0: . . . as above test not supported – fails to find stdalign.h KCC: no
CLANG 34-O0: . . . as above (flags alignment and effective type errors – though the
CLANG 35-O0: . . . as above
Alignas makes the former incorrect)
CLANG 36-O0: . . . as above
[Question 11/15 of our What is C in practice? (Cerberus
CLANG 37-O0: . . . as above
survey v2)61 relates to this.]
CLANG 33-O2: . . . as above
A literal reading of the effective type rules prevents the
CLANG 34-O2: . . . as above
use of an unsigned character array as a buffer to hold values
CLANG 35-O2: . . . as above
of other types (as if it were an allocated region of storage).
CLANG 36-O2: . . . as above
For example, the following has undefined behaviour due to
CLANG 37-O2: . . . as above
a violation of 6.5p7 at the access to *fp62 .
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above E XAMPLE (effective_type_3.c):
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
#include <stdio.h>
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above #include <stdalign.h>
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above int main() {
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
_Alignas(float) unsigned char c[sizeof(float)];
float *fp = (float *)c;
CLANG 37-UBSAN: . . . as above *fp=1.0; // does this have defined behaviour?
CLANG 37-ASAN: . . . as above printf("*fp=%f\n",*fp);
TIS - INTERPRETER :
}
[value] Analyzing a complete application starting at GCC -4.8-O0:
main *fp=1.000000
[value] Computing initial state GCC -4.9-O0: . . . as above
[value] Initial GCC -4.8-O2: . . . as above
state computed GCC -4.9-O2: . . . as above
GCC -5.3-O2: . . . as above
f: s1p->i1 = 3 GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
61 www.cl.cam.ac.uk/ pes20/cerberus/
~
notes50-survey-discussion.html
s.i1=3 s1p->i1=3 62 This reasoning presumes that the conversion of the (float *)c cast
s2p->i2=3 gives a usable result — the conversion is permitted by 6.3.2.3p7 but the
standard text only guarantees a roundtrip property.

136 2016/3/17
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above at
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above <file-scope>(<unknown>)
CLANG 33-O0: . . . as above Error: UB-EIO10
CLANG 34-O0: . . . as above Description:
CLANG 35-O0: . . . as above Type of lvalue not compatible with the effective type of
CLANG 36-O0: . . . as above the object being accessed.
CLANG 37-O0: . . . as above Type: Undefined behavior.
CLANG 33-O2: . . . as above See
CLANG 34-O2: . . . as above also: C11 sec. 6.5:7, J.2:1 item 37
CLANG 35-O2: . . . as above at
CLANG 36-O2: . . . as above main(effective type 3.c:6)
CLANG 37-O2: . . . as above at
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above <file-scope>(<unknown>)
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above Error: UB-EIO10
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above Description:
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above Type of lvalue not compatible with the effective type of
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above the object being accessed.
CLANG 37-UBSAN: . . . as above Type: Undefined behavior.
CLANG 37-ASAN: . . . as above See
TIS - INTERPRETER : also: C11 sec. 6.5:7, J.2:1 item 37
[kernel] user error: failed to run: cc -C -E -isystem at
/home/john/tis-interpreter/tis-interpreter/share/frama-c main(effective type 3.c:7)
/libc -dD -DTIS INTERPRETER -dD -D FRAMAC -nostdinc at <file-scope>(<unknown>)
-D FC MACHDEP X86 64 -I/home/john/tis-interpreter/tis-i DEFACTO : defined behaviour iff -no-strict-aliasing
nterpreter/share/frama-c/libc -o ISO : undefined behaviour
’/tmp/effective type 3.c24c5f4.i’ ’effective type 3.c’
In the de facto semantics we imagine this should be allowed.
you may set the CPP environment Even bytewise copying of a value via such a buffer leads
variable to select the proper preprocessor command or to unusable results in the standard:
use the option "-cpp-command".
[kernel] user error: E XAMPLE (effective_type_4.c):
stopping on file "effective type 3.c" that has #include <stdio.h>
#include <stdlib.h>
#include <string.h>
errors. Add ’-kernel-msg-key pp’ for #include <stdalign.h>
preprocessing command. int main() {
[kernel] Frama-C aborted: invalid _Alignas(float) unsigned char c[sizeof(float)];
// c has effective type char array
user input. float f=1.0;
effective type 3.c:2:22: fatal error: stdalign.h: No memcpy((void*)c, (const void*)(&f), sizeof(float));
such file or directory // c still has effective type char array
float *fp = (float *) malloc(sizeof(float));
#include <stdalign.h> // the malloc’d region initially has no effective type
memcpy((void*)fp, (const void*)c, sizeof(float));
^ // does the following have defined behaviour?
// (the ISO text says the malloc’d region has effective type
compilation terminated. // unsigned char array, not float, and hence that
KCC : // the following read has undefined behaviour)
*fp=1.0000000000000000E0 float g = *fp;
printf("g=%f\n",g);
Error: UB-CCV11 }
Description: Conversion to a pointer type with a
stricter alignment requirement (possibly SOURCES MISMATCHGCC -4.8-O0:
undefined). g=1.000000
Type: Undefined behavior. GCC -4.9-O0: . . . as above

See also: C11 sec. GCC -4.8-O2: . . . as above

6.3.2.3:7, J.2:1 item 25 GCC -4.9-O2: . . . as above

at GCC -5.3-O2: . . . as above

main(effective type 3.c:5) GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above


GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above

137 2016/3/17
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above ten into allocated regions. Suppose we write a single member
CLANG 33-O0: . . . as above of a structure into a fresh allocated region, then does
CLANG 34-O0: . . . as above
(i) the footprint of the member take on an effective type as
CLANG 35-O0: . . . as above
the type of that struct member, or
CLANG 36-O0: . . . as above
CLANG 37-O0: . . . as above (ii) the footprint of the member take on an effective type of
CLANG 33-O2: . . . as above the type of that structure member annotated as coming
CLANG 34-O2: . . . as above from that member of that structure type, or
CLANG 35-O2: . . . as above (iii) the footprint of the whole structure take on the structure
CLANG 36-O2: . . . as above type as its effective type?
CLANG 37-O2: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above 4.3.1 Q76. After writing a structure to a malloc’d
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
region, can its members can be accessed via
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
pointers of the individual member types?
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-UBSAN: . . . as above ISO : yes DEFACTO - USAGE: yes DEFACTO - IMPL: yes
CLANG 37-ASAN: . . . as above CERBERUS - DEFACTO : yes CHERI : yes TIS : yes KCC :
TIS - INTERPRETER : yes
[kernel] user error: failed to run: cc -C -E -isystem This is uncontroversial.
/home/john/tis-interpreter/tis-interpreter/share/frama-c
E XAMPLE (effective_type_5.c):
/libc -dD -DTIS INTERPRETER -dD -D FRAMAC -nostdinc
#include <stdio.h>
-D FC MACHDEP X86 64 -I/home/john/tis-interpreter/tis-i
#include <stdlib.h>
nterpreter/share/frama-c/libc -o #include <stddef.h>
’/tmp/effective type 4.cd3bc8b.i’ ’effective type 4.c’ #include <assert.h>
typedef struct { char c1; float f1; } st1;
int main() {
you may set the CPP environment void *p = malloc(sizeof(st1)); assert (p != NULL);
variable to select the proper preprocessor command or st1 s1 = { .c1=’A’, .f1=1.0};
*((st1 *)p) = s1;
use the option "-cpp-command".
float *pf = &(((st1 *)p)->f1);
[kernel] user error: // is this free of undefined behaviour?
stopping on file "effective type 4.c" that has float f = *pf;
printf("f=%f\n",f);
}
errors. Add ’-kernel-msg-key pp’ for
preprocessing command. GCC -4.8-O0:

[kernel] Frama-C aborted: invalid f=1.000000


user input. GCC -4.9-O0: . . . as above
effective type 4.c:4:22: fatal error: stdalign.h: No GCC -4.8-O2: . . . as above
such file or directory GCC -4.9-O2: . . . as above

#include <stdalign.h> GCC -5.3-O2: . . . as above


GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
^ GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
compilation terminated. GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above

KCC : CLANG 33-O0: . . . as above

Execution failed (configuration dumped) CLANG 34-O0: . . . as above

DEFACTO : defined behaviour iff -no-strict-aliasing CLANG 35-O0: . . . as above

ISO : undefined behaviour CLANG 36-O0: . . . as above


CLANG 37-O0: . . . as above

This seems to be unsupportable for a systems programming CLANG 33-O2: . . . as above

language: a character array and malloc’d region should be CLANG 34-O2: . . . as above

interchangeably usable, and this too should be allowed in CLANG 35-O2: . . . as above

the de facto standard semantics. CLANG 36-O2: . . . as above


CLANG 37-O2: . . . as above
4.3 Effective types and subobjects CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
Another difficulty with the standard text relates to the treat- CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
ment of subobjects: members of structures and unions writ- CLANG 35-O2- NO - STRICT- ALIASING : . . . as above

138 2016/3/17
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above CLANG 36-O0: . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above CLANG 37-O0: . . . as above
CLANG 37-UBSAN: . . . as above CLANG 33-O2: . . . as above
CLANG 37-ASAN: . . . as above CLANG 34-O2: . . . as above
TIS - INTERPRETER : CLANG 35-O2: . . . as above
[value] Analyzing a complete application starting at CLANG 36-O2: . . . as above
main CLANG 37-O2: . . . as above
[value] Computing initial state CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
[value] Initial CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
state computed CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
effective type 5.c:7:[value] allocating CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
variable malloc main l7 CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-UBSAN: . . . as above
f=1.000000 CLANG 37-ASAN:
f=-0.372549
[value] done for TIS - INTERPRETER :
function main [value] Analyzing a complete application starting at
KCC : main
f=1.0000000000000000E0 [value] Computing initial state
DEFACTO : defined behaviour [value] Initial
ISO : defined behaviour state computed
effective type 6.c:6:[value] allocating
variable malloc main l6
4.3.2 Q77. Can a non-character value be read from an effective type 6.c:9:[kernel]
uninitialised malloc’d region? warning: accessing uninitialized left-value: assert
D: ISO - VS - DEFACTO \initialized(&f);
ISO : no DEFACTO - USAGE: yes (for stack: main
-fno-strict-aliasing) DEFACTO - IMPL: yes (for [value]
-fno-strict-aliasing) CERBERUS - DEFACTO : Stopping at nth alarm
yes (for -fno-strict-aliasing) CHERI : yes (for [value] user error: Degeneration
-fno-strict-aliasing) TIS: no KCC: no (looks like occurred:
you can read but not print – flags UB Indeterminate value results are not correct
used in an expression) for lines of code that can be reached from the
degeneration point.
E XAMPLE (effective_type_6.c):
KCC :
#include <stdio.h> f=0.0000000000000000E-1
#include <stdlib.h>
#include <stddef.h> Error: UB-CEE2
#include <assert.h> Description: Indeterminate value used in an
int main() {
expression.
void *p = malloc(sizeof(float)); assert (p != NULL);
// is this free of undefined behaviour? Type: Undefined behavior.
float f = *((float *)p); See also: C11 sec.
printf("f=%f\n",f); 6.2.4, 6.7.9, 6.8, J.2:1 item 11
}
at
GCC -4.8-O0: main(effective type 6.c:8)
f=0.000000 at <file-scope>(<unknown>)
GCC -4.9-O0: . . . as above DEFACTO : defined behaviour iff -no-strict-aliasing,
GCC -4.8-O2: . . . as above reading an unspecified value
GCC -4.9-O2: . . . as above ISO : undefined behaviour
GCC -5.3-O2: . . . as above
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above The effective type rules seem to deem this undefined be-
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above haviour.
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O0: . . . as above
CLANG 34-O0: . . . as above
CLANG 35-O0: . . . as above

139 2016/3/17
4.3.3 Q78. After writing one member of a structure to effective type 7.c:7:[value] allocating
a malloc’d region, can its other members be variable malloc main l7
read? effective type 7.c:11:[kernel]

D: ISO - VS - DEFACTO warning: accessing uninitialized left-value: assert


ISO : no DEFACTO - USAGE: yes (for \initialized(&f);
-fno-strict-aliasing) DEFACTO - IMPL: yes (for stack: main
-fno-strict-aliasing) CERBERUS - DEFACTO : [value]
yes (for -fno-strict-aliasing) CHERI : yes (for Stopping at nth alarm
[value] user error: Degeneration
-fno-strict-aliasing) TIS: no (similarly?) KCC:
no (flags UB Indeterminate value used in an expression) occurred:
results are not correct
E XAMPLE (effective_type_7.c): for lines of code that can be reached from the
#include <stdio.h> degeneration point.
#include <stdlib.h> KCC :
#include <stddef.h>
f=0.0000000000000000E-1
#include <assert.h>
typedef struct { char c1; float f1; } st1; Error: UB-CEE2
int main() { Description: Indeterminate value used in an
void *p = malloc(sizeof(st1)); assert (p != NULL);
expression.
((st1 *)p)->c1 = ’A’;
// is this free of undefined behaviour? Type: Undefined behavior.
float f = ((st1 *)p)->f1; See also: C11 sec.
printf("f=%f\n",f);
6.2.4, 6.7.9, 6.8, J.2:1 item 11
}
at
GCC -4.8-O0: main(effective type 7.c:10)
f=0.000000 at
GCC -4.9-O0: . . . as above <file-scope>(<unknown>)
GCC -4.8-O2: . . . as above DEFACTO : defined behaviour iff -no-strict-aliasing
GCC -4.9-O2: . . . as above ISO : undefined behaviour
GCC -5.3-O2: . . . as above
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above If the write should be considered as affecting the effective
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above type of the footprint of the entire structure, then it would
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above change the answer to effective_type_5.c here. It seems
CLANG 33-O0: . . . as above unlikely but not impossible that such an interpretation is
CLANG 34-O0: . . . as above desirable.
CLANG 35-O0: . . . as above
CLANG 36-O0: . . . as above 4.3.4 Q79. After writing one member of a structure to
CLANG 37-O0: . . . as above a malloc’d region, can a member of another
CLANG 33-O2: . . . as above structure, with footprint overlapping that of the
CLANG 34-O2: . . . as above first structure, be written?
CLANG 35-O2: . . . as above U: ISO D: ISO - VS - DEFACTO
CLANG 36-O2: . . . as above ISO : unclear DEFACTO - USAGE: yes (for
CLANG 37-O2: . . . as above -fno-strict-aliasing) DEFACTO - IMPL: yes (for
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above -fno-strict-aliasing) CERBERUS - DEFACTO: yes
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above (for -fno-strict-aliasing) CHERI : yes TIS: yes
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above KCC : yes
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
E XAMPLE (effective_type_8.c):
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-UBSAN: . . . as above #include <stdio.h>
#include <stdlib.h>
CLANG 37-ASAN: #include <stddef.h>
f=-0.372549 #include <assert.h>
TIS - INTERPRETER : typedef struct { char c1; float f1; } st1;
typedef struct { char c2; float f2; } st2;
[value] Analyzing a complete application starting at int main() {
main assert(sizeof(st1)==sizeof(st2));
[value] Computing initial state assert(offsetof(st1,c1)==offsetof(st2,c2));
assert(offsetof(st1,f1)==offsetof(st2,f2));
[value] Initial void *p = malloc(sizeof(st1)); assert (p != NULL);
state computed ((st1 *)p)->c1 = ’A’;

140 2016/3/17
// is this free of undefined behaviour? 4.3.5 Q80. After writing a structure to a malloc’d
((st2 *)p)->f2 = 1.0; region, can its members be accessed via a pointer
printf("((st2 *)p)->f2=%f\n",((st2 *)p)->f2);
} to a different structure type that has the same
leaf member type at the same offset?
D: ISO - VS - DEFACTO
ISO : no DEFACTO - USAGE: yes (for
-fno-strict-aliasing) DEFACTO - IMPL: yes (for
-fno-strict-aliasing) CERBERUS - DEFACTO :
yes (for -fno-strict-aliasing) CHERI : yes iff
GCC -4.8-O0:
-fno-strict-aliasing) TIS: yes KCC : yes
((st2 *)p)->f2=1.000000
GCC -4.9-O0: . . . as above E XAMPLE (effective_type_9.c):
GCC -4.8-O2: . . . as above
#include <stdio.h>
GCC -4.9-O2: . . . as above #include <stdlib.h>
GCC -5.3-O2: . . . as above #include <stddef.h>
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above #include <assert.h>
typedef struct { char c1; float f1; } st1;
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above typedef struct { char c2; float f2; } st2;
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above int main() {
CLANG 33-O0: . . . as above
assert(sizeof(st1)==sizeof(st2));
assert(offsetof(st1,c1)==offsetof(st2,c2));
CLANG 34-O0: . . . as above assert(offsetof(st1,f1)==offsetof(st2,f2));
CLANG 35-O0: . . . as above void *p = malloc(sizeof(st1)); assert (p != NULL);
CLANG 36-O0: . . . as above
st1 s1 = { .c1=’A’, .f1=1.0};
*((st1 *)p) = s1;
CLANG 37-O0: . . . as above // is this free of undefined behaviour?
CLANG 33-O2: . . . as above float f = ((st2 *)p)->f2;
CLANG 34-O2: . . . as above
printf("f=%f\n",f);
}
CLANG 35-O2: . . . as above
CLANG 36-O2: . . . as above GCC -4.8-O0:
CLANG 37-O2: . . . as above f=1.000000
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above GCC -4.9-O0: . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above GCC -4.8-O2: . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above GCC -4.9-O2: . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above GCC -5.3-O2: . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-UBSAN: . . . as above GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-ASAN: . . . as above GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
TIS - INTERPRETER : CLANG 33-O0: . . . as above
[value] Analyzing a complete application starting at CLANG 34-O0: . . . as above
main CLANG 35-O0: . . . as above
[value] Computing initial state CLANG 36-O0: . . . as above
[value] Initial CLANG 37-O0: . . . as above
state computed CLANG 33-O2: . . . as above
effective type 8.c:11:[value] allocating CLANG 34-O2: . . . as above
variable malloc main l11 CLANG 35-O2: . . . as above
CLANG 36-O2: . . . as above
((st2 CLANG 37-O2: . . . as above
*)p)->f2=1.000000 CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
[value] done for function main CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
KCC : CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
((st2 *)p)->f2=1.0000000000000000E0 CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
DEFACTO : defined behaviour CLANG 37-UBSAN: . . . as above
ISO : unclear CLANG 37-ASAN: . . . as above
TIS - INTERPRETER :
Again this is exploring the effective type of the footprint of [value] Analyzing a complete application starting at
the structure type used to form the lvalue. main

141 2016/3/17
[value] Computing initial state GCC -4.8-O0:
[value] Initial 0
state computed GCC -4.9-O0: . . . as above
effective type 9.c:11:[value] allocating GCC -4.8-O2:
variable malloc main l11 1
GCC -4.9-O2: . . . as above
f=1.000000 GCC -5.3-O2: . . . as above
GCC -4.8-O2- NO - STRICT- ALIASING :
[value] done for 0
function main GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
KCC : GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
f=1.0000000000000000E0 CLANG 33-O0: . . . as above
DEFACTO : defined behaviour iff -no-strict-aliasing CLANG 34-O0: . . . as above
ISO : undefined behaviour CLANG 35-O0: . . . as above
CLANG 36-O0: . . . as above
The standard seems to deem this undefined behaviour. CLANG 37-O0: . . . as above
CLANG 33-O2: . . . as above
4.3.6 Q81. Can one access two objects, within a CLANG 34-O2:
malloc’d region, that have overlapping but 1
non-identical footprint? CLANG 35-O2: . . . as above
U: ISO D: ISO - VS - DEFACTO CLANG 36-O2: . . . as above
ISO : unclear - no? DEFACTO - USAGE: yes (for CLANG 37-O2: . . . as above

-fno-strict-aliasing) DEFACTO - IMPL: yes (for CLANG 33-O2- NO - STRICT- ALIASING :

-fno-strict-aliasing; no without) CERBERUS - 0


DEFACTO : yes (for -fno-strict-aliasing) CHERI : CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
yes iff -fno-strict-aliasing) TIS: yes KCC: yes CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
Robbert Krebbers asks on the GCC list63 whether “GCC CLANG 36-O2- NO - STRICT- ALIASING : . . . as above

uses 6.5.16.1p3 of the C11 standard as a license to perform CLANG 37-O2- NO - STRICT- ALIASING : . . . as above

certain optimizations. If so, could anyone provide me an CLANG 37-UBSAN: . . . as above

example program. In particular, I am interested about the CLANG 37-ASAN: . . . as above

“then the overlap shall be exact” part of 6.5.16.1p3: “If TIS - INTERPRETER :

the value being stored in an object is read from another [value] Analyzing a complete application starting at
object that overlaps in any way the storage of the first object, main
then the overlap shall be exact and the two objects shall [value] Computing initial state
have qualified or unqualified versions of a compatible type; [value] Initial
otherwise, the behavior is undefined.” ”. Richard Biener state computed
replies with this example (rewritten here to print the result), krebbers biener 1.c:13:[value] allocating
saying that it will be optimised to print 1 and that this is variable malloc main l13
basically effective-type reasoning.
0
E XAMPLE (krebbers_biener_1.c):
#include <stdlib.h> [value] done for function
#include <assert.h> main
#include <stdio.h>
struct X { int i; int j; }; KCC :
int foo (struct X *p, struct X *q) { 0
// does this have defined behaviour? ISO : unclear
q->j = 1;
p->i = 0;
return q->j;
}
int main() {
assert(sizeof(struct X) == 2 * sizeof(int));
unsigned char *p = malloc(3 * sizeof(int));
printf("%i\n", foo ((struct X*)(p + sizeof(int)),
(struct X*)p));
}

63 https://gcc.gnu.org/ml/gcc/2015-03/msg00083.html

142 2016/3/17
5. Other Questions CLANG 34-O0: . . . as above
5.1 Q82. Given a const-qualified pointer to an object CLANG 35-O0: . . . as above
CLANG 36-O0: . . . as above
defined with a non-const-qualified type, can the
CLANG 37-O0: . . . as above
pointer be cast to a non-const-qualified pointer and
CLANG 33-O2: . . . as above
used to mutate the object?
CLANG 34-O2: . . . as above
CLANG 35-O2: . . . as above
ISO : yes DEFACTO - USAGE: yes DEFACTO - IMPL: yes CLANG 36-O2: . . . as above
CERBERUS - DEFACTO : yes CHERI : no TIS : yes KCC : CLANG 37-O2: . . . as above
yes CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
This is the Deconst idiom from the CHERI ASPLOS CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
paper, where they write: “Deconst refers to programs that CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
remove the const qualifier from a pointer. This will break CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
with any implementation that enforces the const at run time. CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
6.7.3.4 states: If an attempt is made to modify an object de- CLANG 37-UBSAN: . . . as above
fined with a const-qualified type through use of an lvalue CLANG 37-ASAN: . . . as above
with nonconst-qualified type, the behavior is undefined. This TIS - INTERPRETER :
means that such removal is permitted unless the object iden- [value] Analyzing a complete application starting at
tified by the pointer is declared const, but this guarantee is main
very hard to make statically and the removal can violate pro- [value] Computing initial state
grammer intent. We would like to be able to make a const [value] Initial
pointer a guarantee that nothing that receives the pointer state computed
may write to the resulting memory. This allows const point-
ers to be passed across security-domain boundaries.” x=1 *p=1 *q=1
The current standard text is 6.7.3p6 “If an attempt is
made to modify an object defined with a const-qualified type [value] done for
through use of an lvalue with non-const-qualified type, the function main
behavior is undefined. If an attempt is made to refer to an ob- KCC :
ject defined with a volatile-qualified type through use of an x=1 *p=1 *q=1
lvalue with non-volatile-qualified type, the behavior is un- DEFACTO : defined behaviour
defined.133)” and, in Appendix L, “All undefined behavior ISO : defined behaviour
shall be limited to bounded undefined behavior, except for
the following which are permitted to result in critical unde-
fined behavior: [...] An attempt is made to modify an object
defined with a const-qualified type through use of an lvalue 5.2 Q83. Can char and unsigned char be assumed to
with non-const-qualified type (6.7.3).” be 8-bit bytes?
E XAMPLE (cheri_01_deconst.c):
#include <stdio.h> ISO : no DEFACTO - USAGE: yes DEFACTO - IMPL: yes
int main() { CERBERUS - DEFACTO : yes?
int x=0;
const int *p = (const int *)&x;
//are the next two lines free of undefined behaviour? 5.3 Q84. Can one assume two’s-complement
int *q = (int*)p; arithmetic?
*q = 1;
printf("x=%i *p=%i *q=%i\n",x,*p,*q);
} ISO : no DEFACTO - USAGE: yes DEFACTO - IMPL: yes
GCC -4.8-O0:
CERBERUS - DEFACTO : yes?
x=1 *p=1 *q=1
GCC -4.9-O0: . . . as above 5.4 Q85. In the absence of floating point, can one
GCC -4.8-O2: . . . as above
assume that no base types have multiple
GCC -4.9-O2: . . . as above
representations of the same value?
GCC -5.3-O2: . . . as above U: DEFACTO
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above ISO : no DEFACTO - USAGE: yes? DEFACTO - IMPL:
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above yes? (or perhaps pointer values?) CERBERUS - DEFACTO:
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above yes?
CLANG 33-O0: . . . as above This is not necessarily true for CHERI pointers, at least.

143 2016/3/17
Where there are multiple representations, one has to con- • The CompCert Memory Model, Version 2; Leroy, Appel,
sider the extent to which the representation bytes are stable. Blazy, Stewart; INRIA RR-7987 2012 [33], §6.6
• Formal C semantics: CompCert and the C standard;
6. Related Work Krebbers, Leroy, and Wiedijk; ITP 2014 [32], §6.7
In this section we discuss some of the related work in a • A Precise and Abstract Memory Model for C using Sym-
moderately in-depth way. For work that involves a model, bolic Values, Besson, Blazy, and Wilke; APLAS 2014 [9],
a verification tool, or an implementation of much of C, §6.8
a fully detailed comparison would involve going through • A Concrete Memory Model for CompCert; Besson, Blazy,
each of our earlier questions one by one, considering both Wilke; ITP 2015 [10], §6.9
the intended semantics and any observable results for the
test cases. This would require an extended discussion with • A formal C memory model supporting integer-pointer
the authors of each work, which at the time of writing we casts; Kang, Hur, Mansky, Garbuzov, Zdancewic,
only just embarked on, though we do have experimental Vafeiadis; PLDI 2015 [25], §6.10
data and have had some limited discussion (including survey Work by Krebbers and by Krebbers and Wiedijk aims at
responses) for a few systems. Instead, here we consider a semantics “corresponding to a significant part of [...] the
the related work as it is described in the literature (with C11 standard, as well as technology to enable verification of
a subsection for each paper or group of papers), focussing C programs in a standards compliant and compiler indepen-
on the motivating examples they give and checking whether dent way”:
they suggest additional questions.
• The C standard formalized in Coq; Krebbers; PhD thesis
We first consider several lines of work building memory
models for C to support mechanised formal reasoning in a 2015 [29] and also [27, 28, 30, 31], §6.11
proof assistant. We begin with the fully concrete model used Ellison et al. give another semantics for a substantial frag-
by Norrish, who aimed to make (aspects of) the ISO C90 ment of C, expressed as a rewrite system in the K framework
standard precise: rather than within an interactive prover:
• C formalised in HOL; Norrish; PhD thesis 1998 [43], • An Executable Formal Semantics of C with Applications;
§6.1 Ellison and Roşu; POPL 2012 [18], and also [21, 22],
§6.12
Tuch et al. develop a concrete model used for the seL4
verification, aiming to provide a model that is sound for the Cohen et al. describe the model used in their VCC sys-
particular C used in that work (a particular compiler and tem:
underlying architecture) rather than a model for either ISO • A precise yet efficient memory model for C; SSV 2009;
or de facto standards in general.
Cohen, Moskał, Tobies, Schulte [15], §6.13
• A unified memory model for pointers; Tuch, Klein; LPAR
A number of papers and blog posts look at undefined
2005 [50], §6.2
behaviour in C (much but not all of which concerns the
• Types, bytes, and separation logic; Tuch, Klein, Norrish; memory and pointer behaviour we focus on here) from a
POPL 2007 [48], §6.3 systems point of view, without mathematical models:
Work by several groups on verified compilation has pro- • Undefined Behavior: What Happened to My Code?;
duced a number of models. These too are not trying to ex- Wang, Chen, Cheung, Jia, Zeldovich, Kaashoek; APSys
actly capture either the ISO or the de facto standards in gen- 2012 [53] and Towards Optimization-Safe Systems: An-
eral, but rather to provide a semantics for the C-like language alyzing the Impact of Undefined Behavior. Wang, Zel-
of some particular verified compiler, that justifies or eases dovich, Kaashoek, Solar-Lezama; SOSP 13 [54], §6.14
reasoning about its compiler transformations. Most of these • Beyond the PDP-11: Architectural support for a memory-
models are abstract, based on a block-ID/offset notion; the safe C abstract machine; Chisnall et al.; ASPLOS
later work in this line aims at supporting more low-level pro- 2015 [14], §6.15
gramming idioms.
• What every C programmer should know about undefined
• Formal verification of a C-like memory model and its behavior; Lattner; Blog post 2011, §6.16
uses for verifying program transformations; Leroy and
• Proposal for a Friendly Dialect of C; Cuoq, Flatt,
Blazy; JAR 2008 [34], §6.4
Regehr; Blog post 2014, §6.17
• CompCertTSO: A Verified Compiler for Relaxed-
• UB Canaries; Regehr; Blog post 2015, §6.18
Memory Concurrency; Ševčı́k, Vafeiadis, Zappa Nardelli,
Jagannathan, Sewell; POPL 2011, JACM 2013 [51, 52] , For completeness we mention early work on sequential
§6.5 C semantics, by Gurevich and Higgens [20], Cook and Sub-

144 2016/3/17
ramanian [16], Papaspyrou [46], Bofinger [13], Black and 6.3 Types, bytes, and separation logic; Tuch, Klein,
Windley [11, 12], and Anderson [4]. Norrish; POPL 2007
On the concurrency side, Batty et al. [8] formalised the This paper [48] presents a memory model for C intended
concurrency aspects of the ISO C/C++11 standards during to support formal verification of C systems code by mecha-
the standardisation process, with the resulting mathematical nised interactive proof, following automated program-logic
models and standard prose in close correspondence; this verification condition generation (VCG) for a translation of
was later extended and related the IBM POWER hardware the C source program and its semantics into a prover (Is-
model [7, 47], and used for compiler testing by Morisset et abelle/HOL). The paper includes example verifications of a
al. [39]. simple list reversal and the L4 kernel memory allocator, and
Then there are very extensive literatures on static and dy- the model was used for the seL4 verification [26]. More de-
namic analysis, symbolic execution, model-checking, and tails are in Tuch’s 2008 PhD thesis [49]. The paper presents
formal verification for C, and systems-oriented work on “a formal model of memory that both captures the low-level
bug-finding tools, including tools such as Valgrind [42], the features of Cs pointers and memory, and that forms the basis
Clang sanitisers, and the Csmith tool of Yang et al. [57], for an expressive implementation of separation logic.” How-
which aims to generate programs that cover a large subset of ever, the work targets C code written for verification in mind,
C while avoiding undefined and unspecified behaviors. Yet rather than systems code found in the wild, and it targets that
another line of related work includes C-like languages that code as compiled for a specific architecture. That permits
provide additional safety guarantees, such as Cyclone [23], a number of simplifications w.r.t. general C ([26, §4.3],[48,
and tools for hardening C execution, such as Softbound [40], §3]):
and many more. We cannot begin to summarise all of these
here, but each implicitly embodies some notion of C seman- • syntactically, expressions are restricted to be largely side-
tics. effect-free; this and other restrictions make the evaluation
Our work on Cerberus began with Justus Matthiesen’s order deterministic;
undergraduate and MPhil project dissertations [35, 36].
• the C implementation-defined behaviour choices can be
fixed based on the intended compiler and machine archi-
tecture; and
6.1 C formalised in HOL; Norrish; PhD thesis 1998 • some unspecified behaviours are handled by automati-
This model [43] (the basis also for the expression determi- cally inserting guards when translating into the prover,
nacy proof of [44]), adopts an almost fully concrete model, covering “division by zero, dereferencing the null pointer,
in which memory is a map from addresses to concrete 8- and dereferencing an improperly aligned pointer”. Any
bit byte values (together with a map saying which addresses verification has to show that these hold whenever they are
have been initialised). These bit-sequences are interpreted as encountered.
values when read, including a check that “the bytes read out
of memory constitute a valid value for the given type” [43, The basic memory model is completely concrete, similar
§3.3.2]. Pointer values are allowed to point one-past any al- to that of Tuch and Klein [50]: a heap memory state is a
located address, but there is no notion of provenance. total function from addresses (word32 ) to bytes (word8 ).
Each language type has an associated Isabelle/HOL type in a
type class recording its representation functions to and from
byte sequences, a type-name tag, and size and alignment
6.2 A unified memory model for pointers; Tuch, Klein; information.
LPAR 2005 There is no allocation ID or other provenance informa-
This paper [50] aims at a “heap abstraction that allows for tion, and whether the model is sound w.r.t. the behaviour of
effective reasoning about both typed and untyped views of the specific compiler (GCC) used for seL4 for our test cases
the heap and the effects of updates on the heap”. It de- involving provenance (if indeed those are supported by their
scribes a model consisting concretely of a map from ad- translation into the prover and VCG) is unclear from the pa-
dresses to concrete bitvector values (word32) together with per.
a map from addresses to optional source-language types. The model does not support structs whose members have
The Isabelle/HOL types corresponding to those have to be their address taken or which involve padding, or local veri-
equipped with maps to and from their concrete represen- ables whose address is taken [48, §4.1].
tations and with sizes. For programs that respect this type The model also contains a history variable mapping ad-
information, a heap abstraction lets the concrete heap be dresses to optional source-language types, with proof anno-
viewed as a collection of heaps, one for each type; this tations updating this added by the verifier. Above this con-
supports formal reasoning that exploits type-based lack-of- crete model the paper builds an abstraction of multiple typed
aliasing properties. heaps and a separation logic.

145 2016/3/17
The first example is a C program with well-defined but level features of C’s pointers and memory” that this model
nondeterministic behaviour w.r.t. the ISO standard that is supports. The second is an in-place linked list reverse, for
excluded by their syntactic restrictions: lists which contain no data beyond the link pointer:
E XAMPLE (tkn-1.c): E XAMPLE (tkn-2.c):
#include <stdio.h> #include <stdio.h>
int i = 0, a[2] = {0,0}; typedef unsigned long word_t;
int f(void) {
i++; word_t reverse(word_t *i) {
return i; } word_t j = 0;
/* will print either 0 or 1 */ while (i) {
int main(void) { word_t *k = (word_t*)*i;
a[i] = f(); *i = j;
printf("%i\n",a[0]); } j = (word_t)i;
i = k;
GCC -4.8-O0:
}
1 return j;
GCC -4.9-O0: . . . as above }
GCC -4.8-O2: . . . as above int main() {
GCC -4.9-O2: . . . as above word_t a[3];
GCC -5.3-O2: . . . as above a[0] = (word_t) &a[1];
a[1] = (word_t) &a[2];
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above a[2] = (word_t) 0;
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above word_t b;
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above printf("a[0]=%lu a[1]=%lu a[2]=%lu\n",
a[0],a[1],a[2]);
CLANG 33-O0:
b = reverse(a);
0 printf("a[0]=%lu a[1]=%lu a[2]=%lu b=%lu\n",
CLANG 34-O0: . . . as above a[0],a[1],a[2],b);
}
CLANG 35-O0: . . . as above
CLANG 36-O0: . . . as above GCC -4.8-O0:
CLANG 37-O0: . . . as above a[0]=140737488349752 a[1]=140737488349760 a[2]=0
CLANG 33-O2: . . . as above a[0]=0 a[1]=140737488349744 a[2]=140737488349752
CLANG 34-O2: . . . as above b=140737488349760
CLANG 35-O2: . . . as above GCC -4.9-O0: . . . as above
CLANG 36-O2: . . . as above GCC -4.8-O2: . . . as above
CLANG 37-O2: . . . as above GCC -4.9-O2: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above GCC -5.3-O2: . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above GCC -4.8-O2- NO - STRICT- ALIASING :
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above a[0]=140737488349720 a[1]=140737488349728 a[2]=0
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above a[0]=0 a[1]=140737488349712 a[2]=140737488349720
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above b=140737488349728
CLANG 37-UBSAN: . . . as above GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-ASAN: . . . as above GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
TIS - INTERPRETER : CLANG 33-O0:
[value] Analyzing a complete application starting at a[0]=140737488349752 a[1]=140737488349760 a[2]=0
main a[0]=0 a[1]=140737488349744 a[2]=140737488349752
[value] Computing initial state b=140737488349760
[value] Initial CLANG 34-O0: . . . as above
state computed CLANG 35-O0: . . . as above
CLANG 36-O0: . . . as above
0 CLANG 37-O0: . . . as above
CLANG 33-O2: . . . as above
[value] done for function main CLANG 34-O2:
KCC : a[0]=140737488349736 a[1]=140737488349744 a[2]=0
0 a[0]=0 a[1]=140737488349728 a[2]=140737488349736
b=140737488349744
(adapted to print the result rather than return it). CLANG 35-O2: . . . as above
The second and third examples illustrate what can be CLANG 36-O2: . . . as above
verified in this system; they also illustrate the specific “low- CLANG 37-O2: . . . as above

146 2016/3/17
CLANG 33-O2- NO - STRICT- ALIASING : tmp = 0;
a[0]=140737488349720 a[1]=140737488349728 a[2]=0 break;
};
a[0]=0 a[1]=140737488349712 a[2]=140737488349720 tmp = (word_t*) *tmp;
b=140737488349728 }
CLANG 34-O2- NO - STRICT- ALIASING : if (tmp) {
*prev = (word_t) tmp;
a[0]=140737488349704 a[1]=140737488349712 a[2]=0 for (i = 0; i < (size / sizeof(word_t)); i++) {
a[0]=0 a[1]=140737488349696 a[2]=140737488349704 curr[i] = 0;
b=140737488349712 }
return curr;
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above }
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above }
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above }
return 0;
CLANG 37-UBSAN: }
a[0]=140737488349752 a[1]=140737488349760 a[2]=0
a[0]=0 a[1]=140737488349744 a[2]=140737488349752 void print_free_list(word_t* p) {
word_t* q = p;
b=140737488349760 printf("free list: ");
CLANG 37-ASAN: while (q != NULL) {
a[0]=140737488349480 a[1]=140737488349488 a[2]=0 printf("%p ",(void*)q);
q = (word_t*) *q;
a[0]=0 a[1]=140737488349472 a[2]=140737488349480 }
b=140737488349488 printf("%p\n",(void*)q);
TIS - INTERPRETER : }
[value] Analyzing a complete application starting at int main() {
main int n=10; // number of blocks
[value] Computing initial state void *r = malloc(1024*(n+1));
// crudely force r to be 1024-byte-aligned
[value] Initial if (((word_t)r & (1024-1)) != 0)
state computed r = (void*)((((word_t)r) & ~((word_t)(1024-1)))
+ (word_t)1024);
// initialise the internal next-block pointers
a[0]= int i;
for (i=0; i < n-1; i++)
a[0]=0 a[1]= *((word_t *)((word_t)r+i*1024))
= (word_t)r+(i+1)*1024;
[value] done for *(word_t *)((word_t)r+(n-1)*1024) = 0;
function main kfree_list = (word_t *)r;
KCC : // try some allocations
print_free_list(kfree_list);
Execution failed (configuration dumped) void *a, *b, *c;
a = alloc(1024); // should succeed
adapted with a typedef to capture the prose definition b = alloc(2048); // should succeed
c = alloc(65536);// should fail
of word t, though to unsigned long rather than their printf("a=%p b=%p c=%p\n",a,b,c);
unsigned int, to match the types of the 64-bit machine print_free_list(kfree_list);
used to run the example) and with the main() usage added. }
The third is an allocation function:
GCC -4.8-O0:
E XAMPLE (tkn-3.c): free list: 0x801417000 0x801417400 0x801417800
#include <stdio.h> 0x801417c00 0x801418000 0x801418400 0x801418800
#include <stdlib.h> 0x801418c00 0x801419000 0x801419400 0x0
a=0x801417000
typedef unsigned long word_t;
b=0x801417800 c=0x0
word_t* kfree_list; free list: 0x801417400 0x801418000
0x801418400 0x801418800 0x801418c00 0x801419000
void * alloc(word_t size) {
word_t *prev, *curr, *tmp; 0x801419400 0x0
word_t i; GCC -4.9-O0: . . . as above
size = size >= 1024 ? size : 1024; GCC -4.8-O2: . . . as above
for (prev = (word_t*) &kfree_list, curr = kfree_list;
curr; GCC -4.9-O2: . . . as above
prev = curr, curr = (word_t*) *curr) { GCC -5.3-O2: . . . as above
if (!((word_t) curr & (size - 1))) { GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
tmp = (word_t*) *curr;
for (i = 1; tmp && (i < size / 1024); i++) { GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
if ((word_t) tmp != ((word_t) curr + 1024*i)) { GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above

147 2016/3/17
CLANG 33-O0: . . . as above long)& malloc main l48 }}
CLANG 34-O0: . . . as above int 1024 -
CLANG 35-O0: . . . as above 1 {1023}
CLANG 36-O0: . . . as above void * r {{ (void
CLANG 37-O0: . . . as above *)& malloc main l48 }}
CLANG 33-O2: . . . as above int 1 {1}
CLANG 34-O2: . . . as above
CLANG 35-O2: . . . as above int 1024 {1024}
CLANG 36-O2: . . . as above
CLANG 37-O2: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above Stopping
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above stack:
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above main
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above [value] user error: Degeneration occurred:
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-UBSAN: results are not correct for lines of code
free list: 0x801c17000 0x801c17400 0x801c17800 that can be reached from the degeneration point.
0x801c17c00 0x801c18000 0x801c18400 0x801c18800 KCC :
0x801c18c00 0x801c19000 0x801c19400 0x0 Execution failed (configuration dumped)
a=0x801c17000
b=0x801c17800 c=0x0 This is adapted similarly with a specific choice of word t
free list: 0x801c17400 0x801c18000 and with a usage example, and also to fix an error: the paper
0x801c18400 0x801c18800 0x801c18c00 0x801c19000 has word t* prev, curr, tmp; which should be word t
0x801c19400 0x0 *prev, *curr, *tmp; (in their supplementary pdf proof
CLANG 37-ASAN: document the three declarations are separated out, so this
free list: 0x62600000c400 0x62600000c800 0x62600000cc00 seems to be a typo introduced when they typeset the code).
0x62600000d000 0x62600000d400 0x62600000d800
6.4 Formal verification of a C-like memory model and
0x62600000dc00 0x62600000e000 0x62600000e400
its uses for verifying program transformations;
0x62600000e800 0x0
Leroy and Blazy; JAR 2008
a=0x62600000c400 b=0x62600000c800
c=0x0 The early CompCert memory model, as described by Leroy
free list: 0x62600000d000 0x62600000d400 and Blazy [34], is rather abstract from our de facto standards
0x62600000d800 0x62600000dc00 0x62600000e000 point of view.
0x62600000e400 0x62600000e800 0x0 They present both an axiomatisation and a “concrete
TIS - INTERPRETER : model” [34, §4] that satisfies it. The main focus is the es-
[value] Analyzing a complete application starting at tablishment of the memory injection machinery used in the
main CompCert compiler correctness proof to relate memory con-
[value] Computing initial state tents across compilation phases. Their concrete model has a
[value] Initial memory state consisting of: a block ID counter; blocks with
state computed unique non-reused IDs; a boolean for each block ID saying
tkn-3.c:48:[value] allocating variable whether it has been deallocated; the bounds (in Z) for each
malloc main l48 block, supplied as arguments to the allocation operation;
tkn-3.c:50:[value] warning: The and a optional abstract typed value (option (memtype
following sub-expression cannot be evaluated: val)) for each block ID. The memory types (int8signed,
int8unsigned, int16signed, int16unsigned, int32,
(unsigned long)r & (unsigned long)(1024 - 1) float32, float64) have sizes and alignment restrictions
(in numbers of bytes). The values are “defined as the dis-
criminated union of 32-bit integers int(n ), 64-bit double-
All sub-expressions with precision floating-point numbers float(f ), memory loca-
their values: tions ptr(b, i ) where b is a memory block reference and
unsigned long (unsigned i a byte offset within this block, and the constant undef rep-
long)(1024 - 1) {1023} resenting an undefined value such as the value of an unini-
unsigned tialized variable”.
long (unsigned long)r {{ (unsigned In this semantics the IDs are used to give a strong prove-
nance semantics, e.g. with == pointer comparison compar-

148 2016/3/17
ing the IDs, but more concrete manipulations of pointers and The CompCertTSO back-end semantics and correctness
memory are not supported. In particular: proof also supported finite memory [52, §3.4], in which “al-
• pointer values do not contain anything corresponding to location can fail and in which pointer values in the running
machine-code implementation can be numerically equal to
the numeric address of a pointer value in a conventional C
their values in the semantics”, with the back-end allocations
implementation. They therefore cannot be meaningfully
all at concrete addresses in a single block (ID 0), but the
cast to integer types.
concrete values and representations of pointers were not ex-
• there is no support for manipulation of the representa- posed in the source language.
tion bytes of values. For the integer and floating-point
types that would need a relatively straightforward adap- 6.6 The CompCert Memory Model, Version 2; Leroy,
tation of their store function, at least given a fixed Appel, Blazy, Stewart; INRIA RR-7987 2012
implementation-defined representation. But for pointer This paper [33] describes an updated memory model for
values, because there is no address information, it would CompCert, introduced in CompCert 1.7 and refined in Com-
require more radical change. pCert 1.11. The principal changes are support for byte-level
• there is (correspondingly) no modelling of the layout and manipulations of integers and floats (while keeping pointer
padding of C struct and union types. representations abstract) and the introduction of per-byte
permissions on memory.
It is important to note that the CompCert C semantics is
This paper writes (§3.1) “The CompCert memory model
intended to be the semantics of a particular implementation
version 1 correctly models the memory behaviour of C
(that of the CompCert compiler), rather than a semantics
programs that conform to the ISO C99 standard.”, but
that captures the envelope of all behaviour permitted by any
this is not entirely correct according to our reading of the
particular version of the ISO or de facto standards; in that
ISO standards. For example, the C99 and C11 text on ef-
sense their goals are quite different from ours.
fective types [1, 3, 6.5p6] licenses copying values as ar-
6.5 CompCertTSO: A Verified Compiler for rays of character type, e.g. as in our §2.4.2 with exam-
Relaxed-Memory Concurrency; Ševčı́k, Vafeiadis, ple pointer_copy_user_dataflow_direct_bytewise.
Zappa Nardelli, Jagannathan, Sewell; POPL 2011, c, but that earlier CompCert memory model does not. In-
JACM 2013 deed, given that ISO C99 is not defined in a mathematically
rigorous way, and the absence of any proof or test-based
CompCertTSO [51, 52] is a verified compiler for a C- evaluation, the exact force of the claim is unclear.
like language with x86 TSO concurrency. The development Their §3.1 also gives two idioms which the CompCert
started with that of CompCert 1.5, and the sequential as- memory model version 1 permits which they say ISO C99
pects of the behaviour of pointers and memory are broadly does not. The first is roundtrip casts of one pointer type to
as above, but there are some interesting differences. another and back, e.g.
In the relaxed-memory TSO setting, the lifetime of an
allocation becomes a more involved concept, as an allocation E XAMPLE (compcertMMv2-1.c):
or free event may be in the local write buffer of the thread #include <stdio.h>
performing it before becoming visible to other threads. To int main() {
int x=3;
prevent this complicating the compiler correctness proof, *((int *) (float *)&x) = 4;
CompCertTSO relaxed the ISO-like restriction of pointer == printf("x=%i\n",x);
comparison to pointers to live blocks, allowing comparison }
of arbitrary pointer values. In turn, to be sound w.r.t. the GCC -4.8-O0:
behaviour of a reasonable implementation, which will often x=4
reuse memory for allocations that are separated in time, this GCC -4.9-O0: . . . as above
means the semantics had to permit both true and false for the GCC -4.8-O2: . . . as above
example below [52, §3.4], which we use in §2.16.2. GCC -4.9-O2: . . . as above
GCC -5.3-O2: . . . as above
E XAMPLE (compcertTSO-1.c):
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
#include <stdio.h>
int* f() { GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
int a; GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
return &a; } CLANG 33-O0: . . . as above
int* g() {
int a; CLANG 34-O0: . . . as above
return &a; } CLANG 35-O0: . . . as above
int main() { CLANG 36-O0: . . . as above
_Bool b = (f() == g()); // can this be true?
printf("(f()==g())=%s\n",b?"true":"false"); CLANG 37-O0: . . . as above
} CLANG 33-O2: . . . as above

149 2016/3/17
CLANG 34-O2: . . . as above CLANG 36-O0: . . . as above
CLANG 35-O2: . . . as above CLANG 37-O0: . . . as above
CLANG 36-O2: . . . as above CLANG 33-O2: . . . as above
CLANG 37-O2: . . . as above CLANG 34-O2: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above CLANG 35-O2: . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above CLANG 36-O2: . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above CLANG 37-O2: . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-UBSAN: . . . as above CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-ASAN: . . . as above CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
TIS - INTERPRETER : CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
[value] Analyzing a complete application starting at CLANG 37-UBSAN: . . . as above
main CLANG 37-ASAN: . . . as above
[value] Computing initial state TIS - INTERPRETER :
[value] Initial [value] Analyzing a complete application starting at
state computed main
[value] Computing initial state
x=4 [value] Initial
state computed
[value] done for function main
KCC : s.y=42
x=4
s.y=43
As we discuss in §2.14, this is permitted in C11, and also in
C99 [3, 6.3.2.3p7], [1, 6.3.2.3p7], for pointers to object types [value] done for
(as opposed to pointers to function types) if the intermediate function main
value is correctly aligned. In practice it seems reasonably KCC :
common for implementations to use the same representation s.y=42 s.y=43
for all pointer types; there it could be allowed in general.
The second is a consequence of the use of concrete byte-
count offsets for access within a block and of the particular
layout algorithm used, which makes the following two ex- E XAMPLE (compcertMMv2-3.c):
amples well-defined. #include <stdio.h>
union point3d {
E XAMPLE (compcertMMv2-2.c): struct { int x, y, z; } s;
#include <stdio.h> int d[3];
struct { int x, y, z; } s; };
int main() { int main() {
s.y = 41; union point3d p;
((int *) &s)[1] = 42; p.s.y = 42;
printf("s.y=%i ",s.y); int w;
*((int *) ((char *) &s + sizeof(int))) = 43; w = p.d[1];
printf("s.y=%i\n",s.y); printf("w=%i\n",w);
} }

GCC -4.8-O0: GCC -4.8-O0:


s.y=42 s.y=43 w=42
GCC -4.9-O0: . . . as above GCC -4.9-O0: . . . as above
GCC -4.8-O2: . . . as above GCC -4.8-O2: . . . as above
GCC -4.9-O2: . . . as above GCC -4.9-O2: . . . as above
GCC -5.3-O2: . . . as above GCC -5.3-O2: . . . as above
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O0: . . . as above CLANG 33-O0: . . . as above
CLANG 34-O0: . . . as above CLANG 34-O0: . . . as above
CLANG 35-O0: . . . as above CLANG 35-O0: . . . as above

150 2016/3/17
CLANG 36-O0: . . . as above GCC -5.3-O2: . . . as above
CLANG 37-O0: . . . as above GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
CLANG 33-O2: . . . as above GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
CLANG 34-O2: . . . as above GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O2: . . . as above CLANG 33-O0: . . . as above
CLANG 36-O2: . . . as above CLANG 34-O0: . . . as above
CLANG 37-O2: . . . as above CLANG 35-O0: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above CLANG 36-O0: . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above CLANG 37-O0: . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above CLANG 33-O2: . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above CLANG 34-O2: . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above CLANG 35-O2: . . . as above
CLANG 37-UBSAN: . . . as above CLANG 36-O2: . . . as above
CLANG 37-ASAN: . . . as above CLANG 37-O2: . . . as above
TIS - INTERPRETER : CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
[value] Analyzing a complete application starting at CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
main CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
[value] Computing initial state CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
[value] Initial CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
state computed CLANG 37-UBSAN: . . . as above
CLANG 37-ASAN: . . . as above
w=42 TIS - INTERPRETER :
[value] Analyzing a complete application starting at
[value] done for function main main
KCC : [value] Computing initial state
w=42 [value] Initial
state computed
The first relies (in the last assignment) on the absence of
padding between the ints in the struct, which may often be y=0x44332211
true but is certainly not guaranteed by ISO; both examples
rely on struct and array layout corresponding, and the sec- [value] done for function
ond also relies on union type punning, which we discuss in main
§2.15.4. KCC :
Their §3.2 discusses several limitations of the CompCert y=0x44332211
memory model version 1. The first three involve bytewise
access to the representations of integers and floats:
E XAMPLE (compcertMMv2-4.c): E XAMPLE (compcertMMv2-5.c):
#include <stdio.h>
#include <stdio.h>
float fabs_single(float x) {
unsigned int bswap(unsigned int x) {
union { float f; unsigned int i; } u;
union { unsigned int i; char c[4];} src, dst;
u.f = x;
int n;
u.i = u.i & 0x7FFFFFFF;
src.i=x;
return u.f;
dst.c[3]=src.c[0]; dst.c[2]=src.c[1];
}
dst.c[1]=src.c[2]; dst.c[0]=src.c[3];
int main() {
return dst.i;
float f=-1.0;
}
float g;
int main() {
g = fabs_single(f);
unsigned int x=0x11223344;
printf("g=%f\n",g);
unsigned int y;
}
y = bswap(x);
printf("y=0x%x\n",y); GCC -4.8-O0:
}
g=1.000000
GCC -4.8-O0: GCC -4.9-O0: . . . as above
y=0x44332211 GCC -4.8-O2: . . . as above
GCC -4.9-O0: . . . as above GCC -4.9-O2: . . . as above
GCC -4.8-O2: . . . as above GCC -5.3-O2: . . . as above
GCC -4.9-O2: . . . as above GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above

151 2016/3/17
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above GCC -4.8-O2: . . . as above
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above GCC -4.9-O2: . . . as above
CLANG 33-O0: . . . as above GCC -5.3-O2: . . . as above
CLANG 34-O0: . . . as above GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above
CLANG 35-O0: . . . as above GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above
CLANG 36-O0: . . . as above GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-O0: . . . as above CLANG 33-O0: . . . as above
CLANG 33-O2: . . . as above CLANG 34-O0: . . . as above
CLANG 34-O2: . . . as above CLANG 35-O0: . . . as above
CLANG 35-O2: . . . as above CLANG 36-O0: . . . as above
CLANG 36-O2: . . . as above CLANG 37-O0: . . . as above
CLANG 37-O2: . . . as above CLANG 33-O2: . . . as above
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above CLANG 34-O2: . . . as above
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above CLANG 35-O2: . . . as above
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above CLANG 36-O2: . . . as above
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above CLANG 37-O2: . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-UBSAN: . . . as above CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-ASAN: . . . as above CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
TIS - INTERPRETER : CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
[value] Analyzing a complete application starting at CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
main CLANG 37-UBSAN: . . . as above
[value] Computing initial state CLANG 37-ASAN:
[value] Initial /tmp/compcertMMv2-6-681522.o: In function ‘memcpy’:
state computed compcertMMv2-6.c:(.text+0x0): multiple definition of
‘memcpy’
g=1.000000 /usr/bin/ld: Dwarf Error: found dwarf version
’4’, this reader only handles version 2
[value] done for function information.
main /usr/local/llvm37/bin/../lib/clang/3.7.1/li
KCC : b/freebsd/libclang rt.asan-x86 64.a(asan interceptors.cc
Execution failed (configuration dumped) .o):/usr/local/poudriere/ports/brooks/devel/llvm37/work/
llvm-3.7.1.src/tools/compiler-rt/lib/asan/asan intercept
(we omit the third, which is broadly similar). The model ors.cc:(.text+0x3cda0): first defined here
proposed in this Leroy et al. 2012 paper permits these two, clang-3.7:
essentially by building particular representation choices into error: linker command failed with exit code 1 (use -v to
the load and store functions and by shifting to a memory see invocation)
state that stores bytes that can be Undef, a concrete 8-bit compcertMMv2-6.c.clang37-ASAN.out: not found
byte value, or the nth byte of an abstract pointer. TIS - INTERPRETER :
The last is a bytewise user memcpy compcertMMv2-6.c:2:[kernel] warning: def’n of func
E XAMPLE (compcertMMv2-6.c): memcpy at compcertMMv2-6.c:2 (sum 710282) conflicts with
the one at FRAMAC SHARE/libc/string.c:32 (sum 31246452);
#include <stdio.h>
void* memcpy(void *dest,const void *src,size_t n) { keeping the one at FRAMAC SHARE/libc/string.c:32.
unsigned long i; [value
for (i=0; i<n; i++) ] Analyzing a complete application starting at
((char *)dest)[i] = ((const char *) src)[i];
return dest; main
} [value] Computing initial state
int main () { [value] Initial
int x[2], y[2];
x[0] = 0; x[1] = 1; state computed
memcpy(y, x, sizeof(x));
printf("y[0]=%i y[1]=%i\n",y[0],y[1]); y[0]=0 y[1]=1
}

GCC -4.8-O0: [value] done for function


y[0]=0 y[1]=1 main
GCC -4.9-O0: . . . as above

152 2016/3/17
KCC : CLANG 33-O2: . . . as above
File: compcertMMv2-6.c CLANG 34-O2: . . . as above
Line: 13 CLANG 35-O2: . . . as above
Error: UB-TIN2 CLANG 36-O2: . . . as above
Description: Multiple external definitions for CLANG 37-O2: . . . as above
memcpy. CLANG 33-O2- NO - STRICT- ALIASING : . . . as above
Type: Undefined behavior. CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
See also: C11 sec. CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
6.9:5, J.2:1 item 84 CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
y[0]=0 y[1]=1 CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-UBSAN: . . . as above
similar to the pointer_copy_user_dataflow_direct_ CLANG 37-ASAN: . . . as above
bytewise.c example we discuss in our §2.4.2; it is not TIS - INTERPRETER :
supported by the CompCert memory model version 2. [value] Analyzing a complete application starting at
The paper also adds a fine-grained access control permis- main
sion mechanism, aimed both at the separation-logic verifica- [value] Computing initial state
tion of CompCert C programs in Appel’s Verified Software [value] Initial
Toolchain project and at supporting compiler optimisations state computed
for const globals. [value] done for function main
KCC :
6.7 Formal C semantics: CompCert and the C Execution failed (configuration dumped)
standard; Krebbers, Leroy, and Wiedijk; ITP 2014 ISO : defined behaviour
This paper [32] extends CompCert 1.12 to bring it closer
to something that could be soundly described by Krebbers’s
Formalin C semantics. It adds support (in CompCert 1.13) Krebbers et al. note “In CompCert 1.12, this program
for: has undefined behavior, for two reasons: the comparison p
• comparison with end-of-array pointers, and < end that involves an end-of-array pointer, and the byte-
wise reads of the pointer s.r”. They go on to relax those
• byte-wise pointer copy.
restrictions slightly. For comparison (at least for equality
The motivating example is a user-code memcpy imple- comparison – whether they also mean to include relational
mentation, essentially the same as our pointer_copy_ comparison is unclear), they write:
user_dataflow_direct_bytewise.c in §2.4.2: • Comparison of pointers in the same block is defined only
E XAMPLE (klw-itp14-1.c): if both are weakly valid. A pointer is weakly valid if it is
void my_memcpy(void *dest, void *src, int n) { valid or end-of-array
unsigned char *p = dest, *q = src, *end = p + n; • Comparison of pointers with different block identifiers is
while (p < end) // end may be end-of-array defined for valid pointers only.
*p++ = *q++;
} They argue that this is “more sensible than the naive reading
int main() {
struct S { short x; short *r; } s = { 10, &s.x }, s2; of the C standard because it is stable under compilation”.
my_memcpy(&s2, &s, sizeof(struct S)); We agree that this stability property would be desirable,
return *(s2.r); but the downside, that comparison becomes more partial, is
}
potentially significant. It is already somewhat partial in the
GCC -4.8-O0: ISO standard, but arguably not in important de facto stan-
GCC -4.9-O0: . . . as above dards. Whether code in practice does comparisons of point-
GCC -4.8-O2: . . . as above ers with different provenances that are not strictly within
GCC -4.9-O2: . . . as above their original allocations is unknown (we guess it is un-
GCC -5.3-O2: . . . as above common but does occur). This suggests another question for
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above §2.10, added in §2.10.2 with the following example.
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above E XAMPLE (klw-itp14-2.c):
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above
#include <stdio.h>
CLANG 33-O0: . . . as above int x=1, y=2;
CLANG 34-O0: . . . as above int main() {
CLANG 35-O0: . . . as above int *p = &x + 1;
int *q = &y;
CLANG 36-O0: . . . as above _Bool b = (p == q); // free of undefined behaviour?
CLANG 37-O0: . . . as above printf("(p==q) = %s\n", b?"true":"false");

153 2016/3/17
return 0; _Bool b = isset(status,0);
} printf("status=0x%x b=%s\n",status,b?"true":"false");
return isset(status,0); }
Turning to bytewise reads and writes of pointer values,
the CompCert 1.12 they describe [32, §3] stores “integer
GCC -4.8-O0:
and floating point values by sequences of numeric bytes, but
besson blazy wilkie Fig 1 adapted.c: In function ’main’:
pointer values and uninitialized memory by symbolic bytes”:
besson blazy wilkie Fig 1 adapted.c:8:9: warning:
Inductive memval: Type := ’status’ is used uninitialized in this function
| Undef: memval [-Wuninitialized]
| Byte: byte -> memval printf("status=0x%x\n",status);
| Pointer: block -> int -> nat -> memval
where Pointer b i n is the n’th byte of pointer with ^
block ID n and offset i, but pointer values could only be read status=0x0
or written as complete sequences. status=0x1 b=true
They extend CompCert values with a corresponding sym- GCC -4.9-O0:

bolic pointer byte constructor, Vptrfrag: block -> int besson blazy wilkie Fig 1 adapted.c: In function ’main’:
-> nat -> val (they also needed an additional memval besson blazy wilkie Fig 1 adapted.c:8:3: warning:
constructor, PointerPad, to represent the upper bytes of ’status’ is used uninitialized in this function
an in-memory representation of a Vptrfrag, determined by [-Wuninitialized]
sign-extension in the implementation). printf("status=0x%x\n",status);
These two extensions are enough to support user-defined
bytewise memcpy, but arithmetic on those byte values is ^
given undefined behaviour, so it will not support examples status=0x0
such as our pointer_copy_user_dataflow_indirect_ status=0x1 b=true
bytewise.c, §2.4.3. They remark “Reading a pointer byte GCC -4.8-O2:

from memory, adding 0 to it, and writing it back remains besson blazy wilkie Fig 1 adapted.c: In function ’main’:
undefined behavior. It would be tempting give an ad-hoc besson blazy wilkie Fig 1 adapted.c:8:9: warning:
semantics to such corner cases, but that will result in a loss ’status’ is used uninitialized in this function
of algebraic properties like associativity”. [-Wuninitialized]
printf("status=0x%x\n",status);
6.8 A Precise and Abstract Memory Model for C using
Symbolic Values, Besson, Blazy, and Wilke; ^
APLAS 2014 status=0x0
This paper [9] aims at a semantics in which reading unini- status=0x1 b=true
GCC -4.9-O2:
tialised variables and “low-level pointer operations” (by
which they mean manipulations of unused pointer bits) have besson blazy wilkie Fig 1 adapted.c: In function ’main’:
well-defined behaviour, not the undefined behaviour of the besson blazy wilkie Fig 1 adapted.c:8:3: warning:
ISO standard, without “resorting to a concrete representa- ’status’ is used uninitialized in this function
tion of pointers as machine integers”. [-Wuninitialized]
They give two motivating examples. The first [9, Fig. 1 printf("status=0x%x\n",status);
and §6.3] reads an uninitialised variable, OR’s it with 1 and
writes it (we adapt their example to split the calculation of ^
status and add the calculation of b and printfs). They status=0x0
state that this occurs in practice, in an implementation of status=0x1 b=true
GCC -5.3-O2: . . . as above
memalign, but do not explain why author of this code wants
GCC -4.8-O2- NO - STRICT- ALIASING :
to preserve some bits of an uninitialised variable.
besson blazy wilkie Fig 1 adapted.c: In function ’main’:
E XAMPLE (besson_blazy_wilkie_Fig_1_adapted.c): besson blazy wilkie Fig 1 adapted.c:8:9: warning:
#include <stdio.h> ’status’ is used uninitialized in this function
int set(int p, int flag) { [-Wuninitialized]
return p | (1 << flag); }
int isset(int p, int flag) { printf("status=0x%x\n",status);
return (p & (1 << flag)) != 0; }
int main() { ^
int status;
printf("status=0x%x\n",status); status=0x0
status = set(status,0); status=0x1 b=true

154 2016/3/17
GCC -4.9-O2- NO - STRICT- ALIASING : besson blazy wilkie Fig 1 adapted.c:8:26: warning:
besson blazy wilkie Fig 1 adapted.c: In function ’main’: variable ’status’ is uninitialized when used here
besson blazy wilkie Fig 1 adapted.c:8:3: warning: [-Wuninitialized]
’status’ is used uninitialized in this function printf("status=0x%x\n",status);
[-Wuninitialized]
printf("status=0x%x\n",status); ^
besson blazy wilkie Fig 1 ada
^ pted.c:7:13: note: initialize the variable ’status’ to
status=0x0 silence this warning
status=0x1 b=true int status;
GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above ^
CLANG 33-O0:
besson blazy wilkie Fig 1 adapted.c:8:26: warning: = 0
variable ’status’ is uninitialized when used here 1 warning generated.
[-Wuninitialized] status=0xffffea50
printf("status=0x%x\n",status); status=0xffffffff b=true
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above
^ CLANG 35-O2- NO - STRICT- ALIASING : . . . as above
besson blazy wilkie Fig 1 ada CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
pted.c:7:13: note: initialize the variable ’status’ to CLANG 37-O2- NO - STRICT- ALIASING : . . . as above
silence this warning CLANG 37-UBSAN:
int status; besson blazy wilkie Fig 1 adapted.c:8:26: warning:
^ variable ’status’ is uninitialized when used here
[-Wuninitialized]
= 0 printf("status=0x%x\n",status);
1 warning generated.
status=0x0 ^
status=0x1 b=true besson blazy wilkie Fig 1 ada
CLANG 34-O0: . . . as above pted.c:7:13: note: initialize the variable ’status’ to
CLANG 35-O0: . . . as above silence this warning
CLANG 36-O0: . . . as above int status;
CLANG 37-O0: . . . as above ^
CLANG 33-O2:
besson blazy wilkie Fig 1 adapted.c:8:26: warning: = 0
variable ’status’ is uninitialized when used here 1 warning generated.
[-Wuninitialized] status=0x0
printf("status=0x%x\n",status); status=0x1 b=true
CLANG 37-ASAN: . . . as above
^ TIS - INTERPRETER :
besson blazy wilkie Fig 1 ada [value] Analyzing a complete application starting at
pted.c:7:13: note: initialize the variable ’status’ to main
silence this warning [value] Computing initial state
int status; [value] Initial
^ state computed
besson blazy wilkie Fig 1 adapted.c:8:[ke
= 0 rnel] warning: accessing uninitialized left-value:
1 warning generated. assert \initialized(&status);
status=0xffffea78 stack:
status=0xffffffff b=true main
CLANG 34-O2: . . . as above [value] Stopping at nth alarm
CLANG 35-O2: . . . as above [value] user error:
CLANG 36-O2: . . . as above Degeneration occurred:
CLANG 37-O2: . . . as above results are
CLANG 33-O2- NO - STRICT- ALIASING : not correct for lines of code that can be reached from

155 2016/3/17
the degeneration point. ISO : unclear
KCC :
status=0x0 With respect to the ISO standard, Besson et al. write (their
status=0x1 b=true §2.1, “Access to Uninitialised Variables”):
Error: UB-CEE2
The C standard states that any read access to
Description: Indeterminate value used in an
uninitialised memory triggers undefined behaviours
expression.
[10, section 6.7.8, §10]: If an object that has automatic
Type: Undefined behavior.
storage duration is not initialised explicitly, its value is
See also: C11 sec.
indeterminate.” Here, “indeterminate” means that the
6.2.4, 6.7.9, 6.8, J.2:1 item 11
behaviour is undefined.
at
main(besson blazy wilkie Fig 1 adapted.c:8) Here their [10] refers to the C99 standard [1], but what
at they say does not seem to be exactly supported by that text.
<file-scope>(<unknown>) Appendix J.2 of C99 says that behaviour is undefined if
Error: UB-STDIO1 “The value of an object with automatic storage duration is
Description: used while it is indeterminate (6.2.4, 6.7.8, 6.8).” but (a) this
’printf’: Mismatch between the type expected by the refers only to objects with automatic storage duration, and
conversion specifier %x and the type of the (b) Appendix J is “informative” not “normative”, and it is
argument. not clear how (even for those) the listed subsections imply
Type: Undefined behavior. undefined behaviour. In any case, in C11 the standard text
See also: C11 sec. for this has been changed, and is as we describe in §3. In our
7.21.6.1:9, J.2:1 item 153 reading of ISO C11 the example has undefined behaviour by
at 6.3.2.1p2, because the address of status is not taken.
printf(besson blazy wilkie Fig 1 adapted.c:8) With respect to the de facto standards, this is essen-
at tially the question we discuss in §3.2.4, with examples such
main(besson blazy wilkie Fig 1 adapted.c:8) as unspecified_value_strictness_int.c, of whether
at various operations are strict w.r.t. unspecified values. Our
<file-scope>(<unknown>) choice for our candidate de facto semantics model is to make
Error: UB-CEE2 all operations strict, which will not permit this idiom but will
Description: permit compiler optimisations that propagate undef through
Indeterminate value used in an expression. operations.
Type: Besson et al. give a symbolic semantics that permits this
Undefined behavior. example. Their model has symbolic values (a grammar of
See also: C11 sec. 6.2.4, 6.7.9, unary, binary, conditional, and cast operations), some extra
6.8, J.2:1 item 11 alignment knowledge, and symbolic byte-n-of symbolic val-
at main(besson blazy wilkie Fig 1 a ues. These are normalised to a concrete value when read-
dapted.c:9) ing/writing or making a control-flow choice. It has been ex-
at <file-scope>(<unknown>) ercised on the Doug Lea allocator, NaCl crypto, and Com-
Error: pCert benchmarks.
UB-STDIO1 Note that “symbolic” is used in two senses: these are
Description: ’printf’: Mismatch between the symbolic identifiers that are eventually resolved to concrete
type expected by the conversion specifier %x and the values, not to be confused with the symbolic undef which is
type of the argument. a distinguished single constructor of a value type, (roughly)
Type: Undefined behavior. as used in the LLVM implementation, in the CompCert
See memory models, and in our de facto standard model.
also: C11 sec. 7.21.6.1:9, J.2:1 item 153 Note also that a semantics that nondeterministically picks
at a value at each read of an undefined value would also permit
printf(besson blazy wilkie Fig 1 adapted.c:11) the above motivating example.
at Their second motivating example uses the low-order bits
main(besson blazy wilkie Fig 1 adapted.c:11) of a pointer (to store a hash of the pointer as a hardening
at technique, apparently based on a technique in Doug Lea’s
<file-scope>(<unknown>) allocator):
DEFACTO : defined behaviour (printing nondeterministic
values)
E XAMPLE (besson_blazy_wilkie_Fig_2.c):
#include <inttypes.h>

156 2016/3/17
#include <stdlib.h> proof (from CompCert C to Cminor) can be adapted to the
char hash(void *ptr); new model.
char hash(void *ptr) {
char h=0; As additional motivation, they mention the CompCert
unsigned int i; treatment of bitfields, which are translated away (to bitwise
for (i=0;i<sizeof(ptr);i++) operations) by a non-verified elaboration pass before the
h = h ^ *((char *)ptr+i);
return h; } formally verified front-end. Together with the strictness of
int main(){ arithmetic and logical operations w.r.t. the CompCert sym-
int *p = (int *) malloc(sizeof(int)); bolic undef, this means that for structs containing multiple
*p = 0;
int *q = (int *) ((uintptr_t) p | (hash(p) & 0xF)); bitfields that the translation represents in the same back-end
int *r = (int *) (((uintptr_t) q >> 4) << 4); word, one cannot set one bitfield at a time.
return *r; } Their example Fig. 1(a) (adapted below to print the re-
(They assume that pointers are 4-byte values and malloc sult rather than return it) refines an earlier question about
returns a 16-byte aligned value.) This is essentially just like whether unspecified-value-ness is a per-leaf-value property,
our earlier pointer-bitmask examples, e.g. provenance_ a per-byte property, or a per-bit property; we include this in
tag_bits_via_uintptr_t_1.c, in §2.2.4. §3.2.8.
Their §6.2 notes that system calls such as mmap return -1
E XAMPLE (besson_blazy_wilke_bitfields_1u.c):
on error, and so one must be able to compare pointers against
-1. We add a question and test for this in §2.10.3. #include <stdio.h>
struct f {
E XAMPLE (besson_blazy_wilke_6.2.c): unsigned int a0 : 1; unsigned int a1 : 1;
} bf ;
#include <stdlib.h> int main() {
int main() { unsigned int a;
void *p = malloc(sizeof(int)); bf.a1 = 1;
_Bool b = (p == (void*)-1); // defined behaviour? a = bf.a1;
} printf("a=%u\n",a);
}
In §6.4 they give another example that contains a poten-
The example above has been adapted in another way from
tially inter-allocation pointer relational comparison, from a the original version [10, Fig. 1(a)]: the latter had bitfields
memmove implementation found in practice: a0 and a1 declared as simple ints. In C11 (6.7.2p5, and
void* memmove(void *s1, const void *s2, size_t n) { similarly in C99), for bitfields it is implementation-defined
char * dest = (char *) s1;
const char * src = (const char *) s2; whether int designates signed int or unsigned int.
if ( dest <= src ) For the int version (besson_blazy_wilke_bitfields_
while ( n-- ) { *dest++ = *src++; } 1.c), GCC-4.8 -O2 warns of overflow in implicit
else {
src += n; dest += n; constant conversion, as one might expect when storing
while ( n-- ) { *--dest = *--src; } the value 1 in a signed bitfield of size 1, and that conversion
} results in a print of -1. We avoid this complexity, which is
return s1;
} not relevant for this example, by using unsigned int bit-
fields.
This seems to be an issue for their semantics because there
is no way to resolve the conditional control-flow choice. Their discussion of unspecified values implicitly assumes
They write “In other words, a program whose control- that they should be stable, as they write in their §3.1: “For
flow depends on the memory layout has an undefined be- instance, consider two uninitialised char variables x and y.
haviour. This dependance on the memory layout (e.g. on the Expressions x-x and x-y both construct the symbolic ex-
memory allocator) is a portability bug that is detected by pression undef-undef, which does not normalise. However
our semantics.”. As we discuss in §2.11.1 with pointer_ we would like x-x to normalise to 0, since whatever the
comparison_rel_1_global.c, such comparisons are un- value stored in memory for x, say v, the result of v-v should
defined behaviour w.r.t. ISO but should be allowed in many always be 0.”. This is at odds with our understanding of the
de facto semantics; we believe that for real OS code the de facto standards and experimental observations in §3.2.3,
above has to be permitted and that it is not really a porta- where we see unstable uninitialised values in Clang. It may
bility bug. or may not be sound w.r.t. the CompCert optimisations, but
other compilers may optimise usages of an undef value to
6.9 A Concrete Memory Model for CompCert; uses of values that happen to be left in registers.
Besson, Blazy, Wilke; ITP 2015 Inter-block pointer relational comparison is not sup-
This paper [10] shows that the model of [9] described in the ported, which is also at odds with our de facto standards un-
preceding subsection (§6.8) is an abstraction of the Com- derstanding, as we discuss in the previous subsection. They
pCert model, and that the CompCert front-end correctness write: “The normalisation of e w.r.t. a memory m returns a

157 2016/3/17
value v if and only if the side-effect free expression e evalu- known to g() in the code on the left, and hence that constant
ates to v for every concrete mapping cm : block → B32 of propagation can soundly convert it to the middle code. This
blocks to concrete 32 bits addresses which are compatible is not sound in a fully concrete model, as g() might happen
with the block-based memory m”. to write to whatever address the semantics chooses for a.
They identified a glitch w.r.t. pointer wraparound in Com- They would also like to permit the removal of the now-
pCert: in the semantics used for all phases of the verification, unused allocation of a to give the code on the right. This
successive incrementing of a pointer to an allocated region optimisation is also not sound in general in a fully concrete
will never produce something that compares equal to NULL, model: in a finite-memory semantics g() might attempt to
while in the final implementation (compiled in a non-verified allocate enough memory to exactly exhaust the available
way from the CompCert assembly with semantic values) to memory of the right-hand code, giving a non-error behaviour
machine code, it will. They write that this was fixed in the that the middle code cannot match.
CompCert trunk by making the comparison of a pointer with Their approach (their quasi-concrete model) adapts the
NULL defined behaviour only if the pointer is either within or abstract block-ID/offset model: blocks are created as ab-
one-past its allocation. This is tighter than the ISO seman- stract, and only when/if a pointer to a block is first cast to
tics and our understanding of the de facto semantics, both of an integer is a concrete address chosen and associated to the
which allow such comparisons freely. block. A memory state is a map from block IDs to blocks.
Their model supports finite memory and an allocation op- A block (v, p, n, c) has a boolean flag v indicating whether
eration that can fail. CompCertTSO [51, 52] also had those it is valid or had been freed, a natural-number size n, an
properties (though it is not discussed in [10]). Referring to n-tuple of values c, and a p that is either a concrete int32
a fully concrete memory model in which allocations return address or an undef indicating that the block is still abstract
non-deterministic currently-fresh pointers, they write (§2.3) (this should not be confused with the C unspecified values or
“However, this model lacks an essential property of Com- LLVM undef). Values are a disjoint union of concrete int32
pCert’s semantics: determinism. For instance, with a fully values and block-ID/offset pairs.
concrete memory model, allocating a memory chunk returns This justifies the above optimisations: because the ad-
a non-deterministic pointer – one of the many that does dress of a is not cast to an integer type before the call to
not overlap with an already allocated chunk. In CompCert, g(), the block is still abstract. Hence, pointers to it cannot
the allocation returns a block that is computed in a deter- be forged within g(), justifying the first optimisation, and
ministic fashion. Determinism is instrumental for the sim- it has not yet consumed any of the finite-memory address
ulation proofs of the compiler passes and its absence is a space, justifying the second.
show stopper.” The CompCertTSO development shows that Our candidate de facto model will treat this rather dif-
allocation nondeterminism can be accommodated in such a ferently. As discussed in §2.2.3, while we permit meaning-
proof: its memory model was not fully concrete, but it did ful casts between pointers and integers, we associate prove-
have nondeterministic allocation. The proof separated out nance information with integer values, and one cannot nor-
the threadwise semantics of each thread from its interactions mally forge a valid pointer from an arbitrary integer. That
with memory, thus keeping the former deterministic and al- should make the first optimisation sound, but we need to
lowing relatively straightforward adaptations of many Com- consider two “abnormal” cases.
pCert compiler-phase proofs [52, §4.2–4.4]. First, there is access to device memory via concrete
addresses (see §2.7). This is simple: the implementation-
6.10 A formal C memory model supporting defined range of device-memory addresses we propose,
integer-pointer casts; Kang, Hur, Mansky, guaranteed to be disjoint from normal C-accessible mem-
Garbuzov, Zdancewic, Vafeiadis; PLDI 2015 ory, means the compiler can still soundly assume a lack of
This paper [25] is also focussed on C compiler verification: aliasing, with other accesses via concrete unprovenanced ad-
it aims to support casts between pointers and integers and dresses giving rise to undefined behaviour.
arithmetic over them, in the way a fully concrete model does, Second, there are accesses via pointers read in from IO
while simultaneously making a range of compiler optimisa- (see §2.6). For IO is done in a controlled fashion, e.g. with
tions sound and verifiable, in the way that the abstract block- the scanf %p, we previously proposed tagging such pointer
ID/offset models do. values with a wildcard provenance, indicating that they
Their motivating example is: might alias with any other pointer. That would disallow the
int f(void) { int f(void) { int f(void) {
first optimisation above (g() could read in a concrete ad-
int a = 0; int a = 0; dress that happened to be equal to that of a and then use it to
g(); → g(); → g(); mutate a). But one could refine the proposal in the spirit of
return a; return 0; return 0;
} } }
the Kang et al. paper, but for IO-escape rather than cast-to-
integer escape: dynamically marking block IDs (aka prove-
where g is an unknown external function, for which they nances) which might have escaped to the outside world, and
would like the compiler to be able to deduce that a is not

158 2016/3/17
letting the wildcard-provenance pointers produced by input if (memcmp(&b, &a, sizeof(b)) == 0) {
alias only with those. This tagging would not have to be at a = (a - b) + (2 * b - b);
int *q = (int *) a;
pointer-to-integer cast time; it could be as late as the actual *q = 123; // does this have undefined behaviour?
IO. For IO done in an uncontrolled bytewise fashion, one printf("*((int*)b=%d *q=%d\n",*((int*)b),*q);
could do something similar: for output, dynamically mark- }
return 0;
ing block IDs for which any value tagged with that prove- }
nance might have escaped, and for use of pointer values ob-
tained from bytewise input (which is essentially the same as GCC -4.8-O0:

casts from arbitrary unprovenanced integers) treating them Addresses: b=0x600BF0 a=0x60102C
as having that wildcard provenance. What mainstream com- GCC -4.9-O0: . . . as above (modulo addresses)

pilers currently do in these cases is an interesting question. GCC -4.8-O2: . . . as above (modulo addresses)

Our candidate de facto model, as currently envisaged, GCC -4.9-O2: . . . as above (modulo addresses)
will not licence the second optimisation in general: it is a GCC -5.3-O2: . . . as above
finite-memory model which will nondeterministically allo- GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
cate memory from the finite address space at each alloca- dresses)
tion site and free it at each block kill. Kang et al. are essen- GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
tially arranging for the memory they wish to optimise away dresses)
to be in a separate and unbounded region (following Com- GCC -5.3-O2- NO - STRICT- ALIASING : . . . as above

pCertTSO in this, as they say). They argue that this will still CLANG 33-O0: . . . as above (modulo addresses)

permit common optimisation cases, but from a mainstream CLANG 34-O0: . . . as above

compiler point of view it seems more likely that compilers CLANG 35-O0: . . . as above (modulo addresses)

will do such optimisations whether or not they are sound in CLANG 36-O0: . . . as above (modulo addresses)

the strong sense implied by the example (they may remove CLANG 37-O0: . . . as above (modulo addresses)

allocations even if their addresses are taken and concretely CLANG 33-O2: . . . as above (modulo addresses)

manipulated), and that the real challenge is to understand CLANG 34-O2: . . . as above

some more subtle sense in which they are sound. CLANG 35-O2: . . . as above (modulo addresses)

Their §3.2 has an interesting argument against models in CLANG 36-O2: . . . as above

which (in our terms) integer values derived from pointers CLANG 37-O2: . . . as above (modulo addresses)

carry provenance information. They write that this prevents CLANG 33-O2- NO - STRICT- ALIASING : . . . as above

the optimisation below: CLANG 34-O2- NO - STRICT- ALIASING : . . . as above


CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
a = (a - b) + (2 * b - b);
q = (ptr) a; → q = (ptr) a; dresses)
*q = 123; *q = 123; CLANG 36-O2- NO - STRICT- ALIASING : . . . as above
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
“Suppose the variable b contains an integer with permission
to access some valid block l, and a contains an integer dresses)
CLANG 37-UBSAN: . . . as above (modulo addresses)
without any permission that is equal to the concrete address
CLANG 37-ASAN: . . . as above (modulo addresses)
of the block l. Then the source program successfully stores
TIS - INTERPRETER :
123 into the block l because q has the relevant permission,
whereas the target program fails because q does not have [value] Analyzing a complete application starting at
the permission.” main
To make a concrete test case, we need to construct such [value] Computing initial state
a numerically correct but unprovenanced a value program- [value] Initial
matically. This is difficult, especially if one wishes to avoid state computed
the questions of provenance for IO mentioned above, so we
simply use a constant value appropriate to one particular im- Addresses: b=0x
plementation and platform. khmgzv-1.c:10:[kernel]
warning: out of bounds read. assert \valid read((char
E XAMPLE (khmgzv-1.c): *)(&b)+(0 .. sizeof(b)-1));
#include <stdio.h> stack:
#include <string.h> memcmp :: khmgzv-1.c:10 <- main
#include <inttypes.h>
int x=0; [value] Stopping at nth
int main() { alarm
uintptr_t b = (uintptr_t) &x; [value] user error: Degeneration occurred:
uintptr_t a = 0x60102C;
printf("Addresses: b=0x%" PRIXPTR " a=0x%" PRIXPTR
"\n",b,a); results are not correct for lines of code

159 2016/3/17
that can be reached from the degeneration point. [value] Analyzing a complete application starting at
KCC : main
Execution failed (configuration dumped) [value] Computing initial state
[value] Initial
[TODO: RE-EXAMINE THIS (depending on the integer state computed
provenance semantics, we might forbid the original be-
haviour)] (taking the result provenance of binary arithmetic Addresses: b=0x
operations of a provenanced and unprovenanced argument khmgzv-2.c:10:[kernel]
to be that of the former) but will forbid the following: warning: out of bounds read. assert \valid read((char
*)(&b)+(0 .. sizeof(b)-1));
E XAMPLE (khmgzv-2.c): stack:
#include <stdio.h> memcmp :: khmgzv-2.c:10 <- main
#include <string.h> [value] Stopping at nth
#include <inttypes.h>
int x=0; alarm
int main() { [value] user error: Degeneration occurred:
uintptr_t b = (uintptr_t) &x;
uintptr_t a = 0x60102C;
printf("Addresses: b=0x%" PRIXPTR " a=0x%" PRIXPTR results are not correct for lines of code
"\n",b,a); that can be reached from the degeneration point.
if (memcmp(&b, &a, sizeof(b)) == 0) { KCC :
int *q = (int *) a;
*q = 123; // does this have undefined behaviour? Execution failed (configuration dumped)
printf("*((int*)b=%d *q=%d\n",*((int*)b),*q);
}
return 0;
} In §3.5 Kang et al. point out that in an abstract block-
ID/offset model, with integer values a disjoint union of hon-
GCC -4.8-O0: est integers and abstract pointer values, arithmetic optimisa-
Addresses: b=0x600BD0 a=0x60102C tions on integers must be limited to exclude examples such
GCC -4.9-O0: . . . as above (modulo addresses) as this:
GCC -4.8-O2: . . . as above (modulo addresses)
t = a + b;
GCC -4.9-O2: . . . as above (modulo addresses)
d1 = a + (b - c1); → d1 = t - c1;
GCC -5.3-O2: . . . as above d2 = a + (b - c2); d2 = t - c2;
GCC -4.8-O2- NO - STRICT- ALIASING : . . . as above (modulo ad-
dresses) where a and b happen to be abstract pointer values (or, in
GCC -4.9-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- our candidate de facto model, integer values with nonempty
dresses) provenance), as the result of the addition on the right will
. . . as above
GCC -5.3-O2- NO - STRICT- ALIASING :
give rise to undefined behaviour (or, in our model, per-
CLANG 33-O0: . . . as above (modulo addresses) haps an unprovenanced value – leaving aside the multiple-
CLANG 34-O0: . . . as above
provenance possibility of §2.3). This also holds for our can-
CLANG 35-O0: . . . as above (modulo addresses)
didate model, but (as above) it is more of an issue for com-
CLANG 36-O0: . . . as above
piler verification, using the same model across such optimi-
CLANG 37-O0: . . . as above (modulo addresses)
sation phases, than for a source-language definition.
CLANG 33-O2: . . . as above (modulo addresses)
CLANG 34-O2: . . . as above
6.11 The C standard formalized in Coq; Krebbers;
CLANG 35-O2: . . . as above (modulo addresses)
PhD thesis 2015
CLANG 36-O2: . . . as above Krebbers, partly in collaboration with Wiedijk, has devel-
CLANG 37-O2: . . . as above (modulo addresses) oped a semantics in Coq for a substantial fragment of C,
CLANG 33-O2- NO - STRICT- ALIASING : . . . as above in their CH2 O project [27–31]. We discuss the version pre-
CLANG 34-O2- NO - STRICT- ALIASING : . . . as above sented in Krebbers’ 2015 PhD thesis [29]. Starting with an
CLANG 35-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- abstract-syntax representation of C produced by the FrontC
dresses) parser (based on the version used by CompCert version 2.2,
CLANG 36-O2- NO - STRICT- ALIASING : . . . as above in turn based on the CIL FrontC parser [41]), this work is
CLANG 37-O2- NO - STRICT- ALIASING : . . . as above (modulo ad- based on a translation into CH2 O Core C, which is equipped
dresses) with a type system, an operational semantics, an executable
CLANG 37-UBSAN: . . . as above (modulo addresses) version of that, an axiomatic semantics for reasoning about
CLANG 37-ASAN: . . . as above (modulo addresses) programs, and machinery for refinements, with metatheory
TIS - INTERPRETER : proved in Coq relating these.

160 2016/3/17
Krebbers writes: “The goal of the CH2 O project is to (s.a + 3) - s.a; // OK, same array objects
develop a formal version of the non-concurrent fragment of
Pointer addition has undefined behaviour when it goes more
the C11 standard that is usable in proof assistants.” [29, p.5]
than one-past the (presumably sub)object [29, p.103]. We
and “It makes the standard utterly precise.” [29, p.4], but the add a question for the different-subobject-array case in
reality is more nuanced: for the aspects of C that it covers, §2.13.5.
CH2 O is more like a maximally strict interpretation of the Pointer casts give undefined behaviour if they “break
ISO C11 standard, as discussed in [29, Ch.2]: “CH2 O errs
dynamic typing”, e.g. [29, p.103]:
on the side of caution: it makes certain behaviors undefined
int x;
that some people deem defined according to the standard”. It (short*)(void*)&x;
aims thereby to be sound w.r.t. any compiler that conforms to // Undefined, int* cast to short*
the ISO standard, but at the cost of excluding some programs (int*)((unsigned char*)&x + 1); // Undefined, ill-aligned
that others would deem legitimate; it is not attempting to This seems stricter than ISO; see our §2.14 discussion.
reflect the de facto standards. There is support for bytewise manipulation of the repre-
The memory model is basically an abstract one, in terms sentation bytes of C values, with symbolic “bit i of pointer
of abstract object identifiers rather than numerical address- value p” values, presumably permitting pointer values to be
esd. These identifiers correspond to the provenances sug- copied bytewise but not supporting arithmetic on them.
gested by DR #260, as we discuss in §2.1. However, if (as The treatment of type punning and unions [29, §2.5.6
we imagine) it follows Krebbers, Leroy, and Wiedijk [32] Type-punning] seems to aim at the GCC interpretation,
(discussed in §6.7) in making pointer equality comparison c.f. our discussion in §2.15.4. They make the following
defined only for “valid” pointers, not one-past pointers, some example disallowed [29, p.28], following that GCC text,
of our examples there will have undefined behaviour in this though a literal reading of the ISO text might suggest oth-
semantics. erwise.
Casting pointers to integer types and back (see our §2.2) short g(int *p, short *q) {
is not supported: “The CH2 O semantics uses an abstract short z = *q; *p = 10; return z;
memory model with symbolic pointer values and therefore }
fails to account for pointer to integer casts. Casting a pointer union int_or_short { int x; short y; } u = { .y = 3 };
int *p = &u.x;
to an integer, and vice versa, has undefined behavior in // p points to the x variant of u
the CH2 O semantics.” [29, 2.6.2 Integer representations of short *q = &u.y; // q points to the y variant of u
pointers]. return g(p, q); // g is called with aliased pointers p,q
It differs from earlier abstract memory models in associ- Their pointer values include a bit saying whether they can be
ating a tree-structured object (corresponding to the C data used for type punning; see [29, p.66,80,81]:
type structure) rather than a vector of bytes with each object union U { int x; short y; } u = { .x = 3 };
ID, and pointer values therefore include paths through those short *p = &u.y; // a frozen version of the pointer
trees rather than offsets within such vectors. // &u.y is stored
printf("%d", *p); // type-punning via a frozen pointer
They aim throughout at a semantics that takes effective // -> undefined
types into account (and also using this tree structure for that),
while our candidate de facto model aims at C compiled with and
-fno-strict-aliasing. union U { int x; short y; } u = { .x = 3 };
printf("%d", u.y);
Pointer manipulation (relational comparison, subtraction,
and addition) appears to be permitted only within the same At the end of an object lifetime, they make all pointers to
leaf subobject (or one-past for the latter two). For example, that object indeterminate [29, §2.5.7 Indeterminate memory
[29, p.102]: and pointers, p.30]; see our discussion in §2.16.1. They also
struct S { int a[3]; int b[3]; } s1, s2; assert that “using an indeterminate pointer in pointer arith-
s1.a == s2.b; metic and pointer comparisons also yields undefined behav-
// OK, neither of the two pointers is end-of-array ior”, but the justification of that w.r.t. the ISO text is not
s1.a == s1.b+3; // OK, same object
s1.a == s2.b+3; // Undefined, different objects, clear to us. The [29, p.30] example:
// s2.b+3 end-of-array int *p = malloc(sizeof(int)); assert (p != NULL);
s1.a <= s1.b; free(p);
// OK, <= into the same object int *q = malloc(sizeof(int)); assert (q != NULL);
s1.a <= s2.a; if (p == q) {//undefined, p indeterminate due to the free
// Undefined, <= with different objects *q = 10;
*p = 14;
and [29, p.66]: printf("%d\n", *q);//p and q alias, expected to print 14
struct S { int a[3]; int b[3]; } s; }
s.a - s.b;
// Undefined, different array objects seems intended to justify it, but that could be explained
(s.a + 3) - s.b; // Undefined, different array objects in other ways, e.g. by giving a nondeterministic result to

161 2016/3/17
such a comparison, coupled with the manifest undefined *p = 15;
behaviour of the *p=14 in a provenance-aware semantics. printf("%d\n", x);
The C99 Rationale [2, p.49, l.22–33] does introduce the We ran CH2 O on our tests (test run
notion of an invalid pointer and says that any of use of from 2016-02-01, ch2o github checkout
it gives rise to undefined behaviour. It justifies this with 64d98faf7631252524230c859a4fc3bb4767f6e2 from
a “hypothetical segmented architecture” in which arrays Tue Nov 17 14:10:57 2015). Most tests (all except those
might be represented using mulitple segments, where pointer for around 11 questions) were not supported in this version,
comparison involves some metadata that might no longer many due to missing features in the CH2 O printf and
exist after an object has been deallocated. We would like to standard libraries.
know whether such implementations actually exist.
[29, §2.6.1 Integer representations of indeterminate mem- 6.12 An Executable Formal Semantics of C with
ory] relates to our §3.2.1 and following. Applications; Ellison and Roşu; POPL 2012
For indeterminate values, they say [29, p.104]: “Branch- This paper [18] describes a semantics for a substantial frag-
ing on an indeterminate value has undefined behavior.” See ment of C expressed in the K rewriting logic, explained
our §3.2.2. in more detail in Ellison’s 2012 PhD thesis [22] and ex-
Their “implementation environment” specifies sizes and tended by Hathhorn et al. [21]. The authors claim to give
alignments (and hence struct layout in the normal ABI “the first complete formal semantics of the C programming
way, see [29, p.138]), with explicit modelling of padding language” [22, Abstract], but again the reality is more nu-
bytes, but “In our tree based memory model we enforce anced.
that padding bytes always have an indeterminate value” [29, The memory model is described as a map [18, §4.3] from
p.27]. block IDs to blocks with a size (in bytes) and a sequence
[29, §2.5.8 End-of-array pointers] relates to our §2.1.3. of bytes of that size. In the rewriting setting those bytes
The [29, p.36] example: are not necessarily ground numbers, and pointer values are
int x = 30, y = 31; represented essentially as a pair of a block ID and a nu-
int *p = &x + 1, *q = &y; meric offset with the block, encoded e.g. as sym(B) + O
intptr_t i = (intptr_t)p, j = (intptr_t)q;
printf("%ld %ld %d\n", i, j, i == j);
where sym seems to be a fresh function symbol, B is
a block ID, and O is an offset. Pointer values are them-
(reported by them as a GCC bug, and fixed from 4.7.1 to selves represented in memory with symbolic bytes, e.g. as a
4.8) suggests another possible question we add in §2.2.5: list subObject(sym(B)+O), 0), . . . , subObject(sym(B)+
Can equality testing on integers, for integers derived from O), 3) [22, p.81] (the sym of the paper seems to correspond
pointer values, be affected by their provenance? to the loc of the thesis). This is very broadly similar to the
In [29, p.63] they suggest that reading from abstract CompCert memory model of Krebbers et al. discussed in
memory may affect its effective type information, with the §6.7. It is considerably more abstract than either the ISO or
example below in which the member of a union is left unre- de facto standards, e.g. in the fact that pointers are not asso-
solved until the read [29, p.77]. This is not clearly mandated ciated to concrete addresses, and so cannot be meaningfully
by the ISO text, by our reading thereof. But as our modelling cast to integer types.
is aiming at the -fno-strict-aliasing case, the point is Uninitialised values can be represented in memory with
moot as far as comparison goes. another function symbol, Unknown(N ) [22, p.82] (where
short g(int *p, short *q) { N is the bitwidth).
short z = *q; *p = 10; return z; Hathhorn et al. [21] extend KCC with additional machin-
}
int main() { ery for detecting undefined behaviour. The basic memory
union int_or_short { int x; short y; } u; model is as above. They “use a trap representation wherever
// initialize u with zeros, the variant of u remains the standard allows one to be used” [21, §3.4], which (as
// unspecified
for (size_t i = 0; i < sizeof(u); i++) they observe) leads to more undefinedness; it is significantly
((unsigned char*)&u)[i] = 0; different from the de facto standards. They also add a record
return g(&u.x, &u.y); of the last-stored type of memory values, for effective-type
}
checks (though see the experimental data below). Then there
In [29, p.194] the discussion of Kang et al. [25] has this is additional provenance-related metadata attached to pointer
amusing example: values:
int x = 0, *p = 0; • “the union variant a pointer or lvalue expression is based
for (uintptr_t i = 0; ; i++) {
if (i == (uintptr_t)&x) { on so we can mark the section of memory not overlapping
p = (int*)i; with the active variant as unspecified”
break;
} • “the size of an array that a pointer is based on and its
} current offset into the array in order to catch violations

162 2016/3/17
dealing with undefined pointer arithmetic and out-of- division by zero, oversized shifts, and signed integer over-
bounds pointer dereferences” flow.
• “when a pointer can be traced back to the value stored in Their fourth example involves formation of pointers that
some restrict-qualified pointer variable” are (more than one) beyond their original allocation, which
can occur in some bounds-checking code. We discuss this in
• “a pointers’s alignment”
§2.13.
KCC detected two potential alignment errors in earlier Their fifth example is one where dereferencing a null
versions of our tests. But it gave ‘Execution failed’, with no pointer was expected to cause a kernel oops, but where GCC
further details, for the tests of 20 of our questions; ‘Transla- removes a program-order-later null-pointer check based on
tion failed’ for one; segfaulted at runtime for one; and gave such dereferences being undefined behaviour. Our candidate
results contrary to our reading of the ISO standard for at de facto model follows ISO in this respect, but conceivably
least 6: it exhibited a very strict semantics for reading unini- one could strengthen the behaviour of null-pointer derefer-
tialised values (but not for padding bytes), and permitted ences to definitely trap rather than be undefined behaviour.
some tests that ISO effective types forbid. It is not clear how widely that would be feasible. We add a
question to §2.17 for this.
6.13 A precise yet efficient memory model for C; SSV Their sixth example involves integer type aliasing, with
2009; Cohen, Moskal, Tobies, Schulte a write of a uint16 t struct member followed by a read
at type int (within a Linux-kernel memcpy). This is an
Cohen et al. [15] describe a model implicit in their “Ver-
effective-type question, as we discuss in §4.1.
ifying C Compiler”. This translates annotated C code into
Their seventh example is an intentional read of unini-
BoogiePL; the verification condition generator Boogie takes
tialised memory in an attempt to produce entropy, as we dis-
BoogiePL as input, and feeds the generated verification con-
cuss in §3.1.2.
ditions into the Z3 SMT solver. The main focus of the paper
is on capturing type-based aliasing properties, though they The second of these two papers [54] describes a tool,
do not refer to the C99/C11 effective types; they relate a S TACK, to identify some instances of what they term “un-
fully concrete model to one in which memory is a “collec- stable code”: “code that is unexpectedly discarded by com-
tion of typed objects”. There is no discussion of provenance, piler optimizations due to undefined behavior in the pro-
of reading uninitialised memory, or of undefined behaviour gram”. They give six motivating examples, where an opti-
in general. mising compiler might remove the body of a conditional, in
most cases based on reasoning that it could only be executed
6.14 Undefined Behavior: What Happened to My in the presence of undefined behaviour:
Code?; Wang, Chen, Cheung, Jia, Zeldovich,
Kaashoek; APSys 2012, and Towards if (p + 100 < p)
Optimization-Safe Systems: Analyzing the {p dereferencable} if (!p)
Impact of Undefined Behavior. Wang, Zeldovich, if (x + 100 < x)
Kaashoek, Solar-Lezama; SOSP 13 {x non-negative} if (x + 100 < 0)
if (!(1 << x))
The first of these two papers [53] “investigates whether bugs
if (abs(x) < 0)
due to programmers using constructs with undefined behav-
ior happen in practice”. Similarly to our position that the de Their tool detects cases where their (solver-based) opti-
facto standards differ significantly from the ISO standard, miser optimises based on ten undefined-behaviour condi-
they write “Our results show that programmers do use un- tions, which we reproduce in Fig. 2. It found significant
defined behavior in real-world systems, including the Linux numbers of bugs in real systems code and many instances
kernel and the PostgreSQL database, and that some cases of unstable code across a snapshot of all debian packages.
result in serious bugs.” These ten conditions are (as the authors note) sufficient
The investigation consists of a collection of 7 such cases, for undefined behaviour but do not characterise it in general;
taken from PostgreSQL, the Linux kernel, and FreeBSD, they are very specific. Looking at them in more detail:
each with a code snippet, and a preliminary evaluation of the • their (1) identifies pointer addition overflow but not
combined cost of three optimisation-limiting compiler flags
the ISO-forbidden more-than-one out-of-bounds pointer
used by some of these:
arithmetic (this suggests another test, below);
-fno-strict-overflow • their (2,6) identify null pointer dereference and out-of-
-fno-delete-null-pointer-checks bounds array access but not other illegal pointer derefer-
-fno-strict-aliasing ences.
Their first three examples relate to the arithmetic unde- • their (3,4,5,7) are arithmetic issues, which are not our
fined behaviours, which are not our focus in this document: focus in this document

163 2016/3/17
Construct Sufficient condition Undefined behavior
Language (1) p+x p∞ + x∞ ∈ [0, 2n − 1] pointer overflow
(2) p p = NULL null pointer dereference
(3) xop s y x∞ op s y∞ ∈ [2n−1 , 2n−1 − 1] signed integer overflow
(4) x/y, x%y y=0 division by zero
(5) x<<y, x>>y y <0∨y ≥n oversized shift
(6) a[x] x < 0 ∨ x ≥ ARRAY SIZE(a) buffer overflow
Library (7) abs(x) x = −2n−1 absolute value overflow
(8) memcpy(dst, src, len) |dst − src| < len overlapping memory copy
(9) use q after free(p) alias(p, q) use after free
(10) use q after p := realloc(p, ...) alias(p, q) ∧ p 6= NULL use after realloc
A list of sufficient (though not necessary) conditions for undefined behavior in certain C constructs [3, §J.2]. Here p, p , q are
n-bit pointers; x, y are n-bit integers; a is an array, the capacity of which is denoted as ARRAY SIZE(a); op s refers to binary
operators +, -, *, /, % over signed integers; x∞ means to consider x as infinitely ranged; NULL is the null pointer; alias(p, q)
predicates whether p and q point to the same object.
Figure 2. Reproduced from Wang et al. [54, Fig. 3]

• their (8), overlapping memory copy, refers to the ISO [TODO: fix up the following (cf David’s email)]
memcpy text: “If copying takes place between objects “C ONTAINER describes behavior in a macro common in the
that overlap, the behavior is undefined” [3, §7.24.2.1]. In Linux, BSD, and Windows kernels that, given a pointer to a
Cerberus this library call can be implemented in C except structure member, returns a pointer to the enclosing struc-
that it needs this explicit undefined-behaviour check. ture”. This is essentially the question of §2.13.4.
• their (9,10) identify use-after-free and use-after-realloc E XAMPLE (cheri_02_container.c):
cases, which are clearly forbidden in both ISO and de #include <stdio.h>
facto standards. #include <stddef.h>
typedef struct { int i; float f; int j; } st;
We add two questions following §2.13.1 (p.62), first just int main() {
forming a pointer value by arithmetic that overflows (on an st s = {.i=1, .f=2.0, .j=3};
int *pj = &(s.j);
architecture with 64-bit pointer representations), and then a char *pcj = ((char *)pj);
test that makes an access using such a pointer value. char *pcst = (pcj - (offsetof(st,j)-offsetof(st,i)));
//are these two lines free of undefined behaviour?
6.15 Beyond the PDP-11: Architectural support for a st *ps = (st *)pcst;
ps->f = 22.0;
memory-safe C abstract machine; Chisnall et al.; printf("s.i=%i s.f=%f s.j=%i ps->f=%f\n",s.i,s.f,s.j,
ASPLOS 2015 ps->f);
}
The following examples give simple forms of the “difficult
idioms” listed in this paper [14]. The data there shows that
most of these idioms occur often in practice and hence that “II refers to computation of invalid intermediate results. [...]
those (mostly) should be allowed in a semantics for a de This case refers to pointer arithmetic where the end result is
facto standard C, while a CHERI C semantics will be tighter within the bounds of an object, but intermediate results are
in some respects. not”. We used the next two tests in §2.13.1.
“D ECONST refers to programs that remove the const quali- E XAMPLE (cheri_03_ii.c):
fier from a pointer”. We used the following example in §5.1.
#include <stdio.h>
int main() {
int x[2];
E XAMPLE (cheri_01_deconst.c): int *p = &x[0];
#include <stdio.h> //is this free of undefined behaviour?
int main() { int *q = p + 11;
int x=0; q = q - 10;
const int *p = (const int *)&x; *q = 1;
//are the next two lines free of undefined behaviour? printf("x[1]=%i *q=%i\n",x[1],*q);
int *q = (int*)p; }
*q = 1;
printf("x=%i *p=%i *q=%i\n",x,*p,*q);
} E XAMPLE (cheri_03_ii_char.c):
#include <stdio.h>

164 2016/3/17
int main() { “M ASK refers to simple masking of pointers. For example,
unsigned char x; to store some other data in the low bits”. This is the test
unsigned char *p = &x;
//is this free of undefined behaviour? below from §2.2.4.
unsigned char *q = p + 11;
q = q - 10; E XAMPLE (provenance_tag_bits_via_uintptr_t_1.c):
*q = 1; #include <assert.h>
printf("x=0x%x *p=0x%x *q=0x%x\n",x,*p,*q); #include <stdio.h>
} #include <stdint.h>
int x=1;
int main() {
int *p = &x;
“I NT refers to storing a pointer in an integer variable in // cast &x to an integer
memory — implementation-defined behavior in C. [...] Dis- uintptr_t i = (uintptr_t) p;
// check the bottom two bits of an int* are not used
allowing this behavior makes accurate garbage collection assert(_Alignof(int) >= 4);
possible, as the compiler can statically track every pointer assert((i & 3u) == 0u);
use”. These are the examples we used in §2.2.2: // construct an integer like &x with low-order bit set
i = i | 1u;
// cast back to a pointer
E XAMPLE (provenance_roundtrip_via_intptr_t.c): int *q = (int *) i; // defined behaviour?
#include <stdio.h> // cast to integer and mask out the low-order two bits
#include <inttypes.h> uintptr_t j = ((uintptr_t)q) & ~((uintptr_t)3u);
int x=1; // cast back to a pointer
int main() { int *r = (int *) j;
int *p = &x; // are r and p now equivalent?
intptr_t i = (intptr_t)p; *r = 11; // defined behaviour?
int *q = (int *)i; _Bool b = (r==p);
*q = 11; // is this free of undefined behaviour? printf("x=%i *r=%i (r==p)=%s\n",x,*r,b?"true":"false");
printf("*p=%d *q=%d\n",*p,*q); }
}
“W IDE refers to storing a pointer in an integer variable of a
smaller size. This is undefined according to the C specifica-
tion, but may work if you are able to guarantee that pointers
E XAMPLE (provenance_roundtrip_via_unsigned_long.c): are within a certain range, for example by allocating mem-
#include <stdio.h> ory with malloc and the MAP 32BIT flag. Code using this
int x=1; idiom is broken by existing implementations, and most likely
int main() {
int *p = &x; reflects bugs in the code. We were surprised to see examples
unsigned long i = (unsigned long)p; of this in programs that we inspected, but fortunately it is
int *q = (int *)i; sufficiently rare that fixing all of the cases would be easy in
*q = 11; // is this free of undefined behaviour?
printf("*p=%d *q=%d\n",*p,*q); these codebases.” This seems sufficiently pathological that
} we do not include a question for it.
E XAMPLE (cheri_07_wide.c):
“IA refers to performing integer arithmetic on pointers — #include <stdio.h>
such as storing a pointer in an integer value and then per- #include <inttypes.h>
forming arbitrary arithmetic on it. This is a more general #include <limits.h>
#include <assert.h>
case of the Int idiom and relies on the same implementation- int x=1;
defined behavior”. This is essentially a combination of II int main() {
and Int. int *p = &x;
uintptr_t i = (uintptr_t) p;
assert( i <= UINT_MAX);
E XAMPLE (cheri_05_ia.c): unsigned int j = (unsigned int)i;
#include <stdio.h> uintptr_t k = (uintptr_t)j;
#include <inttypes.h> int *q = (int *)k;
int main() { *q = 2;
int x=0; printf("i=0x%"PRIxPTR" UINT_MAX=0x%x ULONG_MAX=0x%lx\n",
int *px = &x; i,UINT_MAX,ULONG_MAX);
uintptr_t ql = (uintptr_t)px; printf("x=%i *q=%i\n",x,*q);
ql = ql + 287343; }
ql = ql - 287343;
int *q = (int *)ql;
*q = 1; “L AST W ORD refers to accessing an object as aligned
printf("x=%i *px=%i *q=%i\n",x,*px,*q); words without regard for the fact that the objects extent may
}
not include all of the last word. This is used as an optimiza-
tion for strlen() in FreeBSD libc. While this is undefined be-

165 2016/3/17
havior in C, it works in systems with pagebased memory pro- “3 Arithmetic that dynamically executes an undefined
tection mechanisms, but not in CHERI where objects have operation (such as a signed integer overflow) generates a
byte granularity. We have found this idiom only in FreeBSDs logical trap value which poisons any computation based on
libc, as reported by valgrind”. This is the example we used it, but that does not destroy your entire program.”
in §3.3.11.
6.17 Proposal for a Friendly Dialect of C; Cuoq, Flatt,
E XAMPLE (cheri_08_last_word.c): Regehr; Blog post 2014
#include <assert.h> This blog post65 makes an initial proposal for a more pre-
#include <stdio.h>
#include <inttypes.h>
dictable dialect of C. They write: “As a starting point, we
char c[5]; imagine that friendly C is like the current C standard, but
int main() { replacing many occurrences of ‘X has undefined behavior’
char *cp = &(c[0]);
assert(sizeof(uint32_t) == 4);
with ‘X results in an unspecified value’. That adjustment
uint32_t x0 = *((uint32_t *)cp); alone can produce a much friendlier language. In other
// does this have defined behaviour? cases, we may be forced to refer to machine-specific details
uint32_t x1 = *((uint32_t *)(cp+4));
printf("x0=%x x1=%x\n",x0,x1);
that are not features of the C abstract machine, and we are
} OK with that.” and list 14 features, as below. Many of these
relate to integer arithmetic undefined behaviours, which are
not our focus in this document. In the other direction, the
blog post does not discuss most of our memory-model ques-
6.16 What every C programmer should know about
tions.
undefined behavior; Lattner; Blog post 2011
Part 1 of this three-part blog post by Chris Lattner64 dis- 1 The value of a pointer to an object whose lifetime has
cusses how six forms of undefined behaviour permit desir- ended remains the same as it was when the object was
able compiler optimisation: alive.
This would change the ISO “no” to a “yes” for our ques-
• Use of an uninitialized variable tion in §2.16.1.
As we discuss in §3, in ISO C11 this does not always give 2 Signed integer overflow results in twos complement
rise to undefined behaviour. The motivation given by Lat- wrapping behavior at the bitwidth of the promoted type.
tner for treating this as undefined behaviour would apply
Integer arithmetic UB. This could be accommodated in
equally to a semantics in which reading uninitialised vari-
the Cerberus semantics with an easy change to the elabo-
ables gives unspecified values.
ration function.
• Signed integer overflow
3 Shift by negative or shift-past-bitwidth produces an un-
• Oversized Shift Amounts specified result.
These two are both integer arithmetic undefined be- Integer arithmetic UB. This could be accommodated in
haviours, which are not our focus in this document. the Cerberus semantics with an easy change to the elabo-
• Dereferences of Wild Pointers and Out of Bounds Array ration function.
Accesses 4 Reading from an invalid pointer either traps or produces
• Dereferencing a NULL Pointer an unspecified value. In particular, all but the most ar-
These are both discussed in the previous subsection cane hardware platforms can produce a trap when deref-
(§6.17, point 4). erencing a null pointer, and the compiler should preserve
this behavior.
• Violating Type Rules
See §2.17.2.
This explains the motivation for type-based alias anal-
For null pointers, on many platforms one could require
ysis, but for our candidate de facto memory model we
them to definitely give a runtime failure, as per our ques-
focus on the -fno-strict-aliasing case.
tion in §2.17.1.
Part 3 of this series lists some cases where Clang adopts 5 Division-related overflows either produce an unspecified
a stronger semantics than ISO, including: result or else a machine-specific trap occurs.
“2 Arithmetic that operates on undefined values is con- Integer arithmetic UB. This could be accommodated in
sidered to produce a undefined value instead of producing the Cerberus semantics with an easy change to the elabo-
undefined behavior.” ration function.
64 http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.
65 http://blog.regehr.org/archives/1180 and followup http://
html blog.regehr.org/archives/1287

166 2016/3/17
6 If possible, we want math- and memory-related traps to 13 The compiler is granted no additional optimization
be treated as externally visible side-effects that must not power when it is able to infer that a pointer is invalid. In
be reordered with respect to other externally visible side- other words, the compiler is obligated to assume that any
effects (much less be assumed to be impossible), but we pointer might be valid at any time, and to generate code
recognize this may result in significant runtime overhead accordingly. The compiler retains the ability to optimize
in some cases. away pointer dereferences that it can prove are redun-
The impact of 4–6 on optimisations that involve code dant or otherwise useless.
motion isn’t clear to us. The force of this is unclear, especially w.r.t. provenance-
7 The result of any signed left-shift is the same as if the based alias analysis.
left-hand shift argument was cast to unsigned, the shift 14 When a non-void function returns without returning a
performed, and the result cast back to the signed type. value, an unspecified result is returned to the caller.
Integer arithmetic UB. This could be accommodated in This is presumably also an easy elaboration change.
the Cerberus semantics with an easy change to the elabo-
ration function. 6.18 UB Canaries; Regehr; Blog post 2015
8 A read from uninitialized storage returns an unspecified This blog post66 by John Regehr gives “a collection of ca-
value. naries for undefined behavior: little test programs that au-
This is our question in §3.1.2. Though exactly tomate the process of determining whether a given compiler
how Friendly-C unspecified values should behave, configuration is willing to exploit particular UBs.”, together
e.g. w.r.t. strictness and our other §3.2 questions, is not with the results for several versions of GCC and LLVM.
stated. The first two examples (addr_null_p1.c and addr_
9 It is permissible to compute out-of-bounds pointer val- null_p2.c) test whether one can use the address of mem-
ues including performing pointer arithmetic on the null bers of a NULL struct pointer in place of offsetof. We add
pointer. This works as if the pointers had been cast to an example to §2.13.6 for this.
uintptr t. However, the translation from pointer math array_oob_p1.c contains a straightforward out-of-
to integer math is not completely straightforward since bounds array-read undefined behaviour (the question for
incrementing a pointer by one is equivalent to increment- the canaries is whether compilers aggressively exploit that).
ing the integer-typed variable by the size of the pointed-to array_oob_p2.c is similar.
type. The dangling_pointer_p1.c, dangling_pointer_
p2.c, and dangling_pointer_p3.c examples check
The first part is our question from §2.13.1. The second is
whether compilers optimise based on an assumption that an
handled in Cerberus by the elaboration.
out-of-lifetime pointer is distinct from another pointer, after
10 The strict aliasing rules simply do not exist: the represen- the end of a block scope, a realloc, and a free respec-
tations of integers, floating-point values and pointers can tively. See our §2.16.1, where we give block-end and free
be accessed with different types. tests.
This matches our candidate de facto memory model The int_min_mod_minus_1_p1.c tests INT MIN %
choice to focus on the -fno-strict-aliasing be- -1. Not being a memory object question, this is not in the
haviour. scope of this note.
memcpy_overlap_p1.c tests random memcpy’s, pre-
11 A data race results in unspecified behavior. Informally,
sumably to check whether the compiler exploits the [3,
we expect that the result of a data race is the same as in
§7.24.2.1] statement that overlapping memcpy’s (unlike over-
C99: threads are compiled independently and then data
lapping memmove’s) give undefined behaviour. We could add
races have a result that is dictated by the details of the
another question, asking whether such a memcpy gives a
underlying scheduler and memory system. Sequentially
well-defined copy, unspecified values in the target footprint,
consistent behavior may not be assumed when data races
or undefined behaviour.
occur.
modify_string_literal_p1.c tries to modify a string
This is rather unclear: what does this usage of “unspeci- literal, undefined behaviour by [3, §6.4.5p7].
fied behaviour” mean? pointer_casts_p1.c tries to cast away a const from
12 memcpy() is implemented by memmove(). Additionally, a pointer and write using the result.
both functions are no-ops when asked to copy zero bytes, pointer_casts_p2.c tries to use a non-volatile pointer
regardless of the validity of their pointer arguments. to mutate a volatile int; we have not considered
This is a library undefined-behaviour issue; we’ve so far volatile in this note.
not looked into those.
66 http://blog.regehr.org/archives/1234

167 2016/3/17
shift_by_bitwidth_p1.c tests whether “it’s OK
to shift an integer by its bitwidth and the result is 0”;
an arithmetic property we do not consider in this note.
signed_integer_overflow_p1.c, signed_integer_
overflow_p2.c, signed_left_shift_p1.c, and
signed_left_shift_p2.c are similarly outside our scope
here.
strict_aliasing_p1.c is a basic effective-types type
punning question, as in our §4.1.1.
uninitialized_variable_p1.c, uninitialized_
variable_p2.c, and uninitialized_variable_p3.c
involve stability, strictness, and control-flow choices of
unspecified values, as in our questions 50, 51, and 52.
uninitialized_variable_p4.c asks whether a com-
parison x < INT MIN, where x is uninitialised, is guaran-
teed false. If all operations on unspecified values give un-
specified values (c.f. our Question 52) then the answer to this
would be no. uninitialized_variable_p5.c is similar
but for >.

168 2016/3/17
References [16] J. Cook and S. Subramanian. A formal semantics for C in
[1] Programming Languages — C. December 1999. Second Nqthm. Technical Report 517D, Trusted Information Sys-
edition. ISO/IEC 9899:1999 (E). tems, October, 1994.

[2] Rationale for international standard – programming languages [17] J. Devietti, C. Blundell, M. M. K. Martin, and S. Zdancewic.
– C, revision 5.10, April 2003. www.open-std.org/jtc1/ Hardbound: Architectural support for spatial safety of the C
sc22/wg14/www/C99RationaleV5.10.pdf. programming language. In Proc. ASPLOS, 2008.

[3] Programming Languages — C. 2011. ISO/IEC 9899:2011. [18] C. Ellison and G. Roşu. An executable formal semantics of C
A non-final but recent version is available at www.open-std. with applications. In Proc. POPL, 2012.
org/jtc1/sc22/wg14/docs/n1539.pdf. [19] Daniel Joseph Grossman. Safe Programming at the C
[4] Lars Ole Anderson. Program Analysis and Specialization for Level of Abstraction. PhD thesis, Ithaca, NY, USA, 2003.
the C Programming Language. PhD thesis, DIKU, University AAI3104470.
of Copenhagen, 1994. [20] Y. Gurevich and J. K. Huggins. The semantics of the C
[5] ARM. Procedure call standard for the ARM architecture, programming language. In Proc. CSL ’92, 1993.
November 2012. ARM IHI 0042E, current through ABI [21] C. Hathhorn, C. Ellison, and G. Rosu. Defining the undefined-
release 2.09. ness of C. In Proc. PLDI, 2015.
[6] Ryan S. Arnold, Greg Davis, Brian Deitrich, Michael Eager, [22] Charles McEwen Ellison III. A Formal Semantics of C with
Emil Medve, Steven J. Munroe, Joseph S. Myers, Steve Pa- Applications. PhD thesis, University of Illinois at Urbana-
pacharalambous, Anmol P. Paralkar, Katherine Stewart, and Champaign, 2012.
Edmar Wienskoski. DRAFT: Power Architecture 32-bit Ap-
[23] T. Jim, J. G. Morrisett, D. Grossman, M. W. Hicks, J. Cheney,
plication Binary Interface, Supplement 1.0 - Linux and Em-
and Y. Wang. Cyclone: A safe dialect of C. In Proc. USENIX
bedded, April 2011.
ATC, 2002.
[7] M. Batty, K. Memarian, S. Owens, S. Sarkar, and P. Sewell.
[24] Derek M. Jones. The new C standard: An economic and
Clarifying and compiling C/C++ concurrency: from C++11 to
cultural commentary. http://www.coding-guidelines.
POWER. In Proceedings of the 39th annual ACM SIGPLAN-
com/cbook/. Accessed 2014-06-16.
SIGACT symposium on Principles of programming languages,
POPL ’12, pages 509–520, New York, NY, USA, 2012. ACM. [25] J. Kang, C.-K. Hur, W. Mansky, D. Garbuzov, S. Zdancewic,
and V. Vafeiadis. A formal C memory model supporting
[8] M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber. Math-
integer-pointer casts. In Proc. PLDI, 2015.
ematizing C++ concurrency. In Proc. POPL, 2011.
[26] Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June An-
[9] F. Besson, S. Blazy, and P. Wilke. A precise and abstract
dronick, David Cock, Philip Derrin, Dhammika Elkaduwe,
memory model for C using symbolic values. In APLAS, 2014.
Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas
[10] F. Besson, S. Blazy, and P. Wilke. A concrete memory model Sewell, Harvey Tuch, and Simon Winwood. seL4: Formal ver-
for CompCert. In Proc. ITP, 2015. ification of an OS kernel. In Proceedings of the ACM SIGOPS
[11] Paul E. Black and Phillip J. Windley. Inference rules for 22nd Symposium on Operating Systems Principles, SOSP ’09,
programming languages with side effects in expressions. In pages 207–220, New York, NY, USA, 2009. ACM.
Proceedings of the 9th International Conference on Theorem [27] R. Krebbers. Aliasing restrictions of C11 formalized in Coq.
Proving in Higher Order Logics, TPHOLs ’96, pages 51–60, In Proc. CPP, LNCS 8307, 2013.
London, UK, UK, 1996. Springer-Verlag.
[28] R. Krebbers. An operational and axiomatic semantics for non-
[12] Paul E. Black and Phillip J. Windley. Formal verification of determinism and sequence points in C. In Proc. POPL, 2014.
secure programs in the presence of side effects. In Proceed-
[29] R. Krebbers. The C standard formalized in Coq. PhD thesis,
ings of the Thirty-First Annual Hawaii International Confer-
Radboud University Nijmegen, December 2015.
ence on System Sciences - Volume 3, HICSS ’98, pages 327–,
Washington, DC, USA, 1998. IEEE Computer Society. [30] R. Krebbers and F. Wiedijk. Separation logic for non-local
control flow and block scope variables. In FoSSaCS, 2013.
[13] Mark Boffinger. Reasoning about C programs. PhD thesis,
University of Queensland, 1998. [31] R. Krebbers and F. Wiedijk. A typed C11 semantics for
interactive theorem proving. In Proc. CPP, 2015.
[14] David Chisnall, Colin Rothwell, Brooks Davis, Robert N.M.
Watson, Jonathan Woodruff, Munraj Vadera, Simon W. [32] Robbert Krebbers, Xavier Leroy, and Freek Wiedijk. Formal
Moore, Peter G. Neumann, and Michael Roe. Beyond the pdp- C semantics: Compcert and the C standard. In Interactive
11: Processor support for a memory-safe c abstract machine. Theorem Proving - 5th International Conference, ITP 2014,
In Proceedings of the Fifteenth Edition of ASPLOS on Archi- Held as Part of the Vienna Summer of Logic, VSL 2014,
tectural Support for Programming Languages and Operating Vienna, Austria, July 14-17, 2014. Proceedings, pages 543–
Systems, New York, NY, USA, 2015. ACM. 548, 2014.
[15] E. Cohen, M. Moskal, S. Tobies, and W. Schulte. A precise yet [33] X. Leroy, A. W. Appel, S. Blazy, and G. Stewart. The Com-
efficient memory model for C. Electron. Notes Theor. Comput. pCert memory model, version 2. Research report RR-7987,
Sci. (SSV 2009), 254:85–103, October 2009. INRIA, June 2012.

169 2016/3/17
[34] X. Leroy and S. Blazy. Formal verification of a C-like memory [52] J. Ševčı́k, V. Vafeiadis, F. Zappa Nardelli, S. Jagannathan, and
model and its uses for verifying program transformations. P. Sewell. CompCertTSO: A verified compiler for relaxed-
Journal of Automated Reasoning, 41(1):1–31, 2008. memory concurrency. J. ACM, 60(3), June 2013.
[35] Justus Matthiesen. Mathematizing the C programming lan- [53] X. Wang, H. Chen, A. Cheung, Z. Jia, N. Zeldovich, and M. F.
guage, May 2011. University of Cambridge Computer Sci- Kaashoek. Undefined behavior: what happened to my code?
ence Tripos Part II project dissertation. In Proc. APSYS, 2012.
[36] Justus Matthiesen. Elaborating C, June 2012. University of [54] X. Wang, N. Zeldovich, M. F. Kaashoek, and A. Solar-
Cambridge Computer Science ACS MPhil dissertation. Lezama. Towards optimization-safe systems: Analyzing the
impact of undefined behavior. In Proc. SOSP, 2013.
[37] Michael Matz, Jan Hubička, Andreas Jaeger, and
Mark Mitchell (Eds.). System V Application Binary In- [55] R. N. M. Watson, P. G. Neumann, J. Woodruff, M. Roe, J. An-
terface, AMD64 Architecture Processor Supplement, Draft derson, D. Chisnall, B. Davis, A. Joannou, B. Laurie, S. W.
Version 0.99.6, October 2013. Moore, S. J. Murdoch, and R. Norton. Capability hardware
enhanced RISC instructions: CHERI instruction-set architec-
[38] Microsoft. Visual Studio 2013, Aggregates and Unions.
ture. Technical Report UCAM-CL-TR-864, University of
http://msdn.microsoft.com/en-us/library/
Cambridge, Computer Laboratory, November 2015.
9dbwhz68.aspx, 2013. Accessed 2014-06-16.
[56] R. N. M. Watson, J. Woodruff, P. G. Neumann, S. W. Moore,
[39] R. Morisset, P. Pawan, and F. Zappa Nardelli. Compiler
J. Anderson, D. Chisnall, N. H. Dave, B. Davis, K. Gudka,
testing via a theory of sound optimisations in the C11/C++11
B. Laurie, S. J. Murdoch, R. Norton, M. Roe, S. Son, and
memory model. In Proc. PLDI, 2013.
M. Vadera. CHERI: A hybrid capability-system architecture
[40] S. Nagarakatte, J. Zhao, M. M.K. Martin, and S. Zdancewic. for scalable software compartmentalization. In IEEE Sympo-
SoftBound: highly compatible and complete spatial memory sium on Security and Privacy, SP, 2015.
safety for C. In Proc. PLDI, 2009. [57] X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and
[41] G. C. Necula, S. McPeak, S. P. Rahul, and W. Weimer. CIL: understanding bugs in C compilers. In Proc. PLDI, 2011.
Intermediate language and tools for analysis and transforma-
tion of C programs. In Proc. CC, 2002.
[42] N. Nethercote and J. Seward. Valgrind: A framework for
heavyweight dynamic binary instrumentation. In PLDI, 2007.
[43] M. Norrish. C formalised in HOL. Technical Report UCAM-
CL-TR-453, U. Cambridge, Computer Laboratory, 1998.
[44] M. Norrish. Deterministic expressions in C. In ESOP, 1999.
[45] Santa Cruz Operation. SYSTEM V APPLICATION BINARY
INTERFACE, MIPS RISC Processor Supplement, 3rd Edi-
tion, February 1996.
[46] N. S Papaspyrou. A formal semantics for the C program-
ming language. PhD thesis, National Technical University of
Athens, 1998.
[47] S. Sarkar, K. Memarian, S. Owens, M. Batty, P. Sewell,
L. Maranget, J. Alglave, and D. Williams. Synchronising
C/C++ and POWER. In PLDI ’12: Proceedings of the 33rd
ACM SIGPLAN conference on Programming Language De-
sign and Implementation, pages 311–322. ACM Press, June
2012.
[48] H. Tuch, G. Klein, and M. Norrish. Types, bytes, and separa-
tion logic. In Proc. POPL, 2007.
[49] Harvey Tuch. Formal Memory Models for Verifying C Systems
Code. PhD thesis, UNSW, Sydney, Australia, aug 2008.
[50] Harvey Tuch and Gerwin Klein. A unified memory model
for pointers. In Proceedings of the 12th International Con-
ference on Logic for Programming, Artificial Intelligence and
Reasoning, pages 474–488, Montego Bay, Jamaica, dec 2005.
[51] J. Ševčı́k, V. Vafeiadis, F. Zappa Nardelli, S. Jagannathan, and
P. Sewell. Relaxed-memory concurrency and verified com-
pilation. In Proceedings of POPL 2011: the 38th Annual
ACM SIGPLAN-SIGACT Symposium on Principles of Pro-
gramming Languages, pages 43–54, 2011.

170 2016/3/17
Index
besson_blazy_wilke_6.2.c, 55, 157 padding_struct_copy_3.c, 120
besson_blazy_wilke_bitfields_1.c, 157 padding_struct_copy_of_representation_bytes.c,
besson_blazy_wilke_bitfields_1u.c, 107, 157 130
besson_blazy_wilkie_Fig_1_adapted.c, 154 padding_struct_members_copy.c, 119
padding_subunion_1.c, 128
cast_struct_and_first_member_1.c, 72 padding_subunion_2.c, 129
cast_struct_inter_member_1.c, 69 padding_unspecified_value_1.c, 122
cast_struct_isomorphic.c, 77 padding_unspecified_value_2.c, 123
cast_struct_same_prefix.c, 78 padding_unspecified_value_3.c, 123
cast_union_and_member_1.c, 73 padding_unspecified_value_4.c, 124
cheri_01_deconst.c, 143, 164 padding_unspecified_value_5.c, 127
cheri_02_container.c, 164 padding_unspecified_value_6.c, 127
cheri_03_ii.c, 63, 164
padding_unspecified_value_7.c, 125
cheri_03_ii_char.c, 63, 164
padding_unspecified_value_8.c, 126
cheri_05_ia.c, 165
pointer_add_wrap_1.c, 66
cheri_07_wide.c, 165
pointer_add_wrap_2.c, 67
cheri_08_last_word.c, 131, 166
pointer_arith_algebraic_properties_2_global.c,
compcertMMv2-1.c, 149
33
compcertMMv2-2.c, 150
pointer_arith_algebraic_properties_3_global.c,
compcertMMv2-3.c, 150
34
compcertMMv2-4.c, 151
pointer_comparison_eq_1_auto.c, 51
compcertMMv2-5.c, 151
pointer_comparison_eq_1_global.c, 50
compcertMMv2-6.c, 152
compcertTSO-1.c, 149 pointer_comparison_eq_2_auto.c, 53
compcertTSO-2.c, 85 pointer_comparison_eq_2_global.c, 52
pointer_comparison_eq_zombie_1.c, 83
effective_type_1.c, 134 pointer_comparison_eq_zombie_2.c, 84
effective_type_10.c, 135 pointer_comparison_rel_1_auto.c, 57
effective_type_2.c, 135 pointer_comparison_rel_1_global.c, 56, 157
effective_type_3.c, 136 pointer_comparison_rel_different_type_
effective_type_4.c, 137 members.c, 59
effective_type_5.c, 138 pointer_comparison_rel_substruct.c, 58
effective_type_6.c, 139 pointer_copy_memcpy.c, 35
effective_type_7.c, 140 pointer_copy_user_ctrlflow_bitwise.c, 39
effective_type_8.c, 140 pointer_copy_user_ctrlflow_bytewise.c, 38
effective_type_9.c, 141 pointer_copy_user_ctrlflow_bytewise_abbrev.c,
38
frama-c-2013-03-13-2.c, 92 pointer_copy_user_dataflow_direct_bitwise.c,
frama-c-2013-03-13-3-uc.c, 94 40
frama-c-2013-03-13-3.c, 93 pointer_copy_user_dataflow_direct_bytewise.c,
35, 149, 153
khmgzv-1.c, 159
pointer_copy_user_dataflow_indirect_bytewise.
khmgzv-2.c, 160
c, 36, 154
klw-itp14-1.c, 153
pointer_from_concrete_address_1.c, 47
klw-itp14-2.c, 54, 153
pointer_from_concrete_address_2.c, 48
krebbers_biener_1.c, 142
pointer_offset_constant_8_malloc.c, 28
null_pointer_1.c, 60 pointer_offset_from_subtraction_1_auto.c, 27
null_pointer_2.c, 61 pointer_offset_from_subtraction_1_global.c, 26
null_pointer_3.c, 62 pointer_offset_from_subtraction_1_malloc.c, 28
null_pointer_4.c, 86 pointer_offset_from_subtraction_2_auto.c, 31
pointer_offset_from_subtraction_2_global.c, 30
padding_struct_copy_1.c, 116 pointer_offset_xor_auto.c, 33
padding_struct_copy_2.c, 117 pointer_offset_xor_global.c, 32

171 2016/3/17
pointer_stability_1.c, 49 tkn-2.c, 146
provenance_basic_auto_xy.c, 10 tkn-3.c, 147
provenance_basic_auto_yx.c, 9 trap_representation_1.c, 90, 95
provenance_basic_global_xy.c, 8 trap_representation_2.c, 91
provenance_basic_global_yx.c, 8 trap_representation_3.c, 91
provenance_basic_mixed_global_offset+4.c, 41
provenance_basic_mixed_global_offset-4.c, 41 ubc_addr_null_1.c, 71
provenance_basic_using_intptr_t_auto_xy.c, 19 union_punning_gcc_1.c, 81
provenance_basic_using_intptr_t_auto_yx.c, 18 union_punning_gcc_2.c, 81
provenance_basic_using_intptr_t_auto_yx_ unspecified_value_control_flow_choice.c, 97
offset-16.c, 19 unspecified_value_daemonic_1.c, 105
provenance_basic_using_intptr_t_global_xy.c, unspecified_value_library_call_argument.c, 96
17 unspecified_value_representation_bytes_1.c, 107
provenance_basic_using_intptr_t_global_xy_ unspecified_value_representation_bytes_2.c, 111
offset64.c, 18 unspecified_value_representation_bytes_3.c, 113
provenance_basic_using_intptr_t_global_yx.c, unspecified_value_representation_bytes_4.c, 109
16 unspecified_value_stability.c, 98
provenance_basic_using_intptr_t_malloc_ unspecified_value_strictness_and_1.c, 104
offset_8.c, 20 unspecified_value_strictness_int.c, 101, 156
provenance_equality_auto_cu_yx_a.c, 14 unspecified_value_strictness_mod_1.c, 103
provenance_equality_auto_cu_yx_b.c, 14 unspecified_value_strictness_mod_2.c, 103
provenance_equality_auto_fn_yx.c, 13 unspecified_value_strictness_unsigned_char.c,
provenance_equality_auto_yx.c, 13 102
provenance_equality_global_cu_xy_a.c, 13 unspecified_value_struct_copy.c, 106
provenance_equality_global_cu_xy_b.c, 13 unspecified_value_union_1.c, 106
provenance_equality_global_fn_xy.c, 12 use_struct_isomorphic.c, 75
provenance_equality_global_fn_yx.c, 11
provenance_equality_global_xy.c, 11 write_union_same_prefix_visible.c, 80
provenance_equality_global_yx.c, 11
provenance_equality_uintptr_t_global_xy.c, 22
provenance_equality_uintptr_t_global_yx.c, 22
provenance_multiple_1_global.c, 23
provenance_multiple_2_global.c, 24
provenance_multiple_3_global_yx.c, 24
provenance_multiple_4_global_yx.c, 25
provenance_roundtrip_via_intptr_t.c, 14, 165
provenance_roundtrip_via_unsigned_long.c, 15,
165
provenance_tag_bits_via_uintptr_t_1.c, 21, 157,
165
provenance_union_punning_1_global.c, 42
provenance_union_punning_2_auto_xy.c, 44
provenance_union_punning_2_global_xy.c, 43
provenance_union_punning_2_global_yx.c, 43
provenance_via_io_bytewise_global.c, 45
provenance_via_io_percentp_global.c, 45
provenance_via_io_uintptr_t_global.c, 46

read_union_same_prefix_visible.c, 79
read_via_invalid_1.c, 88

struct_initialise_members.c, 74
struct_inter_submember_1.c, 70

tkn-1.c, 146

172 2016/3/17

Potrebbero piacerti anche