-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Found while working on a custom Debian testing-based container (using C compiler: ‘Debian clang version 19.1.7 (1+b1)’
) with Clang sanitizers enabled for #6746:
When trying to access the contents of a zero-length vector using INTEGER(...)
or REAL(...)
or other accessor, R may return an invalid pointer (0x1
). The C standard says that giving an invalid pointer to memcpy()
is undefined behaviour, even though in practice nothing breaks (memcpy
sees n=0
and doesn't dereference it).
If nothing breaks, what's the risk? One of the CRAN special checks (clang-UBSAN
or 0len
) might pick this up too. Very far-fetched, a compiler might optimize away a chunk of code deemed to cause undefined behaviour.
Running test id 173.1
dogroups.c:541:39: runtime error: load of misaligned address 0x000000000001 for type 'int *', which requires 4 byte alignment
0x000000000001: note: pointer points here
<memory cannot be printed>
#0 0x7f2efbaec116 in growVector /work/data.table.Rcheck/00_pkg_src/data.table/src/dogroups.c
#1 0x7f2efbae90d5 in dogroups /work/data.table.Rcheck/00_pkg_src/data.table/src/dogroups.c:409:66
(gdb) frame 4
#4 0x00007fc6d4aec117 in growVector (x=0x52500661b038, newlen=newlen@entry=3) at dogroups.c:543
543 case CPLXSXP: memcpy(COMPLEX(newx), COMPLEX(x), len*SIZEOF(x)); break;
(gdb) p Rf_xlength(x)
$4 = 0
(gdb) p Rf_xlength(newx)
$5 = 3
(gdb) call Rf_PrintValue(R_GlobalContext->call)
`[.data.table`(DT, , B[B > 3], by = A)
Running test id 893.5
utils.c:233:12: runtime error: store to misaligned address 0x000000000001 for type 'int *', which requires 4 byte alignment
0x000000000001: note: pointer points here
<memory cannot be printed>
#0 0x7f2efbbfd5f9 in copyAsPlain /work/data.table.Rcheck/00_pkg_src/data.table/src/utils.c:233:5
#1 0x7f2efbbefbf9 in subsetDT /work/data.table.Rcheck/00_pkg_src/data.table/src/subset.c:317:30
(gdb) frame 4
#4 0x00007fc6d4bfd5fa in copyAsPlain (x=x@entry=0x525003c44b68) at utils.c:233
233 memcpy(INTEGER(ans), INTEGER(x), n*sizeof(int)); // covered by 10:1 after test 178
(gdb) p Rf_xlength(x)
$8 = 0
(gdb) call Rf_PrintValue(R_GlobalContext->call)
`[.data.table`(head(DT, nr), , seq_len(if (nc == 0) ncol(DT) else nc),
with = FALSE)
Running test id 2150.21
dogroups.c:540:39: runtime error: load of misaligned address 0x000000000001 for type 'int *', which requires 4 byte alignment
0x000000000001: note: pointer points here
<memory cannot be printed>
#0 0x7f2efbaec116 in growVector /work/data.table.Rcheck/00_pkg_src/data.table/src/dogroups.c
#1 0x7f2efbb46d4e in allocateDT /work/data.table.Rcheck/00_pkg_src/data.table/src/freadR.c:501:36
#2 0x7f2efbb2f967 in freadMain /work/data.table.Rcheck/00_pkg_src/data.table/src/fread.c:2666:7
#3 0x7f2efbb42306 in freadR /work/data.table.Rcheck/00_pkg_src/data.table/src/freadR.c:222:3
(gdb) frame 4
#4 0x00007fc6d4aec117 in growVector (x=x@entry=0x525004e417e8, newlen=newlen@entry=1024)
at dogroups.c:543
543 case CPLXSXP: memcpy(COMPLEX(newx), COMPLEX(x), len*SIZEOF(x)); break;
(gdb) p Rf_xlength(x)
$11 = 0
(gdb) call Rf_PrintValue(R_GlobalContext->call)
fread("c1\n2018-01-31 03:16:57")
(Yes, that case CPLSXP:
looks a bit strange. clang
must have merged the branches into one with different length arguments to memcpy()
.)