Skip to content

growVector, copyAsPlain may formally cause undefined behaviour with 0-length vectors #6819

@aitap

Description

@aitap

Found while working on a custom Debian testing-based container (using C compiler: ‘Debian clang version 19.1.7 (1+b1)’) with Clang sanitizers enabled for #6746:

When trying to access the contents of a zero-length vector using INTEGER(...) or REAL(...) or other accessor, R may return an invalid pointer (0x1). The C standard says that giving an invalid pointer to memcpy() is undefined behaviour, even though in practice nothing breaks (memcpy sees n=0 and doesn't dereference it).

If nothing breaks, what's the risk? One of the CRAN special checks (clang-UBSAN or 0len) might pick this up too. Very far-fetched, a compiler might optimize away a chunk of code deemed to cause undefined behaviour.

Running test id 173.1

dogroups.c:541:39: runtime error: load of misaligned address 0x000000000001 for type 'int *', which requires 4 byte alignment
0x000000000001: note: pointer points here
<memory cannot be printed>
    #0 0x7f2efbaec116 in growVector /work/data.table.Rcheck/00_pkg_src/data.table/src/dogroups.c
    #1 0x7f2efbae90d5 in dogroups /work/data.table.Rcheck/00_pkg_src/data.table/src/dogroups.c:409:66

(gdb) frame 4
#4  0x00007fc6d4aec117 in growVector (x=0x52500661b038, newlen=newlen@entry=3) at dogroups.c:543
543       case CPLXSXP: memcpy(COMPLEX(newx), COMPLEX(x), len*SIZEOF(x)); break;
(gdb) p Rf_xlength(x)
$4 = 0
(gdb) p Rf_xlength(newx)
$5 = 3
(gdb) call Rf_PrintValue(R_GlobalContext->call)
`[.data.table`(DT, , B[B > 3], by = A)
Running test id 893.5

utils.c:233:12: runtime error: store to misaligned address 0x000000000001 for type 'int *', which requires 4 byte alignment
0x000000000001: note: pointer points here
<memory cannot be printed>
    #0 0x7f2efbbfd5f9 in copyAsPlain /work/data.table.Rcheck/00_pkg_src/data.table/src/utils.c:233:5
    #1 0x7f2efbbefbf9 in subsetDT /work/data.table.Rcheck/00_pkg_src/data.table/src/subset.c:317:30

(gdb) frame 4
#4  0x00007fc6d4bfd5fa in copyAsPlain (x=x@entry=0x525003c44b68) at utils.c:233
233         memcpy(INTEGER(ans), INTEGER(x), n*sizeof(int));             // covered by 10:1 after test 178
(gdb) p Rf_xlength(x)
$8 = 0
(gdb) call Rf_PrintValue(R_GlobalContext->call)
`[.data.table`(head(DT, nr), , seq_len(if (nc == 0) ncol(DT) else nc),
    with = FALSE)
Running test id 2150.21

dogroups.c:540:39: runtime error: load of misaligned address 0x000000000001 for type 'int *', which requires 4 byte alignment
0x000000000001: note: pointer points here
<memory cannot be printed>
    #0 0x7f2efbaec116 in growVector /work/data.table.Rcheck/00_pkg_src/data.table/src/dogroups.c
    #1 0x7f2efbb46d4e in allocateDT /work/data.table.Rcheck/00_pkg_src/data.table/src/freadR.c:501:36
    #2 0x7f2efbb2f967 in freadMain /work/data.table.Rcheck/00_pkg_src/data.table/src/fread.c:2666:7
    #3 0x7f2efbb42306 in freadR /work/data.table.Rcheck/00_pkg_src/data.table/src/freadR.c:222:3

(gdb) frame 4
#4  0x00007fc6d4aec117 in growVector (x=x@entry=0x525004e417e8, newlen=newlen@entry=1024)
    at dogroups.c:543
543       case CPLXSXP: memcpy(COMPLEX(newx), COMPLEX(x), len*SIZEOF(x)); break;
(gdb) p Rf_xlength(x)
$11 = 0
(gdb) call Rf_PrintValue(R_GlobalContext->call)
fread("c1\n2018-01-31 03:16:57")

(Yes, that case CPLSXP: looks a bit strange. clang must have merged the branches into one with different length arguments to memcpy().)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions