Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Savedata is not endianness portable #2693

Open
nurupo opened this issue Feb 17, 2024 · 3 comments
Open

Savedata is not endianness portable #2693

nurupo opened this issue Feb 17, 2024 · 3 comments
Labels
bug Bug fix for the user, not a fix to a build script P1 High priority
Milestone

Comments

@nurupo
Copy link
Member

nurupo commented Feb 17, 2024

Savedata created on a little-endian systems will not load on big-endian systems and the other way around.

I have looked into save_compatibility_test failing on s390x and found a couple of issues that I would like to document:

  1. Toxcore uses little-endian to host and host to little-endian conversion functions. For example, lendian_bytes_to_host32() in tox_load():

    c-toxcore/toxcore/tox.c

    Lines 704 to 709 in 710eb67

    memcpy(data32, data, sizeof(uint32_t));
    lendian_bytes_to_host32(data32 + 1, data + sizeof(uint32_t));
    if (data32[0] != 0 || data32[1] != STATE_COOKIE_GLOBAL) {
    return -1;
    }

    They function such that if WORDS_BIGENDIAN is defined, they assume the host is big-endian, otherwise it's little endian:

    c-toxcore/toxcore/state.c

    Lines 127 to 136 in 710eb67

    void lendian_bytes_to_host32(uint32_t *dest, const uint8_t *lendian)
    {
    uint32_t d;
    memcpy(&d, lendian, sizeof(uint32_t));
    #ifdef WORDS_BIGENDIAN
    d = ((d << 8) & 0xFF00FF00) | ((d >> 8) & 0xFF00FF);
    d = (d << 16) | (d >> 16);
    #endif /* WORDS_BIGENDIAN */
    *dest = d;
    }

    However, WORDS_BIGENDIAN is never defined by anything on a big-endian system, so those functions always assume the host is little-endian and just memcpy() the data as is, without any conversion. This results in little-endian systems storing and reading integers in the little-endian order, and big-endian systems storing and reading integers in the big-endian order.

    Trying to load a savedata created on amd64 on a s390x system will result in the tox_load() code linked above returning -1 on line 708 as the savedata file's magic number won't match due to the wrong endianness, with tox_new() returning TOX_ERR_NEW_LOAD_BAD_FORMAT to the user, as evident by save_compatibility_test failing on s390s with:

    ./auto_tests/save_compatibility_test.c:87: failed `tox': failed to create tox, error number: 9
    

    size_t index = 0;
    Tox_Err_New err;
    Tox *tox = tox_new_log(&options, &err, &index);
    ck_assert_msg(tox, "failed to create tox, error number: %d", err);

  2. If we fix the previous issue by defining WORDS_BIGENDIAN on a big-endian system, e.g. by adding the following to CMakeLists.txt:

    include(TestBigEndian)
    test_big_endian(IS_BIG_ENDIAN)
    if(IS_BIG_ENDIAN)
      add_definitions(-DWORDS_BIGENDIAN)
    endif()

    then the magic number matches and the code proceeds further in parsing the savedata, however the s390x save_compatibility_test then fails with:

    [#0] ERROR state.c:41   state_load:     state file garbled: 01ce != 01ce
    

    uint32_t cookie_type;
    lendian_bytes_to_host32(&cookie_type, data + sizeof(uint32_t));
    data += size_head;
    length -= size_head;
    if (length < length_sub) {
    /* file truncated */
    LOGGER_ERROR(log, "state file too short: %u < %u", length, length_sub);
    return -1;
    }
    if (lendian_to_host16(cookie_type >> 16) != cookie_inner) {
    /* something is not matching up in a bad way, give up */
    LOGGER_ERROR(log, "state file garbled: %04x != %04x", cookie_type >> 16, cookie_inner);
    return -1;
    }
    const uint16_t type = lendian_to_host16(cookie_type & 0xFFFF);

    The issue here is that the code does the same endianness conversion twice. On line 28 it converts little-endian to host 32-bit. This will convert the little-endian to the big-endian representation. But then on the line 39 it performs a little-endian to host conversion again, mistakenly converting those 16 bits from the host endianness (which is big-endian! this is why calling lendian_to_host16() on it produces non-portable result, as it's not lendian to begin with) to the little endian. Then the comparison fails because left side is little-endian now and the right side is big-endian. (Then there are the lower 16-bits being converted on like 45, which also is unnecessary and will produce an incorrect result).

    The saving code does the conversion twice too:

    host_to_lendian_bytes32(data, (host_to_lendian16(cookie_type) << 16) | host_to_lendian16(section_type));

    which are no-ops on little endian but produce unexpected result on big-endian.

Fixing this would break the savedata format on big-endian systems, changing it in non backwards-compatible way. There have been talk of switching savedata to using msgpack, which would also be a breaking change, so it might make sense to keep the current broken behavior for now and do all the savedata breaking changes together.

Here is a small snippet to reproduce the issue
$ git checkout 710eb674a50f17201bb7556d389d82d2133d98de
$ sudo docker run -it --rm -v "$PWD":/toxcore debian:bookworm /bin/bash
# dpkg --add-architecture s390x
# apt-get update
# apt-get install -y --no-install-recommends binutils-s390x-linux-gnu gcc-s390x-linux-gnu g++-s390x-linux-gnu qemu-user-static
# apt-get install -y libsodium-dev:s390x libopus-dev:s390x libvpx-dev:s390x pkg-config:s390x
# apt-get install -y cmake make
# echo "
    SET(CMAKE_SYSTEM_NAME Linux)

    SET(CMAKE_C_COMPILER   /usr/bin/s390x-linux-gnu-gcc)
    SET(CMAKE_CXX_COMPILER /usr/bin/s390x-linux-gnu-g++)
    SET(CMAKE_AR           /usr/bin/s390x-linux-gnu-ar)

    SET(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
    SET(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
    SET(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)

    SET(CROSSCOMPILING_EMULATOR /usr/bin/qemu-s390x-static)
" > /usr/local/share/s390x-linux-gnu.cmake
# export PKG_CONFIG_LIBDIR=/usr/lib/s390x-linux-gnu/pkgconfig:/usr/share/pkgconfig
# cp -a /toxcore/ ~/toxcore
# cd ~/toxcore/
# rm -rf _build/
# cmake -B_build -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=/usr/local/share/s390x-linux-gnu.cmake -DENABLE_SHARED=ON -DENABLE_STATIC=OFF -DAUTOTEST=ON -DBOOTSTRAP_DAEMON=OFF
# cmake --build _build --parallel "$(nproc)" --target auto_save_compatibility_test
# cd _build/
# ./auto_tests/auto_save_compatibility_test 
@nurupo nurupo added bug Bug fix for the user, not a fix to a build script P1 High priority labels Feb 17, 2024
@nurupo nurupo added this to the v0.3.0 milestone Feb 17, 2024
@iphydf
Copy link
Member

iphydf commented Feb 17, 2024

Why 0.3.0? Can this not be fixed earlier?

@iphydf
Copy link
Member

iphydf commented Feb 17, 2024

I see, your comment says big endian systems will break. True, but does anyone actually use toxcore on that? Is it worth delaying the fix by months/years?

nurupo added a commit to nurupo/InsertProjectNameHere that referenced this issue Feb 17, 2024
save_compatibility_test was failing on big-endian systems, as it was
written and tested on a little-endian system and savedata is not
endianness portable[1].

[1] TokTok#2693
@nurupo nurupo modified the milestones: v0.3.0, v0.2.x Feb 17, 2024
@nurupo
Copy link
Member Author

nurupo commented Feb 17, 2024

@iphydf well, that was before we had the discussion that it could be done without possibly breaking things and with Tox_Options flag(s).

There you go, updated the milestone. Not sure if v0.2.x or v0.2.20 is more appropriate, feel free to change it if I got it wrong.

nurupo added a commit to nurupo/InsertProjectNameHere that referenced this issue Feb 17, 2024
save_compatibility_test was failing on big-endian systems, as it was
written and tested on a little-endian system and savedata is not
endianness portable[1].

[1] TokTok#2693
nurupo added a commit to nurupo/InsertProjectNameHere that referenced this issue Feb 17, 2024
save_compatibility_test was failing on big-endian systems, as it was
written and tested on a little-endian system and savedata is not
endianness portable[1].

[1] TokTok#2693
nurupo added a commit to nurupo/InsertProjectNameHere that referenced this issue Feb 18, 2024
save_compatibility_test was failing on big-endian systems, as it was
written and tested on a little-endian system and savedata is not
endianness portable[1].

[1] TokTok#2693
@iphydf iphydf modified the milestones: v0.2.x, v0.2.20 Feb 21, 2024
@iphydf iphydf modified the milestones: v0.2.20, v0.2.21 Nov 6, 2024
@iphydf iphydf modified the milestones: v0.2.21, v0.2.x Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fix for the user, not a fix to a build script P1 High priority
Projects
None yet
Development

No branches or pull requests

2 participants