Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors while running container built on Docker Mac for armhf #6

Open
leshik opened this issue Aug 7, 2019 · 15 comments
Open

Errors while running container built on Docker Mac for armhf #6

leshik opened this issue Aug 7, 2019 · 15 comments
Assignees

Comments

@leshik
Copy link

leshik commented Aug 7, 2019

Consider the following Dockerfile to be built for armhf as an example:

FROM debian:buster-slim

env USER root

RUN apt-get update \
 && apt-get install --no-install-recommends -y tightvncserver xfonts-base \
 && mkdir /root/.vnc \
 && echo test | vncpasswd -f >/root/.vnc/passwd \
 && chmod 600 /root/.vnc/passwd

CMD vncserver

If you then try to run the resulting image, either on Docker Mac or on Raspberry Pi, VNC won't start, complaining:

Fatal server error:
could not open default font 'fixed'

If the image were built on Raspberry Pi, there would be no error and VNC would start successfully. The image built on Mac can be fixed on Raspberry Pi by manually executing dpkg-reconfigure xfonts-base before launching vncserver (might be somehow related to fontconfig.) There is no fix that works in QEMU, however. ARM64 doesn't suffer from this issue.

Sorry for not having a simpler example, this one was taken from the real world scenario.

@tuonga
Copy link
Collaborator

tuonga commented Aug 9, 2019

@leshik I don't think this is a QEMU problem. I believe the issue here is the missing fonts.dir file in /usr/share/fonts/X11/misc. Suggest you run the mkfontdir command which should create fonts.dir.

/usr/share/fonts/X11/misc# mkfontdir

@leshik
Copy link
Author

leshik commented Aug 9, 2019

@tuonga maybe, but the problem here that it breaks only in QEMU. On real ARM hardware the same Dockerfile works correctly, be it armhf or any other.

@tuonga
Copy link
Collaborator

tuonga commented Aug 10, 2019

@leshik there are obviously differences between QEMU and real hardware. I guess the main question is what's keeping the fonts.dir file from being created. Since you mentioned OSX also requires manual intervention, there may be other factors at play.

@leshik
Copy link
Author

leshik commented Aug 10, 2019

@tuonga yes, what can I do to help troubleshooting the issue?

ps: I also have another one where the pre-built java app doesn't work if packed for ARM on Docker Mac, but it's probably different and not that easy to reproduce, so let's figure out what's wrong with this one first.

@leshik
Copy link
Author

leshik commented Aug 10, 2019

@tuonga looks like it's /usr/bin/mkfontscale which fails to generate the correct fonts.dir file. As it's ELF binary and strace doesn't work in QEMU I'm not sure what to do next.

@tuonga
Copy link
Collaborator

tuonga commented Aug 10, 2019

@leshik can you try adding mkfontdir to your startup sequence and see if that works around the issue? There is also a new version of QEMU coming out within a week or so. Let's test again and see what happens.

PS: may have to debug mkfontscale from source.

@leshik
Copy link
Author

leshik commented Aug 10, 2019

@tuonga adding to startup doesn't help, it's not generating anything under QEMU (but does on real hardware).

@tuonga
Copy link
Collaborator

tuonga commented Aug 10, 2019

@leshik That's interesting. I was able to get vncserver to run after running mkfontdir. Can you try manually from the command line? The following sequence worked for me.

$ docker run -it timtsai2018/hello bash
root@8e4dff74eac6:/# cd /usr/share/fonts/X11/misc/
root@8e4dff74eac6:/usr/share/fonts/X11/misc# vncserver
/usr/bin/xauth:  file /root/.Xauthority does not exist
Couldn't start Xtightvnc; trying default font path.
Please set correct fontPath in the vncserver script.
Couldn't start Xtightvnc process.

10/08/19 05:04:41 Xvnc version TightVNC-1.3.9
10/08/19 05:04:41 Copyright (C) 2000-2007 TightVNC Group
10/08/19 05:04:41 Copyright (C) 1999 AT&T Laboratories Cambridge
10/08/19 05:04:41 All Rights Reserved.
10/08/19 05:04:41 See http://www.tightvnc.com/ for information on TightVNC
10/08/19 05:04:41 Desktop name 'X' (8e4dff74eac6:1)
10/08/19 05:04:41 Protocol versions supported: 3.3, 3.7, 3.8, 3.7t, 3.8t
10/08/19 05:04:41 Listening for VNC connections on TCP port 5901
Font directory '/usr/share/fonts/X11/Type1/' not found - ignoring
Font directory '/usr/share/fonts/X11/75dpi/' not found - ignoring
Font directory '/usr/share/fonts/X11/100dpi/' not found - ignoring

Fatal server error:
could not open default font 'fixed'
10/08/19 05:04:43 Xvnc version TightVNC-1.3.9
10/08/19 05:04:43 Copyright (C) 2000-2007 TightVNC Group
10/08/19 05:04:43 Copyright (C) 1999 AT&T Laboratories Cambridge
10/08/19 05:04:43 All Rights Reserved.
10/08/19 05:04:43 See http://www.tightvnc.com/ for information on TightVNC
10/08/19 05:04:43 Desktop name 'X' (8e4dff74eac6:1)
10/08/19 05:04:43 Protocol versions supported: 3.3, 3.7, 3.8, 3.7t, 3.8t
10/08/19 05:04:43 Listening for VNC connections on TCP port 5901
Font directory '/usr/share/fonts/X11/Speedo/' not found - ignoring
Font directory '/usr/share/fonts/X11/Type1/' not found - ignoring
Font directory '/usr/share/fonts/X11/75dpi/' not found - ignoring
Font directory '/usr/share/fonts/X11/100dpi/' not found - ignoring

Fatal server error:
could not open default font 'fixed'
 
root@8e4dff74eac6:/usr/share/fonts/X11/misc# mkfontdir
root@8e4dff74eac6:/usr/share/fonts/X11/misc# vncserver

New 'X' desktop is 8e4dff74eac6:1

Creating default startup script /root/.vnc/xstartup
Starting applications specified in /root/.vnc/xstartup
Log file is /root/.vnc/8e4dff74eac6:1.log

root@8e4dff74eac6:/usr/share/fonts/X11/misc# uname -a
Linux 8e4dff74eac6 4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019 armv7l GNU/Linux

@leshik
Copy link
Author

leshik commented Aug 10, 2019

@tuonga

Oh, that's weird. Turned out it might be not QEMU issue but Docker Mac or moby instead. Something strange is happening. I was able to reproduce it with much simpler setup, here is the Dockerfile:

FROM debian:buster-slim
RUN apt-get update \
 && apt-get install --no-install-recommends -y xfonts-base

If you build an image with it, then run it and switch to /usr/share/fonts/X11/misc, you'll find there is no fonts.dir file. In this case, running mkfontdir manually fixes the issue.

BUT !!!

If you start just from bare image (docker run -it --platform armhf debian:buster-slim), then do all the steps manually (i.e. installing xfonts-base), then no matter how many times you run mkfontdir, it always generates an incorrect fonts.dir file which is 2-bytes in length and contains just 0 and the line feed.

@tuonga
Copy link
Collaborator

tuonga commented Aug 10, 2019

@leshik good info. I can take a look next week.

tuonga pushed a commit that referenced this issue Aug 15, 2019
When emulating irqchip in qemu, such as following command:

x86_64-softmmu/qemu-system-x86_64 -m 1024 -smp 4 -hda /home/test/test.img
-machine kernel-irqchip=off --enable-kvm -vnc :0 -device edu -monitor stdio

We will get a crash with following asan output:

(qemu) /home/test/qemu5/qemu/hw/intc/ioapic.c:266:27: runtime error: index 35 out of bounds for type 'int [24]'
=================================================================
==113504==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61b000003114 at pc 0x5579e3c7a80f bp 0x7fd004bf8c10 sp 0x7fd004bf8c00
WRITE of size 4 at 0x61b000003114 thread T4
    #0 0x5579e3c7a80e in ioapic_eoi_broadcast /home/test/qemu5/qemu/hw/intc/ioapic.c:266
    #1 0x5579e3c6f480 in apic_eoi /home/test/qemu5/qemu/hw/intc/apic.c:428
    #2 0x5579e3c720a7 in apic_mem_write /home/test/qemu5/qemu/hw/intc/apic.c:802
    #3 0x5579e3b1e31a in memory_region_write_accessor /home/test/qemu5/qemu/memory.c:503
    #4 0x5579e3b1e6a2 in access_with_adjusted_size /home/test/qemu5/qemu/memory.c:569
    #5 0x5579e3b28d77 in memory_region_dispatch_write /home/test/qemu5/qemu/memory.c:1497
    #6 0x5579e3a1b36b in flatview_write_continue /home/test/qemu5/qemu/exec.c:3323
    #7 0x5579e3a1b633 in flatview_write /home/test/qemu5/qemu/exec.c:3362
    #8 0x5579e3a1bcb1 in address_space_write /home/test/qemu5/qemu/exec.c:3452
    #9 0x5579e3a1bd03 in address_space_rw /home/test/qemu5/qemu/exec.c:3463
    #10 0x5579e3b8b979 in kvm_cpu_exec /home/test/qemu5/qemu/accel/kvm/kvm-all.c:2045
    #11 0x5579e3ae4499 in qemu_kvm_cpu_thread_fn /home/test/qemu5/qemu/cpus.c:1287
    #12 0x5579e4cbdb9f in qemu_thread_start util/qemu-thread-posix.c:502
    #13 0x7fd0146376da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
    #14 0x7fd01436088e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x12188e

This is because in ioapic_eoi_broadcast function, we uses 'vector' to
index the 's->irq_eoi'. To fix this, we should uses the irq number.

Signed-off-by: Li Qiang <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Message-Id: <[email protected]>
tuonga pushed a commit that referenced this issue Aug 15, 2019
The test aarch64 kernel is in an array defined with
 unsigned char aarch64_kernel[] = { [...] }

which means it could be any size; currently it's quite small.
However we write it to a file using init_bootfile(), which
writes exactly 512 bytes to the file. This will break if
we ever end up with a kernel larger than that, and will
read garbage off the end of the array in the current setup
where the kernel is smaller.

Make init_bootfile() take an argument giving the length of
the data to write. This allows us to use it for all architectures
(previously s390 had a special-purpose init_bootfile_s390x
which hardcoded the file to write so it could write the
correct length). We assert that the x86 bootfile really is
exactly 512 bytes as it should be (and as we were previously
just assuming it was).

This was detected by the clang-7 asan:
==15607==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55a796f51d20 at pc 0x55a796b89c2f bp 0x7ffc58e89160 sp 0x7ffc58e88908
READ of size 512 at 0x55a796f51d20 thread T0
    #0 0x55a796b89c2e in fwrite (/home/petmay01/linaro/qemu-from-laptop/qemu/build/sanitizers/tests/migration-test+0xb0c2e)
    #1 0x55a796c46492 in init_bootfile /home/petmay01/linaro/qemu-from-laptop/qemu/tests/migration-test.c:99:5
    #2 0x55a796c46492 in test_migrate_start /home/petmay01/linaro/qemu-from-laptop/qemu/tests/migration-test.c:593
    #3 0x55a796c44101 in test_baddest /home/petmay01/linaro/qemu-from-laptop/qemu/tests/migration-test.c:854:9
    #4 0x7f906ffd3cc9  (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72cc9)
    #5 0x7f906ffd3bfa  (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72bfa)
    #6 0x7f906ffd3bfa  (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72bfa)
    #7 0x7f906ffd3ea1 in g_test_run_suite (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72ea1)
    #8 0x7f906ffd3ec0 in g_test_run (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72ec0)
    #9 0x55a796c43707 in main /home/petmay01/linaro/qemu-from-laptop/qemu/tests/migration-test.c:1187:11
    #10 0x7f906e9abb96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
    #11 0x55a796b6c2d9 in _start (/home/petmay01/linaro/qemu-from-laptop/qemu/build/sanitizers/tests/migration-test+0x932d9)

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: [email protected]
tuonga pushed a commit that referenced this issue Aug 15, 2019
Reading the RX_DATA register when the RX_FIFO is empty triggers
an abort. This can be easily reproduced:

  $ qemu-system-arm -M emcraft-sf2 -monitor stdio -S
  QEMU 4.0.50 monitor - type 'help' for more information
  (qemu) x 0x40001010
  Aborted (core dumped)

  (gdb) bt
  #1  0x00007f035874f895 in abort () at /lib64/libc.so.6
  #2  0x00005628686591ff in fifo8_pop (fifo=0x56286a9a4c68) at util/fifo8.c:66
  #3  0x00005628683e0b8e in fifo32_pop (fifo=0x56286a9a4c68) at include/qemu/fifo32.h:137
  #4  0x00005628683e0efb in spi_read (opaque=0x56286a9a4850, addr=4, size=4) at hw/ssi/mss-spi.c:168
  #5  0x0000562867f96801 in memory_region_read_accessor (mr=0x56286a9a4b60, addr=16, value=0x7ffeecb0c5c8, size=4, shift=0, mask=4294967295, attrs=...) at memory.c:439
  #6  0x0000562867f96cdb in access_with_adjusted_size (addr=16, value=0x7ffeecb0c5c8, size=4, access_size_min=1, access_size_max=4, access_fn=0x562867f967c3 <memory_region_read_accessor>, mr=0x56286a9a4b60, attrs=...) at memory.c:569
  #7  0x0000562867f99940 in memory_region_dispatch_read1 (mr=0x56286a9a4b60, addr=16, pval=0x7ffeecb0c5c8, size=4, attrs=...) at memory.c:1420
  #8  0x0000562867f99a08 in memory_region_dispatch_read (mr=0x56286a9a4b60, addr=16, pval=0x7ffeecb0c5c8, size=4, attrs=...) at memory.c:1447
  #9  0x0000562867f38721 in flatview_read_continue (fv=0x56286aec6360, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4, addr1=16, l=4, mr=0x56286a9a4b60) at exec.c:3385
  #10 0x0000562867f38874 in flatview_read (fv=0x56286aec6360, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4) at exec.c:3423
  #11 0x0000562867f388ea in address_space_read_full (as=0x56286aa3e890, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4) at exec.c:3436
  #12 0x0000562867f389c5 in address_space_rw (as=0x56286aa3e890, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4, is_write=false) at exec.c:3466
  #13 0x0000562867f3bdd7 in cpu_memory_rw_debug (cpu=0x56286aa19d00, addr=1073745936, buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4, is_write=0) at exec.c:3976
  #14 0x000056286811ed51 in memory_dump (mon=0x56286a8c32d0, count=1, format=120, wsize=4, addr=1073745936, is_physical=0) at monitor/misc.c:730
  #15 0x000056286811eff1 in hmp_memory_dump (mon=0x56286a8c32d0, qdict=0x56286b15c400) at monitor/misc.c:785
  #16 0x00005628684740ee in handle_hmp_command (mon=0x56286a8c32d0, cmdline=0x56286a8caeb2 "0x40001010") at monitor/hmp.c:1082

From the datasheet "Actel SmartFusion Microcontroller Subsystem
User's Guide" Rev.1, Table 13-3 "SPI Register Summary", this
register has a reset value of 0.

Check the FIFO is not empty before accessing it, else log an
error message.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
tuonga pushed a commit that referenced this issue Aug 15, 2019
In the previous commit we fixed a crash when the guest read a
register that pop from an empty FIFO.
By auditing the repository, we found another similar use with
an easy way to reproduce:

  $ qemu-system-aarch64 -M xlnx-zcu102 -monitor stdio -S
  QEMU 4.0.50 monitor - type 'help' for more information
  (qemu) xp/b 0xfd4a0134
  Aborted (core dumped)

  (gdb) bt
  #0  0x00007f6936dea57f in raise () at /lib64/libc.so.6
  #1  0x00007f6936dd4895 in abort () at /lib64/libc.so.6
  #2  0x0000561ad32975ec in xlnx_dp_aux_pop_rx_fifo (s=0x7f692babee70) at hw/display/xlnx_dp.c:431
  #3  0x0000561ad3297dc0 in xlnx_dp_read (opaque=0x7f692babee70, offset=77, size=4) at hw/display/xlnx_dp.c:667
  #4  0x0000561ad321b896 in memory_region_read_accessor (mr=0x7f692babf620, addr=308, value=0x7ffe05c1db88, size=4, shift=0, mask=4294967295, attrs=...) at memory.c:439
  #5  0x0000561ad321bd70 in access_with_adjusted_size (addr=308, value=0x7ffe05c1db88, size=1, access_size_min=4, access_size_max=4, access_fn=0x561ad321b858 <memory_region_read_accessor>, mr=0x7f692babf620, attrs=...) at memory.c:569
  #6  0x0000561ad321e9d5 in memory_region_dispatch_read1 (mr=0x7f692babf620, addr=308, pval=0x7ffe05c1db88, size=1, attrs=...) at memory.c:1420
  #7  0x0000561ad321ea9d in memory_region_dispatch_read (mr=0x7f692babf620, addr=308, pval=0x7ffe05c1db88, size=1, attrs=...) at memory.c:1447
  #8  0x0000561ad31bd742 in flatview_read_continue (fv=0x561ad69c04f0, addr=4249485620, attrs=..., buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", len=1, addr1=308, l=1, mr=0x7f692babf620) at exec.c:3385
  #9  0x0000561ad31bd895 in flatview_read (fv=0x561ad69c04f0, addr=4249485620, attrs=..., buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", len=1) at exec.c:3423
  #10 0x0000561ad31bd90b in address_space_read_full (as=0x561ad5bb3020, addr=4249485620, attrs=..., buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", len=1) at exec.c:3436
  #11 0x0000561ad33b1c42 in address_space_read (len=1, buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", attrs=..., addr=4249485620, as=0x561ad5bb3020) at include/exec/memory.h:2131
  #12 0x0000561ad33b1c42 in memory_dump (mon=0x561ad59c4530, count=1, format=120, wsize=1, addr=4249485620, is_physical=1) at monitor/misc.c:723
  #13 0x0000561ad33b1fc1 in hmp_physical_memory_dump (mon=0x561ad59c4530, qdict=0x561ad6c6fd00) at monitor/misc.c:795
  #14 0x0000561ad37b4a9f in handle_hmp_command (mon=0x561ad59c4530, cmdline=0x561ad59d0f22 "/b 0x00000000fd4a0134") at monitor/hmp.c:1082

Fix by checking the FIFO is not empty before popping from it.

The datasheet is not clear about the reset value of this register,
we choose to return '0'.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
tuonga pushed a commit that referenced this issue Aug 15, 2019
When creating the admin queue in nvme_init() the variable that
holds the number of queues created is modified before actual
queue creation. This is a problem because if creating the queue
fails then the variable is left in inconsistent state. This was
actually observed when I tried to hotplug a nvme disk. The
control got to nvme_file_open() which called nvme_init() which
failed and thus nvme_close() was called which in turn called
nvme_free_queue_pair() with queue being NULL. This lead to an
instant crash:

  #0  0x000055d9507ec211 in nvme_free_queue_pair (bs=0x55d952ddb880, q=0x0) at block/nvme.c:164
  #1  0x000055d9507ee180 in nvme_close (bs=0x55d952ddb880) at block/nvme.c:729
  #2  0x000055d9507ee3d5 in nvme_file_open (bs=0x55d952ddb880, options=0x55d952bb1410, flags=147456, errp=0x7ffd8e19e200) at block/nvme.c:781
  #3  0x000055d9507629f3 in bdrv_open_driver (bs=0x55d952ddb880, drv=0x55d95109c1e0 <bdrv_nvme>, node_name=0x0, options=0x55d952bb1410, open_flags=147456, errp=0x7ffd8e19e310) at block.c:1291
  #4  0x000055d9507633d6 in bdrv_open_common (bs=0x55d952ddb880, file=0x0, options=0x55d952bb1410, errp=0x7ffd8e19e310) at block.c:1551
  #5  0x000055d950766881 in bdrv_open_inherit (filename=0x0, reference=0x0, options=0x55d952bb1410, flags=32768, parent=0x55d9538ce420, child_role=0x55d950eaade0 <child_file>, errp=0x7ffd8e19e510) at block.c:3063
  #6  0x000055d950765ae4 in bdrv_open_child_bs (filename=0x0, options=0x55d9541cdff0, bdref_key=0x55d950af33aa "file", parent=0x55d9538ce420, child_role=0x55d950eaade0 <child_file>, allow_none=true, errp=0x7ffd8e19e510) at block.c:2712
  #7  0x000055d950766633 in bdrv_open_inherit (filename=0x0, reference=0x0, options=0x55d9541cdff0, flags=0, parent=0x0, child_role=0x0, errp=0x7ffd8e19e908) at block.c:3011
  #8  0x000055d950766dba in bdrv_open (filename=0x0, reference=0x0, options=0x55d953d00390, flags=0, errp=0x7ffd8e19e908) at block.c:3156
  #9  0x000055d9507cb635 in blk_new_open (filename=0x0, reference=0x0, options=0x55d953d00390, flags=0, errp=0x7ffd8e19e908) at block/block-backend.c:389
  #10 0x000055d950465ec5 in blockdev_init (file=0x0, bs_opts=0x55d953d00390, errp=0x7ffd8e19e908) at blockdev.c:602

Signed-off-by: Michal Privoznik <[email protected]>
Message-id: 927aae40b617ba7d4b6c7ffe74e6d7a2595f8e86.1562770546.git.mprivozn@redhat.com
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
tuonga pushed a commit that referenced this issue Aug 15, 2019
commit a6f230c move blockbackend back to main AioContext on unplug. It set the AioContext of
SCSIDevice to the main AioContex, but s->ctx is still the iothread AioContex(if the scsi controller
is configure with iothread). So if there are having in-flight requests during unplug, a failing assertion
happend. The bt is below:
(gdb) bt
#0  0x0000ffff86aacbd0 in raise () from /lib64/libc.so.6
#1  0x0000ffff86aadf7c in abort () from /lib64/libc.so.6
#2  0x0000ffff86aa6124 in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000ffff86aa61a4 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000000529118 in virtio_scsi_ctx_check (d=<optimized out>, s=<optimized out>, s=<optimized out>) at /home/qemu-4.0.0/hw/scsi/virtio-scsi.c:246
#5  0x0000000000529ec4 in virtio_scsi_handle_cmd_req_prepare (s=0x2779ec00, req=0xffff740397d0) at /home/qemu-4.0.0/hw/scsi/virtio-scsi.c:559
#6  0x000000000052a228 in virtio_scsi_handle_cmd_vq (s=0x2779ec00, vq=0xffff7c6d7110) at /home/qemu-4.0.0/hw/scsi/virtio-scsi.c:603
#7  0x000000000052afa8 in virtio_scsi_data_plane_handle_cmd (vdev=<optimized out>, vq=0xffff7c6d7110) at /home/qemu-4.0.0/hw/scsi/virtio-scsi-dataplane.c:59
#8  0x000000000054d94c in virtio_queue_host_notifier_aio_poll (opaque=<optimized out>) at /home/qemu-4.0.0/hw/virtio/virtio.c:2452

assert(blk_get_aio_context(d->conf.blk) == s->ctx) failed.

To avoid assertion failed,  moving the "if" after qdev_simple_device_unplug_cb.

In addition, to avoid another qemu crash below, add aio_disable_external before
qdev_simple_device_unplug_cb, which disable the further processing of external clients
when doing qdev_simple_device_unplug_cb.
(gdb) bt
#0  scsi_req_unref (req=0xffff6802c6f0) at hw/scsi/scsi-bus.c:1283
#1  0x00000000005294a4 in virtio_scsi_handle_cmd_req_submit (req=<optimized out>,
    s=<optimized out>) at /home/qemu-4.0.0/hw/scsi/virtio-scsi.c:589
#2  0x000000000052a2a8 in virtio_scsi_handle_cmd_vq (s=s@entry=0x9c90e90,
    vq=vq@entry=0xffff7c05f110) at /home/qemu-4.0.0/hw/scsi/virtio-scsi.c:625
#3  0x000000000052afd8 in virtio_scsi_data_plane_handle_cmd (vdev=<optimized out>,
    vq=0xffff7c05f110) at /home/qemu-4.0.0/hw/scsi/virtio-scsi-dataplane.c:60
#4  0x000000000054d97c in virtio_queue_host_notifier_aio_poll (opaque=<optimized out>)
    at /home/qemu-4.0.0/hw/virtio/virtio.c:2447
#5  0x00000000009b204c in run_poll_handlers_once (ctx=ctx@entry=0x6efea40,
    timeout=timeout@entry=0xffff7d7f7308) at util/aio-posix.c:521
#6  0x00000000009b2b64 in run_poll_handlers (ctx=ctx@entry=0x6efea40,
    max_ns=max_ns@entry=4000, timeout=timeout@entry=0xffff7d7f7308) at util/aio-posix.c:559
#7  0x00000000009b2ca0 in try_poll_mode (ctx=ctx@entry=0x6efea40, timeout=0xffff7d7f7308,
    timeout@entry=0xffff7d7f7348) at util/aio-posix.c:594
#8  0x00000000009b31b8 in aio_poll (ctx=0x6efea40, blocking=blocking@entry=true)
    at util/aio-posix.c:636
#9  0x00000000006973cc in iothread_run (opaque=0x6ebd800) at iothread.c:75
#10 0x00000000009b592c in qemu_thread_start (args=0x6efef60) at util/qemu-thread-posix.c:502
#11 0x0000ffff8057f8bc in start_thread () from /lib64/libpthread.so.0
#12 0x0000ffff804e5f8c in thread_start () from /lib64/libc.so.6
(gdb) p bus
$1 = (SCSIBus *) 0x0

Signed-off-by: Zhengui li <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
@tuonga
Copy link
Collaborator

tuonga commented Aug 15, 2019

@leshik I believe ultimately the issue is this:

https://bugs.launchpad.net/qemu/+bug/1805913 (also see the link in comment #5).

I was able to distill the problem into the exact same behaviour where readdir() returns NULL with the errno EOVERFLOW.

The best solution is probably going back to glibc-2.27.

@leshik
Copy link
Author

leshik commented Aug 18, 2019

@tuonga yeah, I can confirm it works fine on ubuntu:18.04 which ships with glibc-2.27.
Any chance this could be fixed somehow on qemu part? While we can go to older glibc for now, this might not be the case in future...

@tuonga
Copy link
Collaborator

tuonga commented Aug 18, 2019

@leshik this problem is tracked in upstream qemu already and is best handled there. https://bugs.launchpad.net/qemu/+bug/1805913

@leshik
Copy link
Author

leshik commented Aug 18, 2019

@tuonga yes, I've seen that. There is also a patch for glibc at the end of this thread which might help in the meantime for OSes that ship with glibc > 2.27.

@tuonga
Copy link
Collaborator

tuonga commented Aug 18, 2019

@leshik good info. it seems like a hack but I will look further.

tuonga pushed a commit that referenced this issue Aug 26, 2019
If interface_count is NO_INTERFACE_INFO, let's not access the arrays
out-of-bounds.

==994==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x625000243930 at pc 0x5642068086a8 bp 0x7f0b6f9ffa50 sp 0x7f0b6f9ffa40
READ of size 1 at 0x625000243930 thread T0
    #0 0x5642068086a7 in usbredir_check_bulk_receiving /home/elmarco/src/qemu/hw/usb/redirect.c:1503
    #1 0x56420681301c in usbredir_post_load /home/elmarco/src/qemu/hw/usb/redirect.c:2154
    #2 0x5642068a56c2 in vmstate_load_state /home/elmarco/src/qemu/migration/vmstate.c:168
    #3 0x56420688e2ac in vmstate_load /home/elmarco/src/qemu/migration/savevm.c:829
    #4 0x5642068980cb in qemu_loadvm_section_start_full /home/elmarco/src/qemu/migration/savevm.c:2211
    #5 0x564206899645 in qemu_loadvm_state_main /home/elmarco/src/qemu/migration/savevm.c:2395
    #6 0x5642068998cf in qemu_loadvm_state /home/elmarco/src/qemu/migration/savevm.c:2467
    #7 0x56420685f3e9 in process_incoming_migration_co /home/elmarco/src/qemu/migration/migration.c:449
    #8 0x564207106c47 in coroutine_trampoline /home/elmarco/src/qemu/util/coroutine-ucontext.c:115
    #9 0x7f0c0604e37f  (/lib64/libc.so.6+0x4d37f)

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Liam Merwick <[email protected]>
Reviewed-by: Li Qiang <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>
tuonga pushed a commit that referenced this issue Aug 26, 2019
There's a race condition in which the tcp_chr_read() ioc handler can
close a connection that is being written to from another thread.

Running iotest 136 in a loop triggers this problem and crashes QEMU.

 (gdb) bt
 #0  0x00005558b842902d in object_get_class (obj=0x0) at qom/object.c:860
 #1  0x00005558b84f92db in qio_channel_writev_full (ioc=0x0, iov=0x7ffc355decf0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
 #2  0x00005558b84e0e9e in io_channel_send_full (ioc=0x0, buf=0x5558baf5beb0, len=138, fds=0x0, nfds=0) at chardev/char-io.c:123
 #3  0x00005558b84e4a69 in tcp_chr_write (chr=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138) at chardev/char-socket.c:135
 #4  0x00005558b84dca55 in qemu_chr_write_buffer (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, offset=0x7ffc355dedd0, write_all=false) at chardev/char.c:112
 #5  0x00005558b84dcbc2 in qemu_chr_write (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, write_all=false) at chardev/char.c:147
 #6  0x00005558b84dfb26 in qemu_chr_fe_write (be=0x5558ba476610, buf=0x5558baf5beb0 "...", len=138) at chardev/char-fe.c:42
 #7  0x00005558b8088c86 in monitor_flush_locked (mon=0x5558ba476610) at monitor.c:406
 #8  0x00005558b8088e8c in monitor_puts (mon=0x5558ba476610, str=0x5558ba921e49 "") at monitor.c:449
 #9  0x00005558b8089178 in qmp_send_response (mon=0x5558ba476610, rsp=0x5558bb161600) at monitor.c:498
 #10 0x00005558b808920c in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:526
 #11 0x00005558b8089307 in monitor_qapi_event_queue_no_reenter (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:551
 #12 0x00005558b80896c0 in qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:626
 #13 0x00005558b855f23b in qapi_event_send_shutdown (guest=false, reason=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at qapi/qapi-events-run-state.c:43
 #14 0x00005558b81911ef in qemu_system_shutdown (cause=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at vl.c:1837
 #15 0x00005558b8191308 in main_loop_should_exit () at vl.c:1885
 #16 0x00005558b819140d in main_loop () at vl.c:1924
 #17 0x00005558b8198c84 in main (argc=18, argv=0x7ffc355df3f8, envp=0x7ffc355df490) at vl.c:4665

This patch adds a lock to protect tcp_chr_disconnect() and
socket_reconnect_timeout()

Signed-off-by: Alberto Garcia <[email protected]>
Signed-off-by: Andrey Shinkevich <[email protected]>
Message-Id: <1565625509-404969-3-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Paolo Bonzini <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants