Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrary crashes during darktable use with CR3 #17816

Open
NohWayJose opened this issue Nov 10, 2024 · 28 comments
Open

Arbitrary crashes during darktable use with CR3 #17816

NohWayJose opened this issue Nov 10, 2024 · 28 comments

Comments

@NohWayJose
Copy link

Describe the bug

I recently started using Darktable after I bought a new Canon R6 mk2 - it has CR3 RAW files.
Pretty much every session Darktable crashes catastrophically, in that I cannot force close the frozen app. and all I can do is reboot the OS.

Even kill -9 <pid> doesn't work and I have to reboot every time. I can drag the Darktable window around and resize it but I can't interact with the GUI.

I cannot identify any specific action that causes the hang and I've kept the system monitor open to keep an eye on memory & CPU load but not noticed a large spike (it may be that I missed it, rather than it didn't happen). I am not sure what I should post, system log or std-out (where I could start Darktable from the command-line and pipe output to a text file, so it survives reboot.

To my untrained eye, I wonder whether it's a memory leak - so, independent of what I'm doing, but then again often, everything else continues to work?

Please let me know what (and how) I need to capture to let you see what the cause is.

Please hurry, I want to process my new photos! ;-P

Thanks

Steps to reproduce

Open Darktable -> edit some (indeterminate number) of CR3s -> it crashes but doesn't take down the OS every time (does sometimes) but to purge the Darktable process I have to reboot.

Expected behavior

Darktable works reliably and doesn't arbitrarily crash

Logfile | Screenshot | Screencast

Please advise how to collect what you need to see

Commit

I don't know how to find out

Where did you obtain darktable from?

distro packaging

darktable version

4.8.1_91.9

What OS are you using?

Linux

What is the version of your OS?

OpenSuSE Tumbleweed

Describe your system?

Operating System: openSUSE Tumbleweed 20241107
KDE Plasma Version: 6.2.3
KDE Frameworks Version: 6.7.0
Qt Version: 6.8.0
Kernel Version: 6.11.6-2-default (64-bit)
Graphics Platform: Wayland
Processors: 32 × 13th Gen Intel® Core™ i9-13900K
Memory: 62.6 GiB of RAM
Graphics Processor: AMD Radeon RX 6650 XT
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: Z790 AORUS ELITE AX

OpenCL
Number of platforms                               0
ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.1
  ICD loader Profile                              OpenCL 3.0

Also OpenGL (EGL) & (GLX), Vulcan, Wayland - too much to paste (unless needed?)

2 x acer UHD 4k2k 28" screens

Are you using OpenCL GPU in darktable?

I dont know

If yes, what is the GPU card and driver?

01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c1) (prog-if 00 [Normal decode]) 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (prog-if 00 [Normal decode]) Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6650 XT / 6700S / 6800S] (rev c1) (prog-if 00 [VGA controller]) Kernel driver in use: amdgpu Kernel modules: amdgpu 03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller

Please provide additional context if applicable. You can attach files too, but might need to rename to .txt or .zip

Don't know - What's a Lua script?

@NohWayJose
Copy link
Author

Just checked and I downloaded 4.8._81.9 from your repo

@ralfbrown
Copy link
Collaborator

Does this happen only with CR3s or with other file types as well? (CR3-only would point to LibRaw.)

How much swap space does your system have? Runaway memory usage can force the entire user interface to be swapped out, resulting in an apparent freeze (it'll come back eventually as the UI code swaps back in, but that can take several minutes). You can try

ulimit -m $MEM -v $MEM

in a terminal window before starting darktable from that window, where $MEM should be around 3/4 of your physical memory size in kilobytes, i.e. "12000000" for a 16GB system. If the problem is runaway memory usage by darktable, this should cause it to fail some operations or possibly crash, but would prevent the system freeze. Note that "ulimit" isn't enforced on all system, so it might also do nothing....

@NohWayJose
Copy link
Author

ulimit is installed on my system and so I presume it's enforced?
I've got 64GB Ram, so I tried (as me, not sudo):
ulimit -m 48000000 -v 48000000
which returned:
bash: ulimit: max memory size: cannot modify limit: Operation not permitted

Did I do that right ?

@pehar1
Copy link

pehar1 commented Nov 13, 2024

To my knowlede ulimit needs sudo privileges.

@ralfbrown
Copy link
Collaborator

User accounts should be able to reduce limits; that works just fine for me. Raising limits, on the other hand does require elevated privileges, and attempting to do so from a user account gives me exactly the error message you reported.

So it's likely that your system already includes a per-process restriction on memory. What does "ulimit -a" report for "max memory size" and "virtual memory"?

@NohWayJose
Copy link
Author

NohWayJose commented Nov 13, 2024

tranquility:~ # sudo su -
tranquility:~ # ulimit -m 48000000 -v 48000000
tranquility:~ # 
tranquility:~ # ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 255482
max locked memory           (kbytes, -l) 8192
max memory size             (kbytes, -m) 48000000
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 255482
virtual memory              (kbytes, -v) 48000000
file locks                          (-x) unlimited

@NohWayJose
Copy link
Author

I did that remotely via ssh, so can't see its effect on Darktable, if any, until this evening (UK time).

@NohWayJose
Copy link
Author

It just crashed again but this time a system dialogue popped up first, saying that Firefox (at that time quite a few instances, each with at least 5-10 tabs open but none actively being used) had crashed. Shortly after that Darktable went. I have reopened Firefox to post this but Darktable won't close and is hovering around in the background like a passive zombie. I tried opening a new instance, to see what would happen and, as I expected it popped up that database lock dialogue. I closed that and am about to log out and in again and if that doesn't kill the zombie, reboot.

For the record
ulimit -a
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 255671
max locked memory (kbytes, -l) 8192
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 255671
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

@pehar1
Copy link

pehar1 commented Nov 18, 2024

Firefox (at that time quite a few instances, each with at least 5-10 tabs open but none actively being used) had crashed. Shortly after that Darktable went.

and

2 x acer UHD 4k2k 28" screens

This sounds like you might have a problem with GPU memory allocation.
Please check if you have OpenCL enabled : preferences -> processing -> activate OpenCL support
What are your settings in preferences -> processing, sections CPU/Memory and OpenCL ?
Did you read https://darktable-org.github.io/dtdocs/en/preferences-settings/processing/ ?

But on the other hand :

OpenCL
Number of platforms 0

???

@NohWayJose
Copy link
Author

CPU/Memory was 'small', now 'large'
OpenCL is weird
acivate OpenCL support - 'not available'
OpenCL sheduling profile - 'not available'
use all device memory - 'not available'
but
OpenCL Drivers
Intel GPU - checked
NVIDIA CUDA - checked
RustiCL - checked
Apple & MS (OpenCLOn12) both greyed

@NohWayJose
Copy link
Author

And having freshly rebooted (after zypper dup ) and changed Darktable CPU/Memory to Large, it crashed when I changed collection. Now have to reboot again!!!

I bought a reasonably capable AMD GPU (AMD Radeon RX 6650 XT), to avoid all the Nvidia driver nonsense I used to have to go through. Any thoughts as to how I can get OpenCL working. (I presume it's useful for graphics processing performance and will speed Darktable up - or have I got the wrong end of the stick?)

@ralfbrown
Copy link
Collaborator

OpenCL Drivers Intel GPU - checked
NVIDIA CUDA - checked
RustiCL - checked
Apple & MS (OpenCLOn12) both greyed

Those settings only control which drivers darktable is allowed to use, if they are available.
For AMD, you need to install the ROCm OpenCL package, which on most distros is separate from the graphics drivers proper. Yes, depending on the mix of modules you use and relative speed of CPU and GPU, you can get up to a 10x speedup.

But if you're getting crashes even without using OpenCL, it's not an issue with graphics memory being used for compute....

@NohWayJose
Copy link
Author

NohWayJose commented Nov 18, 2024

Minor update on my previous experience that it was impossible to get rid of the hung Darktable, I came back to my home PC after working for several hours on my work laptop and when I tried to kill the hung Darktable, this time it actually closed!

zypper se ROCm
Loading repository data...
Reading installed packages...

S  | Name            | Summary                                                              | Type
---+-----------------+----------------------------------------------------------------------+--------
i  | procmail        | A program for local e-mail delivery                                  | package
   | procmeter       | Utility to display current system parameters                         | package
   | procmeter-devel | Development files for the procmeter system parameter display program | package
   | procmon         | Trace the syscall activity on the system                             | package

which seem unrelated

Anything relevant here?...

zypper se OpenCL
Loading repository data...
Reading installed packages...

S  | Name                            | Summary                                                                  | Type
---+---------------------------------+--------------------------------------------------------------------------+--------
   | armnn-opencl                    | Arm NN SDK enables machine learning workloads on power-efficient devices | package
   | armnn-opencl-devel              | Development headers and libraries for armnn                              | package
   | intel-opencl                    | Intel Graphics Compute Runtime for OpenCL                                | package
   | intel-opencl-devel              | Headers for the Intel Graphics Compute Runtime OpenCL Driver             | package
   | libarmnn33-opencl               | libarmnn from armnn                                                      | package
   | libarmnnBasePipeServer33-opencl | libarmnnBasePipeServer from armnn                                        | package
   | libarmnnSerializer33-opencl     | libarmnnSerializer from armnn                                            | package
   | libarmnnTestUtils3-opencl       | libarmnnTestUtils from armnn                                             | package
   | libarmnnTfLiteParser24-opencl   | libarmnnTfLiteParser from armnn                                          | package
   | libopencl-clang14               | A wrapper library around clang                                           | package
i  | libOpenCL1                      | OpenCL ICD Bindings                                                      | package
   | libOpenCL1-32bit                | OpenCL ICD Bindings                                                      | package
   | libtimelineDecoder33-opencl     | libtimelineDecoder from armnn                                            | package
   | libtimelineDecoderJson33-opencl | libtimelineDecoderJson from armnn                                        | package
   | Mesa-libOpenCL                  | Mesa OpenCL implementation (Clover)                                      | package
   | Mesa-libOpenCL-debuginfo        | Debug information for package Mesa-libOpenCL                             | package
   | Mesa-libRusticlOpenCL           | Mesa OpenCL implementation (Rusticl)                                     | package
   | Mesa-libRusticlOpenCL-debuginfo | Debug information for package Mesa-libRusticlOpenCL                      | package
   | opencl-cpp-headers              | OpenCL C++ headers                                                       | package
   | opencl-headers                  | OpenCL (Open Computing Language) headers                                 | package

@NohWayJose
Copy link
Author

Darktable_crash_jounalctl.log
Hopefully this shows what happens when it crashes

@ralfbrown
Copy link
Collaborator

The backtrace in that log says it crashed in libsqlite when called from dt_is_tag_attached in src/common/tags.c. But that's a really simple lookup which returns at most one record.....

@pehar1
Copy link

pehar1 commented Nov 19, 2024

Perhaps a debug log could shed some light on this. Could you start darktable from the command line with the debug option -d common, provoke the crash and post the terminal output ?

@NohWayJose
Copy link
Author

@NohWayJose
Copy link
Author

NohWayJose commented Nov 19, 2024

Might be irrelevant but I notice it reports a lot of errors about the HEIF format images (rather than JPGs). I shoot RAW (CR3) with a concurrent HEIF image. Perhaps I should just shoot CR3+JPG?

Also possibly irrelevant: When Darktable crashes htop & ps -A just sit there doing nothing

@pehar1
Copy link

pehar1 commented Nov 19, 2024

I shoot RAW (CR3) with a concurrent HEIF image.

In the log we have 124 messages like
Failed to read HEIF file [/..... .cr3]: Invalid input: Unexpected end of file
but this should not cause a crash. Also not the numerous messages
[dt_imageio_large_thumbnail] error: The thumbnail image is not in JPEG format, and DT was built without neither GraphicsMagick or ImageMagick. Please rebuild DT with GraphicsMagick or ImageMagick support enabled.

GraphicsMagick or ImageMagick support is optional (when building the application).

Does this happen only with CR3s or with other file types as well? (CR3-only would point to LibRaw.)

We don't yet have an answer to this question from @ralfbrown.

Is it posible to share an image and the corresponding xmp causing the crash ? This could possibly open up the possibility of reproducing the crash.

Perhaps I should just shoot CR3+JPG?

Just try and see what happens ...

@NohWayJose
Copy link
Author

@NohWayJose
Copy link
Author

Crashed while manipulating a jpg (Coke Zero can)

@pehar1
Copy link

pehar1 commented Nov 21, 2024

Crashed while manipulating a jpg (Coke Zero can)

Sorry, but I don't understand. The debug log refers to /data/2024/Nov/24-11-20/20241120-2158-0763AD.cr3, the jpg (20241120-2158-0763AC.jpg) does not appear in it. Same for the xmp, it has been created from the cr3 :

xmpMM:DerivedFrom="20241120-2158-0763AD.cr3"

To try to reproduce we would need the file 20241120-2158-0763AD.cr3.

With current master and a fresh .config the jpg you provided works without any problems for me. Tried with OpenCL enabled and disabled and with a large number of different modules.

Just a wild guess : you reported firefox also crashing, htop hanging, ps hanging, the need of a reboot to get rid of the darktable process(es). And the logs and backtrace you provided don't show a systematic. Are you absolutely sure that the hardware (RAM) is working properly ? Many OS offer the possibility to execute memtest from the bootloader. I remember similar unsystematic behavior that I observed years ago on an older test system. In this case, a small address range of a RAM module showed errors. I was able to localize the faulty hardware with memtest.

@ralfbrown
Copy link
Collaborator

Bad hardware (most likely memory) is also my conclusion from the totality of reports on this thread.

@jenshannoschwalm
Copy link
Collaborator

I had such problems years ago a few weeks before my PSU finally died. Using dt with opencl lead to higher drain and bang :-)

@NohWayJose
Copy link
Author

NohWayJose commented Nov 25, 2024

I ran memtest86+ for 23½hrs and no errors reported

PSU quite recent and over specified for the system load, so doubt it's that.

@NohWayJose
Copy link
Author

I'm running DT also with OpenSUSE Tumbleweed on an old Linux converted MacBook Pro from 2015, with no crashes.

@victoryforce
Copy link
Collaborator

I ran memtest86+ for 23½hrs and no errors reported

PSU quite recent and over specified for the system load, so doubt it's that.

Well, that's good, but why does Firefox crash then?

@pehar1
Copy link

pehar1 commented Nov 25, 2024

running DT also with OpenSUSE Tumbleweed on an old Linux converted MacBook Pro from 2015, with no crashes

... and this might also point in direction of a hardware problem.

I doubt that this will change anything, but it might be worth a try.
Are you able to "reproduce" your crashes with a fresh (empty) .config ? For example by starting darktable from the command line with options

--library /home/<user>/.config/darktable-crashtest/library.db --configdir /home/<user>/.config/darktable-crashtest/ --cachedir /home/<user>/.cache/darktable-crashtest/ -d common

You need to replace <user> by your username.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants