-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipu3-cio2: crash on printing device topology when CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y #5
Comments
Perhaps some uninitialised variables being used. cio2_subdev_get_fmt certainly shouldn't be calling recursively like that ... It's an odd code path, but I guess there is a potential recursive loop if some how the cio2_subdev_get_fmt() was ending up calling itself recursively. Needs investigating more in here with some debug prints:
|
Thanks for the comment! It seems that indeed it's called recursively. Lines starting with 1234 /*
1235 * cio2_subdev_get_fmt - Handle get format by pads subdev method
1236 * @sd : pointer to v4l2 subdev structure
1237 * @cfg: V4L2 subdev pad config
1238 * @fmt: pointer to v4l2 subdev format structure
1239 * return -EINVAL or zero on success
1240 */
1241 static int cio2_subdev_get_fmt(struct v4l2_subdev *sd,
1242 struct v4l2_subdev_pad_config *cfg,
1243 struct v4l2_subdev_format *fmt)
1244 {
1245 struct cio2_queue *q = container_of(sd, struct cio2_queue, subdev);
1246 struct v4l2_subdev_format format;
1247 int ret;
1248
1249 pr_info("DEBUG: %s() called\n", __func__);
1250 pr_info("DEBUG: msleep()\n");
1251 msleep(1000);
1252
1253 if (fmt->which == V4L2_SUBDEV_FORMAT_TRY) {
1254 pr_info("DEBUG: Passed %s() %d\n", __func__, __LINE__);
1255 fmt->format = *v4l2_subdev_get_try_format(sd, cfg, fmt->pad);
1256 return 0;
1257 }
1258
1259 pr_info("DEBUG: Passed %s() %d\n", __func__, __LINE__);
1260
1261 if (fmt->pad == CIO2_PAD_SINK) {
1262 pr_info("DEBUG: Passed %s() %d\n", __func__, __LINE__);
1263 format.which = V4L2_SUBDEV_FORMAT_ACTIVE;
1264 ret = v4l2_subdev_call(sd, pad, get_fmt, NULL,
1265 &format);
1266
1267 if (ret) {
1268 pr_info("DEBUG: Passed %s() %d\n", __func__, __LINE__);
1269 return ret;
1270 }
1271 /* update colorspace etc */
1272 q->subdev_fmt.colorspace = format.format.colorspace;
1273 q->subdev_fmt.ycbcr_enc = format.format.ycbcr_enc;
1274 q->subdev_fmt.quantization = format.format.quantization;
1275 q->subdev_fmt.xfer_func = format.format.xfer_func;
1276 }
1277
1278 pr_info("DEBUG: Passed %s() %d\n", __func__, __LINE__);
1279
1280 fmt->format = q->subdev_fmt;
1281
1282 return 0;
1283 } When "Memory initialization" option is CONFIG_INIT_STACK_NONE (weakest) or CONFIG_GCC_PLUGIN_STRUCTLEAK_USER (weak), dmesg output is like the following:
When "Memory initialization" option is CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF (strong) or CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL (very strong), on the other hand, dmesg output is like the following:
I'll send this to mailing list next time I have time. |
Is there anything interesting in the stack trace? What's the entry point into the recursion? Also - have you chopped the media-graph? I was expecting more information there too. For example - this is the output of media-ctl -p on my IPU3 device (not a surface)
|
Do you mean kernel log that can be obtained by On the other hand, on v5.4 LTS, it causes I'm not sure why there is such difference.
It repeats these lines:
1259 pr_info("DEBUG: Passed %s() %d\n", __func__, __LINE__);
1260
1261 if (fmt->pad == CIO2_PAD_SINK) {
1262 pr_info("DEBUG: Passed %s() %d\n", __func__, __LINE__);
1263 format.which = V4L2_SUBDEV_FORMAT_ACTIVE;
1264 ret = v4l2_subdev_call(sd, pad, get_fmt, NULL,
1265 &format);
1266
1267 if (ret) {
1268 pr_info("DEBUG: Passed %s() %d\n", __func__, __LINE__);
1269 return ret;
1270 } So, it looks like the following loop is happening there:
Ah, yes. I omitted the output for the one that is working as expected. The full output is available here (the one I posted in linux-surface/linux-surface#91 before) For the one that caused system hang, it's the full output. |
regarding the media graph - they're different. That's something for you to explore. Look at the paste from above:
And compare that against the media graph you had 'before'
Those changes, (like the lack of the ov8865 sink) are crucial pieces of information in regards to this bug. |
Ah, sorry, in this issue, I tested this issue with all the sensor drivers and bridge driver unloaded. (Yes, this issue happens even without sensor drivers / bridge driver. So, This issue may be reproducible on any PCs equipped with IPU3 when For the record, here is the full output when sensor drivers and bridge driver aren't loaded:
|
Sorry for taking a long time, I've just sent a mail to linux-media mailing list with what I know so far. For the record, the URL: |
For the record, patches are available here:
|
On Arch Linux with the latest stable kernel (
5.8.5-arch1-1
), printing device topology causes the system to hang. No journal log available after the hang.This issue makes libcamera not working when trying to capture images.
On Arch Linux with the latest LTS kernel (
5.4.61-1-lts
), it causes the kernel oops (but no hangs):Below is a more detailed log.
log
$ media-ctl -d /dev/media0 -p Media controller API version 5.4.61 Media device information ------------------------ driver ipu3-cio2 model Intel IPU3 CIO2 serial bus info PCI:0000:00:14.3 hw revision 0x0 driver version 5.4.61 Device topology - entity 1: ipu3-csi2 0 (2 pads, 2 links) type V4L2 subdev subtype Unknown flags 0 device node name /dev/v4l-subdev2 pad0: Sink zsh: segmentation fault media-ctl -d /dev/media0 -p
No issue on Ubuntu with v5.8.4 (https://git.launchpad.net/~ubuntu-kernel-test/ubuntu/+source/linux/+git/mainline-crack/log/?h=cod/mainline/v5.8.4). This tree is almost the same as upstream, thus also almost the same as Arch's kernel. So, I suspected the cause of the hang might be the difference in the kernel config.
And this is true. When I built the kernel with CONFIG_INIT_STACK_NONE=y, no hang occurred there.
Arch sets kernel config
Initialize kernel stack variables at function entry
toCONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y
(zero-init anything passed by reference (very strong)
). On the other hand, Ubuntu sets toCONFIG_INIT_STACK_NONE=y
(no automatic initialization (weakest)
).So, does this mean that the ipu3-cio2 driver hit areas that shouldn't be hit?
The text was updated successfully, but these errors were encountered: