Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part 10 deadlock demo timeout solution: tasks attempt to give/return mutexes that have never been taken #5

Open
neilbalch opened this issue Jun 27, 2021 · 5 comments

Comments

@neilbalch
Copy link

I noticed while going through this (excellent BTW!) tutorial series that the ESP32 was resetting repeatedly during the execution of the deadlock demo timeout solution.

Serial output:
ESP-ROM:esp32s2-rc4-20191025
Build:Oct 25 2019
rst:0x1 (POWERON),boot:0x8 (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:2
load:0x3ffe6100,len:0x424
load:0x4004c000,len:0x844
load:0x40050000,len:0x2460
entry 0x4004c180
�
---FreeRTOS Deadlock Demo---
Task A took mutex 1
Task B took mutex 2
Task A timed out waiting for mutex 2
Guru Meditation Error: Core  0 panic'ed (Unhandled debug exception).
Debug exception reason: Stack canary watchpoint triggered (Task A)
Core  0 register dump:
PC      : 0x00001004  PS      : 0x3ffc75b9  A0      : 0x80028fb7  A1      : 0x8002d7c4
A2      : 0x3ffc7330  A3      : 0x00000054  A4      : 0x00000000  A5      : 0x00000080
A6      : 0x3ffc7500  A7      : 0x00000003  A8      : 0x80028d3d  A9      : 0x00000000  
A10     : 0x00000000  A11     : 0x00001800  A12     : 0x00000000  A13     : 0x80028d3d  
A14     : 0x3ffc7350  A15     : 0x00000054  SAR     : 0x00000002  EXCCAUSE: 0x00000001
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x3f005c0c  LCOUNT  : 0x400275ac  


Backtrace:0x00001001:0x8002d7c4 |<-CORRUPTED




ELF file SHA256: 0000000000000000

Rebooting...
ESP-ROM:esp32s2-rc4-20191025
Build:Oct 25 2019
rst:0xc (RTC_SW_CPU_RST),boot:0x8 (SPI_FAST_FLASH_BOOT)
Saved PC:0x400261c0
SPIWP:0xee
mode:DIO, clock div:2
load:0x3ffe6100,len:0x424
load:0x4004c000,len:0x844
load:0x40050000,len:0x2460
entry 0x4004c180
Guru Meditation Error: Core  0 panic'ed (Cache error). 

Core  0 register dump:
PC      : 0x40099949  PS      : 0x00060d34  A0      : 0x800883ae  A1      : 0x3fffe2a0  
A2      : 0x02000000  A3      : 0x00000000  A4      : 0x02000000  A5      : 0xfffffffc
A6      : 0x00000000  A7      : 0x00000000  A8      : 0x00000000  A9      : 0x6000e000  
A10     : 0x007dff2d  A11     : 0x02000000  A12     : 0x00000030  A13     : 0x00000030
A14     : 0x00231d69  A15     : 0x00000000  SAR     : 0x00000007  EXCCAUSE: 0x00000007  
EXCVADDR: 0x00000000  LBEG    : 0x00000030  LEND    : 0x00000030  LCOUNT  : 0x40026008


Backtrace:0x40099946:0x3fffe2a00x400883ab:0x3fffe2b0 0x40088d4b:0x3fffe2d0 0x40088f49:0x3fffe2f0 0x40025fd4:0x3fffe320 0x40050bbe:0x3fffe340 0x40050e53:0x3fffe370 0x4004c1c1:0x3fffe4a0 0x40011371:0x3fffe560 0x4000f788:0x3fffe6f0 0x400072a5:0x3fffe710 




ELF file SHA256: 0000000000000000

Rebooting...

As opposed to what's in this repo, I realized that this version of the function is required to avoid the core reset: (same change for Task B as well)

// Task A (high priority)
void doTaskA(void *parameters) {
  // Loop forever
  while (1) {
    // Take mutex 1
    if (xSemaphoreTake(mutex_1, mutex_timeout) == pdTRUE) {
      // Say we took mutex 1 and wait (to force deadlock)
      Serial.println("Task A took mutex 1");
      vTaskDelay(1 / portTICK_PERIOD_MS);

      // Take mutex 2
      if (xSemaphoreTake(mutex_2, mutex_timeout) == pdTRUE) {
        // Say we took mutex 2
        Serial.println("Task A took mutex 2");

        // Critical section protected by 2 mutexes
        Serial.println("Task A doing some work");
        vTaskDelay(500 / portTICK_PERIOD_MS);

        // Give back mutex
        xSemaphoreGive(mutex_2);
      } else {
        Serial.println("Task A timed out waiting for mutex 2");
      }

      // Give back mutex
      xSemaphoreGive(mutex_1);
    } else {
      Serial.println("Task A timed out waiting for mutex 1");
    }

    // Wait to let the other task execute
    Serial.println("Task A going to sleep");
    vTaskDelay(500 / portTICK_PERIOD_MS);
  }
}

I can submit a PR to fix this if you're interested? Love your work here, Shawn!

@ShawnHymel
Copy link
Owner

Apologies for the super long delay. I'm just now getting a chance to go through emails and GitHub issues.

Thank you for the fix! I'd like to keep the demos used in the video exactly as they appear in the video (if possible). I don't remember running into the reset issue on my ESP32 in the deadlock timeout demo. Do you have any idea why it was happening for you?

@neilbalch
Copy link
Author

neilbalch commented Nov 8, 2021

Honestly, I can't say I remember as of now 🤦. Looking back at it now, I'm pretty sure the only change I ended up making from your stock solution was to move the mutex_n give instructions inside the if statements.

I don't have the time to try and re-run through my debugging steps right now, but I think this still makes sense from a theoretical perspective. Looking at the serial output and original code, Task A attempts to give mutex_2 immediately after Task A timed out waiting for mutex 2. Unless I misunderstand the usage of a mutex, this should be an invalid (and not desired) operation. (the mutex should only be given back when the task was actually able to take it successfully)

Although, I may very well be missing something important here. For reference, my version of the tutorial is in my repository here: github.com/neilbalch/ESP32-FreeRTOS-Tutorial

@ShawnHymel
Copy link
Owner

Interesting! I thought that in FreeRTOS, if you "gave" a mutex that had not been taken, nothing would happen. That's the behavior I saw in my example, but it looks like you're seeing resets.

From what I can tell, your resets are happening because of a stack overflow (Task A is filling up the allocated stack to the point the canary watchpoint is tripped). Does the reset happen right away, or does it take some time to occur when you run the program? Did you change anything else other than moving the xSemaphoreGive() calls to inside the if statements?

@neilbalch
Copy link
Author

neilbalch commented Nov 8, 2021

If I remember correctly, and based on the serial output I provided in the original issue text, the resets started occurring immediately after an MCU reset/program cycle. I can't remember how or why, but I thought I originally attributed that reset exception message to the task being in a permanently blocked state as it waited for the mutex to be "given," but your explanation now seems more proper.

As for a list of changes, I believe the only thing I did was move the mutex give instructions. The full code I used is in my repo here.

@wanggaoteng
Copy link

I also have the same problem, my board is esp32-s3.
Thank you for providing the correct code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants