Skip to content

Hardware Watchdog

Boaz Feldboim edited this page Jan 29, 2022 · 4 revisions

The device contains hardware watchdog. Not to confuse with the Internet watchdog. This watchdog makes sure that the device is working properly. When the device hangs, this watchdog will restart the device. It was one of the latest modifications to the device and is rather a major modification to the hardware.

After working with the device for a long time it was found that the device becomes unstable after about a week. This instability caused the device to stop responding to HTTP requests or response became very intermittent. The instability was observed in the ethernet module, in both the wired and the WiFi modules. It happened more often in the WiFi module. I wasn't able to find anything wrong in my code, though it is not unconceivable that the problem is in my code. The problem can as well be in the libraries. Instead of spending a lot of time to realizing which resource is leaking I preferred to add a hardware circuit that acts as a watchdog.

So if the device hangs it will restart. When restarting the power to the processor, micro SD adapter and ethernet adapter is turned off for about three seconds and then power is reestablished. This happens without turning off the router or the modem. There is also a task the causes planned restart once in few days. The time between restarts is set in CONFIG.TXT file. The configuration values names are HardResetTime and HardResetPeriodDays. The default values cause hard reset every 3 days at 3:00 (see CONFIG.TXT). It is also possible to initiate a hard reset by pressing the buttons on the front panel (see Controlling the Device Manually).

Following is the electrical diagram of the watchdog:
watchdog
The central component is the TLC556CN which is the CMOS version of a pair of 555 timers. The first timer is wired to act as a multi-vibrator. The second timer is wired to act as a one shot. There is a task that changes the state of WD line (GPIO32) every 500mSec. Whenever the state of WD changes from LOW to HIGH capacitor C6 of the first timer is discharged. This prevents from the first timer to change state from HIGH to LOW. If the task that changes WD state stops, C6 capacitor will get charged and will cause the output of the first timer to change from HIGH to LOW. This triggers the second timer which causes relay K1 to change position and disconnect power from the processor, micro SD adapter and the ethernet adapter. When the second timer returns to its steady state, power is resumed.

The initialization code of the device expects to find the state of WDOP (GPIO 33) which is also the output of the first timer in LOW state. WDOP in LOW state when it triggers the second timer that disconnects the power. If the processor reboots because for example an unhandled exception, the state of WDOP will be HIGH. So the code simply enters an infinite loop. This will cause the first timer to trigger the second timer and this will cause a power off/power on cycle.

This behavior of WD and WDOP lines and the fact that the first timer is wired to act as a multi-vibrator requires that the time that the first timer remains in LOW state is longer than the time that the second timer is in its HIGH state (non steady state). On the other hand the time for the first timer to be in HIGH state should be relatively short because it is desirable that power will set to off soon after WD stops changing state. This is why C6 is charged through R19 which relatively has low resistance (68K) and C6 is discharged through R14 which has relatively high resistance (470K). D8 is required in order for C6 not to get discharged through R19 and only through R14. R18 is required in order to lower a bit the threshold levels because D8 lowers the maximum charge voltage of C6.

It was found that after recovering from unhandled exception or any other major failure it is best to disconnect the power for several seconds. Otherwise, some hardware (usually the micro SD adapter) might not function properly until power is turned off and on. The first timer is wired to act as a multi-vibrator in order to repeat the power off/power on cycle in case restart didn't succeed for some reason.

When loading new code to the processor, the task that triggers the watchdog is not running so in order to suppress power disconnect from the processor there is a DIP switch (S1) that disables the second timer. So when loading code to the processor it is important to set S1 to ON position and set it back to OFF when the code complete loading. Also when starting a debug session it is required to set S1 to ON position, wait for the debugger to break in the startup code. Soon after resuming code run set S1 to off position.

When the device is turned on WDON is immediately set to HIGH. This causes the initialization code to enter an infinite loop and soon after that a power off/power on cycle to occurs. This is why a few seconds after the device is turned on power is turned off and then on again after a few more seconds.