Application Watchdog

Since 0.5.0:

A Watchdog Timer is designed to rescue your device should an unexpected problem prevent code from running. This could be the device locking or or freezing due to a bug in code, accessing a shared resource incorrectly, corrupting memory, and other causes.

Device OS includes a software-based watchdog, ApplicationWatchdog, that is based on a FreeRTOS thread. It theoretically can help when user application enters an infinite loop. However, it does not guard against the more problematic things like deadlock caused by accessing a mutex from multiple threads with thread swapping disabled, infinite loop with interrupts disabled, or an unpredictable hang caused by memory corruption. Only a hardware watchdog can handle those situations.

The application note AN023 Watchdog Timers has information about hardware watchdog timers, and hardware and software designs for the TPL5010 and AB1805.

// PROTOTYPES
ApplicationWatchdog(unsigned timeout_ms, 
    std::function<void(void)> fn, 
    unsigned stack_size=DEFAULT_STACK_SIZE);

ApplicationWatchdog(std::chrono::milliseconds ms, 
    std::function<void(void)> fn, 
    unsigned stack_size=DEFAULT_STACK_SIZE);

// EXAMPLE USAGE
// Global variable to hold the watchdog object pointer
ApplicationWatchdog *wd;

void watchdogHandler() {
  // Do as little as possible in this function, preferably just
  // calling System.reset().
  // Do not attempt to Particle.publish(), use Cellular.command()
  // or similar functions. You can save data to a retained variable
  // here safetly so you know the watchdog triggered when you 
  // restart.
  // In 2.0.0 and later, RESET_NO_WAIT prevents notifying the cloud of a pending reset
  System.reset(RESET_NO_WAIT);
}

void setup() {
  // Start watchdog. Reset the system after 60 seconds if 
  // the application is unresponsive.
  wd = new ApplicationWatchdog(60000, watchdogHandler, 1536);
}

void loop() {
  while (some_long_process_within_loop) {
    ApplicationWatchdog::checkin(); // resets the AWDT count
  }
}
// AWDT count reset automatically after loop() ends

A default stack_size of 512 is used for the thread. stack_size is an optional parameter. The stack can be made larger or smaller as needed. This is generally too small, and it's best to use a minimum of 1536 bytes. If not enough stack memory is allocated, the application will crash due to a Stack Overflow. The RGB LED will flash a red SOS pattern, followed by 13 blinks.

The application watchdog requires interrupts to be active in order to function. Enabling the hardware watchdog in combination with this is recommended, so that the system resets in the event that interrupts are not firing.

The Particle.process() function calls ApplicationWatchdog::checkin() internally, so you can also use that to service the application watchdog.


Your watchdog handler should have the prototype:

void myWatchdogHandler(void);

You should generally not try to do anything other than call System.reset() or perhaps set some retained variables in your application watchdog callback. In particular:

  • Do not call any cloud functions like Particle.publish() or even Particle.disconnect().
  • Do not call Cellular.command().

Calling these functions will likely cause the system to deadlock and not reset.

Note: waitFor and waitUntil do not tickle the application watchdog. If the condition you are waiting for is longer than the application watchdog timeout, the device will reset.

Since 1.5.0:

You can also specify a value using chrono literals, for example: wd = new ApplicationWatchdog(60s, System.reset) for 60 seconds.