Investigating random core panics on Pycom/ESP32 devices

Investigating the Guru Meditation Error: Core 1 panic'ed (LoadProhibited) crashes

At Stabilität und längere Testzeiträume des Terkin-Datenloggers, we observed a LoadProhibited core dump after running one of our FiPy devices for some hours.

Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC      : 0x401e5b2f  PS      : 0x00060a30  A0      : 0x80104957  A1      : 0x3ffe3190
A2      : 0x3f9530f4  A3      : 0x00000000  A4      : 0x00000d77  A5      : 0x00000002
A6      : 0x3ffca944  A7      : 0x00000455  A8      : 0x00000000  A9      : 0x00000001
A10     : 0x00000000  A11     : 0x00000000  A12     : 0x3ffc3500  A13     : 0x3ffc34fc
A14     : 0x00000000  A15     : 0x00000006  SAR     : 0x0000001c  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x4009dae8  LEND    : 0x4009daf3  LCOUNT  : 0x00000000

Backtrace: 0x401e5b2f:0x3ffe3190 0x40104954:0x3ffe31b0 0x400fc5b5:0x3ffe3210 0x400f8ced:0x3ffe3240 0x400fb09c:0x3ffe3260 0x400fb0b9:0x3ffe32b0 0x400f8ced:0x3ffe32d0 0x400f8d31:0x3ffe32f0 0x400e75a6:0x3ffe3320 0x400ef644:0x3ffe3340 0x400de74d:0x3ffe33d0

More references can be found here.


To hazard a guess

The problem seems to be related to the multithreading subsystem when the system is under pressure and/or when interrupts are involved.

Our investigations around these random crashes diverted us into different directions. We have been able to confirm these on different Pycom devices, namely the FiPy, the GPy and the LoPy4.

  1. There is the well-known LoadProhibited core panic fault occurring when uploading files either through the REPL using rshell or through FTP using lftp, see also

    Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
    
  2. There are memory corruption faults when operating in dual-core mode, see also

    Guru Meditation Error: Core  1 panic'ed (Cache disabled but cached memory region accessed)
    Guru Meditation Error: Core  1 panic'ed (IllegalInstruction). Exception was unhandled.
    Memory dump at 0x4020fd14: bad00bad bad00bad bad00bad
    

We have been trying to nail down the problems through different attempts baking our own firmware images and handing them out to people in order to get more feedback about these issues, see also Pycom inofficial firmware bakery.

However, things got even worse for some. This leads to the conclusion some errors have been masked by others and will only get triggered within certain edge cases.

From our research so far, we are still convinced that

However, we haven’t been able to conceive a minimal example program to reproduce the issues, like Thomas Rogg did with his findings around ESP32-WROVER is fundamentally broken (but is being fixed by Espressif now) – neonious Basics.

If you are brave enough to get into the details of Random memory corruption faults on ESP32-WROVER rev.1 and rev.2 when running in dual-core mode, you will recognize that there are silicon bugs involved on this journey as well.

Thoughts about recentness of the precompiled/proprietary libraries

After looking at Merge branch 'bugfix/coex_lc_protect' into 'master' · espressif/esp-idf@c960bcb · GitHub, it just occurred to us that we should maybe always use the most recent versions of the precompiled/proprietary libraries “newlib”, “esp32-wifi-lib” and “esp32-bt-lib”.

They are located within the ESP-IDF directory at their respective paths:

  • components/newlib/lib
  • components/esp32/lib
  • components/bt/lib

Thoughts about other recent fixes coming from Espressif

After looking at Investigating core panics with BLE on Pycom devices we also think about adding this into the mix.

The files changed within that commit are:

  • components/bt/bt.c
  • components/esp32/esp_adapter.c
  • components/freertos/include/freertos/portable.h
  • components/newlib/locks.c

In order to lighten up the conversation, I would like to divert it into a totally different direction here and would like to give you some readings about possible influences on silicon through ionizing particles, exposition to electromagnetic fields (EMF) and the topics of electromagnetic interference (EMI) [also called radio-frequency interference (RFI)] and electromagnetic compatibility (EMC).

There are nice stories to be found at:

Enjoy reading.

Back to work. In the meanwhile, we became even more active on the Pycom user forum and by reading on the ESP-IDF in order to get additional insights into the problem.

Apparently, while earlier releases of the Pycom firmware has been reasonably stable, we want to recap that things obviously got more sensitive to random crashes with the advent of the Firmware Release v1.20.1 bringing in dual-core mode through upgrading to [IDF V3.2 and] MicroPython 1.11, which enables this by default [1].


  1. esp32/boards: Enable dual core support by default. · micropython/micropython@92149c8 · GitHub ↩︎

1 Like

In order to get an even broader picture, we also started looking at things within Genuine MicroPython development.


We’ve built upon many of the aspects listed above and released another round of inofficial firmware images called 1.20.1.r1-0.6.0-vanilla-dragonfly based on Pycom’s 1.20.1.r1.

We are collecting the outcome of that within Dragonfly firmware for Pycom/ESP32.

A post was merged into an existing topic: Testing the custom “dragonfly” builds on Pycom devices