Preciserr hardfault at random times, when using lwip and ethernet

ngrigoriadis · ‎2024-01-31

Hi, I am developing a project with an STM32H7 microcontroller, that has to do with ethernet connectivity. More specifically it is an embedded display that runs freeRTOS v2 code, which is being generated automatically from touchGFX. I used the cubemx in order to enable the ethernet peripheral, the LwIP to assign a static IP address to my board, and also I set MPU configurations. After that, when I generate my code, I set a section into the linker script file (lwip_sec) and set the DMA descriptors to run in RAM_D2 also with the RX_pool buffer. I manage to succeed to work with the peripheral I can ping my board correctly, although at a random time it generates a precise hardfault error. So I would like to ask a question about linkerscript. Is there any additional process that it needs to be done in linkerscript (for example, change the length of ram or other things, when a new section is being initialized) in order to avoid the hardfault error, or it isn't related the linkerscript with that error?

I have followed the same steps like this video: https://www.youtube.com/watch?v=sQ3rgQNGKV4, with the only difference is that I work with touchgfx but the video has the lvgl library.

ngrigoriadis · ‎2024-02-22

Hello,

I found the problem. It was a stack corruption. I found that the functions which was appearing in the debug view in CUBEIDE has a stack which is the INTERFACE_THREAD_STACK_SIZE. Initially, was defined as 350 and I increased it. Furthermore, I increased the configTOTAL_HEAP_SIZE in the freeRTOSConfig.h file and I made the suggestion of the @KDJEM.1 to set MPU_Config() before the enabling of caches (I and D). Finally, I managed to successfully overcome this problem that I had.

I would like to thank everyone that replied to my topic and had the willingenss to help me.

View solution in original post

KDJEM.1 · ‎2024-02-02

Hello @ngrigoriadis ,

I recommend you to take a look at How to create project for STM32H7 with Ethernet an... - STMicroelectronics Community may help you.

Please note that, if the cache is enabled, it is requires to follow this order: configure MPU firstly, then enable cache.

It is recommend to configure MPU before enabling the caches. This is because the MPU settings can be affect the behavior of the caches.

Also, I advise you to start with an available example from the Github .

I hope this help you to solved your issue.

Thank you.

Kaouthar

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

ngrigoriadis · ‎2024-02-02

hello @KDJEM.1,

I had checked the link that you posted, I have a different setup from this example, my screen is the RIVERDI display. The MPU configurations that this example has are not compatible with mine, and also the descriptors are on different addresses. I change the main file that it has been generated from touchgfx designer and I set my MPU configurations before the cache. However it didn't worked. The code is only from touchgfx and cubemx I didn't write a single line expect from linker and ethernetif.c in order to declare the rx_pool buff. Maybe the generated code has a lot of problems that needs to be fixed I don't know. :\

Bob S · ‎2024-02-02

First rule of debugging faults - have you looked at the fault register contents to see exactly what caused the fault? There are several (many) posts on this forum about this. And if you are using CubeIDE it even has a fault analyzer tab. For precise faults that should give you the PC value of the instruction that generated the fault.

ngrigoriadis · ‎2024-02-04

Hello @Bob S,

Yes I have investigate the CFSR register and it is giving all the time a preciserr error with a BFAR address pointing to memory locations that are reserved. I cannot understand how to find the exact location of the problem, as I mentioned before I am using a code that is being generated from both TOUCHGFXDesigner and CUBEMX(LwIP & Ethernet). I have tried either to increase the ETH interrupt priority, because when the fault appears, I can see in the debug view that within the functions that are being called before the error occurs is the interrupt function from ETH peripheral. Although, it didn't solve the problem at all. I should mention that the address in the BFAR register is different each time I make some changes in the code, for instance, increasing the stack size of the task, or the TCPIP_STACK_SIZE in the lwipopts.h file. However, my MPU_Config function is like this:

void MPU_Config(void)
{
  MPU_Region_InitTypeDef MPU_InitStruct = {0};

  /* Disables the MPU */
  HAL_MPU_Disable();

  /** Initializes and configures the Region and the memory to be protected
  */
  MPU_InitStruct.Enable = MPU_REGION_ENABLE;
  MPU_InitStruct.Number = MPU_REGION_NUMBER1;
  MPU_InitStruct.BaseAddress = 0x30000000;
  MPU_InitStruct.Size = MPU_REGION_SIZE_1KB;
  MPU_InitStruct.SubRegionDisable = 0x0;
  MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0;
  MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
  MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_ENABLE;
  MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
  MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
  MPU_InitStruct.IsBufferable = MPU_ACCESS_BUFFERABLE;

  HAL_MPU_ConfigRegion(&MPU_InitStruct);

  /** Initializes and configures the Region and the memory to be protected
  */
  MPU_InitStruct.Number = MPU_REGION_NUMBER2;
  MPU_InitStruct.BaseAddress = 0x30004000;
  MPU_InitStruct.Size = MPU_REGION_SIZE_16KB;
  MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
  MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE;
  MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;

  HAL_MPU_ConfigRegion(&MPU_InitStruct);
  /* Enables the MPU */
  HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);

}

and I don't know if I should declare another area in which I have to enable access to some other areas in the RAM.

ngrigoriadis · ‎2024-02-05

Looking at BFAR register and the memory map of my microcontroller it is obvious that the address of BFAR is inside the area of the FMC NAND Flash memory, so thats why I am getting this preciserr error. Although, I cannot find what it causes the problem.

Bob S · ‎2024-02-05

My knowledge of both H7 and TouchGFX are minimal. Are you executing code/instruction from the FMC, or is that only storing data (fonts, graphics, etc.). If the BFAR is pointing there, it sounds like you have code there - so what assembly instruction is at that address in the NAND Flash? See what the instruction is, and what the values the registers have. See if you can spot any corrupted data or NULL pointer reference.

Hmmmm - if the fault occurs in the ETH interrupt handler (or any interrupt), then increasing the Ethernet TASK stack will not help. FreeRTOS interrupts do not use the task's stack pointer (psp), it uses the system stack (msp). And that is set by the linker file (see prvPortStartFirstTask() - it loads the msp from the initial stack pointer stored in the vector table).

Tesla DeLorean · ‎2024-02-05

Precise errors come from READs

The FMC will generate an error in the decode if the address is outside expected sizes configured in the peripheral.

NAND memory tends not to use many address bits, bit mainly one for command/data differentiation.

When you memcpy() out of NAND data space you're reading a FIFO for the block requested.

Is this memory range expected? Look critically at the code that touched the address (just prior to PC) and where the subroutine was called from at a higher level (LR)

Put in an ASSERT() or sanity check in the code path that's doing this so you can work it backward to the source of the problem.

Have a HardFault Handler that outputs actionable data. That way you don't need a debugger, and end-users can report failures in a way you can fix or understand.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

ngrigoriadis · ‎2024-02-22

Hello,

I found the problem. It was a stack corruption. I found that the functions which was appearing in the debug view in CUBEIDE has a stack which is the INTERFACE_THREAD_STACK_SIZE. Initially, was defined as 350 and I increased it. Furthermore, I increased the configTOTAL_HEAP_SIZE in the freeRTOSConfig.h file and I made the suggestion of the @KDJEM.1 to set MPU_Config() before the enabling of caches (I and D). Finally, I managed to successfully overcome this problem that I had.

I would like to thank everyone that replied to my topic and had the willingenss to help me.