cancel
Showing results for 
Search instead for 
Did you mean: 

32F746GDISCOVERY: help with the hard fault handler

Jackiii1989
Associate III

Hi,

 

I am developing an application on the 32F746GDISCOVERY board. The application reads data from a Python script that send certain strings to the MCU via USB connected to the USART1. The data from USART1 is then passed through the queue to the Screen1View where it is displayed in the TextArea field. Once this is done, the MCU sends a reply back to the "Python script" and then a new command is executed.

The application runs fine for some time, but after a few seconds, the hard fault handler is called. Here is the function stack:

Jackiii1989_1-1717414711861.png

 

I currently do not know what is located at the address 0x30302d32. Does anyone have any insights on this?

Also looking at the memory map, the address 0xffffffed is associated with the Cortex M7 internal registers and there should not be a function call to it, right?

I have found a relatively similar post about this here, but it suggests increasing the stack size. Looking in the stack size of the task the default value is 4096 bytes, which is more then enough of memory I would say.

Can anyone help me with this? Thanks in advance.

 

11 REPLIES 11
hellohello
ST Employee

Hello @Jackiii1989 and welcome to the community!

 

What have you tried to debug?

What happens when you increase the stack size as the post suggest ?

What state is your board on when the error occurs? Waiting for Python script? Receiving the string? Sending response?

Do you have any way of identifying the function you are in when the issue happens? With a decompiler maybe?


Hello @Jackiii1989 and welcome to the community!


Hi, thanks for welcoming me to the community. 

What have you tried to debug?


Sorry for my incomplete post - I am trying to find my bug in the application. I am playing with the TouchGFX capabilities and trying to understand how to develop an application. In TouchGFX designer I have created 4 text areas where I am trying to update them with the python script. When I press the USER button from the Discovery board, the communication starts - the Discovery sends a command via USART to the Python script and then the Python script replies with a string response. When this string response is received, it is passed to the Screen1View where data extraction is performed and this data is updated in the text areas. A new command is then sent to the PC after the view updated the text area. It is a closed loop between the Python script and the Discovery Board.


What happens when you increase the stack size as the post suggest ? What state is your board on when the error occurs?



I have increased the stack size to 5120 bytes and the error still occurs. This is a huge stack size for an embedded application. However, the error is very sporadic. Sometimes it somehow misses the command from the Python script and waits there. When I press the USER button again , it starts from the begging again. I can do this couple of times before a hard fault is happening.  Other times it jumps directly in the hard fault after the timeout as shown in the picture above.

 


 Waiting for Python script? Receiving the string? Sending response?

 


The Python script simply listens for the command, responds to it and then listens again. The serial library has a timeout. Of course, if no data is received in that time, a timeout text is generated. Here is the code:

    command = ""
prev_command = ""
count = 0
waiting_on_start = 1
while True:
# wait for the command
response = ser.read_command()

if response == "timeout" or response == "decodeError":
print(f"{bcolors.WARNING}{response}{bcolors.ENDC}")
count = 0
continue

if response.find(dev1) >= 0:
command = create_cmd(dev1)
elif response.find(dev2) >= 0:
command = create_cmd(dev2)
elif response.find(dev3) >= 0:
command = create_cmd(dev3)
elif response.find(dev4) >= 0:
command = create_cmd(dev3)

ser.send_command(command)
count += 1
if prev_command != command:
prev_command = command
print(f" {bcolors.OKBLUE}msg received!, count:{count}{bcolors.ENDC}")
#sleep(0.05)

Here is the image of the python script output:

Jackiii1989_0-1717517225611.png

Here is also an output, where the discovery jumped after the first timeout:

Jackiii1989_1-1717517442656.png


Do you have any way of identifying the function you are in when the issue happens? With a decompiler maybe?


I am using the arm-none-eabi-gcc and arm-none-eabi-c++ compilers respectively. I think it always happens when the TouchGFX_Task is called. However, I cannot find out exactly where in the code it is because the developer of TouchGFX defined as a virtual function and finding out exactly where the declaration is linked to is a bit of a challenge for me.

I am transferring the data to the TouchGFX_Task with queue as shown below:

 

 

uint8_t RxData[MAX_MESSAGE_SIZE];
UartData_t* uartData_debug_msg;

typedef struct{
	int size;
	char Data[MAX_MESSAGE_SIZE];
}UartData_t;

void HAL_UARTEx_RxEventCallback(UART_HandleTypeDef *huart, uint16_t Size){

  if(osMessageQueueGetSpace(uartDebugQueueHandle)>0){
   strncpy(uartData_debug_msg->Data, (char*)RxData, Size);
   uartData_debug_msg->size = Size-2;
   osMessageQueuePut(uartDebugQueueHandle, &uartData_debug_msg, 0, 0);
  }
  HAL_UARTEx_ReceiveToIdle_IT(&huart1, RxData, MAX_MESSAGE_SIZE);
}

 

 

What is the bottleneck on how much data can be transferred between to the GUI task? I know this one:

https://support.touchgfx.com/docs/basic-concepts/performance  

If the data is not updated during this time it just sends the same frame. So it should not be a problem.



Thanks for the help!

 

Hello,

 

"Also looking at the memory map, the address 0xffffffed is associated with the Cortex M7 internal registers and there should not be a function call to it, right?"
As far as I know, the registers ca be used as cache memory.

 

You only have a log for receiving messages.
I think the timeout can happen if the Python script doesn4t receive anything for a long time or if the Python script is trying to send something and there is no viable receiver.
So having a log for sending message finished could be helpful.

 

You could try to remove compiler optimization.

 

I dont have many ideas there, maybe you can share your project, we have that discovery board available.

 

"What is the bottleneck on how much data can be transferred between to the GUI task?"
Sending strings wont be an issue, we have to deal with sending 480*800*3bytes 60 times per second, I think that one string per second is ok.

 

Regards,

 

@GaetanGodart 

 

Hello,
I apologize for my late response. I was a bit busy lately. 


"As far as I know, the registers ca be used as cache memory."

Okay, Interesting. In the memory map (Chapter 4) it is only written Cortex 7 Internal Peripherals. 

"You only have a log for receiving messages.
I think the timeout can happen if the Python script doesn4t receive anything for a long time or if the Python script is trying to send something and there is no viable receiver."

I think the first explanation is the reason for the timeout. Because the python script is not receiving anything, therefore the timeout is called. The reason why this is happening is because the STM32F746 is going into hard fault and it is irresponsible. My question is why is this happening? What am I doing wrong in the code?

 

"You could try to remove compiler optimization."

I played with the optimisation flags, I disabled optimisation (-O0), but it did not help.

 

"I dont have many ideas there, maybe you can share your project, we have that discovery board available"

If you can help me find the bug and improve my understanding of what is wrong, then I will be happy to share the project. I could not upload the files to the post. So I created a share folder myself. I have made some updates to the TouchGFX. I moved some code from ScreenView1 to a separate task called StartConsumerTask where the data extraction happens, and from this task the extracted data should be sent to the ScreenView1 to be displayed on the screen, but this is omitted as I want to narrow down the problem. To start the communication, first run the Python script and press the USER button on the board with the flashed TouchGFX code. Thanks!

Share folder:

https://drive.google.com/drive/folders/13bt0pabCeKZUSY40b6lj8PBWxU4mG9B0?usp=drive_link 

If there is anything I can do to help, please let me know.

 

>>I currently do not know what is located at the address 0x30302d32. Does anyone have any insights on this?

Stack getting trashed with ASCII string data,ie "2-00"

Look for where you have ridiculously large auto/local variables or arrays.

Add some stack depth checks, so you can isolate / bisect the issue. ie fill stack with a pattern and determine a low-water mark for how far it descends.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

> Stack getting trashed with ASCII string data,ie "2-00"

+1 to "2-00", but not necessarily at stack - i.e. it may be not just return to thrashed stack content, but also indirect jump to corrupted function pointer.

Look at disasm (best mixed with C) couple of instructions before the offending address (0x8019166) to find out, which one of these it was.

JW

Hello @Jackiii1989 ,

 

I am not able to download your Python script, Google won't let me download it because it is not able to do an anti-virus scan.

Can you share it here as code directly inside a message or is it too big? (It seemed to be multiple files and folders, including macros, etc)

 

Regards,

Gaetan Godart
Software engineer at ST (TouchGFX)

Hello @GaetanGodart,

Thank you for your support. I wanted to share the TouchGFX project and the Python script. Unfortunately it does not allow me to put any compression option (WinRar, Winzip or 7-zip). It automatically rejects them. I would try to find another solution or try again tomorrow.

Github, Google Drive (as a browserable node, not a ZIP blob), Microsoft One Drive, ..

The chances people want to build and debug an entire project is pretty low.

Work backward from the fault, trying to identify the triggering behavior and test for it in prior code execution.

Instrument your code so you understand the flow dynamics to the point of failure, and add checks for heap and stack integrity.

Look for unbounded string interactions, and add checking code around them, vs the size of the structures you're moving them into.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..