2019-03-03 02:51 PM
I am developing software for a system (using HAL API) that communicates over CAN and I recently got into some issues related to thread-safety. I am using STM32L496ZGT6 mcu.
Some background info:
My main "thread" runs cyclically in a predetermined tick rate.
I have an interrupt handler for CAN Rx which puts the message received (using HAL_CAN_GetRxMessage) in a SW Rx Fifo that my main thread then uses (main thread basically decodes the message payload based on message id and stores into variables).
My main thread also puts messages into a SW Tx Fifo cyclically or on requests. This SW Tx Fifo is used to send messages using HAL_CAN_AddTxMessage.
So whenever a message is pushed onto the SW Tx Fifo there is an attempt to send it by HAL_CAN_AddTxMessage. However since there are only 3 Tx mailboxes (for my device) it might happen quite often that not every message pushed can be transmitted immediately.
Therefore in the background I also have a interrupt handler for CAN Tx which requests transmission of the messages stored in the SW Tx Fifo.
I am protecting my SW Rx and Tx Fifos using a simple binary semaphore to make sure it does not get corrupt between main thread and ISRs, eventhough it's not likely when running a circular Fifo buffer.
So to the issue. I noticed that since the interrupts (Tx and Rx) can preempt the main thread I will face problems in case the main thread is inside HAL_CAN_AddTxMessage in some unfortunate place when an interrupt fires. The problem will result in various interesting things such as corrupt messages being sent onto the bus, e.g random message ids or non-valid CAN message formats.
So I started looking at the HAL API functions and to me it seems that there is nothing to prevent the peripheral handler (in this case CAN_HandleTypeDef) from being used at the same time by two different threads. I.e registers could become corrupt during read/write operations between different threads. On first glance I thought that CAN_HandleTypeDef.State would be used to prevent this since it could quite easily be used as some sort of semaphore/mutex, but it's not.
Is there anything I am missing, or how am I supposed to use the HAL API without risking problems with thread-safety?
In my case I could solve the problem by making sure the main thread does not call any HAL functions and that no ISR can preempt another one. So instead of having the main thread attempt HAL_CAN_AddTxMessage whenever a message is added to SW Tx Fifo, it will instead trigger a software interrupt for CAN Tx (using NVIC_SetPendingIRQ) and let the interrupt run HAL_CAN_AddTxMessage as described above.
2019-03-05 08:30 AM
Tried to find where in the HAL CAN API this issue may appear but can't really find any non-atomic writes to the can message id register etc. Anyone has an idea or has experienced something similar?
2019-03-08 12:29 AM
Hi nellemannen,
I have two questions:
1- Are you using two different threads that are accessing the CAN instance or only one thread that manages your Tx and Rx in your main?
2- Just to confirm, are you using Tx interrupt?
If yes could you test with polling as following:
uint32_t FreeTxMbox = 0;
.
.
/* Wait for free mailbox */
do
{
FreeTxMbox = HAL_CAN_GetTxMailboxesFreeLevel(&Handle);
}while(FreeTxMbox == 0);
HAL_CAN_AddTxMessage(&Handle, &TxHeader , &TxData, &TxMailbox)
}
B.R.
STM32
2019-03-09 04:35 AM
Hi,
1. Yes in the main thread the CAN handle is accessed and then in the Tx/Rx interrupts it is being used as well.
2. I use Tx interrupt for all 3 mailboxes. Whenever I attempt a Tx either in main or from interrupt I check that freelevel is > 0 otherwise I don't attempt Tx. So Im not sure how polling would solve my issue.
Is there any risk to use the Handle in both Tx and Rx interrupts in case one would preempt the other? Since I have a semaphore I could apply that to any HAL driver handle to make sure it's threadsafe. In that case what is a good way to make sure that I don't miss my interrupt, is there any good way of making sure it can trigger after the semaphore has released? I could have it trigger a sw interrupt as I described above in my original post, is that the best way?
2019-03-11 02:10 AM
Hi,
The goal to test with polling is not to solve the issue but at least to help to identify the problem.
As I understood, you don't have two concurrent threads that access the CAN handle (here we talk about thread safety), but you have one thread and two interrupts that one interrupts the other, and here we talk about race condition issue. May be you can confirm if you do the test without RTOS (send CAN frames periodically in a timer interrupt with the same periodicity of your RTOS thread).
May be also, before this, you can play with preemption priority and sub priority of CAN tx and rx interrupts.
With best Regards,
STM32
2019-03-11 03:10 AM
Thanks for the answer. Maybe I was a bit unclear in my post, I am not using RTOS.