cancel
Showing results for 
Search instead for 
Did you mean: 

CAN Driver bug (?!)

Beck.Karl-Michael
Associate III
Posted on April 16, 2015 at 16:42

I think

i found an bug in the CAN driver. Explain me this:

a) Receiving

1. In the example application HAL_CAN_Receive_IT is called from HAL_CAN_RxCpltCallback

2. Looking in the implementation of the driver we see that HAL_CAN_RxCpltCallback is called from CAN_Receive_IT

3. CAN_Receive_IT is called from the HAL_CAN_IRQHandler

4. Going back to HAL_CAN_Receive_IT looking at the implementation we see it uses __HAL_LOCK which exclusively locks the CAN_HandleTypeDef

5. So conclusion: HAL_CAN_Receive_IT is called from an interrupt and exclusively locks the CAN_HandleTypeDef

b) Sending

1. An application will call either HAL_CAN_Transmit_IT or HAL_CAN_Transmit in the main routine.

2. Looking at the implementation we see both methods also use __HAL_LOCK

c) The Bug occurs

1. So an application sends a can message in the main routine 

2. The the rx interrupt occurs while the main task is inside HAL_CAN_Transmit_IT or HAL_CAN_Transmit. (Rember the CAN_HAndleTypeDef is locked: b.2)

3. Now the rx interrupt tries also to lock (a.4)

4. Of course this will fail because it's also locked by b.2

d) The resulting phenomena

1. The sample application will call it's error handler, going to while(1); Of course this is not acceptable.

2. If we'd ignored the error looking in the implementation of HAL_CAN_Receive_IT we see that it does not too much but it enables the rx interrupts.

   => So if we'd ignore the error we will no more get interrupts of incoming messages.

Of course it's very unlikely the bug occurs in the sample application with low busload. But i have it here in the System, still trying to figure out how to deal with this Rx-Acknoledge done by  HAL_CAN_Receive_IT. But looking at the above insight there is no solution - just a fundemantal bug inside the Can Driver. Actually there should be separate lock handles for rx + tx. Then it should work.

Did i miss something? The conclusion will be just not to use the can driver and write another one... or maybe modify it to use two seperate lock-objects....

Still i'm not sure if i use the Driver correctly but following the example code provided one traps into this.

2 REPLIES 2
rlisario9
Associate
Posted on August 05, 2015 at 16:30

Dear,

We have same problem.

Is it possible to have a correct CAN driver from ST ?

Regards,

Rinaldo

I think

i found an bug in the CAN driver. Explain me this:

a) Receiving

1. In the example application HAL_CAN_Receive_IT is called from HAL_CAN_RxCpltCallback

2. Looking in the implementation of the driver we see that HAL_CAN_RxCpltCallback is called from CAN_Receive_IT

3. CAN_Receive_IT is called from the HAL_CAN_IRQHandler

4. Going back to HAL_CAN_Receive_IT looking at the implementation we see it uses __HAL_LOCK which exclusively locks the CAN_HandleTypeDef

5. So conclusion: HAL_CAN_Receive_IT is called from an interrupt and exclusively locks the CAN_HandleTypeDef

b) Sending

1. An application will call either HAL_CAN_Transmit_IT or HAL_CAN_Transmit in the main routine.

2. Looking at the implementation we see both methods also use __HAL_LOCK

c) The Bug occurs

1. So an application sends a can message in the main routine 

2. The the rx interrupt occurs while the main task is inside HAL_CAN_Transmit_IT or HAL_CAN_Transmit. (Rember the CAN_HAndleTypeDef is locked: b.2)

3. Now the rx interrupt tries also to lock (a.4)

4. Of course this will fail because it's also locked by b.2

d) The resulting phenomena

1. The sample application will call it's error handler, going to while(1); Of course this is not acceptable.

2. If we'd ignored the error looking in the implementation of HAL_CAN_Receive_IT we see that it does not too much but it enables the rx interrupts.

   => So if we'd ignore the error we will no more get interrupts of incoming messages.

Of course it's very unlikely the bug occurs in the sample application with low busload. But i have it here in the System, still trying to figure out how to deal with this Rx-Acknoledge done by  HAL_CAN_Receive_IT. But looking at the above insight there is no solution - just a fundemantal bug inside the Can Driver. Actually there should be separate lock handles for rx + tx. Then it should work.

Did i miss something? The conclusion will be just not to use the can driver and write another one... or maybe modify it to use two seperate lock-objects....

Still i'm not sure if i use the Driver correctly but following the example code provided one traps into this.

Beck.Karl-Michael
Associate III
Posted on July 14, 2016 at 16:04

Hi Rinaldo,

i'm glad someone observed the same.

I guess it would be possible to fix the implementation by introducing 2 separate Lock States  one for receiption and one for transmission.

However for now as a workorund i did the following:

- If the problem occurs in the interrupt i set a bool variable to true in the receiption interrupt.

- In the application ''task'' immediatelly after sending if the variable is set i call HAL_CAN_Receive_IT again and receive the msg from the driver to process it in my application.

Of course - as every workaround - here's also a drawback: If the application will take too long to finish sending and getting the msg from the driver in the ''main task'' a can msg will get lost. (Or whatever the driver does if a can msg is received before the previous msg was picked by the application)

For now i didn't observe this problem. I guess the timeslice between finishing sending and receiveing in the main task is too short for the problem to occur.

However, it may popup again when we ported some realtime processing to the application. Then we have a higher interrupt which may block the main task long enough for the problem to occur.

The proper solution should be to allow receiption in the receiption interrupt although currently sending is active (by introducing read/write lock like stated above)

Another possibility - without touching the driver - should be to make a software interrupt for sending msgs which is higher prio than the receiption interrupt. Then it will never come to a case where a receiption interrupt occurs while sending.

I'm a bit angry about this issue. You have a nice handy system which works properly in the demo software but when coming to real live conditions (lots of traffic) you run into such problems.