STM32G474 FDCAN TX buffer cancellation finish IRQ does not trigger

Johannes · ‎2023-09-12

Hello Everyone
I am struggling to get STM32G474 FDCAN working, especially error handling.

Normal transmission works fine. I use TXQueue (not TXFifo) mode. I can write messages to each of the 3 message buffers. Messages are sent, I get a Transmit Occured Interrupt, I know that the message is successfully sent. All fine.

If I disconnect the bus (no peer) messages are not acknowledged by another CAN node, the message is NOT sent properly (all fine).

I am working in "auto Retransmission = disable) (DAR) mode.
When I put a message into a tx buffer, the message is sent with an error and disappears from that txbuffer

There are two registers:
FDCAN_TXBRP (transmit buffer request pending)
FXCAN_TXBCF (cancellation finished)

I can see FDCAN_TXBRP [TRP] is cleared (because the message no longer pending, because it was faulty, and no retry -> done sending, but faulty, transmit request is cleared)

I can see FDCAN_TXBCF[CF] is set (!) Transmit cancellation is finished. Obviously the hardware has canceled the faulty transmission and has set TXBCF.

I have FDCAN_ID [TCFE] (Transmit Cancellation finish interrupt enable) enabled

This interrupt never triggers despite the fact, that FDCAN_TXBCF is set

How to reproduce this:

enable FDCAN
use DAR (auto retransmission disable)
use TXQUEUE mode (not TX Fifo)
enable FDCAN_IE[TCFE] transmit cancel finish interrupt enable
enable FDCAN_TXBCIE transmit cancellation finished interrupt enable register for each transmit buffer
use an open CAN-Bus (no peer)
try to send messages by copying them into the message buffer
add a transmission request by setting the appropriate bit in FDCAN_TXBAR
observe FDCAN_TXBRP, see that the request pending bit is 1 and then switches to 0
observe FDCAN_TXBCF, see that transmit cancellation finish bit is set
IRQ never triggered.

Have anyone experienced the same? Is there no way to get an IRQ on a message which are not sent becaue of errors and automatically disappear from the TXBuffer?

At the moment I only get IRQs on successful message sent, but not on failed tx attempts.

Johannes

joh06937 · ‎2023-10-05

I'm dealing with something similar (issues sending under DAR mode while only device on bus; when ACKed everything is fine). I also don't see any transmit cancellation finished IRQs. Right now I'm getting those notifications via the arbitration protocol error interrupt (FDCAN_IR->PEA). If you're using a different baudrate for data than arbitration, you'll want PED.

That being said, even though I get a number of IRQs for the failure to ACK the message (as expected), I also see my system stop transmitting after between 4 and 16 messages (seems to vary). I set FDCAN_TXBAR, but nothing happens after that. So while my suggestion might get you a little further, it also might just give you another roadblock. :\

Of course, this post is a month old, so maybe you've figure this out/worked around it somehow. If you are successfully up and running, I'd love to know how you fixed your issue!

Johannes · ‎2023-10-08

Unfortunately I have not solved the the issue. I use a workaround for now. I set the system to "auto retry". Messages in the transmit mailboxes are not dropped and the system tries to send them out indefinitvely. Once the bus comes back to life, the messages in the message boxes are sent.
The downside of this: There are very old messages staying in the mailboxes. And when the bus comes back, these old messages are sent.

unicyclebloke · ‎2024-04-10

Were you able to resolve this? I have hit the same problem with the PEA interrupt. There is also some curious interaction with the debugger which I don't understand.

I am sending two messages to a motor controller every 100ms (a test). This normally works just fine but I turn off power to the controller to simulate a fault. That's when I get the PEA interrupts. That works for a short while (16 messages or so) and then I stop getting interrupts.

If I put a break point on the ISR, I can see that the Transmit Error Counter increments with each error. It seems to go up by 8 each time, which is odd. It stops incrementing when it reaches 128. Now the odd thing is that I can keep pressing continue in the ISR, and fresh messages are written to the bus, resulting in another interrupt. But if I remove the breakpoint and continue, there are no further interrupts. I don't understand this. It doesn't appear to be a race condition in the software. If I send only one message on each 100ms tick, it seems that I do get the interrupts after all, but I'll have to see if I can repeat that. I tried reading the error counter register to reset it, but this does not reset the TEC field.

I'm using a logic analyser to view the bus and noticed that the ends of some messages are marked as "NAK" and others as "Error". The "Error" messages show the CAN TX pin (the STM32 pin not the transceiver CANH) low for a few bit times after the message has been written. The "NAK" messages show the CAN TX pin high after the message has been written. I don't understand what causes these two behaviours but expected only NAKs.

Salyzyn · ‎2024-04-11

STM32G491, I am also apparently losing CAN TX completion/abort interrupts, DAR off, Errors or no Errors at point of failure with no solution yet. But what I do experience is occasional errors, enough after ~3000 message to experience the problem. I believe that these errors depend on the electrical health of the bus, so on a clear channel, you can run forever. My postulate is my driver code is not performing a proper recovery and completion action for some of the errors and after seeing these postings believe there is some action for PEA and PED that I need to perform, currently nothing in that callback.

Many of your issues can be explained. The reason for your +8 is documented to weigh TEC issues higher, recovery is -1 for each idle. Single stepping likely results in an idle bus and why it continues, after continue in the ISR. TEC, REC, Passive and CEL can not be reset manually, except by resetting the entire FDCAN peripheral (meaning both CAN1 and CAN2, so a WIDE net) via RCC. These counters are by-design not resettable via software to prevent Denial-Of-Service to the CAN bus by errant software.

joh06937 · ‎2024-04-11

It probably doesn't help either of you, but my issues came down to the way I was using the CAN buffers. Essentially my application is fine having things being sent single-file file using a single buffer, so I was attempting to reuse the same buffer each time. I discovered that if I updated a buffer and marked it to be sent immediately after the arbitration error condition occurred, I would get into my weird state of the peripheral halting after a few errors. Once I switched to using two buffers, my issue went away.

unicyclebloke · ‎2024-04-11

Thanks for the reply. I learned a lot more about the CAN standard and error handling overnight, so my ignorance is less, and I understand now about how TEC is working. Everything works as expected until TEC reaches 0x80 and the device switches to Passive Error state. The funny thing I'm seeing now is that the I get no further interrupts, as if the device is Bus Off. Unless I've missed an interrupt flag I need in this case.

I'm able to recover the bus by resetting the peripheral as you say. I was a bit surprised at the need, but it's not a burden.

Salyzyn · ‎2024-04-11

Look at https://community.st.com/t5/stm32-mcus-products/stm32h7-fdcan-has-lost-the-automatic-bus-off-recovery-mechanism/td-p/187400

Salyzyn · ‎2024-08-27

I solved the problem in my case, once the STM32G FDCAN hardware switched to passive mode and notified by HAL_FDCAN_ErrorCallback, where TEC counter was 128 or more, we had to incorporate at least an additional 10-bit-time delay (20us at 500Kbps) before hitting the TXBAR otherwise the FDCAN transmitter would just ignore the request and will neither interrupt us for success or failure. If one checks the active transmissions, bits are all clear. Our product adds a 50% engineering margin to this delay. With that we progressed from 16, to 32 attempted commands until the Bus Off condition, where we incorporated the the above mentioned bus off recovery mechanism. If we followed this regime, we could also quickly start working again once the bus condition was restored.