2022-09-19 3:45 PM
I've got a weird situation.
We've got a CAN bus going between our controller and a CAT ECU with a couple of PCs watching the conversation going by at about 80 messages per second and we can get a couple of million packets with zero errors. (That's not the weird part)
Next, buddy plugs in his Dell laptop with his Kvaser CAN tap and very occasionally (like 200 in 2 million) our controller sees form errors.
Buddy has two Kvaser taps, one with two CAN taps and one with one tap. Swapping between the two makes zero diff.
If we remove our controller from the bus, no form errors are reported.
After much screwing around we figured out that our controller was noticing the form error with the ACK bit, right after the checksum delimiter happening slightly early. It turns out that buddy's Dell laptop was set up for 8 time quanta whereas the rest of the bus was using 16 time quanta. Dell box had big sloppy bits and everybody else had tight timing.
Correcting the time quanta on the Dell box fixed the issue, going back to zero errors in many millions of packets.
But it was always the STM32 that would notice the form error. None of the other devices on the bus would notice the form error. It was always showing up as a long ACK bit, giving a good indication that it was a CRC delimiter violation (CRC delimiter must be recessive, but was being stomped by an early ACK). Looking at scope traces of the offending packet and the resend, you'd be hard pressed to see any difference in the waveforms (I stared at many many) and they all look the same, other than a very slightly early ACK by someone on the bus.
I have my bit timing set to the 75% point. SJW=4, BS1=11, BS2= 4, 250,000baud. I wouldn't expect that being slightly early would have the Dell's ACK bit encroaching on the 75% point of the CRC delimiter. The scope says no, the form error from bxCAN says yes.
No question as such, just is there something that I'm overlooking?
Opinions?
Thanks,
Andrei
2022-09-22 5:47 AM
Hello Andrei,
from what you have described I think it might be caused by some marginal oscillator drift in your controller which in combination with the bad settings of the Dell and maybe also "late" resynchronization lead to exceeding the maximum possible error in timing, i. e. 4 TQ. In such case, due to the accumulated error, the sampling point of the CRC delimiter bit would really fall into the early ACK bit on the bus which would lead to the form error you saw.
But without further specification and/or waveforms of the problematic frames it is really hard to guess...
Jaroslav
To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.
