2023-03-15 09:39 AM
I've been doing the majority of my FDCAN development with 1 or 2 nodes.
Recently, I have expanded my "network" to 7 nodes. 4 of these nodes send data at the same time.
(I mean, 1 second intervals from power up. They are all powered simultaneously)
With the increase of these senders on my network (from 2 to 4) I noticed that one of them will "fill up" it's Tx FIFO and throw an error.
I know, and understand, the concept of "Wait for the buffers to purge before putting more in them"
However, at 1HZ, filling up the TxFIFO is ridiculous. Also, it never "frees" itself up.
I started looking a little deeper and discovered I can get the last error:
FDCAN_ProtocolStatusTypeDef psr;
HAL_FDCAN_GetProtocolStatus(&HFDCAN, &psr);
printf("Last error code %#lx\r\n", psr.LastErrorCode);
The error ends up being:
#define FDCAN_PROTOCOL_ERROR_BIT1 ((uint32_t)0x00000004U) /*!< Bit 1 (recessive) error */
If I alter this heartbeat on two of the devices to (even) just 1001 ms the problem doesn't occur.
So, it definitely seems like some kind of bus contention.
I took a good long read through AN5348
The only thing that seemed to be a possibility is:
HAL_FDCAN_EnableTxDelayCompensation()
I went ahead an enabled this:
if (HAL_FDCAN_ConfigTxDelayCompensation(&HFDCAN, 5, 0) != HAL_OK)
{
printf("HAL_FDCAN_ConfigTxDelayCompensation error\n\r");
Error_Handler();
}
if (HAL_FDCAN_EnableTxDelayCompensation(&HFDCAN) != HAL_OK)
{
printf("HAL_FDCAN_EnableTxDelayCompensation error\n\r");
Error_Handler();
}
My current settings are:
HFDCAN.Init.FrameFormat = FDCAN_FRAME_FD_BRS;
HFDCAN.Init.Mode = FDCAN_MODE_NORMAL;
HFDCAN.Init.AutoRetransmission = ENABLE;
HFDCAN.Init.TransmitPause = DISABLE;
HFDCAN.Init.ProtocolException = DISABLE;
HFDCAN.Init.NominalPrescaler = 8;
HFDCAN.Init.NominalSyncJumpWidth = 3;
HFDCAN.Init.NominalTimeSeg1 = 11;
HFDCAN.Init.NominalTimeSeg2 = 3;
HFDCAN.Init.DataPrescaler = 8
HFDCAN.Init.DataSyncJumpWidth = 3;
HFDCAN.Init.DataTimeSeg1 = 11;
HFDCAN.Init.DataTimeSeg2 = 3;
HFDCAN.Init.StdFiltersNbr = 28;
HFDCAN.Init.ExtFiltersNbr = 0;
HFDCAN.Init.TxFifoQueueMode = FDCAN_TX_FIFO_OPERATION;
AN5348 does not cover "Transmit Pause", so I don't know what that does and enabling it did not help
But it doesn't seem to help. Then again, if they all have the same value (5) maybe they are all still in perfect sync?
I am looking for more understanding and remediation ideas. Appreciate any help on this.
2023-03-16 02:47 AM
What about using the automatic retransmission feature? Clear DAR bit in CCCR.
2023-03-16 04:54 AM
I updated my questions with my configuration. I am using auto retransmission.
2023-03-16 05:36 AM
Okay.
Disclaimer: I'm also just learning FDCAN, so I'm not 100% sure about some of the following... ;)
You seem to assume that CAN bus has some built in logic to "sync" anything, which is not the case.
It's mostly only the physical layer.
If you let 2 STM32 transmit at exactly the same time, there will be errors due to collisions which you have to handle. The transmitter can detect that because it's listening while transmitting, so it sees if another node puts out dominant while transmitter outputs recessive.
Even if you have auto retransmission enabled, then both STM32 fire again like crazy, with the same problem.
So I would turn off auto retransmission, then build some logic to trigger a retranmission.
If a transmision fails, abort the TX buffer, save the data and retry after some time.
Now if you are again using 2 of the same STM32 as senders, collisions will happen again, so maybe put some randomness in the time you're waiting between retransmissions.
Transmit Pause: does not hurt to enable it, it let's the CAN transmitter wait 2 bit times before the next transmission.
2023-03-16 06:26 AM
I don't assume CAN can "sync" anything. I'm not sure where that came from. I do assume that when a message cannot be sent, and it's put into the FIFO that the transceiver sends it. When it can. There is no other handling. So you can put 3 messages into the buffer and when the transceiver sends them it does. In my case, the transceiver never is sending them. If I put multiple messages on the bus, and the transceiver has to wait a moment then it should.
At the moment, there is no way to catch this error UNTIL the txFIFO is filled (At least with my current code) Then, buffer is never flushed. Like the CAN transceiver has up and quit it's job.
2023-03-16 07:19 AM
I agree, that's how it should work.
You could always check FIFO levels with HAL_FDCAN_GetTxFifoFreeLevel().
You can also "manually" free a FIFO, first get the TX buffer it uses with HAL_FDCAN_GetLatestTxFifoQRequestBuffer(), then call HAL_FDCAN_AbortTxRequest().
I'm just playing around with that, here's some stuff triggered via UART commands:
/* TX message */
case UCMND_RCAN_TXMSG:
{
FDCAN_HandleTypeDef *phCanTx = &hCan1;
char szTxMsg[64] = { 0xA0 };
uint32_t u32StrLen = 0;
uint32_t u32DlcTx = 0;
uint32_t u32TxId = 0;
uint32_t u32TxBuf = 0;
uint32_t u32TickStart;
/* CAN 1 or 2 ? */
if( u8Uart3RxBuf[UART_CMND_BYTE_REGVAL1] == '2' )
{
phCanTx = &hCan2;
/* ???? */
u32TxBuf = 1;
uart_printf("FDCAN2 TX\n\r");
}
else uart_printf("FDCAN1 TX\n\r");
sscanf((char *)u8Uart3RxBuf, "%*s %*s %lX %s", &u32TxId, &szTxMsg[0]);
uart_printf("u32TxId = 0x %08lX\n\r", u32TxId);
u32StrLen = (uint32_t)strlen(szTxMsg);
uart_printf("szTxMsg[%lu/64] = %s\n\r", u32StrLen, szTxMsg);
/* length code */
if( u32StrLen > 48 ) u32DlcTx = FDCAN_DLC_BYTES_64;
else if( u32StrLen > 32 ) u32DlcTx = FDCAN_DLC_BYTES_48;
else if( u32StrLen > 24 ) u32DlcTx = FDCAN_DLC_BYTES_32;
else if( u32StrLen > 20 ) u32DlcTx = FDCAN_DLC_BYTES_24;
else if( u32StrLen > 16 ) u32DlcTx = FDCAN_DLC_BYTES_20;
else if( u32StrLen > 12 ) u32DlcTx = FDCAN_DLC_BYTES_16;
else if( u32StrLen > 8 ) u32DlcTx = FDCAN_DLC_BYTES_12;
else u32DlcTx = u32StrLen << 16;
/* TX FIFO queue */
if( u8Uart3RxBuf[UART_CMND_BYTE_REGVAL2] == 'f' )
{
if( u8Uart3RxBuf[UART_CMND_BYTE_REGVAL3] == 'x' || u32TxId > CAN_FILTER_STD_BIT_MASK )
{
uart_printf("FIFO extended, DLC = %08lX\n\r", u32DlcTx);
CanTxPrepDataExt(&sCanTxHdr, u32TxId, u32DlcTx, 0x1F);
}
else
{
uart_printf("FIFO standard, DLC = %08lX\n\r", u32DlcTx);
CanTxPrepDataStd(&sCanTxHdr, u32TxId, u32DlcTx, 0x2F);
}
/* message to FIFO queue */
if( HAL_FDCAN_GetTxFifoFreeLevel(phCanTx) == 0 ) u8Val = HAL_ERROR;
else u8Val = CanTxMsg2FifoQ(phCanTx, &sCanTxHdr, (uint8_t *)&szTxMsg[0]);
if( u8Val != HAL_OK ) uart_printf("# ERR: CanTxMsg2FifoQ()\n\r");
else
{
uint32_t u32TxFifoRqst = HAL_FDCAN_GetLatestTxFifoQRequestBuffer(phCanTx);
u32TickStart = HAL_GetTick();
/* AutoRetransmission disabled DAR = 1 */
if( phCanTx->Instance->CCCR & FDCAN_CCCR_DAR )
{
while( ((phCanTx->Instance->TXBTO & u32TxFifoRqst) == 0) &&
((HAL_GetTick() - u32TickStart) < CAN_FD_TIMEOUT_TX_MS ) );
uart_printf("TXBTO = %08lX\n\r", phCanTx->Instance->TXBTO);
if( (phCanTx->Instance->TXBTO & u32TxFifoRqst) == 0 )
{
uart_printf("# ERR: Timeout TXBTO, calling HAL_FDCAN_AbortTxRequest()\n\r");
/* abort */
HAL_FDCAN_AbortTxRequest(phCanTx, u32TxFifoRqst);
}
else
uart_printf("... buffer TX occurred\n\r");
}
/* AutoRetransmission enabled DAR = 0 */
else
{
while( HAL_FDCAN_IsTxBufferMessagePending(phCanTx, u32TxFifoRqst) &&
((HAL_GetTick() - u32TickStart) < CAN_FD_TIMEOUT_TX_MS ) );
if( HAL_FDCAN_IsTxBufferMessagePending(phCanTx, u32TxFifoRqst) )
{
uart_printf("# ERR: Timeout HAL_FDCAN_IsTxBufferMessagePending(), calling HAL_FDCAN_AbortTxRequest()\n\r");
/* abort */
HAL_FDCAN_AbortTxRequest(phCanTx, u32TxFifoRqst);
}
}
}
uart_printf("TXBRP = %08lX\n\r", phCanTx->Instance->TXBRP);
uart_printf("TxFifo Free Level = %lu\n\r", HAL_FDCAN_GetTxFifoFreeLevel(phCanTx));
}
/* TX buffer */
else
{
if( u8Uart3RxBuf[UART_CMND_BYTE_REGVAL2] == 'x' || u32TxId > CAN_FILTER_STD_BIT_MASK )
{
uart_printf("BUF extended, DLC = %08lX\n\r", u32DlcTx);
CanTxPrepDataExt(&sCanTxHdr, u32TxId, u32DlcTx, 0x3B);
}
else
{
uart_printf("BUF standard, DLC = %08lX\n\r", u32DlcTx);
CanTxPrepDataStd(&sCanTxHdr, u32TxId, u32DlcTx, 0x4B);
}
/* message to TX buffer */
CanTxMsg(phCanTx, &sCanTxHdr, (uint8_t *)&szTxMsg[0], u32TxBuf);
uint32_t u32TxBufIdx = (uint32_t)1 << u32TxBuf;
u32TickStart = HAL_GetTick();
/* AutoRetransmission disabled DAR = 1 */
if( phCanTx->Instance->CCCR & FDCAN_CCCR_DAR )
{
while( ((phCanTx->Instance->TXBTO & u32TxBufIdx) == 0) &&
((HAL_GetTick() - u32TickStart) < CAN_FD_TIMEOUT_TX_MS ) );
uart_printf("TXBTO = %08lX\n\r", phCanTx->Instance->TXBTO);
if( (phCanTx->Instance->TXBTO & u32TxBufIdx) == 0 )
{
uart_printf("# ERR: Timeout TXBTO, calling HAL_FDCAN_AbortTxRequest()\n\r");
/* abort */
HAL_FDCAN_AbortTxRequest(phCanTx, u32TxBufIdx);
}
else
uart_printf("... buffer TX occurred\n\r");
}
/* AutoRetransmission enabled DAR = 0 */
else
{
while( HAL_FDCAN_IsTxBufferMessagePending(phCanTx, u32TxBufIdx) &&
((HAL_GetTick() - u32TickStart) < CAN_FD_TIMEOUT_TX_MS ) );
uart_printf("TXBRP = %08lX\n\r", phCanTx->Instance->TXBRP);
if( HAL_FDCAN_IsTxBufferMessagePending(phCanTx, u32TxBufIdx) )
{
uart_printf("# ERR: Timeout HAL_FDCAN_IsTxBufferMessagePending(), calling HAL_FDCAN_AbortTxRequest()\n\r");
/* abort */
HAL_FDCAN_AbortTxRequest(phCanTx, u32TxBufIdx);
}
else
uart_printf("... no more TxBuffer Message Pending\n\r");
}
}
break;
}
maybe my error analysis might also help, checking PSR and ECR:
/* Protocol Status */
case UCMND_RCAN_PSR:
{
FDCAN_HandleTypeDef *phCan = &hCan1;
if( u8Uart3RxBuf[UART_CMND_BYTE_REGVAL1] == '2' )
{
phCan = &hCan2;
uart_printf("FDCAN2->PSR\n\r");
}
else
{
uart_printf("FDCAN1->PSR\n\r");
}
u32Val = phCan->Instance->PSR;
uart_printf("PSR = %08lX\n\r", u32Val);
uart_printf("\tTDVC = %lu mtq\n\r", ((u32Val & FDCAN_PSR_TDCV_Msk) >> FDCAN_PSR_TDCV_Pos));
uart_printf("\tPXE = %c\n\r", ((u32Val & FDCAN_PSR_PXE) ? '1' : '0') );
uart_printf("\tREDL = %c\n\r", ((u32Val & FDCAN_PSR_REDL) ? '1' : '0') );
uart_printf("\tRBRS = %c\n\r", ((u32Val & FDCAN_PSR_RBRS) ? '1' : '0') );
uart_printf("\tRESI = %c\n\r", ((u32Val & FDCAN_PSR_RESI) ? '1' : '0') );
u32Val2 = ((u32Val & FDCAN_PSR_DLEC_Msk) >> FDCAN_PSR_DLEC_Pos);
uart_printf("\tDLEC = %02lX -> error: ", u32Val2);
if( u32Val2 == 0 ) uart_printf("none\n\r");
else if( u32Val2 == 1 ) uart_printf("stuff\n\r");
else if( u32Val2 == 2 ) uart_printf("format\n\r");
else if( u32Val2 == 3 ) uart_printf("no ACK\n\r");
else if( u32Val2 == 4 ) uart_printf("bit 1\n\r");
else if( u32Val2 == 5 ) uart_printf("bit 0\n\r");
else if( u32Val2 == 6 ) uart_printf("CRC RX\n\r");
else if( u32Val2 == 7 ) uart_printf("no change\n\r");
uart_printf("\tBO = %c\n\r", ((u32Val & FDCAN_PSR_BO) ? '1' : '0') );
uart_printf("\tEW = %c\n\r", ((u32Val & FDCAN_PSR_EW) ? '1' : '0') );
uart_printf("\tEP = %c\n\r", ((u32Val & FDCAN_PSR_EP) ? '1' : '0') );
u32Val2 = ((u32Val & FDCAN_PSR_ACT_Msk) >> FDCAN_PSR_ACT_Pos);
uart_printf("\tACT = %02lX ", u32Val2);
if( u32Val2 == 0 ) uart_printf("syncing COM\n\r");
else if( u32Val2 == 1 ) uart_printf("idle\n\r");
else if( u32Val2 == 2 ) uart_printf("receiving\n\r");
else if( u32Val2 == 3 ) uart_printf("transmitting\n\r");
u32Val2 = ((u32Val & FDCAN_PSR_LEC_Msk) >> FDCAN_PSR_LEC_Pos);
uart_printf("\tLEC = %02lX -> error: ", u32Val2);
if( u32Val2 == 0 ) uart_printf("none\n\r");
else if( u32Val2 == 1 ) uart_printf("stuff\n\r");
else if( u32Val2 == 2 ) uart_printf("format\n\r");
else if( u32Val2 == 3 ) uart_printf("no ACK\n\r");
else if( u32Val2 == 4 ) uart_printf("bit=1\n\r");
else if( u32Val2 == 5 ) uart_printf("bit=0\n\r");
else if( u32Val2 == 6 ) uart_printf("CRC RX\n\r");
else if( u32Val2 == 7 ) uart_printf("no change\n\r");
u32Val = phCan->Instance->ECR;
uart_printf("ECR = %08lX\n\r", u32Val);
uart_printf("\tCEL = %lu CAN error log\n\r", ((u32Val & FDCAN_ECR_CEL_Msk) >> FDCAN_ECR_CEL_Pos));
uart_printf("\tRP = %s\n\r", ((u32Val & FDCAN_ECR_RP) ? "1 RX err >128" : "0") );
uart_printf("\tREC = %lu RX err\n\r", ((u32Val & FDCAN_ECR_REC_Msk) >> FDCAN_ECR_REC_Pos));
uart_printf("\tTEC = %lu TX err\n\r", ((u32Val & FDCAN_ECR_TEC_Msk) >> FDCAN_ECR_TEC_Pos));
break;
}
2023-03-16 07:53 AM
Right. I've been able to check after send for protocol level errors. I am seeing an error code of 0x4 (This is in my original post)
When that occurs, I abort the transaction and I restart the CAN interface.
Unfortunately, this does not solve the problem. The hardware is non-recoverable.
HAL_FDCAN_GetProtocolStatus(&HFDCAN, &psr);
if (psr.LastErrorCode > 0 && psr.LastErrorCode < 7)
{
printf("Protocol error: %#lx\r\n", psr.LastErrorCode);
HAL_FDCAN_AbortTxRequest(&HFDCAN, HFDCAN.LatestTxFifoQRequest);
// stops and restarts CANFD
start_can();
return;
}
2023-03-16 08:31 AM
> The hardware is non-recoverable.
I can recover the FDCAN by calling HAL_FDCAN_DeInit(&hCan1), then MX_FDCAN1_Init();
My FDCANs were also hanging after failed transmissions, especially with auto retranmit and no timeout / limit.
2023-03-16 08:58 AM
Correct. That does recover the bus. For me, I have an outer loop that then continues to send on the exact same period. I need to implement a method to skew the start of that outer loop. Even a 1ms delay solves that problem. So I am working on that, and it should be a solution (In addition to catching the bus errors IF they do occur)