2026-02-02 1:07 PM
Hi,
I have been working on a project that is based on the ST25R3911b Discovery Board including building prototype firmware from the discovery kit's firmware. I have made sure not to edit any of the RFAL or HAL library files.
Recently I had brought this prototype to be pre-scanned for radiated immunity. During this testing, I discovered a bug that I have not seen before.
The process gets stuck in an infinite loop in the rfalTransceiveRunBlockingTx() function (found in rfal_st25r3911b.c).
Specifically on line 1267, while( (rfalIsTransceiveInTx() && (ret == ERR_BUSY)) );
In the rfalTransceiveTx() function, the process enters the RFAL_TXRX_STATE_TX_WAIT_TXE case and checks for interrupts. However, there will not be any interrupts to process, and so the process breaks without updating the gRFAL.TxRx.state.
This appears to me like the interrupt from the NFC chip was either not sent or was not received correctly, so when that Do while loop checks "(ret == ERR_BUSY)", it will always be busy because the no interrupts are received and the state isn't changed.
It looks like a sanity timer was added in newer versions of RFAL, but I am mainly interested to know if this issue was seen before. It is unclear to me whether or not this could potentially be a hardware issue or if it's just firmware.
Some extra notes:
The system works most of the time, so I am confident that an additional system is not causing the problem.
The problem is intermittent and I have no way to consistently reproduce it.
I can step out of the loop just fine and things work properly from there on.
Would anyone be able to shed some light on this?
Thank you,
Kaitlyn
Solved! Go to Solution.
2026-02-09 12:07 AM
Hi Kaitlyn,
I think the problem is the Calibrate Antenna which your software seems to issue asynchronously. The sequence towards the end is triggering sending an NFCV frame and then without waiting for I_txe, I_rxs/I_rxe/I_nre triggering the calibrate command. Such sequence is likely voiding the transmit and then software waits forever on I_txe.
The calibrate function should only be called synchronously while there is no ongoing transmit/reception.
Typically it is sufficient to call it only once at startup. Only if you are targeting changing environment you can try calling it once in a while.
BR, Ulysses
2026-02-02 11:40 PM
Hi,
could you provide the RFAL version being used (see RFAL_VERSION in rfal_rf.h)?
Rgds
BT
2026-02-03 12:58 AM
Hi Kaitlyn,
BR, Ulysses
2026-02-05 12:57 PM
Hi,
I apologize for the delayed response. I am struggling to reproduce the issue after setting up a logic analyzer. I will add an update later when I am able to provide the data.
I did get a chance to test a unit that was not used for immunity testing and I see the same issue there, so I doubt that those tests caused any issue.
@Brian TIDAL The RFAL version is stated to be v2.0.2. The RFAL files were ported from the ST25R3911b firmware.
2026-02-05 11:51 PM
Hi,
RFAL v2.0.2 is rather old but looking through the ReleaseNotes from latest v4.0.0 I didn't spot any bugs being fixed which sound like you are encountering. You could try to update to latest RFAL and see if it gets fixed. But my hopes are limited and for sure some APIs have changed since then.
Best to analyze based on a trace.
Ulysses
2026-02-06 11:36 AM
Hi,
Attached is the logic analyzer data of the SPI and IRQ communication in Saleae Logic 2 format as well as binary and csv formats. Also included is the register dump at the time of a hang.
I did not see any glaring issues with the data. Some things of note:
I do not know the exact timestamp of the lock-up. The registers are dumped after detecting the infinite loop, so possibly it is the last burst of communication or maybe somewhere in the last few. I do not see anything that may tell me when it happened.
tx_en is set to 1 every time the operation control register is read.
There are 5 interrupts after the last I_txe. The first time the no-response timer expire, and the rest are direct command termination interrupts.
One thing that jumps out to me is that the chip communication breaks down after the last 5 interrupts are read. The only operation being done after lock-up is calibrating the antenna and the system should continue to do so consistently. Maybe there is something here that I am missing, but I do not see anything strange.
My team has suggested trying to update the RFAL libraries, so I will work on that and see what happens as well as investigate the antenna calibration.
If anyone else has some insight or notices anything I missed, please let me know.
Thank you,
Kaitlyn
2026-02-09 12:07 AM
Hi Kaitlyn,
I think the problem is the Calibrate Antenna which your software seems to issue asynchronously. The sequence towards the end is triggering sending an NFCV frame and then without waiting for I_txe, I_rxs/I_rxe/I_nre triggering the calibrate command. Such sequence is likely voiding the transmit and then software waits forever on I_txe.
The calibrate function should only be called synchronously while there is no ongoing transmit/reception.
Typically it is sufficient to call it only once at startup. Only if you are targeting changing environment you can try calling it once in a while.
BR, Ulysses
2026-02-10 11:54 AM
Hi Ulysses,
You are right. I checked and saw a bug in the code that handled when the antenna should be calibrated. I have adjusted it and have done some long term testing and I have not seen the issue anymore.
Thank you very much for the help.
Best regards,
Kaitlyn