ST25R3911B RFAL Library - i_err bit remains high - infinite loop

fredj0207 · ‎2023-12-15

Hello,

I'm using RFAL v2.6.0 and ST25R3911B and I get an issue with the IRQ management.

Using ISO14443 Type-A tag, the IRQ remains high causing the while loop not to exit.

In this condition, the i_err bit of the main interrupt register is '1' while the two interrupt registers (Timer and NFC interrupt register and Error and wake-up interrupt register) have all of their bits to '0'.

Reading these register in a loop cannot clear the i_err bit.

The last content prior the lock state is:

- Main interrupt register: 0x33 -> i_rxs, i_rxe, i_tim, i_err

- Timer and NFC interrupt register: 0x20

- Error and wake-up interrupt register: 00 (Should be different from 0)

fredj0207 · ‎2023-12-18

Hi Ulysses,

Yes, you are right.

But trying first this condition, I also see other rarer cases where the iregs[2] was not null (as expected, with a crc error for instance) and reading the registers does not success to clear the main register too. Only iregs[2] was cleared by the read operation.

I will send you traces if I can get them one day :)

Regards,

Frederic

fredj0207 · ‎2024-01-02

Hi Ulysses,

First, may the New Year bring you everything you wish.

I get another issue with the code above when trying to use rfalT2PollerWrite function. It always results in CRC error.

I try to trace the IRQ registers when I send the command and I get the following sequences :

With my final code (no condition on iregs[2])

I (68421) DEBUG: (HEAP:171052) (1704198943:194265000) [fart_scan.c:166] Found tag: 0414
I (68441) DEBUG: (HEAP:171052) (1704198943:208944000) [st25r3911_interrupt.c:119] 08, 00, 00
I (68441) DEBUG: (HEAP:171052) (1704198943:213070000) [st25r3911_interrupt.c:119] 31, 00, 80
I (68451) TRACE: (HEAP:171052) (1704198943:220982000) [rfid.c:567] [MSL-RFID] Rfal error (T2 write), sts=21

With your proposal (condition iregs[2] == 0) I get a lock into the while loop

I (57131) DEBUG: (HEAP:171048) (1704199559:580346000) [st25r3911_interrupt.c:119] 32, 20, 00
I (57141) DEBUG: (HEAP:171000) (1704199559:589881000) [fart_scan.c:166] Found tag: 0414
I (57151) DEBUG: (HEAP:171000) (1704199559:604509000) [st25r3911_interrupt.c:119] 08, 00, 00
I (57161) DEBUG: (HEAP:171000) (1704199559:608634000) [st25r3911_interrupt.c:119] 31, 00, 80
I (57161) DEBUG: (HEAP:171000) (1704199559:616370000) [st25r3911_interrupt.c:119] 03, 20, 80
I (57171) DEBUG: (HEAP:171000) (1704199559:625484000) [st25r3911_interrupt.c:119] 01, 00, 80
I (57181) DEBUG: (HEAP:171000) (1704199559:634599000) [st25r3911_interrupt.c:119] 01, 00, 80
I (57191) DEBUG: (HEAP:171000) (1704199559:643715000) [st25r3911_interrupt.c:119] 01, 00, 80
...

Removing the while condition (and no extra code to clear the register), it works and I get the following:

I (60701) DEBUG: (HEAP:171180) (1704197378:880586000) [fart_scan.c:166] Found tag: 0414
I (60711) DEBUG: (HEAP:171180) (1704197378:895260000) [st25r3911_interrupt.c:119] 08, 00, 00
I (60721) DEBUG: (HEAP:171180) (1704197378:899392000) [st25r3911_interrupt.c:119] 31, 00, 80
I (60721) DEBUG: (HEAP:171180) (1704197378:907161000) [st25r3911_interrupt.c:119] 03, 20, 80

As far I understand the CRC error may be expected for Type 2 operation as ack/nack are on 4 bits only. The fix we made is so not correct as we lost the fifo content in this case. Any ideas ?

Brian TIDAL · ‎2024-01-02

Hi Frederic,

the 4-bits ACK/NACK in type 2 causes an I_crc and/or an I_par interrupt that is internally converted into an RFAL_ERR_INCOMPLETE_BYTE return code. The rfalT2TPollerWrite function then checks that this incomplete byte is an ACK. Thus, having an I_crc is an expected behavior in case of 4 bits ACK/NACK reception.

It would really be helpful to have an SPI trace (CLK/MISO/MOSI/CS+ ST25R3911B_IRQ) to better understand your issue.

Some additional questions:

have you tried on different boards?
what are the Vdd and Vdd_io values on your custom board?
is your custom board powered by a battery or by the mains?

Rgds

BT

In order to give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

fredj0207 · ‎2024-01-03

Hello Brian,

As I said before unfortunately I do not have any advanced scope to get the SPI trace. And there is no test point on the custom board. I only have total Phase beagle. If I have time I will try to get data.

- We are using ST25R3911B or ST253916 for 3 years now on different custom board versions.

We always had a problem with the while loop in the RFAL (for NFC Type 2 tags). The only way to get the library worked was to remove the while loop.

On the last firmware version I decided to remove the IRQ management that can be problematic on espressif and use polling. I still have the same issue with NFC Type 2 tags.

- Vdd is 5V.

- Vdd_io is 3.3V.

- The board is powered by the mains (12V then two stages regularor, plus CMCC filter).

Regards,

Fred

Grégoire Poulain · ‎2024-01-05

Dear Fred,

This behavior is not known and the only efficient way to proceed is to retrieve a SPI trace (incl IRQ line) of the occurrence, so that we can perform technical analysis.

Meanwhile let me share some additional details:

Loop inside the RFAL
The loop is intended to cope with limitations of certain MCUs.
Depending on configuration (Level-trigger vs Edge-trigger), in case one IRQ occurs while reading the previous one, the ISR may not be called ever again.
This matter is covered in detail on RFAL User Manual (UM2890) Section 6.9

Can your system have the ST25R3911B IRQ configured to level-triggered ?
HW vs SW Chip Select
Also covered on the RFAL User Manual (UM2890) Section 6.8, usage of HW chip select can be problematic.
Can you confirm that the right configuration is deployed?
"set ST25R_COM_SINGLETXRX configuration and clear the APIs: platformSpiSelect() and platformSpiDeselect()"

Also when chip select is handled by the HW it may violate the required SPI timings (tNCSL / tNCSH).
Only with a SPI trace we'll be able to verify the timings.

Hope it helps
We look forward for a SPI trace of the occurrence.

Kind regards
GP

fredj0207 · ‎2024-01-05

Hi Gregory,

Thanks for the information.

HW vs SW Chip Select:

Yes we are using the right configuration. And CS is hardware managed.

As I discussed with Brian in private message, I can imagine we have some violation in SPI. Why not ?

But it cannot explain why we are not able to recover by sending another read.

Because how can you explain that the read operations works 95% of time and when the issue occured, it was impossible to clear the status by reading the register for ever. And at the same time sending other commands through SPI (such as clear command) works fine. It means SPI is not stuck and so one of the reading operation should succeed to clear the register.

My feeling (maybe I'm wrong) is we are not looking on the right point. I have been working in ST, Atmel, Inside and Starchip companies for a long time and it looks more at a rare timing race issue in HW.

Whatever, I will send SPI traces as soon as possible.

Regards,

Frederic

Ulysses HERNIOSUS · ‎2024-01-08

Hi,

I am reading down there about TotalPhase Beagle. It looks like only providing the data packets plus timings not no waveforms.

We have often seen issues only in the waveforms so a logic analyzer is typically superior for this use case. If possible try to get one as well.

BR, Ulysses