cancel
Showing results for 
Search instead for 
Did you mean: 

ST25R3918: Infinite loop in CE mode (FeliCa) during communication when RF field is lost

Tyama.1
Associate II

Hello,

I am using the ST25R3918 in NFC-F (FeliCa) Card Emulation (CE) mode, controlled by an MCU via RFAL (based on STEVAL-25R3916B_V2.1.0).

 

[Issue Description]

The system works correctly under normal conditions: the ST25R3918 receives commands like "Write Without Encryption" or "Read Without Encryption" from a reader and sends back the appropriate responses.

However, if the reader is moved away from the ST25R3918 (resulting in RF field loss) during the communication process, the MCU firmware falls into an infinite loop.

 

[Analysis Details]

The loop occurs within the do-while block of demoTransceiveBlocking() in card_emulation.c. The call stack is as follows: rfalNfcWorker() -> rfalWorker() -> rfalRunTransceiveWorker() -> rfalTransceiveRx()

Inside rfalTransceiveRx(), the state gRFAL.TxRx.state remains stuck at RFAL_TXRX_STATE_RX_WAIT_RXS. The function st25r3916GetInterrupt() is called with the following mask: (ST25R3916_IRQ_MASK_RXS | ST25R3916_IRQ_MASK_NRE | ST25R3916_IRQ_MASK_EOF)

Since no interrupts are triggered, it returns ST25R3916_IRQ_MASK_NONE, causing the rfalTransceiveRx() to break and re-enter the same state in the next cycle, preventing any progress.

I expected the ST25R3916_IRQ_MASK_NRE (No Response Error) to trigger a timeout, but it never occurs when the field is lost.

[Questions]

  1. Is it expected behavior that the NRE interrupt is not triggered when the RF field is lost in CE mode?

  2. What is the recommended way to handle this "field loss during transceive" scenario in RFAL to avoid an infinite loop? Should I explicitly monitor the External Field Event (e.g., ST25R3916_IRQ_MASK_EXT_COL or similar) to break the loop?

Any advice or insights would be greatly appreciated.

Thank you.

 

 

 

This discussion is locked. Please start a new topic to ask your question.
1 ACCEPTED SOLUTION

Accepted Solutions
Grégoire Poulain
ST Employee

Hi,

Thank you, indeed your observation and analysis seems consistent.
Below a few aspects to consider:

  • Active P2P
    On the driver, the TXE information is prioritized versus the Field information as in AP2P (ACM mode) each device field is alternating constantly, producing EON/EOF.
    In case AP2P is not required on your system, such adaptation may be ok.

  • Timeout mechanism
    The RFAL has a fail safe / timeout mechanism which will avoid such occurrences, see sanity timer 
    gRFAL.tmr.txRx.
    This mechanism is disabled in CE/Listen mode (as previously described) one shall wait endlessly for a Reader command.
    Once again, if on your application/system a maximum time is known / can be defined, then you may adapt the logic within rfalPrepareTransceive(), in order to start the sanity timer regardless of no FWT/timeout defined.

  • SPI speed
    800KHz is quite slow, particularly for CE/Listen mode operation which is impaired by this factor.

    T3T is not as stringent as T2T or T4T, but if feasible it is strongly recommended to significantly increase the SPI speed.
    Such will improve CE responses/transition times and narrow the chances for such race conditions in event processing.

Hope it helps

Kind regards
GP

View solution in original post

7 REPLIES 7
Grégoire Poulain
ST Employee

Hi,

Thank you for your detailed analysis.

In general could you better detail your test setup and procedure.

  • It is mentioned ST25R3918 STEVAL, GUI controlled and some APIs used in embedded demos.
    Could you please clarify which demo SW you use and whether adaptations on the drivers/demos code have been performed?

  • Removal procedure
    The behavior is seen every time you remove the RF carrier from Reader during card emulation. or requires procedure/sequence?

To you questions:

  1. NRE (No Response Timer Expired) IRQ occurs when there is a timeout.
    In CE/Listen mode there is no timeouts, a card/listener shall wait endlessly for an upcoming Reader/Poller command (as long as RF carrier is present).
    So yes, this is expected behavior.

  2. Normally the EOF interrupt should occur and a LINK_LOSS error reported.
    Unclear why on your system this is not triggering. Is it possible to share an SPI trace of the occurrence for analysis?

    Also, it is possible to externally monitor the presence of external field via the rfalIsExtFieldOn() and force the deactivation. Ideally this should not be required and the LINK_LOSS error reported

Kind regards
GP

Hi,

Thank you for your prompt and clear explanation regarding the NRE behavior in CE mode.

To answer your questions about my setup and the issue:

1. Environment & Software

Hardware: Custom MCU board connected to ST25R3918 via SPI.

Source Code: Based on STEVAL-25R3916B_V2.1.0. Specifically, I am using the RFAL stack and card_emulation.c from the ST25R3916Demo project.

Modifications:

rfal_platform.h: Adapted to my MCU (STM32L4 series equivalent HAL).

st25r3916.c: Commented out st25r3916CheckChipID( NULL ) in the initialize function.

card_emulation.c: Only modified the IDm value.

Other parts of the RFAL stack remain unchanged.

2. Procedure to Reproduce
The communication sequence is as follows (repeated every 100ms):

Reader sends "Write Without Encryption".

ST25R3918 sends response.

Reader sends "Read Without Encryption".

ST25R3918 sends response.

The infinite loop occurs randomly (approx. 1 out of 100 trials) when the reader is removed during this cycle. It is not 100% reproducible but happens frequently during stress tests.

3. SPI Trace Analysis
I managed to capture a brief SPI trace right when the issue occurs.
(S: MOSI from MCU to ST25R3918 / R: MISO from ST25R3918 to MCU)

The last recorded transactions were:

S: 0x5A (Read Interrupt Status)

R: 0x00, 0x08, 0x00, 0x00 (Result: EOF interrupt detected?)

S: 0xC4 (Close RX?)

S: 0x5A (Read Interrupt Status again)

R: 0x08, 0x00, 0x00, 0x00

After step 5, no further SPI activity occurs and no IRQ is triggered, even though the MCU remains in the do-while loop in demoTransceiveBlocking().

It seems the EOF interrupt (0x08) was actually reported by the chip, but for some reason, the RFAL state machine (RFAL_TXRX_STATE_RX_WAIT_RXS) does not seem to consume it or transition to an error state.

Could you please advise why the RFAL might get stuck even when the EOF bit is present in the interrupt status?

Apologies, I forgot to include the link to the software package I am using. It is STSW-ST25R018: https://www.st.com/en/embedded-software/stsw-st25r018.html

Grégoire Poulain
ST Employee

Hi,

Thank you for the clarification of the setup and sequence.

From the SPI data you collected it seems that the EOF IRQ is occurring immediately before the transmission (C4) of the card response.
In order to avoid triggering the transmission of the card response in absence of the Reader RF field, the operation before is exactly the check for RF Carrier - rfalIsExtFieldOn().

In case the transmit command is issued and the RF Carrier is lost within the SPI timing operation it may happen that the response is actually not transmitted. Nevertheless the EOF shall be processed in the next state (RFAL_TXRX_STATE_TX_WAIT_TXE) and error LINK_LOSS reported.

Questions:

  • How often is the RFAL Worker executed?
  • What is the SPI speed used?
  • For further analysis of the occurrence a complete SPI trace (with all signals, including IRQ and timings) would be required

Kind regards
GP 

Thank you for your insightful feedback. I have conducted further investigations based on your comments.

1. Answers to your questions:

SPI Speed: 800 kHz.

RFAL Worker frequency: It is called within a main loop as frequently as possible (polling).

SPI Trace: Unfortunately, capturing a complete trace with all signals (timings/IRQ) is difficult in my current environment, so I have focused on code-level debugging.


2. New Findings on the Potential Root Cause:
Based on my analysis, I suspect the following scenario might be causing the infinite loop. When the issue occurs, I observed a specific interrupt state in rfal_rfst25r3916.c within rfalTransceiveTx() under the RFAL_TXRX_STATE_TX_WAIT_TXE case:

When calling:
irqs = st25r3916GetInterrupt( (ST25R3916_IRQ_MASK_FWL | ST25R3916_IRQ_MASK_TXE | ST25R3916_IRQ_MASK_EOF) );

The value of irqs returns 0x0808. This indicates that both ST25R3916_IRQ_MASK_TXE and ST25R3916_IRQ_MASK_EOF are set simultaneously.


In the current RFAL implementation:

if( (irqs & ST25R3916_IRQ_MASK_TXE) != 0U )
{
gRFAL.TxRx.state = RFAL_TXRX_STATE_TX_DONE;
}

Since the TXE condition is evaluated first and transitions the state to TX_DONE, the EOF (field loss) flag is effectively ignored. Consequently, the state machine proceeds to RFAL_TXRX_STATE_RX_WAIT_RXS in rfalTransceiveRx(). However, since the reader has already been removed, no further interrupts occur, leading to the infinite loop in demoTransceiveBlocking().

3. Proposed Solution:
To prevent this, I am considering the following modifications in RFAL_TXRX_STATE_TX_WAIT_TXE:

Check for ST25R3916_IRQ_MASK_EOF before TXE to catch the field loss immediately and return an error.

Alternatively, implementing a timeout mechanism within the do-while loop in the application layer.

Grégoire Poulain
ST Employee

Hi,

Thank you, indeed your observation and analysis seems consistent.
Below a few aspects to consider:

  • Active P2P
    On the driver, the TXE information is prioritized versus the Field information as in AP2P (ACM mode) each device field is alternating constantly, producing EON/EOF.
    In case AP2P is not required on your system, such adaptation may be ok.

  • Timeout mechanism
    The RFAL has a fail safe / timeout mechanism which will avoid such occurrences, see sanity timer 
    gRFAL.tmr.txRx.
    This mechanism is disabled in CE/Listen mode (as previously described) one shall wait endlessly for a Reader command.
    Once again, if on your application/system a maximum time is known / can be defined, then you may adapt the logic within rfalPrepareTransceive(), in order to start the sanity timer regardless of no FWT/timeout defined.

  • SPI speed
    800KHz is quite slow, particularly for CE/Listen mode operation which is impaired by this factor.

    T3T is not as stringent as T2T or T4T, but if feasible it is strongly recommended to significantly increase the SPI speed.
    Such will improve CE responses/transition times and narrow the chances for such race conditions in event processing.

Hope it helps

Kind regards
GP

Hi,

Thank you very much for the clear and detailed explanation.

Your points regarding the prioritization of TXE due to AP2P (ACM mode) and the possibility of enabling the sanity timer (gRFAL.tmr.txRx) even in CE mode are extremely helpful.

I will proceed with the following actions based on your recommendations:

Increase the SPI speed to improve response times and reduce the likelihood of race conditions.

Implement a timeout/sanity timer within the RFAL or application layer, as our system does not require indefinite waiting in CE mode.

Adjust the IRQ handling logic in rfalTransceiveTx to better suit our specific requirements, since AP2P is not used in our application.

I truly appreciate your support and the deep dive into the RFAL behavior.

Best regards,