STM32F7xx/STM32F4xx: SPI3 RXNE is erroneously set after enabling the peripheral

nickbeth · ‎2024-05-06

We encountered an issue with the SPI3 peripheral on F4xx and F7xx series chips where the RXNE bit of the status register would be set right after enabling the SPI peripheral after re-configuring it.
This is clearly impossible as no transaction is performed at that time yet, and reading the DR further proves it: random data is read. We have reasons to believe this is a hardware bug. No other SPI peripheral exhibits this behavior, this only happens for SPI3.

We always use the SPI peripheral in 8 bit frame mode, with software slave management and with various clock divider settings (which do not affect the behavior).

We have always made the assumption that RXNE should be `0` at the beginning of a transaction, more so because we empty the RX queue at the end of each transaction. We placed an `assert((SPIx->SR & SPI_SR_RXNE) == 0)` at the beginning of the transaction code, to enforce this assumption.
We started to notice random reboots of our boards, when running code compiled in debug mode. We later realized that the above assertion was getting triggered by our sensors' code. That's when we decided to take the issue seriously and look into it.

We have found a reliable way to trigger the erroneous behavior on F7xx chips:

Configure the SPI3 peripheral to CPOL = 0, CPHA = 0 (mode 0)
Perform a transaction (write followed by read). E.g. retrieve a sample from a sensor
Re-configure the SPI3 peripheral to CPOL = 1, CPHA = 1 (mode 3)
After enabling the SPI3 peripheral the SR register holds the value `0x203` (RXNE flag set, FRLVL set to 1/4)

We've had the assertion trigger for F4xx chips as well, although rarely. The above steps do not seem to work on F4xx chips.
We have ensured that the steps we perform when reconfiguring the SPI peripheral follow the directions from the user manual and the programming manual.

We have been successful in reproducing the issue with the official HAL libraries, the behavior is exactly the same. This effectively rules out a bad implementation of the SPI driver on our side.
Stepping through the HAL code with a debugger shows that the SR register holds the `0x203` value after enabling the SPI peripheral, just like in our driver. Because of how the HAL functions are implemented, the RXNE flag is never checked when transmitting, and the RX queue is flushed afterwards, so the error is never checked nor caught.

We have opened an internal issue, where you can find our code and our proposed workaround to this issue: https://git.skywarder.eu/avn/swd/skyward-boardcore/-/merge_requests/235

I have attached a minimal entrypoint that is able to reproduce the issue. The entrypoint was developed for a STM32F767ZI Nucleo development board. Note that no sensor is required to trigger the bug, only the board alone.

nickbeth · ‎2024-05-08

Would love to get someone from ST to take a look at this. @SofLit not sure if you're the right person to tag, if you're not could you please redirect this to whom it may concern? Thanks!

SofLit · ‎2024-05-08

I think @Petr Sladecek may help you on this.

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
PS:
1 - This is NOT an online support (https://ols.st.com) but a collaborative space.
2 - Please be polite in your reply. Otherwise, it will be reported as inappropriate and you will be permanently blacklisted from my help.

Petr Sladecek · ‎2024-05-14

Hello,

please note the older versions of SPI applied at both STM32F4 and STM32F7 doesn't feature an automatic flushing of the data registers or FIFOs when SPI is disabled. User has to handle the flushing by SW before SPI is disabled and reconfigured. The IP HW reset performs the flushing only. I suggest study AN5543.

Best regards,

Petr

nickbeth · ‎2024-05-16

Hello @Petr Sladecek,

We have double checked that we perform a flush of the RX buffer before disabling the SPI peripheral, and we also read the AN5543 carefully but we are not able to spot any mistakes that we are making. I would like to remind that this only happens for SPI3, not for any other SPI peripheral on the chip.

Furthermore, the same behavior is observed when using HAL libraries by ST, which do perform the flushing by software before the SPI is disabled at the end of the HAL_SPI_Transfer and HAL_SPI_Receive functions. That should exclude any bad implementation on our side, and point towards an hardware bug.

Please take a look at the minimal reproducible sample I provided in the post with a debugger and read the values in the status register at every step.

Petr Sladecek · ‎2024-06-04

Hello,

it would be helpful to see your code handling the SPI configuration and the communication itself. It is not clear if even SPI3 is configured at master mode (I suppose so). It is not quite clear, too, what does it mean "Perform a transaction (write followed by read)". Do you apply sequence of unidirectional transmit & receive only modes or do you even handle a bidirectional mode on single data line? Or the SPI is all the time at full duplex configuration? What is the protocol with the sensor? What you can observe on the SPI bus. The number of the clock pulses is as expected?

Usually the easiest way is to apply full duplex mode while ignore all input data when master sends something out (command to sensor) and applies dummy data when read something in (data from the sensor). This assure full control of number data to be transacted by the master overall and prevent any not expected data reception at problematic termination of the receive only mode especially which is the most critical for these obsolete SPI design versions. For any trivial sequence like send command + read data, there is no need to apply too complex and universal HAL functions. More straightforward way is to apply polling and direct access of the SPI_DR (e.g. via LL drivers) at such a case.