2025-08-10 5:34 AM
Hi all,
Recently I faced pretty bug in the HAL implementation.
I am using STM32F103 with CubeMx 6.12.1 HAL, (but inside the SPI C-code I saw 2016th year).
SPI configured as Full-Duplex slave (using HW NSS, but it's doesn't matter).
Sometimes I had to abort a transaction, and after the abort I got HardFault.
Bug repeat sequence is:
1. Start SPI Rx over Interrupt
2. Abort Rx
3. Start SPI Tx over interrupt
4. Wait for receive all bytes
After figuring out, I found, that Abort handler does not clear SPI_CR2_TXEIE and SPI_CR2_RXNEIE flags in the SPIx_CR2 reg.
Therefore, after start Tx after Rx aborted, hspi->RxISR clears to NULL, but opposite TXIE flag kept set.
Then on an any byte completion the condition on stm32f1xx_hal_cpi.c:2447 appears true, but hspi->RxISR already set to NULL, which raises HardFault.
My proposal is:
1. Clear xxIE flags during Abort (to fix exact this bug)
2. Use HAL_SPI_ErrorCallback instead of NULL everywhere when hspi->RxISR and hspi->TxISR is setting to state when the interrupt is unexpected (to make easier found some other one if appeared)
Patch that solves the issue is attached to the message
2025-08-10 6:23 AM
Are you on the latest library? I feel like the abort logic is implemented here:
RXNEIE gets cleared here:
2025-08-10 8:13 AM - edited 2025-08-10 8:44 AM
Aha, I get your point, but RxISR and TxISR functions get called ONLY when some event appeared after Abort called. But in case if Abort called by the reason when transaction has been REALLY stopped by master, these functions will never be called, and the issue exists.
Bottom line, I have the lines you mentioned in my version, but the issue still exists.
I mean, that the sequence to execute the line 3957 looks like:
1. xfer transaction active
2. Called Abort (handler set to RxAbort or TxAbort)
3. Byte after Abort has been transferred
4. SPI_ISR appeared and general handler calls Tx/RxAbort by the pointer set on step 2
5. RXNEIE cleared properly.
.... But who guaranties that after step 2 master applied step 3? For example, we have bad connection or unexpected reset master by WDT, or I even wouldn't like to guess reason why some external (for my code) device will behave wrong. Am I wrong? :)
For reference, that's the my case:
1. On falling edge of NSS I start receiving
2. On rising edge of NSS I abort any transaction if it still going
Therefore, if my master resets or I start debug it, the next sequence appeared (sometimes):
1. Tx transfer is ongoing
2. Master reseted by debugger
3. After the initialization it set NSS line to HIGH.
4. I abort any current transaction
5. No more bytes clocked by master, therefore no RXNE or TXE events came. SPI Abort ISR not called
6. Master starts transfer by falling edge of NSS
7. I starts receiving new byte. SPI_TransmitIT resets TxISR to NULL
8. Isr handler see that the TXEIE is set, and try to call TxISR. But there is NULL from 7th step.
2025-08-10 8:25 AM - edited 2025-08-10 8:43 AM
(deleted, edited previous)
2025-08-10 11:49 AM
Yep, seems like something is a bit off. If it times out waiting for the state to change to abort, it should probably reset RXNEIE but isn't.
2025-08-10 1:38 PM
There is no timeout using _IT mode (at least in the slave mode). In any case, I think it's better to be sure that there is no to put MCU in HardFault by an external action.
BTW, from your links I just open that the code exists on GitHub :) Need I to make a PR there, or your team will fix it by themselves? :)