How to (successfully) apply mutex for queueing SPI jobs?

NSR · ‎2021-03-08

I'm trying to implement a queue system for external storage attached to the SPI.

I have three parts:

Add an item to the queue

Process an item from the queue

HAL_SPI_RxCpltCallback (my implementation to override the weak default implementation)

In doing so I've found that I have a need to stop the interrupts when I update the queue and then enable; this is some six 'C' instructions; four to update the structure in the current array element, one to move the pointer to the next array element; we'll call this the job index and one to determine the workload remaining. The process item function is called when the current element pointer has value '1' to start the process.

The process item has a separate pointer indicating the job it is working on; we'll call this the current index. This part sends out the address to the SPI via HAL_SPI_Transmit and then calls HAL_SPI_Receive_IT to receive the corresponding number of bytes from the SPI, triggering the appropriate interrupt when complete.

The call back updates the current index and if the same, resets both to zero to avoid overloading the array - the array is plenty big enough and I can see from the traffic that it no where gets near to the actual size of it. If the two index pointers are different, ie. job index > current index then the next process item is initiated.

Sometimes, everything goes a little wrong - the process stops reading while in the middle of HAL_SPI_Receive_IT transaction, for example I know that the job was 96x 16-bit words and hspi->Instance->RxXferSize == 96, hspi->Instance->RxXferCount == 72, 42 bytes have been read (or 21x 16-bit words) or I change the prescaler and I get clock stretching on the transmit read command on the address.

I'm pretty certain that the __disable_irq() and __enable_irq() calls are the culprits that surround the five operations in my job adding function and I would like to know suggestions for a better way to do this. The global interrupt handling was added as I was fairly confident that a completion interrupt was being triggered whilst the control was within this block, causing the processIndex to be reset while the jobIndex retained value. However, this method is probably overkill and is likely the cause to the SPI transaction stalling.

NSR · ‎2021-03-10

Ok, after much reading and debugging, which is somewhat erratic tracing interrupts through code, I think I've found the solution that I'll share for other people out there.

Using the HAL_SPI_Receive_IT() would appear far from ideal; I'm running it at 15 MHz with a 480 MHz core clock suggesting that instead of the final interrupt at the end that I'm expecting, it is in fact generating some 1,875,000 interrupts / second - every byte. I learned this after finally finding the SPI interrupt fault code (SPI_SR_OVR / SPI_IT_OVR or overrun flag set) that is similarly discussed in https://community.st.com/s/question/0D50X0000C3Aein/overrun-flag-when-using-spi-in-interrupt-mode-halspireceiveit

The solution for now would seem to use blocking mode to move things along while we explore DMA and possibly more enhanced possibilities from QSPI.

Tesla DeLorean · ‎2021-03-10

Guessing H7, DMA would be less burdensome for sure, but needs cache coherency to be addressed.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

NSR · ‎2021-03-10

While I agree that DMA would be less burdensome, I had originally looked at the interrupt method owing to the potential frequent operations where only a 16- or 32-bit word is read; it seems like an awful waste of time to setup for this instances such as this; I hadn't anticipated such an overhead of generating an interrupt every byte. Maybe this would better suit a mix of DMA and blocking techniques depending on the amount of data to read moving away from interrupts and towards DMA.

When you mention cache coherency to be addressed, does this imply that I need or should use the cache area in conjunction with memory-mapped mode? This would be a be a setback of sorts as I am currently using the ITCM and DTCM memory areas to process data. I'm currently trying to get my head around the memory-mapped mode as this could potentially reduce complexity, although at what cost? The HAL / LL manual doesn't make it easy either as it's replaced the QSPI section with OSPI. While I accept that there're no LL drivers for the QSPI, there's still HAL and I don't have access to any OSPI functionality using STM32CubeIDE for the STM32H743.