cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F7 - using SPI (slave) with DMA to set a flag (I.E. no interrupts)

Richard Tarbell_2
Associate II

We are trying to use SPI1 as a slave to receive 11 bytes of data per transaction.

I.E. SPI1_RX --> my_data_array[11]

When an 11-byte transaction is received, we want the DMA to raise a flag that we will check (we can't afford to use interrupts, as this is for a motor control application, and other interrupts can mess with our 50usec current control loop).

When I try to configure this as a project in CubeMX, the "DMA Global Interrupt" checkbox comes in the project as pre-checked, and I cannot uncheck it. Does this mean that we MUST use an interrupt to signal that the DMA is done transferring?

16 REPLIES 16

No, you should not call any HAL functions at all. Here is the rule again: You can have either strict timing requirements or CubeMX/HAL in one project, but not both.

Setting up SPI with DMA is just a few lines of code.

  1. Enable the GPIO peripheral clock in RCC
  2. Enable DMA clock in RCC
  3. Enable SPI clock in RCC
  4. set GPIO alternate function number in GPIO->AFR[] for the SPI pins
  5. set GPIO alternate mode in GPIO->MODER for the SPI pins
  6. set the DMA stream peripheral address register (PAR) to the address of the SPI receive register (&SPI->DR)
  7. set the memory address register (M0AR) to the address of the memory buffer
  8. set the transfer size (NDTR)
  9. set MINC and EN in the DMA stream control register
  10. set RXDMAEN in SPI->CR2
  11. set clock phase, polarity (when needed), the channel number, and SPE in SPI->CR

That's it I think, 11 simple register writes. You don't even have to bother with masking the bits, as the reset state of each register is known (listed in the reference manual).

To re-enable DMA after receiving, write the appropriate values to LIFCR or HIFCR, and set the DMA stream control register again.

Richard Tarbell_2
Associate II

SO, we have most of it working! We have an external processor sending the 11 byte message to the ST (through DMA2-RX), and then the ST adds a 1 to all elements, and sends it back to the external processor (through DMA1-TX).

Now, the only issue that I see is that the main processor seems to always be "behind by 1", in terms of the samples that it currently has. Is there a "one-transaction delay" in the DMA in the ST? We do NOT have double-buffering enabled.

You might have an issue with the L1 cache.

The L1 cache sits between the CPU core and the memories, so all code and data accesses made by the program are going through the cache, but DMA accesses bypass this cache. The scenario might be something like this:

  1. DMA receive is started, incoming data is copied from the SPI receiver to memory.
  2. The program is busy doing things, the cache is occupied by all sorts of data.
  3. DMA flag is set, the program starts to examine the contents of the receive buffer.
  4. One line is evicted from the cache, and the buffer contents are read into this cache line. Because of (1.) you are lucky this time, the program has valid data to work on.
  5. The program starts to write data into the transmit buffer.
  6. Another cache line is evicted, and allocated for the write buffer.
  7. Because the default caching mode is write-back write-allocate, outgoing data is accumulated in the cache.
  8. The program finishes writing the transmit buffer, and starts the SPI DMA transmission.
  9. DMA reads the data from memory and copies it to the SPI data register.
  10. The program starts doing other stuff with another set of data.
  11. The contents of the transmit buffer are eventually written back from the cache to memory, when the cache line is needed for something else.
  12. In the next iteration of the above, the transmit buffer holds the data from the previous transaction.

There are two ways to solve this.

Set some memory aside for DMA buffers, and mark it as non-cacheable. I prefer this one, because it has to be set up once, and requires almost no attention later, and its effects can be resticted to the necessary memory area.

struct {
  dma_rx[16];
  dma_tx[16];
} nochache __attribute__ ((aligned (32)));
 
	MPU->RBAR = ((uint32_t)&nocache) | MPU_RBAR_VALID_Msk; // using region slot 0
	MPU->RASR =
			MPU_RASR_XN_Msk            | // 1: Instruction fetches disabled
			(3u << MPU_RASR_AP_Pos)    | // Full access
			(4u << MPU_RASR_SIZE_Pos)  | // 32 bytes
			MPU_RASR_ENABLE_Msk        |
			0; // TEX,C,B,S are 0 meaning strongly ordered, this is the safest thing

The struct holding the DMA buffers must be padded and aligned to a power of 2, and at least to 32 bytes. If the size is changed, adjust the alignment attribute and the MPU_RASR_SIZE field accordingly. The full documentation of the MPU registers is in the PM0253 STM32F7 Series and STM32H7 Series Cortex®-M7 processor programming manual.

Cleaning and invalidating the cache when needed. This one can get tricky, especially at a DMA receive transaction.

There are 2 basic operations.

  • Clean (or flush) writes back all modified data from cache to memory.
  • Invalidate discards the contents of the cache without writing it to memory.

Preparing for a transmit is straightforward, just clean the cache before starting DMA.

Receiving is tricky, because invalidating means that a couple of preceding memory writes are lost. Therefore the cache should be cleaned immediately before invalidating, and it must be ensured that the receive buffer does not get accidentally cached. It must be aligned and padded to cache lines, cleaned before starting to receive, and not touched during receive. There are lots of ways it can go wrong.

If change the size of dma_rx & tx to 128, How do it?​

Just follow this, change the alignment in the struct declaration, and the value that goes into MPU_RASR_SIZE.

The struct holding the DMA buffers must be padded and aligned to a power of 2, and at least to 32 bytes. If the size is changed, adjust the alignment attribute and the MPU_RASR_SIZE field accordingly. The full documentation of the MPU registers is in the PM0253 STM32F7 Series and STM32H7 Series Cortex®-M7 processor programming manual.

  1. struct {
  2. dma_rx[128];
  3. dma_tx[128];
  4. } nochache __attribute__ ((aligned (256)));
  5.  
  6. MPU->RBAR = ((uint32_t)&nocache) | MPU_RBAR_VALID_Msk; // using region slot 0
  7. MPU->RASR =
  8. MPU_RASR_XN_Msk | // 1: Instruction fetches disabled
  9. (3u << MPU_RASR_AP_Pos) | // Full access
  10. (4u << MPU_RASR_SIZE_Pos) | // 256 bytes
  11. MPU_RASR_ENABLE_Msk |
  12. 0; // TEX,C,B,S are 0 meaning strongly ordered, this is the safest thing

How to change (4u << MPU_RASR_SIZE_Pos) ?

Find the documentation of the MPU RASR register in the programming manual.