Skip to main content
Richard Tarbell_2
Associate II
December 15, 2019
Question

STM32F7 - using SPI (slave) with DMA to set a flag (I.E. no interrupts)

  • December 15, 2019
  • 9 replies
  • 3943 views

We are trying to use SPI1 as a slave to receive 11 bytes of data per transaction.

I.E. SPI1_RX --> my_data_array[11]

When an 11-byte transaction is received, we want the DMA to raise a flag that we will check (we can't afford to use interrupts, as this is for a motor control application, and other interrupts can mess with our 50usec current control loop).

When I try to configure this as a project in CubeMX, the "DMA Global Interrupt" checkbox comes in the project as pre-checked, and I cannot uncheck it. Does this mean that we MUST use an interrupt to signal that the DMA is done transferring?

This topic has been closed for replies.

9 replies

waclawek.jan
Super User
December 15, 2019

> When I try to configure this as a project in CubeMX, the "DMA Global Interrupt" checkbox comes in the project as pre-checked, and I cannot uncheck it. Does this mean that we MUST use an interrupt to signal that the DMA is done transferring?

No. It just means that Cube/CubeMX is written this way.

By using a "library" you are bound to use whatever facilities the "libary"'s authors meant you to use.

The DMA interrupt does not need to be enabled, and you can check the DMA Transfer Complete flag DMA_LISR/HISR.TCIFx any time later. Read the DMA chapter in RM.

JW

RMcCa
Senior II
December 15, 2019

The dma always sets its transfer complete flag when the ntdr reaches zero. You can read the status register and the ntdr at anytime. It's up to you whether it causes an interrupt or not and how the dma behaves when the transfer is complete.

Ditch hal and set the registers yourself. ​

berendi
Principal
December 15, 2019

There is an important rule: You can have either strict timing requirements or CubeMX/HAL in one project, but not both.

Read the SPI functional description/Data transmission and reception procedures chapter in the reference manual, it has a section on Communication using DMA, and follow the instructions there.

Richard Tarbell_2
Associate II
December 16, 2019

Thank you ALL for your help - I've got progress!

So right now, I AM receiving the first 11-byte transaction from the SPI slave (great!) - I DO see my_data[11] update with correct values. After this, when a second transaction is sent, my data buffer (my_data[11]) does not change, even though the SPI1 registers do change. This tells me that the SPI is still receiving data, but the DMA itself isn't transferring it.

I'm not using interrupts for this (at least not yet). I also see that my DMA2 -> S1NDTR changed from 11 to 0 after the first transaction. When I change this back to 11 using the debugger, it still does not perform additional DMA transactions.

--> I believe I'm not "resetting" the DMA properly, after a transaction has been performed.

berendi
Principal
December 16, 2019

Try resetting the DMA interrupt flags (LISR or HISR)

waclawek.jan
Super User
December 16, 2019

> Try resetting the DMA interrupt flags (LISR or HISR)

... using LIFCR/HIFCR.

0690X00000BugJpQAJ.png

JW

Richard Tarbell_2
Associate II
December 17, 2019

Thank you for your help again! OKAY, I have reset the DMA interrupt flags by writing "1"s to the appropriate LIFCR register positions.

--> When I call HAL_DMA_Start() the first time at code startup, everything works, and my return status is HAL_OK.

--> After my first DMA transaction, I clear the DMA interrupt flags, and I call HAL_DMA_Start() again. This time, it returns HAL_BUSY, so the stream never enables again. Should I keep calling HAL_DMA_Start() until I get a HAL_OK status?

berendi
Principal
December 17, 2019

No, you should not call any HAL functions at all. Here is the rule again: You can have either strict timing requirements or CubeMX/HAL in one project, but not both.

Setting up SPI with DMA is just a few lines of code.

  1. Enable the GPIO peripheral clock in RCC
  2. Enable DMA clock in RCC
  3. Enable SPI clock in RCC
  4. set GPIO alternate function number in GPIO->AFR[] for the SPI pins
  5. set GPIO alternate mode in GPIO->MODER for the SPI pins
  6. set the DMA stream peripheral address register (PAR) to the address of the SPI receive register (&SPI->DR)
  7. set the memory address register (M0AR) to the address of the memory buffer
  8. set the transfer size (NDTR)
  9. set MINC and EN in the DMA stream control register
  10. set RXDMAEN in SPI->CR2
  11. set clock phase, polarity (when needed), the channel number, and SPE in SPI->CR

That's it I think, 11 simple register writes. You don't even have to bother with masking the bits, as the reset state of each register is known (listed in the reference manual).

To re-enable DMA after receiving, write the appropriate values to LIFCR or HIFCR, and set the DMA stream control register again.

waclawek.jan
Super User
December 17, 2019

I don't Cube.

OTOH, Cube is open source, so you can figure this out yourself.

Note, that you are already not using it as it was intended.

JW

RMcCa
Senior II
December 17, 2019

Yeah, don't know about hal, but certainly the busy status is trying to tell you something. Saying what? Think about it for a while and study the reference manual.

You could try polling for it clear but methinks you will be waiting for a very long time.​

Richard Tarbell_2
Associate II
December 20, 2019

SO, we have most of it working! We have an external processor sending the 11 byte message to the ST (through DMA2-RX), and then the ST adds a 1 to all elements, and sends it back to the external processor (through DMA1-TX).

Now, the only issue that I see is that the main processor seems to always be "behind by 1", in terms of the samples that it currently has. Is there a "one-transaction delay" in the DMA in the ST? We do NOT have double-buffering enabled.

berendi
Principal
December 20, 2019

You might have an issue with the L1 cache.

The L1 cache sits between the CPU core and the memories, so all code and data accesses made by the program are going through the cache, but DMA accesses bypass this cache. The scenario might be something like this:

  1. DMA receive is started, incoming data is copied from the SPI receiver to memory.
  2. The program is busy doing things, the cache is occupied by all sorts of data.
  3. DMA flag is set, the program starts to examine the contents of the receive buffer.
  4. One line is evicted from the cache, and the buffer contents are read into this cache line. Because of (1.) you are lucky this time, the program has valid data to work on.
  5. The program starts to write data into the transmit buffer.
  6. Another cache line is evicted, and allocated for the write buffer.
  7. Because the default caching mode is write-back write-allocate, outgoing data is accumulated in the cache.
  8. The program finishes writing the transmit buffer, and starts the SPI DMA transmission.
  9. DMA reads the data from memory and copies it to the SPI data register.
  10. The program starts doing other stuff with another set of data.
  11. The contents of the transmit buffer are eventually written back from the cache to memory, when the cache line is needed for something else.
  12. In the next iteration of the above, the transmit buffer holds the data from the previous transaction.

There are two ways to solve this.

Set some memory aside for DMA buffers, and mark it as non-cacheable. I prefer this one, because it has to be set up once, and requires almost no attention later, and its effects can be resticted to the necessary memory area.

struct {
 dma_rx[16];
 dma_tx[16];
} nochache __attribute__ ((aligned (32)));
 
	MPU->RBAR = ((uint32_t)&nocache) | MPU_RBAR_VALID_Msk; // using region slot 0
	MPU->RASR =
			MPU_RASR_XN_Msk | // 1: Instruction fetches disabled
			(3u << MPU_RASR_AP_Pos) | // Full access
			(4u << MPU_RASR_SIZE_Pos) | // 32 bytes
			MPU_RASR_ENABLE_Msk |
			0; // TEX,C,B,S are 0 meaning strongly ordered, this is the safest thing

The struct holding the DMA buffers must be padded and aligned to a power of 2, and at least to 32 bytes. If the size is changed, adjust the alignment attribute and the MPU_RASR_SIZE field accordingly. The full documentation of the MPU registers is in the PM0253 STM32F7 Series and STM32H7 Series Cortex®-M7 processor programming manual.

Cleaning and invalidating the cache when needed. This one can get tricky, especially at a DMA receive transaction.

There are 2 basic operations.

  • Clean (or flush) writes back all modified data from cache to memory.
  • Invalidate discards the contents of the cache without writing it to memory.

Preparing for a transmit is straightforward, just clean the cache before starting DMA.

Receiving is tricky, because invalidating means that a couple of preceding memory writes are lost. Therefore the cache should be cleaned immediately before invalidating, and it must be ensured that the receive buffer does not get accidentally cached. It must be aligned and padded to cache lines, cleaned before starting to receive, and not touched during receive. There are lots of ways it can go wrong.

ZYubi
Associate III
February 19, 2020

If change the size of dma_rx & tx to 128, How do it?​