SPI slave MISO delay on STM32F401

PRobe · ‎2019-04-09

I am attempting to communicate between ESP32 and STM32F401 using SPI Mode 0.

I am achieving approximately 6MHz reliably and note data corruptions at 8MHz or so.

Both MPUs are using DMA.

I can improve performance by dropping APB1 prescaler from 2 to 1 changing APB1 from 36MHz to 72MHz (naughty, I know).

The working theory is that we are failing at 8MHz through SCK/MISO violations, see:

https://docs.espressif.com/projects/esp-idf/en/latest/api-reference/peripherals/spi_master.html#speed-and-timing-considerations

Looking at the STM32 datasheet however and MISO is set initially from NSS and then SCK with tsu(NSS) 4*APB1and Tv(SO) of max 17nS.

I have meaured NSS to SCK as 180nS+

What does not make sense to me is that:

- NSS setup > 180nS but the issue seems improved by increasing APB1.

- Tv(SO) = 17nS should give me >=20MHz bus speed.

Also, the regular MISO output delay Tv(SO) is 17nS which is short of APB1 of 28 nS.

I could accept that once loaded the STM output might be driven by SCK but to load the output shift register itself then we must cross clock SCK/APB1 domains and hence have delays?

That is too say the STM output is fed by DMA (no software delays) but the bytes/words still have to be loaded regularly into the shift register - where is this delay shown?

Any pointers/corrections gratefully received!

S.Ma · ‎2019-04-10

I've got 12MHz SPI between 1 master and 12 slave MCUs (all running at SYSCLK 48Mhz). If there is an issue about CPHA/CPOL settings, STM32L4 with DMA and EXTI on NSS as GPIO. Check on a oscilloscope the SCK/MISO/MOSI/NSS lines, check setup and hold time. Which clock edge is used to output data and latch incoming data. When the communication fails, what exactly happens?

PRobe · ‎2019-04-11

Hi,

Yes thanks. I cannot do much right now as I do not have the board but all the above are classic setup issues.

Will post once I have measurements.

PRobe · ‎2019-04-11

Hi,

Sorry, missed your post.

I am still not getting your answer.

The holding register is fed from DMA. The DMA/RAM transaction can take 8*SCK. Plenty of time.

The holding to shift register transaction must cross from APB1 to SCK however. There must be a gate to resolve metastability issues.

I cannot find that delay in the AC characteristics. Indeed the only figure I found was 17nS setup delay which may be true bit to bit (when driven from SCK) but from byte to byte (i.e. in one bit time)?

Maybe they have two shift registers, that would work - the holding to shift load could happen from SCK then.

Anyway I wanted to double check my reading whilst I wait for hardware.

PRobe · ‎2019-04-16

Added some printf() and setup a logic analyser.

I am using HAL throughout but have examined the registers.

STM32F401CB, SPI2, APB1 = 36 MHz

ESP SCK = 8.3MHz (measured on logic analyser)

The STM NSS low->high ISR is used to set a semaphore and allow the STM to repeat.

ESP32 is triggered by user input so no race conditions there.

The STM32 SLAVE process sits in a loop as follows:

(1) Waits for TXE=1, BSY=0 but times out after 600mS (my application assumed that transmission was complete on NSS=1)

(2) HAL_SPI_TransmitReceive_DMA()

Which sets up DMA etc (RX: CH0.STREAM3, TX: CH0.STREAM4)

(3) Displays NDTR

(4) Waits for NDTR semaphore to be released

(5) Repeat

On first STM SPI loop (sending message: 'MESSA') (5 characters)

(1) TXE=1, BSY=0. Continues.

(2) HAL_SPI_TransmitReceive_DMA() call completes (ESP has not sent)

(3) NDTR = 4 (ESP has not sent yet, holding register is loaded immediately on SPE)

(4) ESP sends. NDTR semaphore is released

(5) ESP received: "MMESS"

(5) Logic Analyser shows: 'MMESAG'

(1) TXE=0, BSY=0. Timeout expires

(2) HAL_SPI_TransmitReceive_DMA() completes

(3) NDTR = 5

Bit count looks good .

NSS=1 (approximately 0.12 uS after SCK 1->0)

So typically the first character is repeated. I find other patterns where several bytes may be repeated/not loaded. This then predisposes the test to fail as NDTR!=txLength-1 immediately following SPI setup.

NDTR should = txLength-1 as the holding register should be loaded immediately SPE=1.

We have >=1 byte left over though.

The issue seems improved when ESP SCK = 7.35 MHz

So...

There appears to be a problem loading SPI2 as SCK approaches APB1/4.

I note that the worked example only attempts 2MHz and then using SPI1 who's APB2 is not limited to 42MHz.

How fast should I be able to run as SPI slave relative to its APBn?

PRobe · ‎2019-04-16

PS:

DMA tx priority = DMA_PRIORITY_HIGH

DMA rx priority = DMA_PRIORITY_LOW

I haven't set any other DMA up but perhaps FLASH pre-fetch.

Resetting the SPI won't fix this issue as the error happens once transmission starts.

I would like to know how to reset SPI and its holding register though as as things stand once a transaction messes up recovery seems painful.

waclawek.jan · ‎2019-04-16

OK so here simply DMA can't keep up with updating the SPI_DR. Things to consider might include

transfer 16 bits rather than 8
use SPI on the faster bus
avoid accessing the APB bus (i.e peripherals on it) during transfer by processor or other busmasters
if you need duplex, split Rx and Tx to two SPIs on two separate buses
use a faster mcu

> (sending message: 'MESSA') (5 characters)

> (5) Logic Analyser shows: 'MMESAG'

How can this be possible?

JW

S.Ma · ‎2019-04-16

RCC should have a reset SPI bit

This will enable you to reconfigure the SPI and decide what will be the first data to send out as slave.

That data known at NSS rise/fall edge?

Then use DMA cyclic mode to loop over the transmit buffer.

Same priority use DMA cyclic to loop over the receive buffer.

Process the receive buffer at NSS rise edge (EXTI) interrupt.

TX and RX events are not at the same time, TX is before SCK starts, RX is when SCK pulses are over.

PRobe · ‎2019-04-17

I agree. I wanted to understand the calculation involved though.

I know what I have added to APB but don't understand (cannot find datasheet reference) why this is failing so low.

I have set highest priority so why am I not getting the transfer?

I cannot have (say) the FLASH prefetch trashing my transfers. As it stands I cannot see how to control this except by brute force (run very slow) and soak test.

My particular concern is that whilst I may be able to hack something for a SPI transfers (e.g. checksum) I will soon be writing to SD & that will be rather more unforgiving.

Is there a calculation?

> (sending message: 'MESSA') (5 characters)

> (5) Logic Analyser shows: 'MMESAG'

My bad, its not. I was sending 5 characters of 'MESSAGE' and cut and paste from code.

Haven' figured how to edit post yet.

PRobe · ‎2019-04-17

Thanks.

'RCC should have a reset SPI bit'

Found RCC_APB2RSTR so I can put belt n braces on.

'That data known at NSS rise/fall edge?

Then use DMA cyclic mode to loop over the transmit buffer.'

Not entirely sure I understand.

The slave has some variable data to return. The slave will have prepared that data before NSS=0.

Cyclic does not really help as the slave is repeating bytes.

Mostly the first byte but I have seen others - especially as bit rate is cranked up.

'TX and RX events are not at the same time, TX is before SCK starts, RX is when SCK pulses are over'

Indeed. So who is blocking my DMA cycle (if that is the problem)?

I can test by exhaustion but hope that a datasheet reference can be found.

rdlaner · ‎2024-03-18

@PRobe I know this issue is quite old, but I'm running into nearly the exact same problem. Did you ever find a solution or justification in a datasheet?

Thanks!