STM32G431 SPI MCU <--> MCU Communication

CBerg · ‎2023-05-05

I have 2 PCBs, both with STM32G431 MCUs, and both MCUs should communicate with each other via SPI. Both PCBs are assembled in a chassis, the distance between the two PCBs is aprox. 10 cm. For max. Signal integrity I have chosen to use LVDS as a hardware Layer for the board to board connection. I use Texas Instruments SN65LVDS1 and SN65LVDT2 Drivers/Recievers. Some might consider this as complete overkill, but for me it’s worth the effort.

The LVDS itself works pretty well, as shown in the screenshot below: A signal delay of 8 ns from LVDS Driver IC input to LVDS Receiver Output is IMHO more than good enough for a reliable hardware layer, especially at that slow speed.

Screenshot. SPI Clock signal Master LVDS Driver Input (yellow), Slave LVDS Receiver Output (blue)

Both MCUs should exchange 20 Byte of data @5 KHz, but until now I could not convince the two STM32s to establish a reliable communication via SPI with each other. The setup for the master and the slave is: 1,2,4 or 8 MHz Clock, CPOL 0, CPHA 1, 8 Byte Motorola, MSB First, no CRC, Slave NSS Hardware Input, Master Software CSn. Both Instances use DMA, the Master in “Normal�? mode, the Slave in “Circular�? Mode. The main issue is: the received array on the Master is shifted by 3 or 4 bytes.

The Slave initiates the SPI communication via UART by sending an “I am ready�? signal to the master, the Master starts the communication with a delay of 25 ms. The Slave inits the SPI before sending the “ready signal�? and waits for incoming transmissions.

Both MCUs use >>HAL_SPI_TransmitReceive_DMA(…)<<, as said before the Master in normal Mode, the Slave in Circular mode. The data on the slave are in sync, but the data received by the master are shifted.

Under normal circumstances I’d say: ok, that’s a speed related issue, let’s slow down the clock. But: it does not matter how fast the SPI Clock is, the issue is the same at 1 Mbit, 2, Mbit or 8 Mbit.

I am pretty sure it’s not a hardware issue. Even at 8 MHz SPI Clock, the delay created by the LVDS ICs (8 ns, or 16 ns from Slave to Master) are negligible, if the SPI shifts the data on the rising edge and samples the data on the falling edge, as the duration of one clock cycle is 125 ns.

As posted here: https://stackoverflow.com/questions/76127056/stm32-g07-full-duplex-spi-master-slave-rx-array-on-master-is-shifted I experienced the same issues with a Test-PCB with a STM32G070, which I created to test that Board to Board connection via LVDS. Which leads me to the conclusion, that this is not a hardware related issue, but must be an Issue of the HAL Drivers …

In difference to the post on stack overflow, the behavior on the STM32G431 is, that if I press the reset button on the Slave, the Communication is in Sync for a few cycles, but then the bytes are shifted pretty soon, where it remains stable.

/rant the HAL SPI DMA implantation is pretty disappointing for me so far. I simply don’t get it, why it’s not possible to sync the slave on the falling NSS signal? SPI works – for me – without issues as Master, but trying to use it as Slave turns out to be a nightmare so far …

My suspicion is, that something with the DMA is going wrong. It’s just a guess, but I’d say the DMA on the Master is “a bit sleepy�? – without being able to provide a proof for that claim, but because the received data on the slave are in sync.

How can I solve this? Any help would be highly appreciated, because this issue is driving me crazy! I have used SPI in several appications as Master e.g. to communicate with Flash Memory, DACs or a WIZnet Ethernet IC and all is working well. But Using an STM32 as SPI Slave seems to be a real issue …

Thanks

CBerg · ‎2023-05-08

@waclawek.jan i mentioned NSS as a side note, because i was stumbling uppon the "NSS not working on the Master" when working with Bare Metal Programming / LL Drivers. The Reason for that was, that the NSS on the Master is configured as "Open Drain" GPIO, when you use Hardware Output NSS on the master. Therefore you need a pullup resistor to 3V3. Otherwise the NSS line will stay cont. low and the Slave will not detect it, when it is set to "NSS Input".

waclawek.jan · ‎2023-05-08

> this project is just an intermediate solution

Yes, there are occasions, where quick and dirty solutions make sense. I speak strongly against using loopdelays and printf() in microcontrollers, yet I do use them in such quick and dirty experiments. I also see merits of Arduino (my main problem with it is the missed opportunity to educate users, again using rudimentary examples and well-written tutorials).

> The Reason for that was, that the NSS on the Master is configured as "Open Drain" GPIO

Oh.

The hardware implementation of NSS for Master in STM32 SPI is confusing and mostly useless. Using open-drain is intended for multi-master cases, where NSS pulled down by first master asserting the line would switch other masters to slave.

JW

CBerg · ‎2023-05-08

oh, you can use the NSS on the STM32s (in this case a G431) and you can make it really quick when using Bare Metal Programming. You can make the NSS behaving like a "normal CSn", if needed - and if you have that pullup. The trick is to disable SPI at the end of the tansmission by resetting the CR1->SPE, e.g. in the ISR for the DMA Tx/Rx (whatever you need). Even the NSSP works as intended. But you need a pullup, if you don't have it, you will either need a PCB redesign or go with GPIO/Software NSS.

Regarding SPI NSS Speed in HAL / Bare Metal: i measured that with the Scope on an G070 (my test board for this project) and the time between pulling the NSS low and start of the first clock is about 760-780 MCU cycles when using HAL (several µs at 64MHz), which corresponds to aprox. 2 human weeks on an MCU. This is the price of the convenience and the HAL_SPI_TransmitReceive_DMA() making numerous checks before acually starting the peripheral(s) itself, and e.g. reconfiguring the DMA each time.

We thought about stripping down the HAL functions and replace them by our own, just making the bare minimum of changes to the registers needed, as we wanted to send the same package-size over and over again. But that went uggly pretty quick, because the HAL functions a pretty nested and you'll need a while to understand the concept behind it.

With bare metal programming, using only the registers you can shrink this time down to a few nanoseconds - on both ends (pulling low and releasing). But that was not the solution, that would have been the best solition for the project. From a pure software perspective: yes - it would have been the best to dig down deep, open that can of worms and replace the HAL SPI Functionality. But that would have busted the project (time) budget.

waclawek.jan · ‎2023-05-08

> The trick is to disable SPI at the end of the tansmission by resetting the CR1->SPE, e.g. in the ISR for the DMA Tx/Rx (whatever you need).

Yes; but at the same place with the same effort you can also toggle a pin - and that can be push-pull. It can even be the same pin as you've used to NSS, just set as GPIO Output, so no hw redesign needed.

In other words, there's no point in using hardware NSS in master (except the rare cases where NSSP or the TI mode is needed).

> time between pulling the NSS low and start of the first clock is about 760-780 MCU cycles when using HAL

Hummm. That's way worse than what I'd expect even for Cube/HAL. I suspect you have compiler optimizations switched off.

JW

CBerg · ‎2023-05-08

>>I suspect you have compiler optimizations switched off.<<

yes, that was measured in DEBUG mode.

But even in release, there is really a lot going on in "HAL_SPI_TransmitReceive_DMA()" which will need a fair amnt of MCU cycles.

>>Yes; but at the same place with the same effort...<< actually it is a few nanoseconds faster, if you let the peripheral do that. But it's nothing i'd fight about. If you don't have a pullup, you can still reach good speed with GPIO (i tested that).