Implementing an SPI slave mechanism with STM32 using cube and HAL

IHACH.1 · ‎2020-02-23

Hello all,

I'm trying to implement an SPI slave using stm32 u-controler. the master is as an FPGA.

so i want to be able to read and write different registers in the micro.

the FPGA generates a repeating read command in the next structure (there is also some i2c in the system, that is why the msg is a bit similar to i2c): read\write command, register, B0,B1,B2,B3,B4.

the slave will be ready for spi communication and when he get a msg on the line it would receive the initial 2 bytes to understand weather this is read or write, and when they are completed it would immediately continue to receive another 5 bytes (if write msg) or will send to master the data of the register (if read msg). slave micro runs in 32Mhz, and SPI CLK sent by master is 2Mhz. so i have high confidence that the slave would be able to decide what would be the next action in time to receive\transmit the next 5 bytes.

The problem is that the slave respond to the read command only at the following read command and not immediately...

could any one assist me?

attaching the relevant code:

in my main:

HAL_SPI_Receive_IT(s, (uint8_t *)&RX_buf, 2);

once completed is used the HAL_SPI_RxCpltCallback to continue (sorry for the partly code here, tried to spare the small details):

if (read)

HAL_SPI_Transmit(SPI_PROC,(uint8_t *)&TX_buf,5,1000);

// enabling SPI bus for the next transmition

HAL_SPI_Receive_IT(SPI_PROC, (uint8_t *)&RX_buf, 2);

if (write)

HAL_SPI_Transmit(SPI_PROC,(uint8_t *)&RX_buf,5,1000);

// enabling SPI bus for the next transmition

HAL_SPI_Receive_IT(SPI_PROC, (uint8_t *)&RX_buf, 2);

had also some troubles with the call back not being entered sometimes by i made some work to bypass it so that i could focus on the read problem...

any help would be appreciated

thanks

Edo

berendi · ‎2020-02-23

> it would immediately continue

No.

It would receive an interrupt request, finish what it is doing (1-2 system clock cycles), push registers to the stack (maybe 15-20 cycles, what MCU do you have?), then HAL takes its time to figure out what it is supposed to do, which requires an awful lot of time, calls your callback, your callback then decides what to do (hopefully fast), calls HAL again (painfully slow), and only then will the first byte of the answer land in the SPI data register. I'd guess the whole process takes something in the order of 10 microseconds. Note that the scope snapshots in the linked thread are from a 168 MHz system. Moreover, your callback calls blocking functions, which is considered bad practice.

Meanwhile, the master continues clocking the SPI bus, and the slave controller starts shifting out whatever it finds in its data register a half SPI clock cycle (8 system clock cycles) after the last bit is received. The MCU has not even fetched the first instruction of the interrupt handler yet.

At 32 MHz system clock and 2 MHz SPI clock, a byte is transmitted in every 128 clock cycles. A carefully optimized interrupt handler might handle it in 80-100 cycles (using HAL is out of the question), and might have the answer ready for the 4th byte of the transacion. But I would maybe make the transaction 10 bytes long (master sends 3 dummy bytes after the address) using a 5-byte DMA buffer in circular mode:

In the 1st half transfer interrupt after the 2. byte received, prepare 2 bytes of the answer in the buffer
In the 1st transfer complete interrupt, copy the rest of the answer to the buffer.
In the 2nd half transfer interrupt do nothing just clear interrupt flags.
In the 2nd transfer complete interrupt process received data if it's a write request.

S.Ma · ‎2020-02-23

It's tricky due to some SW workarounds.

Do use DMA on cyclic TX and RX buffers.

Implement EXTI interrupt on NSS as GPIO (rise edge).

When NSS goes high, the slave will process under interrupt the payload and reset the SPI/DMA. (as an I2C Start would)

Use 2 transactions, one for write and one for read (as in I2C reading a memory)

That would work with less ISRs and HAL penalty minimized.

IHACH.1 · ‎2020-02-25

thanks.

but it doesnt solve my problem in case i want to implement a read command (spi master wants to read a certain register from spi slave), since i want to read in the same packet the msg was sent (1st byte read\write command, 2nd byte register address, 3-7th byte slave's register value). so as writen before i'm counting on the fact that the slave micro is much faster then the spi clk so that it would mange to transmit on miso lines at the third spi clk cycle

berendi · ‎2020-02-25

> it would mange to transmit on miso lines at the third spi clk cycle

SPI has neither start or stop bits like UART, nor the restart-address-ack sequence like I2C.

The most significant bit of the last byte is immediately followed by the least significant bit of the next byte.

It has no chance of working unless you throttle back the SPI clock to ridiculously slow speed.

RISC architectures like ARM need a couple of clock cycles to do something useful.

S.Ma · ‎2020-02-25

Which STM32? Which SPI IP (the one with 32 bit FIFO?)

If you really want to do worrying about SW latency and timings, then you can surely use DMA in non cyclic mode,

If you guarantee that say you can have the answer ready faster than a byte transfer, do what other chips (including SPI memories) do:

Expect a dummy byte to prepare your answer, or pause the communication from the master side for the answer to be ready.