Low-Level SPI Transmit

Steven Keeter · ‎2020-04-01

Hello, I am using the stm32H745 device (on DISCO bd.), and am using the HAL driver to transmit and receive spi2 data without problems. However, as many people have complained on-line the HAL overhead delay to send the byte is too long.

// Send Convert for Temp Signal

txbyte[0] = 0x48;

HAL_GPIO_WritePin(GPIOA, GPIO_PIN_15, GPIO_PIN_RESET);

// This section doesn't seem to work in hope to replace the HAL transmit below

// SPI2->CR1 = 0x1201;

// SPI2->CR2 = 0x1;

// while(!(SPI2->SR & SPI_SR_TXP));

// *(volatile uint8_t *)&CEC->TXDR = txbyte[0];

// This works fine

HAL_SPI_Transmit(&hspi2, (uint8_t *)&txbyte,1,10);

I'm basically setup as a master, single byte transfers. Can anyone help?

BTW, here's the init structure:

hspi2.Instance = SPI2;

hspi2.Init.Mode = SPI_MODE_MASTER;

hspi2.Init.Direction = SPI_DIRECTION_2LINES;

hspi2.Init.DataSize = SPI_DATASIZE_8BIT;

hspi2.Init.CLKPolarity = SPI_POLARITY_LOW;

hspi2.Init.CLKPhase = SPI_PHASE_1EDGE;

hspi2.Init.NSS = SPI_NSS_SOFT;

hspi2.Init.BaudRatePrescaler = SPI_BAUDRATEPRESCALER_8;

hspi2.Init.FirstBit = SPI_FIRSTBIT_MSB;

hspi2.Init.TIMode = SPI_TIMODE_DISABLE;

hspi2.Init.CRCCalculation = SPI_CRCCALCULATION_DISABLE;

hspi2.Init.CRCPolynomial = 0x0;

hspi2.Init.NSSPMode = SPI_NSS_PULSE_DISABLE;

hspi2.Init.NSSPolarity = SPI_NSS_POLARITY_LOW;

hspi2.Init.FifoThreshold = SPI_FIFO_THRESHOLD_01DATA;

hspi2.Init.TxCRCInitializationPattern = SPI_CRC_INITIALIZATION_ALL_ZERO_PATTERN;

hspi2.Init.RxCRCInitializationPattern = SPI_CRC_INITIALIZATION_ALL_ZERO_PATTERN;

hspi2.Init.MasterSSIdleness = SPI_MASTER_SS_IDLENESS_00CYCLE;

hspi2.Init.MasterInterDataIdleness = SPI_MASTER_INTERDATA_IDLENESS_00CYCLE;

hspi2.Init.MasterReceiverAutoSusp = SPI_MASTER_RX_AUTOSUSP_DISABLE;

hspi2.Init.MasterKeepIOState = SPI_MASTER_KEEP_IO_STATE_DISABLE;

hspi2.Init.IOSwap = SPI_IO_SWAP_DISABLE;

if (HAL_SPI_Init(&hspi2) != HAL_OK)

{

Error_Handler();

}

TDK · ‎2020-04-01

The H7 SPI peripheral is a complex beast. You can follow the logic path through the HAL_SPI_Transmit routines and implement that logic yourself if you want. The LL_SPI_* functions may help.

> // while(!(SPI2->SR & SPI_SR_TXP));

? // *(volatile uint8_t *)&CEC->TXDR = txbyte[0];

What is SPI_SR_TXP? What is CEC and CEC->TXDR? These are not SPI peripheral defines in stm32h7xx.h.

You'll need to enable clocks, initialize pins, and do everything else that HAL is doing for you.

If you feel a post has answered your question, please click "Accept as Solution".

Steven Keeter · ‎2020-04-02

Here's the define from the stm32h7xx_hal_spi.h

#define SPI_FLAG_TXP SPI_SR_TXP /* SPI status flag : Tx-Packet space available flag */

The CEC register is for HDMI. I shouldn't be using this, was trying to write to the hspi's instance of TXDR.

I walked through the HAL function and then copied most of that code into my function, but for some unknown reason no transmit happens. Must be missing something. Since the hspi structure and peripheral clocks are all set and working with the HAL transmits, I assume I can use minimal code to execute a transmit.

TDK · ‎2020-04-02

You're right, I was looking at STM32F4 defines.

I can tell you that the code you wrote should work fine, assuming the peripheral is in the right state. You're going to have to look at registers and figure it out if you want to break it down into less code. Watch out for RX overruns, as your code isn't reading any data.

If you feel a post has answered your question, please click "Accept as Solution".

Steven Keeter · ‎2020-04-02

Thanks for the feedback. Really my issue is that I need to read from 64 slave devices, and currently I'm doing it one at a time. I'm wondering if I should use four spi ports (SPI1, SPI2, SPI3, SPI4 ) and DMA . Basically I send one byte to each slave and then receive 3 bytes from slave. Shouldn't I be able to use an interrupt driven transmit and receive to greatly improve the data throughput?

berendi · ‎2020-04-02

STM32H7 SPI peripherals have 8 byte FIFOs, it can comfortably store all 3 received bytes.

Interrupt driven transmit and receive would actually reduce throughput because of the time needed to enter and exit different interrupt handlers each time.

A combination of DMA and timers can increase throughput though, how are the CS lines connected? Can you connect all of them to a single GPIO port, i.e. PB0 to PB15 (any other GPIO would do, as long as they all are on the same one). Or are they connected through an external multiplexer?

Steven Keeter · ‎2020-04-02

So, here's some more info and another idea:

I can select any or all CS lines of the 64 slaves through 4 gpio port expanders ( 16 on each ). ( spi controlled )
The slave MISO lines are also multiplexed ( 16 per mux ), also spi controlled.
So, I'm thinking I could use 4 spi ports on STM32H7 in simplex receive only mode. Then I could use one master spi port to control data out and clk to all slave including the rx spi ports, and the other four spi ports would receive the 3-bytes simultaneously. This I think would reduce the time to acquire data by a factor of four.

What do you think?

Basically, I send one common command to all slaves simultaneously to execute a A/D conversion, then when data is ready I clock the data back in groups of four channels. Besides this, I'm not sure what other hooks the STM32H7 has....and then it may be necessary to use an FPGA to acquires MISO data simultaneously. Don't really want to go there, so I'm hoping the 4x improvement method will work. BTW the A/D conversion takes 330uS and the max spi clock rate is 10MHz. If the return data on each spi rx port is skewed a bit in time it's ok as long as it's done before 330uS has expired.

berendi · ‎2020-04-03

You need 16 CS lines only, each controlling 4 slaves. You are always going to do the same thing to 4 slaves in parallel, aren't you? Do they need different setup parameters?
MISO lines can be just tied together in groups of 16, no need to multiplex them. Assuming they are well behaved SPI slaves that switch their output to high impedance when not selected. Although it could pose a problem when all 16 chip selects are enabled at once to send the start command.
Are you sure that a single MOSI / SCK line can drive 64 slaves at once at 10 MHz? Or even 16 slaves?

I think it can be fully automated using a few timers and lots of DMA channels. Of course you don't have to do it with DMA, you can as well dedicate the CM4 core to handle SPI traffic if it does not have much else to do, and you don't want to dive into DMA and timer synchronization.

Connect the port expander to SPI1, as it has only one data direction available, so it would be output only. You can use the hardware NSS line on this one, then you can just write a single 16 bit value into the data register to set the outputs, the rest will be handled by the hardware.

For the rest, I need to know a bit more about the communication protocol. Can you link the datasheet?

As to the issues in 2. and 3. above, would it be acceptable if the start command is sent out in 16 turns to the slaves? Does the I/O expander support 10 MHz too, or is it slower? You can link its datasheet too.

Looks like an interesting problem, and I don't have much to do these days anyway.

Steven Keeter · ‎2020-04-07

Thanks for the suggestions. I have the same concern about tying the all the MISO lines together, as they will all be driving the line at the same time (at least for some finite time ). Not sure about the MOSI and SCK number of loads ( ea. have Cin = 6pF ), but the distance to each is a few inches. The M4 is busy with host communication. I think a suitable solution will be to use four spi ports. One spi in full duplex master mode to send commands and receive data from 16 devices, then use three other spi ports in simplex Rx-Only mode. The Ex-Only ports will have the main master's clock as their clock, etc... I am thinking of this as compared to the QSPI mode because I understand these ports must be half-duplex. And yes, all of the spi slave devices support 10+ MHz, but may have to throttle SCK because of load impedance. BTW, I need all of the slaves to receive the command at the same time.

Steven Keeter · ‎2020-04-13

Hello,

I'm now attempting to let SPI1 be the master and let SPI2 be a slave in DMA mode. I have SPI2 configured with NSS as a hard input and am driving it using an output port. The master is transmitting one byte and receiving three bytes. I set the NSS low for the slave right before the SPI1_Receive() is called. I get a DMA interrupt but no data. If I use HAL_SPI2_Receive_IT() instead of HAL_SPI2_Receive_DMA data comes in ok. Can someone please tell me where I'm missing code, etc...?

Related code attached.