Optimization breaks SPI code on L562

Harvey White · ‎2025-09-28

L562 processor. SPI1. Blocking mode. Two sequential writes, second write variably returns error.

code:

// uses CS if USE_CS is set to CS_ACTIVE (default)
// CS is not used if USE_CS is set to CS_INACTIVE

uint32_t HAL_SPI::Send(
						uint8_t* cmdptr,
						uint32_t  cmd_count,
						uint8_t* dataptr,
						uint32_t  data_count,
						bool USE_CS,
						enum SPI_TRANSFER_MODE Override)
{

	enum SPI_TRANSFER_MODE		NEW_mode = OUT_mode;
	int							error_code;

//	int							old_divisor;

	// ********************************** check input parameters***********************************
	// need to check overrides
	switch (Override)
	{
		case SPI_DMA:			// TRANSMIT IS DMA FROM BUFFER, RECEIVE IS IRQ QUEUE (MAY CHANGE)
		case SPI_IRQ:			// TRANSMIT IS IRQ DRIVEN
		case SPI_BLOCKING:		// TRANSMIT AND RECEIVE ARE BYTE BLOCKING
		{
			NEW_mode = Override;
			break;
		}
		default:
		{
			NEW_mode = OUT_mode;
			// leave mode as is, cannot override
		}
	}

	switch (NEW_mode)
	{
		case SPI_BLOCKING:		// TRANSMIT AND RECEIVE ARE BYTE BLOCKING
		{
			time_start = _MICROSECOND_TIMER.Instance->CNT;
			// always semaphore protected
			if (!dedicated) xSemaphoreTake(busy, portMAX_DELAY);
			// direct call to STMICRO driver, blocking mode
			hspi->Instance->CR1 &= ~SPI_BAUDRATEPRESCALER_256;// reset to highest baudrate
			hspi->Instance->CR1 |= transmit_divisor;

			if (RESET.port != nullptr) RESET.set();				// make sure reset if high
			if (USE_CS) RESET_CS();								// CS low if needed
			/*
				HAL_OK       = 0x00,
				HAL_ERROR    = 0x01,
				HAL_BUSY     = 0x02,
				HAL_TIMEOUT  = 0x03
 	 	 	 */


			if ((cmdptr != nullptr) && (cmd_count != 0))
			{
				// command pointer says to send data with A0 down
				if (A0.port != nullptr) A0.reset();
				result = HAL_SPI_Transmit(hspi, cmdptr, cmd_count, timeout);
				// place to add delay if available
			}


			if ((dataptr != nullptr) && (data_count != 0))
			{
				// command pointer says to send data with A0 down
				if (A0.port != nullptr) A0.set();
				result = HAL_SPI_Transmit(hspi, dataptr, data_count, timeout);
				// place to add delay if available
//				while (hspi->State == HAL_SPI_STATE_BUSY_TX)
//				{
//					HAL_Delay(1);
//				}
			}
			if (USE_CS) SET_CS();								// CS high if needed
			// error reporting
			ERROR_CODE = hspi->ErrorCode;
			ERROR_REASON = (enum COMM_ERROR_TYPE) ERROR_CODE;
			operations++;
			// record errors if needed
			switch (result)
			{
				case HAL_ERROR:    					//= 0x01U,
				{
					fail++;
					break;
				}
				case HAL_BUSY:     					//= 0x02U,
				{
					fail++;
					break;
				}
				case HAL_TIMEOUT:  					//= 0x03U
				{
					fail++;
					break;
				}
				case HAL_OK:
				{
					// was OK
					succeed++;
					break;
				}
			}

			// return semaphore for next operation
			if (!dedicated) xSemaphoreGive(busy);
			time_stop = _MICROSECOND_TIMER.Instance->CNT;
			delta_time = time_stop - time_start;
			break;
		}

Using optimization for debug, this code works perfectly.

Using G0 (no optimization), the code fails either at line #70 after executing line #61 (write two bytes), or the second call to the routine at line #61 (writing one byte per call).

This is a display driver routine. Using version 1.19.0

The rest of the code in this routine simply returns HAL_OK.

Ghofrane GSOURI · ‎2025-09-29

Hello @Harvey White

The issue stems from your custom application code, not from CubeMX or the IDE.

It is most likely related to timing or the SPI peripheral not being ready between sequential writes and it becomes apparent when changing the code optimization level due to differences in execution speed.

To address this, add small delays after toggling the CS and A0 pins, and ensure the SPI state is ready before each transfer.

THX

Ghofrane

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

Saket_Om · ‎2025-10-02

Hello @Harvey White

@Harvey White wrote:
Using G0 (no optimization), the code fails either at line #70 after executing line #61 (write two bytes), or the second call to the routine at line #61 (writing one byte per call).

So, what "issues", exactly, do you encounter? what do you mean by fails?

What tests/investigations/debugging have you done to find out what's going on?

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
Saket_Om

Harvey White · ‎2025-10-02

Broadly, the issue is that it doesn't work.

Specifically, with G0 optimization the first write (these are all writes to a display, the display cannot be read) returns 0, the second write returns 1 (unspecified error).

With Optimization for debug, the program does not return an error from the second write.

Further, any other optimization other than debug has the same result as G0.

On display writes that do not use data (command only), the SPI driver returns to the calling display driver, and the command portion (always needed in this scenario) fails with the same error.

I remember putting a test to see if the driver is busy and waiting until it's not busy between the command and data writes. This did not fix the problem.

The display driver writes either a command, or command and data as needed.

Thanks

Saket_Om · ‎2025-10-03

Hello @Harvey White

Did you check the SPI transfer without OS?

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
Saket_Om

Harvey White · ‎2025-10-03

Considering the complexity of the code, it's too dependent on an operating system.

I did put (it's FreeRTOS) a critical section around the whole troublesome section. I'd assume in that case the operating system is being nice and quiet during the transfer.

I'm beginning to suspect that the calling structure for the ST drivers may be the problem, but I have no idea how or why. I'm trying to rewrite a display driver for as little code between the driver and the SPI interface as possible (eliminating the HAL_SPI driver.

This will take a while.

It's not a question of delay between calls to the hal driver. I can change that.

Have the hal SPI drivers been tested with different levels of optimization?

MM..1 · ‎2025-10-03

Primary optimization isnt -G but -g and i never change and mean isnt very tested on all levels . Normal optimization is -O.

Secondary use SPI in blocking mode isnt perfect , but worst mode for dislays.

Plus in question you dont show call then how size and timeouts you use ? Seems you use global timeout next miss...

FYI -g change only elf size MCU code is same

Pavel A. · ‎2025-10-05

Do you mean -Og ?

MM..1 · ‎2025-10-05

No -g0-3

Harvey White · ‎2025-10-05

I use project/properties/optimization to pick the mode rather than relying on memory. -Og is Optimize for Debug.

I'm aware that blocking mode is not best for displays, and it doesn't matter for now. I'm looking for operational rather then speed. The driver has IRQ and DMA modes as well. I do consider the blocking mode to be valid, and if it doesn't work, something is broken, IMHO.

I'm currently working on a driver configuration that eliminates some code, which may be a workaround.

What I do not understand is how changing the optimization level causes two calls to the blocking driver to either work or not work.

Putting taskENTER_CRITICAL and taskEXIT_CRITICAL around the two calls does not help, so I conclude that it's not the OS interrupting the actual transfer.

This perhaps points to the difficulty that hal level drivers have with an operating system, but hal2 fixes that, right? Fully tested and integrated with FreeRTOS and other OS?

Still working on the changed code.