cancel
Showing results for 
Search instead for 
Did you mean: 

H7 OCTOSPI HyperRAM data throughput changing with compilation

LCE
Principal

Heyho,

I'm using the H733 (custom board) / H735 (eval kit) with Infineon's HyperRAM S70KL1281 / S70KL1282 at 100 MHz for some time now, all working great, except for one thing that is very annoying:

  • the data throughput from / to HyperRAM seems to depend on compilation, even though the OCTOSPI peripheral was not changed
  • after some compilations it's about 178 Mbyte / s, after another only 54 MB/s.
  • data throughput is constant for one compilation, no matter if I call the test function at MCU power up or while operating with all other peripherals running
  • no caching anywhere

I'm pretty sure that it's not "faulty" timing measurements, using the cycle counter and disabling all interrupt calls around the for loops.

  • Is there something wrong in my test function?
  • Is it maybe "only" how the for loop / iteration is compiled?
  • right now I can't get it back to the high speed, so no map / list file
  • my scope here is too old and slow to check the signal lines

Here's the test function, first writing to HyperRAM, then reading:

/* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
/* OCTOSPI HyperRAM test
 */
#define HYPER_TEST_UART		1

uint32_t OspiHypRamTest(uint8_t u8CountDown)
{
	uint32_t i = 0;
	uint32_t u32Val = 0xFFFFFFFF;
	uint32_t u32MaxLen = (uint32_t)((uint32_t)OSPI_HYPERRAM_END_ADDR / 4);
	uint32_t u32Errors = 0;
	uint32_t u32Data = 0;
	uint32_t u32CycStart = 0;
	uint32_t u32Cycles = 0;
	float flClockMHz = (float)HAL_RCC_GetSysClockFreq() / 1E6;
	float flVal = 0.0f;
	uint32_t *pu32MemAddr = NULL;

	if( 	 OCTOSPI1 == pOspiHypRam ) pu32MemAddr = (uint32_t *)OCTOSPI1_BASE;
	else if( OCTOSPI2 == pOspiHypRam ) pu32MemAddr = (uint32_t *)OCTOSPI2_BASE;

#if HYPER_TEST_UART
	uart_printf("\n\r+++++++++++++++++++++++++++++++++++++++++++++++++\n\r");
	uart_printf("OCTOSPI HyperRAM test, memory mapped, IRQs OFF\n\rcounting ");
	if( 0 == u8CountDown ) uart_printf("UP, start with 0\n\r\n\r");
	else uart_printf("DOWN, start with %08lX\n\r\n\r", u32Val);

	uart_printf("writing bytes: %lu\n\r", (uint32_t)OSPI_HYPERRAM_END_ADDR);
#endif

__DSB();
__disable_irq();

/* write complete HyperRAM */
	/* UP - should be faster */
	if( 0 == u8CountDown )
	{
		u32CycStart = DWT->CYCCNT;
		for( i = 0; i < u32MaxLen; i++ )
		{
			pu32MemAddr[i] = i;
		}
		__DMB();
		__DSB();
		u32Cycles = DWT->CYCCNT;
	}
	/* DOWN */
	else
	{
		u32Val = 0xFFFFFFFF;
		u32CycStart = DWT->CYCCNT;
		for( i = 0; i < u32MaxLen; i++ )
		{
			pu32MemAddr[i] = u32Val;
			u32Val--;
		}
		__DMB();
		__DSB();
		u32Cycles = DWT->CYCCNT;
	}

__enable_irq();
__DSB();

	u32Cycles -= u32CycStart;

	flVal = (float)u32Cycles / flClockMHz;
	flOspiRamSpeedMBpsMmWr = (float)OSPI_HYPERRAM_END_ADDR / flVal;
	flOspiRamSpeedMBpsMmWr *= (float)MEGA_CORRECTION;

#if HYPER_TEST_UART
	uart_printf("%lu CPU cycles = %.1f ms\n\r", u32Cycles, (flVal / 1000.0f));
	uart_printf("\n\r-> %.2f MB/s (%.0f Mbit/s) WRITE\n\r\n\r", flOspiRamSpeedMBpsMmWr, (8.0f * flOspiRamSpeedMBpsMmWr));

	uart_printf("reading & comparing bytes: %lu\n\r", (uint32_t)OSPI_HYPERRAM_END_ADDR);
#endif

__DSB();

	if( 	 OCTOSPI1 == pOspiHypRam ) pu32MemAddr = (uint32_t *)OCTOSPI1_BASE;
	else if( OCTOSPI2 == pOspiHypRam ) pu32MemAddr = (uint32_t *)OCTOSPI2_BASE;

__disable_irq();
__DSB();

/* read complete HyperRAM and compare */
	/* UP - should be faster */
	if( 0 == u8CountDown )
	{
		u32CycStart = DWT->CYCCNT;
		for( i = 0; i < u32MaxLen; i++ )
		{
			u32Data = pu32MemAddr[i];
			if( u32Data != i ) u32Errors++;
		}
		__DMB();
		__DSB();

		u32Cycles = DWT->CYCCNT;
	}
	/* DOWN */
	else
	{
		u32Val = 0xFFFFFFFF;
		u32CycStart = DWT->CYCCNT;
		for( i = 0; i < u32MaxLen; i++ )
		{
			u32Data = pu32MemAddr[i];
			if( u32Data != (u32Val - i) ) u32Errors++;
		}
		__DMB();
		__DSB();

		u32Cycles = DWT->CYCCNT;
	}
__enable_irq();

	u32Cycles -= u32CycStart;

	flVal = (float)u32Cycles / flClockMHz;
	flOspiRamSpeedMBpsMmRd = (float)OSPI_HYPERRAM_END_ADDR / flVal;
	flOspiRamSpeedMBpsMmRd *= (float)MEGA_CORRECTION;

#if HYPER_TEST_UART
	uart_printf("%lu CPU cycles = %.1f ms\n\r", u32Cycles, (flVal / 1000.0f));
	uart_printf("\n\r-> %.2f MB/s (%.0f Mbit/s) READ\n\r", flOspiRamSpeedMBpsMmRd, (8.0f * flOspiRamSpeedMBpsMmRd));

	if( 0 == u32Errors ) uart_printf("\n\rNULL errors\n\r");
	else uart_printf("\n\r# ERR: u32Errors = %lu\n\r", u32Errors);
	uart_printf("-------------------------------------------------\n\r");
#endif

	return u32Errors;
}

Anybody any ideas?

Thanks in advance!

4 REPLIES 4
STOne-32
ST Employee

Dear @LCE ,

Thanks for the interesting use case. is that possible to detail the exact IDE/compiler environment so we can try to reproduce the same at our end ?   @KDJEM.1 and then analyze 

Ciao

STOne-32. 

LCE
Principal

I'm using

- H735 EVK or H733 custom board

- STM32CubeIDE Version: 1.10.1

- optimization FAST

- CPU clock 400 MHz

- OSPI 100 MHz

- HyperRAM setup via direct register access (doesn't make a difference to HAL setup)

LCE
Principal

I just got the "fast" version again, maybe there's some bus issues in the background, depending on the UART use:

UART 3 is used for debugging, in TX DMA mode.

The ouput function uart_printf() fills the TX DMA buffer, just waits at the beginning for previous transfers to finish by checking TC and other stuff with a function UartDbgDmaTxWait().

When I put UartDbgDmaTxWait() after each uart_printf() around OspiHypRamTest() I get the high speed - for now at least...

The question remains, before I did that, why sometimes fast / slow results, without changing anything concerning the OSCTOSPI peripheral and the test function?

 

 

LCE
Principal

I also compared the assembler in the list files, between slow / fast version:

the important loops reading / writing HyperRAM and comparing - while the interrupts are disabled - basically look the same