2024-10-16 12:52 AM
Heyho,
I'm using the H733 (custom board) / H735 (eval kit) with Infineon's HyperRAM S70KL1281 / S70KL1282 at 100 MHz for some time now, all working great, except for one thing that is very annoying:
I'm pretty sure that it's not "faulty" timing measurements, using the cycle counter and disabling all interrupt calls around the for loops.
Here's the test function, first writing to HyperRAM, then reading:
/* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
/* OCTOSPI HyperRAM test
*/
#define HYPER_TEST_UART 1
uint32_t OspiHypRamTest(uint8_t u8CountDown)
{
uint32_t i = 0;
uint32_t u32Val = 0xFFFFFFFF;
uint32_t u32MaxLen = (uint32_t)((uint32_t)OSPI_HYPERRAM_END_ADDR / 4);
uint32_t u32Errors = 0;
uint32_t u32Data = 0;
uint32_t u32CycStart = 0;
uint32_t u32Cycles = 0;
float flClockMHz = (float)HAL_RCC_GetSysClockFreq() / 1E6;
float flVal = 0.0f;
uint32_t *pu32MemAddr = NULL;
if( OCTOSPI1 == pOspiHypRam ) pu32MemAddr = (uint32_t *)OCTOSPI1_BASE;
else if( OCTOSPI2 == pOspiHypRam ) pu32MemAddr = (uint32_t *)OCTOSPI2_BASE;
#if HYPER_TEST_UART
uart_printf("\n\r+++++++++++++++++++++++++++++++++++++++++++++++++\n\r");
uart_printf("OCTOSPI HyperRAM test, memory mapped, IRQs OFF\n\rcounting ");
if( 0 == u8CountDown ) uart_printf("UP, start with 0\n\r\n\r");
else uart_printf("DOWN, start with %08lX\n\r\n\r", u32Val);
uart_printf("writing bytes: %lu\n\r", (uint32_t)OSPI_HYPERRAM_END_ADDR);
#endif
__DSB();
__disable_irq();
/* write complete HyperRAM */
/* UP - should be faster */
if( 0 == u8CountDown )
{
u32CycStart = DWT->CYCCNT;
for( i = 0; i < u32MaxLen; i++ )
{
pu32MemAddr[i] = i;
}
__DMB();
__DSB();
u32Cycles = DWT->CYCCNT;
}
/* DOWN */
else
{
u32Val = 0xFFFFFFFF;
u32CycStart = DWT->CYCCNT;
for( i = 0; i < u32MaxLen; i++ )
{
pu32MemAddr[i] = u32Val;
u32Val--;
}
__DMB();
__DSB();
u32Cycles = DWT->CYCCNT;
}
__enable_irq();
__DSB();
u32Cycles -= u32CycStart;
flVal = (float)u32Cycles / flClockMHz;
flOspiRamSpeedMBpsMmWr = (float)OSPI_HYPERRAM_END_ADDR / flVal;
flOspiRamSpeedMBpsMmWr *= (float)MEGA_CORRECTION;
#if HYPER_TEST_UART
uart_printf("%lu CPU cycles = %.1f ms\n\r", u32Cycles, (flVal / 1000.0f));
uart_printf("\n\r-> %.2f MB/s (%.0f Mbit/s) WRITE\n\r\n\r", flOspiRamSpeedMBpsMmWr, (8.0f * flOspiRamSpeedMBpsMmWr));
uart_printf("reading & comparing bytes: %lu\n\r", (uint32_t)OSPI_HYPERRAM_END_ADDR);
#endif
__DSB();
if( OCTOSPI1 == pOspiHypRam ) pu32MemAddr = (uint32_t *)OCTOSPI1_BASE;
else if( OCTOSPI2 == pOspiHypRam ) pu32MemAddr = (uint32_t *)OCTOSPI2_BASE;
__disable_irq();
__DSB();
/* read complete HyperRAM and compare */
/* UP - should be faster */
if( 0 == u8CountDown )
{
u32CycStart = DWT->CYCCNT;
for( i = 0; i < u32MaxLen; i++ )
{
u32Data = pu32MemAddr[i];
if( u32Data != i ) u32Errors++;
}
__DMB();
__DSB();
u32Cycles = DWT->CYCCNT;
}
/* DOWN */
else
{
u32Val = 0xFFFFFFFF;
u32CycStart = DWT->CYCCNT;
for( i = 0; i < u32MaxLen; i++ )
{
u32Data = pu32MemAddr[i];
if( u32Data != (u32Val - i) ) u32Errors++;
}
__DMB();
__DSB();
u32Cycles = DWT->CYCCNT;
}
__enable_irq();
u32Cycles -= u32CycStart;
flVal = (float)u32Cycles / flClockMHz;
flOspiRamSpeedMBpsMmRd = (float)OSPI_HYPERRAM_END_ADDR / flVal;
flOspiRamSpeedMBpsMmRd *= (float)MEGA_CORRECTION;
#if HYPER_TEST_UART
uart_printf("%lu CPU cycles = %.1f ms\n\r", u32Cycles, (flVal / 1000.0f));
uart_printf("\n\r-> %.2f MB/s (%.0f Mbit/s) READ\n\r", flOspiRamSpeedMBpsMmRd, (8.0f * flOspiRamSpeedMBpsMmRd));
if( 0 == u32Errors ) uart_printf("\n\rNULL errors\n\r");
else uart_printf("\n\r# ERR: u32Errors = %lu\n\r", u32Errors);
uart_printf("-------------------------------------------------\n\r");
#endif
return u32Errors;
}
Anybody any ideas?
Thanks in advance!
2024-10-16 01:10 AM
2024-10-16 02:02 AM
I'm using
- H735 EVK or H733 custom board
- STM32CubeIDE Version: 1.10.1
- optimization FAST
- CPU clock 400 MHz
- OSPI 100 MHz
- HyperRAM setup via direct register access (doesn't make a difference to HAL setup)
2024-10-16 02:34 AM
I just got the "fast" version again, maybe there's some bus issues in the background, depending on the UART use:
UART 3 is used for debugging, in TX DMA mode.
The ouput function uart_printf() fills the TX DMA buffer, just waits at the beginning for previous transfers to finish by checking TC and other stuff with a function UartDbgDmaTxWait().
When I put UartDbgDmaTxWait() after each uart_printf() around OspiHypRamTest() I get the high speed - for now at least...
The question remains, before I did that, why sometimes fast / slow results, without changing anything concerning the OSCTOSPI peripheral and the test function?
2024-10-16 02:39 AM
I also compared the assembler in the list files, between slow / fast version:
the important loops reading / writing HyperRAM and comparing - while the interrupts are disabled - basically look the same
2024-10-16 05:40 AM
Hello LCE,
Have you considered providing protection if
u32Cycles -= u32CycStart;
wraps around? Perhaps that would account for the two consistent values...
Regards,
Dave
2024-10-16 06:25 AM
That's not necessary with (C's ?) unsigned integer math.
(I think I did that before, it didn't change anything.)
That would only explain the values at start-up, a rather defined time, but I also get the exact same timing values if I start OspiHypRamTest() by UART anytime the application is running.
And I checked also with the 1 ms SysTick, giving the same results.
2024-10-16 04:02 PM
So what is different in "compilation"? Debug vs Release? Optimization?
2024-10-16 10:02 PM
> So what is different in "compilation"? Debug vs Release? Optimization?
That would be too easy and too obvious! ;)
No, that happens with a new compilation with no change of release / debug mode or optimization settings.
And even without any change of the relevant HyperRam init and test files.
So it can be only something happening in the background, using the same bus as OCTOSPI, my guess.
The test is performed at start-up, the only stuff doing using busses "in the background" until then are:
See above, the UART is my best guess for now, as it is using DMA and the AXI SRAM, where also OCTOSPI is connected. And as said above, waiting until UART3 TX DMA was finished already helped.
I'll keep an eye on this with my next compilations...
2024-10-17 02:29 PM
Dear @LCE ,
Are you able to localize the difference of the cycles counts at WRITE or READ or for Both operations. I’m suspecting a kind of code alignment from Cortex-M7 that is propagated to the AXI ( 64-bits) width to the memory mapped OctalSPI . If possible to share the assembly generated on both cases ( high and slow) for the small loops ( it should be 5 to 6 assembly lines each) . Thank you for help
Cheers,
STOne-32