cancel
Showing results for 
Search instead for 
Did you mean: 

STM32U5xx: OCTALSPI (as QSPI) fails for faster speed, with 1V8 it fails more drastically

tjaekel
Senior III

This is more a bug report, not a question.

Background:

I want to use OCTALSPI as QuadSPI (QSPI), esp. with VDD set to 1V8. I want to get (at least) 30 MHz SCLK working on QSPI.

But I cannot go faster as 12.195 MHz, which is associated with a OCTALSPI clock divider set to: 13.

Symptoms:

VDD = 3V3:

down to clock divider = 13 - all looks fine
clock divider lower (for faster speed): looks different on SCLK but might be OK (still reasonable, even with gaps)
the minimum for clock divider with VDD = 3V3 is: 6 (resulting in 27.778 MHz - I want to see 30 MHz working)
anything faster (smaller clock divider) FAILS: there is no SCLK anymore
VDD = 1V8:

all fine down to clock divider = 13 - the same as 3V3
clock divider = 12 - FAILS completely! no SCLK anymore!
with VDD = 1V8: I cannot even reach the same speed as with 3V3
Details on FW:

I use the standard HAL functions. My QSPI ends up in calling "HAL_QSPI_Transmit()". This is running in "polling mode": the data is written to FIFO register with looping and waiting until it was sent (no INT, no DMA):

   status = OSPI_WaitFlagStateUntilTimeout(hospi, HAL_OSPI_FLAG_FT, SET, tickstart, Timeout);

The OCTALSPI clock source is:

PeriphClkInit.PeriphClockSelection = RCC_PERIPHCLK_OSPI;

PeriphClkInit.OspiClockSelection = RCC_OSPICLKSOURCE_SYSCLK;

Waveforms:

STM32U5A5_QSPI_issues3.png

STM32U5A5_QSPI_issues4.png

STM32U5A5_QSPI_issues1.png

STM32U5A5_QSPI_issues2.png

STM32U5A5_QSPI_issues5.png

VDD = 1V8:
It fails immediately with CLKDIV = 12 - where it was working still (a bit) with 3V3!

What is wrong?

The goal is:

QSPI with VDD = 1V8 and 30 MHz
the SCLK should look constant (no gaps, continuous)
if gaps are on SCLK - caused by "SW polling mode" - OK:
how to change and use DMA based functions?

1 ACCEPTED SOLUTION

Accepted Solutions

OK, I am closing this ticket.

Observations:

  1. the "byte bursts" depend on Debug code vs. Release code:
    if I set optimization to -g0 and -O3 - it happens later.
    But it happens: the longer the QSPI transaction, or the faster the speed - the following data words come as "byte burst" (not as words burst, never mind what the FIFO setting is).
  2. The GPIO Speed setting has FOR SURE a dramatic impact, if it is working or not (Speed setting as "fastest" makes it working for faster QSPI speed, slower GPIO speed makes the QSPI failing).

I am closing this ticket, even I did not have a clue how to avoid the "byte burst" and why GPIO speed setting matters so match to see a SCK signal on scope.

View solution in original post

15 REPLIES 15
FBL
ST Employee

Hello @tjaekel 

Would you provide more details about your hardware setup? Full path of your clock source selected (source of SYSCLK).Also, for 1V8, did you enable High Speed Low Voltage HSLV?  

 

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

KDJEM.1
ST Employee

Hello @tjaekel ,

Could you please give more details about the issue:

- Which STM32U5 device are you using?

-Are you using an ST board or customer board?

Note that the OCTOSPI frequency depends on CL capacity, For that please refer to the datasheet device OCTOSPI characteristics table and check all constraints.

KDJEM1_0-1707473455055.png

Please take a look to an OCTOSPI example may help you.

Thank you.

Kaouthar

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

Thank you.

It is a NUCLEO-U5A5ZJ-Q board.
There is just a scope connected, not any chip (no load).

I will check today manual and what this HSLV is.

I do not change the voltage range (no dynamic voltage scaling used).
QSPI is in SDR mode (not DDR/DTR, no Hyperbus).

It FAILS still, the same way as before (I have studied datasheet and RM and tried several things).

BTW: it fails on a STM32U5A5 MCU a bit earlier compared to a STM32U575 MCU:

  • on STM32U575, with VDD = 3V3: I can lower the OCTOSPI divider one step more: U575 is a bit better as U5A5
  • but on VDD = 1V8 - both fail in the same way (same OCTOSPI divider fails)

STMU575 is slightly better in terms of OCTOSPI speed.

I tried:

  1. use SYSCLK or PLLQ, both set to 160 MHz - no difference (see below Remark-1)
  2. use HSLV - no difference
  3. use LL_SYSCFG_EnableVddCompensationCell(); - no difference

Remark-1

I see in RM that U5A5 and U575 have also PLL2 and PLL3. And it let me configure PPL2 and PLL3 in STMCubeMX.

RM says: PLLxQ for QSPI (only) can be also as 200 MHz. But not possible to set 200 MHz on NUCLEO board with 16 MHz OSC. I would need to use PPL2Q (with 200 MHz).

But the HAL drivers for U5A5, U575 (U5xx) support only ONE PLL (nothing with PLL1 or PLL2). Another issue that HAL drivers for U5xx do not support PPL2, PLL3?

I can confirm:

  • SYS clock is 160 MHz (or PLLQ) - used for QSPI as clock source
  • I run voltage range 1 (I have checked by reading back via HAL_PWREx_GetVoltageRange() >> 16)
  • I am not changing to use DVS (dynamic voltage scaling)
  • "overclocking" the MCU, e.g. with 192 MHz: MCU works still (also USB), but now the OCTALSPI fails already on the "just working" DIV setting (now 13 fails which was working with 160 MHz - issue seems to be in OCTOSPI)
  • running VDD = 1V8 makes it worse (OCTOSPI fails already on lower speeds, I cannot use smaller DIV),
    on VDD = 1V8 it fails immediately on faster speed

HSLV config

As mentioned in STM documentation: HSLV is very risky! It can damage the chip, e.g. HSLV enabled but VDD = 3V3.
The NUCLEO boards have a VDD jumper! So, we had to "follow" what the configured VDD is. I do via ADC and measuring Vrefint and just if below 1.9V - I enable QSPI pins for HSLV. But no difference in speed.

Project and details

Find the details in my project, on GitHub:

https://github.com/tjaekel/NUCLEO-U5A5JZ-Q_QSPI 

Other Remark

The fact that the data words are spread out now (and I see gaps in SCLK) might be obvious: I use data transfer via OSPI_WriteReadTransaction() which sends all in indirect mode. All is based on SW polling (checking the FIFO status) and FW can be slow and cause these gaps (but fine).

But:
I have checked if I can use OCTOSPI (as QSPI) with DMA (in indirect mode). It looks to me, it is not possible (at least not mentioned/documented if OCTOSPI can generate a DMA event).

Conclusion

I am frustrated. I have changed to STM32U5xx because of the QSPI support (even it lacks some features, like "regular SPI"). Now I realize that it does not work as specified (in datasheet), e.g. to get 93 MHz SCLK in voltage range 1, even on VDD = 1V8. I get just 27 MHz maximum (and just with 3V3, not 1V8).

It does not work for me as I need (1V8 and 30 MHz).

And the Errata document is already pretty long for OCTOSPI (10 entries already). Maybe you had to add a new one and correct datasheet and RM for a speed limitation on OCTOSPI.   😉 LOL

More joking

I guess, you have a timing constraints violation in your MCU RTL. Send me your RTL and I could debug.
Or does STM solder "slow corner" (yield) parts on NUCLEO board?

 

Sorry, I take the comment back about CubeMX and not possible to use PLL2: CubeMX generates code for PLL2 and it can be used/compiled.

Just:

  • using HSE as clock source for PLL2 FAILS: code returns a TIME_OUT
  • using MSI is OK, but:
  • the QSPI fails in the same way: not able to see a higher SCLK speed on OCTOSPI

My clock config code for QSPI is this:

void HAL_OSPI_MspInit(OSPI_HandleTypeDef* hospi)
{
  GPIO_InitTypeDef GPIO_InitStruct = {0};
  RCC_PeriphCLKInitTypeDef PeriphClkInit = {0};
  if(hospi->Instance==OCTOSPI1)
  {
  /** Initializes the peripherals clock
  */
#if 0
    PeriphClkInit.PeriphClockSelection = RCC_PERIPHCLK_OSPI;
    PeriphClkInit.OspiClockSelection = RCC_OSPICLKSOURCE_SYSCLK;	//RCC_OSPICLKSOURCE_PLL1;	//RCC_OSPICLKSOURCE_SYSCLK; 160 MHz, could be max. 200 MHz
    if (HAL_RCCEx_PeriphCLKConfig(&PeriphClkInit) != HAL_OK)
    {
      Error_Handler();
    }
#else
    PeriphClkInit.PeriphClockSelection = RCC_PERIPHCLK_OSPI;
    PeriphClkInit.OspiClockSelection = RCC_OSPICLKSOURCE_PLL2;
    PeriphClkInit.PLL2.PLL2Source = RCC_PLLSOURCE_MSI;		//HSE fails with TIME_OUT!!!!!
    PeriphClkInit.PLL2.PLL2M = 1;
    PeriphClkInit.PLL2.PLL2N = 50;		//40 = 160 MHz, 50 = 200 MHz
    PeriphClkInit.PLL2.PLL2P = 2;
    PeriphClkInit.PLL2.PLL2Q = 1;
    PeriphClkInit.PLL2.PLL2R = 2;
    PeriphClkInit.PLL2.PLL2RGE = RCC_PLLVCIRANGE_0;
    PeriphClkInit.PLL2.PLL2FRACN = 0;
    PeriphClkInit.PLL2.PLL2ClockOut = RCC_PLL2_DIVQ;
    if (HAL_RCCEx_PeriphCLKConfig(&PeriphClkInit) != HAL_OK)
    {
       Error_Handler();
    }
#endif
//...

Even if I configure PPL2Q fore 200 MHz (as mentioned in RM, for OCTOSPI only, PPL2N = 50) - no speed improvement, failing in the same way (all faster as OCTOSPI DIV = 12 fails on VDD = 1V8, using U575 right now).

So, the maximum QSPI SCLK speed I can get is 12.5 MHz (with "overclocking" PPL2Q to 200 MHz, "allowed" per RM) - on VDD = 1V8.

BTW: the project is based on a QSPI example project (found inside CubeMX folder, for NUCLEO board).

How to get 30 MHz SCLK on VDD = 1V8?

 

Yes, no way to go above 12.5 MHz.

Even setting PLL2Q to 200 MHz (as mentioned in RM) - FAILS!
(the SCLK stalls later an data transmission part, CMD, ADDR, ALT are still OK, but no SCLK on QSPI Write data sequence)

I've found a "trick": GPIO speed setting

When I configure the GPIO speed to 3 (slew rate) - I can reach 83.3 MHz on VDD = 1V8. OK, great!

STM32U5A5_QSPI_issues8.png

But:

  • I have seen issues with cross-talk when GPIO speed is 3 (fastest slew rate): I saw some "glitches" (cross-talk) on NCS signal (wrong pulses). Not sure if cross-talks happens inside chip or on my external wiring.
  • Why are the words coming in byte bursts? I have configured for 32bit. The slower the speed (larger DIV), e.g. OCTALSPI DIV = 16 - it starts later when it changes to these "byte bursts".
  • I am expecting to see gaps between 32bit words (due to polling status for free FIFO) - but not gaps when 32bit words are sent (expecting 8 clocks without a gap)

STM32U5A5_QSPI_issues9.png

Very strange behavior.

I tried with FIFO threshold - no change: the 32bit words are still split into a byte burst transfers. WHY?
(how to get 32bit words shifted out with 8 clocks without a gap during a word? ok to have gaps between words but not assuming during a word)

BTW: setting FIFO threshold to 32 (OCTALSPI FIFO has 32 BYTES!) - FAILS: the transaction never completes and all data words are missing:

STM32U5A5_QSPI_issues14.png

It is a bit obvious: FIFO is 32 BYTES!: if I wait for free space in FIFO, just one byte potentially free again - but I send always 32bit words (4 bytes) - it would overflow the FIFO.

RM says on this topic something like this: "if you write a data word beyond the FIFO size - this data word is lost". OK, but it fails for me immediately with the first data word written: even the very first word is not completed! (where FIFO should be empty, assuming CMD, ADDR and ALT do not go via this data FIFO, and even they would do: enough space in FIFO, I have 1 byte CMD, 4 byte ADDR, 3 byte ALT).

The OCTOSPI is "so strange"....

Conclusion for now:

In order to get more as 12 Mbps on OCTALSPI, on VDD = 1V8 - I have to use the GPIO Speed setting (set to fastest slew rate). But why?