[SOLVED] How can I fix frequency sensitivity in QUADSPI communication?

JohnS · ‎2022-03-18

I have a STM32F7xx MCU connected to a Winbond W25Q01. The datasheet says the chip will run up to 133MHz. If I set the MCU clock to 108MHz and the quadspi clock prescaler to 0 so that they are running at the same frequency, the chip initializes and communicates no problem.

However, if I set the MCU to 216 MHz and the quadspi clock prescaler to 1 so that it is running at 108MHz it won't initialize and I can't communicate with it.

If I set the MCU to 216MHz and the prescaler to 2 so the quadspi clock is running at 72MHz it initializes no problem.

Summary Table

MCU Freq [MHz] | Prescaler | Quadspi Freq [MHz] | Result

108 | 0 | 108 | pass

216 | 1 | 108 | FAIL

197 | 1 | 98.5 | FAIL

155.5 | 1 | 77.75 | pass

108 | 1 | 54 | pass

216 | 2 | 72 | pass

I believe that the failure mode is that when I query the status register I get a corrupted response and the code hangs thinking that the chip is busy when in reality it is not. But I can't explain why when the MCU and qspi are running at the same speed (108MHz) it runs but doesn't run when only the qspi is running at that speed.

Any ideas?

JohnS · ‎2022-04-08

SOLVED I also had to set CSHT to 6 cycles, but the real big change was I found that it was able to run at 108Mhz when the processor was at 216Mhz when "Sample Shifting Half Cycle" was enabled. I believe that this also explains why it would run at 108Mhz when the MCU was at 108Mhz because there was a shift in timing between the two that didn't exist when they were running at the same speed. I hope this helps someone else in the future. Thank you to the community for all the information, it was very helpful in getting to the root cause solution.

View solution in original post

Andreas Bolsch · ‎2022-03-19

Probably you're right at (or slightly beyond) hardware limits:

1) The flash's datasheet does *not* simply imply operation up to 133MHz. The maximum clock speed depends on VCC value, capacitve loading (and board layout) and on the instruction(!) used! The "133Mhz" is just a marketing buzzword, because this can be achieved only under extremly specific side conditions.

2) E.g. /CS deselect time for the flash is 10ns min., when the state machine in QSPI interface derives all timings from its single input clock, the maximum clock is consequently limited to 100 Mhz when CSHT in QUADSPI_DCR is set to 0 (i.e. one clock cycle).

3) Don't look at clock cycle time only, as this silently assumes a perfect 50% duty cycle. That's not realistic. In particular, the QPSI interface does *NOT* always give a 50% duty cycle (see remark in RM).

If you want to get close to the limits, the PCB design will require a lot of efforts, and you would have to check *ALL* timing requirements in finest detail. Even temperature and VCC variations ...

So, be conservative and stay well below the datasheet limits!

BTW: Did you actually *measure* the precise clock figures above? Did you check signal waveforms with a 1GHz scope? Skew between signals?

Tesla DeLorean · ‎2022-03-19

Most of my stuff is fixtured with sockets and flying wires. So 66 MHz tends to be the ceiling there, and I don't code loaders to be super aggressive. I tend to find the point it breaks, and then back-off, so I've seen similar results to what you're reporting. As I can't tune/control the circuit details I've not focused a lot of specifics. Watch also that the specs tend to use MHz and MBps somewhat interchangeably when describing data bandwidth. The DTR/DDR modes can often buy you bandwidth at lower clock rates so often it's a matter of finding the sweet-spot, where signal integrity works. Watch duty cycles, and edge/phase settings.

As @Andreas Bolsch indicates, as frequency goes up problems increase as you're into transmission-line theory and things need to be well matched so as not to get relative skewing, or ringing/standing-waves in the lines.

On the configurable side is the slew-rate/drive at each end, at the STM32 end this is SPEEDR, and for such short traces and relatively low-load devices, you should be able to back the speed settings a notch or two, and reduce the amount of energy dumped into the lines. The Winbond part also has drive settings at it's end. For eMMC devices we typically add 27/33R range series resistors into matched trace lengths.

For some of the Micron devices the dummy cycles increase with speeds, giving the array the chance to get the initial prefetch from the array started.

I seem to recall the status from the Winbond repeats, so dummy cycles might be usable to skip the first access. Also the internal registers might have different timing from the burst optimized array transfers, and I'm pretty sure when writing them you should busy spin-loop them as you would for other writes.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

JohnS · ‎2022-03-21

@Andreas Bolsch & @Community member , thank you for the insight. @Andreas Bolsch , I did not measure these with my scope, I recorded what was configured/reported by the MCU. I'll have to examine the board to determine where I can connect the scope to probe the clock signal and it sounds like that is an important piece of information in digging deeper into this.

It also sounds like I need to do a little math on the CSHT to make sure that it's hitting the correct deselect time. Is there a drawback or issue of having too high of a CSHT, aside from it creating small delays in your ability to transmit messages?

I did read that the Winbond part can adjust it's drivers, so I need to dig deeper into the datasheet on that as well.

Once again, thank you for your insight and taking the time to share your experience!

Andreas Bolsch · ‎2022-03-21

I wouldn't believe blindly on the clock settings reported by CubeMx, there are too many bugs and odds there. The clock tree *is* complicated, hence verification is mandatory.

Increasing CSHT by one has rather little impact on speed. E.g. for read accesses in memory mapped mode on consecutive addresses it gives virtuall no penalty at all, as in this case CS remains asserted anyway.

Indirect reads and writes are always affected, but one additional clock for CS high time is equivalent to time for one bit (spi mode) or one nibble (qpi mode), so in most cases a very tiny fraction of the whole operation.

JohnS · ‎2022-04-08

SOLVED I also had to set CSHT to 6 cycles, but the real big change was I found that it was able to run at 108Mhz when the processor was at 216Mhz when "Sample Shifting Half Cycle" was enabled. I believe that this also explains why it would run at 108Mhz when the MCU was at 108Mhz because there was a shift in timing between the two that didn't exist when they were running at the same speed. I hope this helps someone else in the future. Thank you to the community for all the information, it was very helpful in getting to the root cause solution.