2024-11-07 10:47 AM
Hello,
I want to execute code located in external 16bit NOR FLASH e.g. on a STM32H7x3 evaluation board.
I can run the NOR FLASH demo (erase, write, read flash) successfully but have no clue how to create and flash code.
After googling 2 days I found plenty of examples for xSPI but none seems to suit this kind of flash type.
Can somebody point me into the right direction e.g. how to configure the linker script and flash the code using the debugger or Cube Programmer. Any example code or ANs?
Thank you
2024-12-02 12:37 PM
I'm running the benchmarks on this H753-Eval2 HW. I'll check the FMC NOR FLASH initialization and get back. Thank you.
2024-12-02 12:44 PM
For speed, check the caching and MPU_Config()
Generally external buses and memory are markedly slower than internal memory. The external bus is going to be 100 MHz or less, and 400-480 MHz at the MCU
2024-12-02 12:58 PM
NOR Flash is exceedingly slow to erase/write, and doesn't allow for doing so concurrently.
The QSPI could be used as a 4KB sector Mass Storage, and with FATFS
Preferred storage would be large NAND Flash in the form of a MicroSD Card or eMMC chip, where a lot of the complexity from wear management and erased pools are done by the hardware.
2024-12-02 10:33 PM
@Tesla DeLorean wrote:For speed, check the caching and MPU_Config()
Generally external buses and memory are markedly slower than internal memory.
I set the MPU region to non-cacheable and the performance dropped significantly.
Is there any reason why code execution in parallel 16bit/70ns NOR should be slower than in 8bit/50Mhz DTR QSPI?
2024-12-03 04:52 AM
Forget about FFT for the moment as it's a bit tricky and many files (many dependencies) to relocate in the correct region.
Try to run a simple algorithm where you know how it works and ho to map.
Check also the FMC timings, are they optimal for the memory?..
2024-12-03 04:53 AM - edited 2024-12-03 05:03 AM
Hello,
I'm trying to understand the FMC NOR initialization, e.g. as done in the FMC_NOR demo as provided by CubeMx for the STM32H743_EVAL2 board.
I debugged the code on the STM32H753_EVAL2 board.
The MCU clock in the demo is set to 400MHz, HCLK to 200MHz.
The NOR timing parameter configuration is
NOR_Timing.AddressSetupTime = 9;
NOR_Timing.AddressHoldTime = 1;
NOR_Timing.DataSetupTime = 5;
NOR_Timing.BusTurnAroundDuration = 4;
NOR_Timing.CLKDivision = 4;
NOR_Timing.DataLatency = 2;
NOR_Timing.AccessMode = FMC_ACCESS_MODE_B;
For mode 2B the read timing for H7 FMC in the reference manual RM0433 is specified as
As far as I understand this timing, data is read by the H7 MCU at /NE 0->1 edge.
According to the board schematic the EVAL2 RevE is equipped with an NOR FLASH MT28EW128ABA1LPC-0SIT. The datasheet says this device has an access time of 70ns. So data is valid after max. 70ns after /NE 1->0 edge.
According to the H753ii datasheet DS12117 Rev 9 table 163, the H7 has an setup time of min 11ns.
So in my understanding, the entire memory transaction time for a 70ns flash should be calculated as min 70ns+11ns=81ns.
In the demo the FMC is configured to ADDSET+DATAST = 9 + 5 = 14, which calculates to 14*5ns = 70ns.
Using a scope, I measure for the memory read transaction (/NE 1->0, 0->1) ~75ns and for /NE low to data valid ca. 60-65ns.
Is my understanding and calculation regarding the FMC NOR flash read timing and parameters correct?
If so, the demo setup does only work for a 60ns flash. Am I right?
BTW:
According to this document how-to-configure-the-fmc-peripheral-to-interface-an-stm32-mcu the Address Setup Time for the MT28WE flash operation, the timing parameters should be calculated as
tACC = 60ns
tHCLK = 1/200MHz = 5ns
ADDSET = tACC / tHCLK = 60ns / 5ns = 12 HCLK cycles,
DATAST = tWP / tHCLK = 35ns / 5ns = 7 HCLK cycles
ADDSET + DATAST = 12+7 = 19 -> 95ns
Who is right?
Thank you
2024-12-03 10:50 AM
@regjoe wrote:
Is there any reason why code execution in parallel 16bit/70ns NOR should be slower than in 8bit/50Mhz DTR QSPI?
In most parallel NOR flash datasheet I found that a "Page Read" feature is supported.
This seems to be not available in H7 NOR controller but in NAND controller. IMHO a page read could improve a cache fill operation. Also the xSPI controller and flash seems to be optimized for sequential read operations.
Could this be the reason why the FFT demo running in parallel NOR is 2x slower compared to DQSPI NOR?
Or is the QSPI/OSPI peripheral somewhat optimized for code execution but the FMC NOR is not intended to be used for it?
I wonder why I cannot find any ST32 application that is running code from parallel NOR.
Thanks
2024-12-04 02:43 AM
Regarding this scope screenshot here (showing fetch of code which is located in the parallel NOR flash)
the measured cycle time for a sequence of random read accesses is ca. 75ns, assuming each spike of /OE is a 16bit data transfer.
This means the data transfer rate is ca. 2/75e-9 = 26.6 MB/s, this is about 1/4 of the theoretical max. 100MB/s of DQSPI/50MHz/DTR.
I guess this is the main reason why the FFT code execution from parallel flash is 2x slower than from DQSPI.
2024-12-06 07:06 AM
Hello,
to speed up sequential read access, most parallel NOR flash have implemented the s.c. Asynchronous Page Read mode.
The first read from an address is considerably slow e.g. at 70ns. If the following address is in the same page, subsequent reads are done faster e.g. at 15-25ns (see https://community.infineon.com/t5/Knowledge-Base-Articles/Initial-Access-Time-and-Page-Access-Time-in-NOR-Flash/ta-p/254636#. )
Some flash have implemented a s.c. Synchronous Read Mode. This would require additional signals and it seems that these flash are only available from Infineon and are quite expensive.
Unfortunately it seems that the H7 has not implemented the Asynchronous Page Read mode. At least I cannot find an appropriate timing in the data sheets and if so, I don't know how to configure the FMC to support this feature.
Any idea? Is Page Read Mode not available in H7 devices?
2024-12-15 05:59 AM
Ok, I got the ACK from ST that the burst feature is not supported in asynchronous mode. This explains why code execution from parallel asynchronous NOR is slower than from xSPI NOR and is not recommended.
I think I'll keep on using the 2MB flash µC's from ST due to faster code execution from internal flash and probably use a parallel NOR for scattered const data. As already mentioned, the dual QSPI is already used in non-memory-mapped mode for writing log data.