cancel
Showing results for 
Search instead for 
Did you mean: 

Nasty little "Cache New" Nut from the lightning fast STM32H7

flyer31
Senior

In my short LED happy short Hello World example from last week, I use two calculation loops for the LED blinking delays. Playing around with this, I recognized the strange effect, that about every 5th - 10th compile + test try after even midiocre changes (it pimped it up with MCO output, and it even was sufficient to change the configuration setting of MCO1 from 0 to 2 ... but it did not happen, if I did this during runtime in the watch window....), I really got extremely upset, as really every 5th-10th time this nice symmetric LED blinking dared to convert to a crazily prononunced and assymmetric flashing mode ... .

At first I thought, it might be some crazily hidden issue of the compiler, or the optimizer, or even the chip ... to rule out the compiler I uploaded the new Keil 5.26 compiler, just to recognize to my disappointment after 1hour early morning upload time that it is Win64 ... . In a disappointed approach to save the life of my good old WinXP PC, I then decided to give a further try to AC6-Eclipse, but after several crazy hours fighting with the ST-Link connection in Eclipse I gave up. I suddenly had the ingenious idea, that Keil might have supplied only the installer to Win64, but the software itself kept Win32 ... motivated like this I went to a Win10 PC and installed the new Keil, was happy that the folder structure looked so similar to my used Keil stuff ... but after I copied the new Keil folder to my good old WinXP PC, I had to recognize that Keil ended up with the Win32 time .. very dramatic :). Just then, when I tested around on the new Win10 PC with Keil 5.26, I was somehow relieved to see that this strange blink flash assymmetry happend also with tne new Keil ... . And only then I started looking at the adress line numbers of my delay loop in more detail.

[Poetic side remark, maybe better skip this :)]: In the marvelous beach of knowledge, and in the mysty and finely grained beach part of virtual knowledge, KR for sure gave a severe imprint with their ingenious C. But whether Eclipse will really be some lasting imprint there, or only some huge sand castle which will be gone by the wind in some time, will show the next ten years. I amust admit that Eclipse really is impressively nice for Android programming ... . But after I came back to Keil after my Eclipse early mornign experience yesterday, I was positively impressed, how such complicated relationsships as controller programming can be presented so nicely clearly and basic as in the Keil environment. (Exception is their meanwhile 1 hour download time, and only beacuase of this stupid CMSIS stuff ... the only good thing about this is its non-spellable name, it somehow approaches to a nice shortcut for cumbersome).]

... but after all this, I got the solution ... it depends on the start and end number of the generated assembly code. The loop has exactly 6 assembly lines with about 10-20 bytes ... (In opt level 4 it reduces to 4 assembly line, as then the global variable STR and LDR commands are cut away). If these 6 lines exceed over a 32 byte page limit, then the loop time exceeds by a factor of 5...20 ... . So the loops starting at numbers like 5C0, but also 58C or 5BA were fast loops (lightning fast 2.5nsec per command at 400MHz), but the loops starting at numbers like 5D6 or 59E very crazily slow ... really factor 5...20 slower .. .

To solve this issue, it would be nice to have an ALIGN command for the code ... . But unfortunately I did not find this ... . There is an ALIGN commmand in Assembler at Keil (automatically inserting nop's at the start of such for loops), but not for C code as far as I see it. In Internet you find reports about other compilers with the possibility to define "#pragma align 32" or so, but unfortunately not in my loved Keil ... I will give them a Forum question for this, although when they come with this on their crazy new Win64 horse, this really will kill my good old WinXP working PC ... . Or anybody has some ingenious idea how to solve this issue with standard C (maybe it will work with inline assembly and ALIGN 32, I will have to try...)?

A nice point about CubeMX is the clock config sheet. Design+idea looks copied from the ingenious STM32F4 STM32F4xx_Clock_Configuration_V1.0.1.xls sheet ... and I would be much more happy to have such an XLS also for stm32h7, but at least they pimped it up very nicely. To pimp up this blog, I insert a code snippet below from the "nice and easy code fan group" for the RCC initialisation in those 90% of applications, where you have the external xtal/oscillator, and you do not want to spoil microseconds of controller startup time and kBytes of code memory with CubeMX blown up RCC configuration code. Just they seem to have some "secret info" input there ... I did not find the max clock restrictions for HPRE, D1-D3PRE anywhere in the RM or DS (but maybe I did not look carefully enough .. would be important info there of course)... except in the CubeMX Clock configuration overview sheet. 

If you put this SystemClock_Config function enclosed, and the SystemInit function into some C startup file, then together with the "startup_stm32h743xx.s" assembly file, all will be done for a good and slim start into STM32H7 programming (though I would STRONGLY recommend also to add the file stm32h7xx_it.c ... just please strip the last two interrupts with HAL invocation there ... I can really only be baffled how tha HAL people dare to link their blown up up code segments to such a basic file).

Some requests for the STM32H7 manual people:

  • Please specify the 100MHz and 200MHz limits in register descriptions (e. g. in the RM 8.7.6 RCC_D1CFGR) ... I found them finally in the Figure 40 ... but it would nice to have them in the CFGR register descriptions if possible (or you refer in the CFGR register destriptions to this Fig. 40... maybe anyway makes sense ... this Fig. 40 is not easy to find, but very useful).
  • Please check the terms "rcc_hclk1,2,3,4" - they seem to be identical ... I think for the reader it would be much better to use only rcc_hclk in the complete manual, otherwise this is a bit confusing.

2 REPLIES 2
flyer31
Senior

... I solved the issue now with a small DelayWithoutTimer function written in assembly code (see Keil forum), there the ALIGN 32 can be used, and then all fine now ... no further surprises.

flyer31
Senior

... meanwhile I found the miracle command: SCB_EnableICache().

If you do this at the start of the main, then no more such problems ... then the for loops run reliably with full speed, no matter whether they extend over a flash page. (Do not ask me what happens if frequent interrupts occur ... I will have to check this out later ... but I anyway hope to come around frequent interrupts).