cancel
Showing results for 
Search instead for 
Did you mean: 

Extend SysTick to 32 bit / Interrupt-safe DCACHE handling

flyer31
Senior

In my software, I usually extend the 24bit SysTick handler to 32bit with the tricks shown in the code segment below. Using such a do-while construct as shown in "GetSysTickCompleteVal" I usually prefer for the fast access to volatile interrupt data, then I do not need to fight with disabling / enabling interrupts... .

This works in STM32F4 perfectly.

To my surprise this does NOT work in STM32H7 ... even if the DCACHE is disabled. I was very reliefed when I recognized that I can fix this by just adding __DSB() in my access function... . (but I needed several hours Googling around DCACHE... to get this idea...).

I looked for several hours through the STM32H7 Programming Manual cache description texts (+ ARM Cortex M7 TRM / Generic user guide, this morning I also found the arm AN321 ... but this all is terribly cumbersome and strangely described... ).

Generally I am happy now with my __DSB(). But I would like to know, whether there are other possiblities to solve this. Anybody knows this?

Alternatively to __DSB(), could I possibly use one of these strange (poorly described) instructions ICIMVAU, DCIMVAC, DCISW, DCCMVAU, DCCMVAC, DCCSW, DCCIMVAC, DCCISW for this purpose? In the ARM Cortec M7 Generic User Guide there is also discussed the possibility of "Forch Write Through" for data ... this sounds nice, but is this possible? (Could I e. g. define the volatile variable vuiSysTickValBase somehow such, that "write-through" / cache disabled is enforced for especially this varialbe ... or would this be efficient / complete nonsense?).

Code example to extend the SysTick timer to 32 bits (this example can easily be modified also for larger bit count, or for dropping e. g. the last 8 bits if nsec resolution for SysTick is not required... :(

volatile unsigned int vuiSysTickValBase;
 
void SysTickInit(){}
// 1.6M = 200MHz / 125kHz ... this generates Systick-Handler-Interrupt
// every 8ms (125kHz) (CPU AHB at maximum frequency of 200MHz)
#define SYSTICK_COUNTS  1600000  
  SysTick->LOAD  = SYSTICK_COUNTS - 1;      
  NVIC_SetPriority (SysTick_IRQn, IP_SysTick);
  // STM32H7: Y code chips do NOT allow AHB/8, see errata, better use AHB
  SysTick->CTRL  = SysTick_CTRL_CLKSOURCE_Msk |
                   SysTick_CTRL_TICKINT_Msk   |
                   SysTick_CTRL_ENABLE_Msk;
}
 
void SysTick_Handler(void){
  // use -! (SysTick running down...)
  vuiSysTickValBase -= SYSTICK_COUNTS;
}
 
// To get the "Total 32 bit" value in "interrupt-safe" mode, in STM32F4 the 
// following code was fine:
unsigned int GetSysTickCompleteVal(){ 
  unsigned int uiBase, uiBase1, ui1; 
  do{ 
    uiBase= vuiSysTickValBase; 
    ui1= SysTick->VAL; 
    uiBase1= vuiSysTickValBase; 
  } while (uiBase != uiBase1);
  return uiBase + ui1;
}
 
// In STM32H7 this does NOT work ... if within the do loop the intterupt comes,
// this will very frequently give wrong values (old uiBase value combined with 
// current SysTick Val ... ).
// In STM32H7 this even does not work, if DCACHE is not enabled... .
// This __DSB() construct is ultimately required for STM32H7!!!
unsigned int GetSysTickCompleteVal_H7(){ 
  unsigned int uiBase, uiBase1, ui1; 
  do{ 
    uiBase= vuiSysTickValBase; 
    ui1= SysTick->VAL;
    __DSB(); 
    uiBase1= vuiSysTickValBase; 
  } while (uiBase != uiBase1);
  return uiBase + ui1;
}

6 REPLIES 6
alister
Lead

Following because I'm interested that you'd need a data sync barrier.

Does the decreasing tick count ever confuse? I'd make it increasing by adding SYSTICK_COUNTS in the interrupt and subtracting VAL in the Get function.

flyer31
Senior

You are perfectly right, it depends a bit on the application whether this disturbs at all or not. (It depends, whether you use delay end checking by comparing the delay time "upwards" or "downwards", resp. in same direction of counting, or in reverse direction of counting).

But you can get such problems in both ways, whether this timer is counting up or down is not the crucial thing.

Now it is counting down. This counter is split in my "artificial Base value, which mainly controls the "high 8 bit" and the Systick-VAL, which only has influence on the "low 24 bit".

If the error happens, then it will be around some Systick-VAL Underflow, and the High 8bit will be the old value still, but the Systick-VAL will be the underflowed value already. So e. g. if the "high 8 bit" is the (arbitrary) value 0x87654321 before rollover, then the low 24 bit will count (you won't see any tick in software, so I show here only about evey 5th-10th tick, but this does not really matter...) e. g. 0x000000A, 0x0000004, 0x001869FE, 0x001869F9, ... (The systick reload value of 1600000 is 0x186A00 - so the the base value will change to 0x87654321-0x00186A00 = 0x874CD921 ).

The the "CompleteVal" Values would be correctly (e. g. in STM32F4 WITHOUT DSB, but in STM32H7 only WITH DSB):

  • ...
  • 0x8765432B (=0x87654321 + 0x0000A)
  • 0x87654325 (=0x87654321 + 0x00004)
  • 0x8765431F (=0x874CD921 + 0x1869FE)
  • 0x8765431A (=0x874CD921 + 0x1869F9)
  • ...

The failure sequence would be ( in STM32H7 WITHOUT DSB):

  • ...
  • 0x8765432B (=0x87654321 + 0x0000A)
  • 0x87654325 (=0x87654321 + 0x00004)
  • 0x877DAD1F (=0x87654321 + 0x1869FE)
  • 0x8765431A (=0x874CD921 + 0x1869F9)
  • ...

The problem here is this spike value 0x877DAD1F due to combination of "old base value" (before rollover) and alread "new Systtick Val" (after rollover)

Most times I use delay checkings like this, that I calculate the end timepoint at the start of the delay, and then always check the current tick ... In this case this "wrong 2nd sequence" will create a "catastrophic delay failure" usually not, as a TOO HIGH value in this checking direction does not matter.

BUT: Oonly in very the very seldom case, that additionally also the "base value" rolls over. ... . But in controller programming, this is even a larger danger, as this might happen only VERY seldom, and later finding such "VERY seldom delay problems" can be very difficult in error checking (a very famous such error e. g. occurred "Dear Microsoft" with Win 98: There the operating system crashed / hanged up after 48 days of continuous operation, because the had an internal delay counter with 32 bit which expired after 48 days and they did not check what happnes on rollover ... then the delay would start hanging ...).

But sometimes luckily I also use delay checkings like this, that I report the delay start time in some fix variable, and then check the delay end by evaluating the "CompleteVal" on every delay end. In this case, the delay finishes immediately in this error condition ... so this error happens very often luckily and I recognized it immediately... .

(it really is VERY important for reliable controller programming, to check such delay working functionality EXTREMELY careful ... and best to use delay programming always in a perfectly tested class environment ... . Otherwise if you have some "accidential delay problem" in final testing of a larger software, it is nearly impossible to identify the problems... .).

flyer31
Senior

PS: Sidremark: The ARM people are also quite lazy, not describing this problem in detail, I think it concerns quite many ARM applications.

If you Google around, you will possibly find the Technical support info enclosed as PDF ("ARM: Cortex-M3/M4 Interrupts Happening Twice?") (I better appended the PDF, as these large companies often have the bad habit to just close down links, if the contents is in any way strange / "not on line").

And this definitely IS a strange hint: There they do NOT recommend to use DSB, but instead they recommend "filthy tricks" to come around a similar problem... . Anyway interesting ... . But also interesting how they try to shift the resposibility for such things happening to the chip manufacturer. If they would think one step further, they really could themselves already come to the "ingenious idea" to check what their own SYSTICK timer does in such case ... .

flyer31
Senior

PS: Such "filthy tricks" as inserting more commands in the systick interrupt handler I tried already, but this did NOT help (in fact I have anyway about 10 more commands in this handler ... the software given in my post is only a very abbreviated version to show the crucial things...).

I just would be very interested to know, whether DSB is the "standard recommended way" in this case, or whether there are further possibilities through these other cache control commands... . And I would like to understand, why this problem persists if I switch off the DCACHE (resp. if I do NOT switch on the DCACHE in my software).

Nice catch, thanks for sharing.

What I believe may happen here, that the vuiSysTickValBase variable is placed in memory tagged as Normal (in MPU), thus the processor can speculatively reorder its access across the access to SysTick->VAL. In the Cortex-M3/M4 (and M0/M0+) the Normal/Device/Strongly-ordered tags are ignored and memory accesses are never reordered; however, Cortex-M7 is different in this regard, as this is one of the ways to achieve higher perceived performance.

So, IMO, one way to deal with this - besides the barrier - might be placing the vuiSysTickValBase into a Device-tagged memory - either by creating such an area in one of the SRAMs using the MPU, or simply by using a register of some unused peripheral.

Corollary: performance comes at a price.

JW

This is interesting and it could be correct ... thank you for an explanation, though it is hard for me to understand this completely ... at least I just tried CRC->IDR register (not using CRC for anything else of course, and enable CRC in AHB4) for my vuiSysTickValBase, and then using this CRC->IDR it works WITHOUT DSB in STM32H750.

But using such a "unused device register" for such a purpose really is a bit too much for my nerves :). I would be anxious that at some later time I might accidentially use the register and then suddenly run into terrible problems... (having forgotten this "ingenious trick usage" already).

... googling for "device-tagged memory STM32H7" I now found a quite specific STM32H7 FAQ article https://community.st.com/s/article/FAQ-DMA-is-not-working-on-STM32H7-devices ... but specific does not mean "readable" ... this really is still quite hard stuff there... .

Is there an easy way to define some variables / some memory space device tagged? (scatter file in Keil would be ok - I needed this already to classify memory to be out of the "automatic zero" memory range ... . Simple attribute of course would be even nicer ... . If it gets more complicated, maybe you can give a small example?

(In fact I am surprised that it should be even possible to define memory as device-tagged - I would expect something like this could be organized only by the chip manufacturer in chip design phase...).

So you see any other possibility to avoid this? Is device-tagged memory the only way to go, or could I e. g. also try with this "write-through" memory approach? (but I think, my problem really is NOT linked at all to the DCACHE, as my problem persists if I run the software without DCACHE?).

If I keep with my original DSB solution: Can you give me a hint, how much performance I will loose by this? What would be the typical delay time such a DSB command would introduce?

PS: One "slight sin" needs to be confessed: As I still stick on Keil 32bit on my WinXP PC, I use STM32H743 (100Pin) device setting, to program an STM32H750 (100Pin) device on my test board ... . Don't tell me that this could be the reason ... (I only just now recognized when I looked at the IROM/IRAM settings in the Keil Target Project page... still wondering whether there is some simple way to create "device-tagged memory" - this really sounds somehow a bit magic =) ). If I have some time I could also try on Nucleo H743, but I really do not expect changes, as so far all running nicely, and my software still very small (only 6kByte ROM, 2kByte RAM usage...).

PPS: Looking a bit at the first text in ARM AN321 concerning this memory barrier / memory ordering stuff, I now considered to really keep this BaseVariable in the CRC->IDR register, as this really seems to be "free to use", even if CRC calculation is used - maybe I will redefine the CRC typedef such, that this IDR gets a special name, so then I should be warned by compiler if anybody (or me) tries to do something else with this IDR. ... as in this ARM AN321 they strictly do NOT recommend to define SRAM as "non-normal", and otherwhere in Internet some people say that including DSB into code at frequent places can each time delay the execution possibly by 100's of processor cycles ... this both things I really want to avoid (and this GetSystickCompleteVal function in my code of course is invoked quite heavily...).

... but if you (or somebody else) have some further / different remarks, I would be very happy to know... .