Precise Time Measurement in STM32MP1 with DWT CYCCNT

aadiljaleel · ‎2021-09-01

I am using OSD32MP1 (based on STM32MP157c) in Production Mode with OpenSTLinux on Core A7 and FreeRTOS on M4. One of the tasks is to timestamp ADC data acquired by M4 at very highspeed, very precisely (think it order of nanosecond to microsecond). Note that only time difference between measurements is important.

On-chip RTC is available (it is assigned to A7 but registers are accessible to M4). However the subsecond precision is ~0.003s (PREDIV_S is 255 - See Reference Manual for Detail) so it is not good enough.

This, this and this stackoverflow posts led to using DWT_CYCCNT i.e., CPU Cycle Counter to measure the time time difference. Relevant portions of the code is as following:

On M4 Side:

typedef struct tTimeStamp
{
    uint32_t nCPUFreq;
    uint32_t nCPUCycles;
    ...
}tTimeStamp;
    
...
 
tTimeStamp oTimeStamp;
 
...
 
oTimeStamp.nCPUCycles = DWT->CYCCNT;
oTimeStamp.nCPUFreq = HAL_RCC_GetSystemCoreClockFreq();

The last 2 statements runs inside the FreeRTOS task right before ADC values are read. The timestamps along with other data are handed over to A7.

On A7 Side (assuming to have tTimeStamp at time T0 and then tTimeStamp at time T1):

// Second to NanoSecond Conversion
#define SECTONS 1000000000 
 
... 
 
float ComputeTimeDiffNS(tTimeStamp oTS0, tTimeStamp oTS1)
{
    // to avoid reporting time diff at t0
    // and in case CPU frequency changes
    if (oTS0.nCPUFreq != oTS1.nCPUFreq)
        return -1;
    	
    // in case of counter overflow
    if (oTS0.nCPUCycles > oTS1.nCPUCycles)
    {
        float fCyclesDiff = float(UINT32_MAX- oTS0.nCPUCycles + oTS1.nCPUCycles);
    	return fCyclesDiff * SECTONS / float(oTS0.nCPUFreq) / 2;
    }
 
    // base case 
    else
    {
    	float fCyclesDiff = float(oTS1.nCPUCycles - oTS0.nCPUCycles);
    	return fCyclesDiff * SECTONS / float(oTS0.nCPUFreq);
    }
}

Is this the correct method to measure very precise time difference using DWT->CYCCNT and HAL_RCC_GetSystemCoreClockFreq()? Is there a better, more precise method?
The above method gives me twice the time than it should be. While reading DWT->CYCCNT, I also toggle a pin and measure the interval between toggles using logic analyzer. Say that this time tActual is 2ms. However the above formula i.e., CPU_Cycles / CPU_Frequency returns tMeasured = 4ms.

This seems to suggest that formula should be CPU_Cycles / (2*CPU_Frequency). So either frequency needs to doubled or cycles needs to halved.

In readouts, nCPUFreq is 208878528 (max allowed per Reference Manual is 209000000), therefore this must be correct and cannot be multiplied by 2.

CPU_Cycles may be divided by 2 but would it not suggest that CPU is going through 2 cycles per one clock cycle? Is that possible (CPU cycling on both rising and falling edge??)

Originally asked at Stack Overflow

PatrickF · ‎2021-09-01

Hi,

I cannot answer for DWT (it should behave as ARM specify it).

Maybe the Cortex-M4 is stalled in WFI by FreeRTOS (so, clock is stalled for some time). But in that case, you should have found a lower value.

The value you found for M4 frequency seems ok (although is might be same on our boards, i.e. 208.877930 MHz, we have found a bug in HAL frequency computation when using Frac-N, which could lead to a very small error, but not double). As another option, Linux should provide you the right Cortex-M4 frequency (e.g "cat /sys/kernel/debug/clk/clk_summary | grep ck_mcu")

For RTC, note that in our SW, we set PREDIV_S to 32767 to give higher sub-second resolution (but still in the 30us range).

Maybe using STGENR is another option independant of Cortex-M4 frequency.

STGEN is running by default on HSI 64MHz which give you a resolution of about 15ns, but HSI is not an high precision oscillator (+/-1%).

alternatively, using STGEN on HSE 24MHz which is more precise (few ten of ppm) but give a resolution of about 40ns.

See also this post: https://community.st.com/s/question/0D53W00000oXAqhSAG/how-can-i-get-access-to-m4-timers-from-a7-linux-is-it-possible-

As STGEN is read using AXI bus thru async buses from Cortex-m4, it must suffer some ns of additional latency.

Regards.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

View solution in original post

KnarfB · ‎2021-09-01

If your goal is periodic ADC conversion, the most precise method is using a hardware timer event for triggering ADC conversion w/o any SW intervention. Moreover, automatically triggered DMA transfers of ADC results can lower SW load of the conversion process. This method is very precise and there is no time-stamping needed.

If you are still going for time-stamping, CYCCNT or a hardware timer counter register may be used. Two remarks on your code:

The relevant SW statments should be encapsulated by a critical section or disabled interrupts to prevent task scheduling resp. interrupts for getting precise time stamps. But, this may affect overall performance.
if you test for oTS0.nCPUFreq != oTS1.nCPUFreq you are assuming freq. changes. But, if so, there may an intermediate freq. change between the two points of measurment slip through, which then remains undetected.

PatrickF · ‎2021-09-01

Hi,

I cannot answer for DWT (it should behave as ARM specify it).

Maybe the Cortex-M4 is stalled in WFI by FreeRTOS (so, clock is stalled for some time). But in that case, you should have found a lower value.

The value you found for M4 frequency seems ok (although is might be same on our boards, i.e. 208.877930 MHz, we have found a bug in HAL frequency computation when using Frac-N, which could lead to a very small error, but not double). As another option, Linux should provide you the right Cortex-M4 frequency (e.g "cat /sys/kernel/debug/clk/clk_summary | grep ck_mcu")

For RTC, note that in our SW, we set PREDIV_S to 32767 to give higher sub-second resolution (but still in the 30us range).

Maybe using STGENR is another option independant of Cortex-M4 frequency.

STGEN is running by default on HSI 64MHz which give you a resolution of about 15ns, but HSI is not an high precision oscillator (+/-1%).

alternatively, using STGEN on HSE 24MHz which is more precise (few ten of ppm) but give a resolution of about 40ns.

See also this post: https://community.st.com/s/question/0D53W00000oXAqhSAG/how-can-i-get-access-to-m4-timers-from-a7-linux-is-it-possible-

As STGEN is read using AXI bus thru async buses from Cortex-m4, it must suffer some ns of additional latency.

Regards.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

aadiljaleel · ‎2021-09-01

Hi, thanks for your response. The goal is time-stamping. The main problem am currently facing is that Time != CYCCNT/Freq but rather CYCCNT/(2*Freq).

Regarding your two points: thanks! I will see to them once everything else works.

aadiljaleel · ‎2021-09-02

Hi @PatrickF ,

Thanks for your reply! Your reply was helpful. It reinforced me belief that DWT should behave as ARM specifies it.

Turned out the problem was very consistent packet drop between M4 and A7, exactly by factor of 2. I wasted too much time looking in other direction but at the end of the day, I learnt the importance of packet counter.

I will try to take some high precision readings with DWT->CYCCNT. Hopefully, it will do that job. Otherwise I will shift to your suggestion of STGENR (thanks for sharing the knowledge!).

PatrickF · ‎2021-09-02

Thanks for the feedback.

I was confident that DWT was really counting what it should.

Regards.

In order to give better visibility on the answered topics, please click on 'Select as Best' on the reply which solved your issue or answered your question. See also 'Best Answers'

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.