cancel
Showing results for 
Search instead for 
Did you mean: 

STM32G4 FMAC Filter Performance Benchmarking

evgenmlnk
Associate II

Dear community,

I'm currently working on a student project where we need to analyze the performance of the FMAC unit of the STM32G4. My idea is to implement an FIR filter and measure how many clock cycles the FMAC operation takes to filter a predefined set of test data. I also plan to measure the execution time. After that, I intend to compare the results by implementing the same FIR filter in software using the CMSIS library.

I found AN4841 , which demonstrates performance measurement results of the CMSIS FIR filter on two different STM32 platforms.

Since I've never done this kind of performance benchmarking before, my question is how are the number of cycles and duration measured? Should we use the SysTick timer or the DWT unit for time measurement?

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
BarryWhit
Senior III

I second TDK's benchmarking advice - measuring duration over many iterations and calculating the average time per iteration is more precise than trying to measure a single iteration. That said, it depends on how long a single iteration takes. What you're really trying to do is to ensure that the measurement overhead is negligible compared to the processing time, so that your measurement isn't skewed.

 

An alternative way  to measure time deltas with reasonable accuracy is to use the SWV trace capability. Essentially, you use the cycle timestamp that's logged when calling ITM_SendChar to capture timing information. It's not as accurate (if you look at ITM_SendChar implementation, it does more more there than a simple register access), but as long as your measuring something sufficiently long, the relative error can be made negligible.

 

See Youtube-STM32CubeIDE Advanced Debug Features: Part 3 for a walk-through of how to do this.

This entire playlist is worth your time if you're relatively new to STM32/Cube/Embedded.

 

Good luck with your project. Remember that student projects are as much about learning from failure as they are about doing something actually useful. You can get a lot of mileage out of doing a solid analysis of why things you tried didn't pan out - so consider taking some chances along the way (The wisdom of this advice strongly depends on your institutional context, of course, but I hope what I'm saying is true wherever you are).

 

- If someone's post helped resolve your issue, please thank them by clicking "Accept as Solution".
- Please post an update with details once you've solved your issue. Your experience may help others.

View solution in original post

2 REPLIES 2
TDK
Guru

DWT->CYCCNT is going to give you the highest precision. One tick per cycle. Use that if you can. You will get a few cycles of overhead but it should be quite accurate.

Timers can also be used with the same precision if you use the right timer and the right setup.

The best way to reduce overhead in timing calculations is to do many different calculations instead of only one or two.

uint32_t start = DWT->CYCCNT;

// do stuff here

uint32_t cycle_count = DWT->CYCCNT - start;

 

If you feel a post has answered your question, please click "Accept as Solution".
BarryWhit
Senior III

I second TDK's benchmarking advice - measuring duration over many iterations and calculating the average time per iteration is more precise than trying to measure a single iteration. That said, it depends on how long a single iteration takes. What you're really trying to do is to ensure that the measurement overhead is negligible compared to the processing time, so that your measurement isn't skewed.

 

An alternative way  to measure time deltas with reasonable accuracy is to use the SWV trace capability. Essentially, you use the cycle timestamp that's logged when calling ITM_SendChar to capture timing information. It's not as accurate (if you look at ITM_SendChar implementation, it does more more there than a simple register access), but as long as your measuring something sufficiently long, the relative error can be made negligible.

 

See Youtube-STM32CubeIDE Advanced Debug Features: Part 3 for a walk-through of how to do this.

This entire playlist is worth your time if you're relatively new to STM32/Cube/Embedded.

 

Good luck with your project. Remember that student projects are as much about learning from failure as they are about doing something actually useful. You can get a lot of mileage out of doing a solid analysis of why things you tried didn't pan out - so consider taking some chances along the way (The wisdom of this advice strongly depends on your institutional context, of course, but I hope what I'm saying is true wherever you are).

 

- If someone's post helped resolve your issue, please thank them by clicking "Accept as Solution".
- Please post an update with details once you've solved your issue. Your experience may help others.