2016-01-09 11:36 AM
Hi.
I tried to put code in a CCM RAM (STM32F303) as described in AN4296 and I was very impressed. It was more than 30% faster, than if the code is in flash.Mentioned PDF (AN4296) says that it is not recommended to place both code and data together in the CCM, because there would be a risk of ''collision''. Is that problem the same if I place in the CCM (beside code) a look-up table, which is a constant? Now I have it stored in a flash.Thanks2016-01-10 09:48 PM
I followed the steps described in
. I use Keil and procedure there is very simple. Only few changes in a scatter file and with __atrubute__((section(''ccmram''))) before function definition you tell where it is going to be.2016-01-11 08:26 AM
It would have been nice if CCMRAM had some dubble buffered adressess for DMA for really efficent DSP so process data in and out could have been effi cent moved from CCMRAM to whatever needed.
.--------------------------------------------------------------------------------------------------Subject: Code in CCM RAMI followed the steps described in
. I use Keil and procedure there is very simple. Only few changes in a scatter file and with __atrubute__((section(''ccmram''))) before function definition you tell where it is going to be.2016-01-11 08:38 AM
The point is CCM is unfettered by DMA contention, why would you need that on executing code? The regular SRAM is the same single cycle stuff as CCM, I can't see a significant downside of using that.
The CCM is already small enough, you don't want to clutter it up with large blocks of data streaming in/out.2016-01-11 10:05 AM
I set a breakpoint on the first function in while (1) loop and set a ''pass count'' of a breakpoint to 500. Then, I stopped the program on breakpoint and reset a ''Cycle Count'', which is part of a ''Cortex Status'' tool in a debugger. Thus, I see how much time 500 cycles take. I repeated that for 4 different cases. Below are the results for one program cycle:
- code from CCM without optimization: 10,5 us
- code from CCM with optimization: 7,2 us
- code from flash without optimization: 17,3 us
- code from flash with optimization 11,0 us
I confirmed those times (cycle counts) with ETM trace. In profiler you can see exactly how much time something take. And for both cases when code was running from flash, the times was the same. For CCM I couldn't verify times with ETM, because ETM works only if code is in flash.
2016-01-12 01:31 PM
Talking about processing data as fast as possible, DSP, not necessarily
executing program code. Supriced you see DMA as an obstacle even if not usedwith CCM so why limit it in first place to just core program.Some devices havealready very limited amount of SRAM.