Super real-time application optimization

drobison · ‎2012-09-28

Posted on September 28, 2012 at 19:28

Hello, I've been wondering about upgrading to the STM32 from an ST10. It's imperative that certain thing are as close to instantaneous as possible. I was thinking I've identified two bottlenecks and two solutions. Tell me what you think. Any further optimization idea are welcome

Bottleneck 1:

Interrupt Service Routine running from flash will suffer from instruction loading latency.

Solution 1:

Load Interrupt Service Routine into SRAM at program initialization. Put jump vector reference to SRAM, so instruction in ISR actually executes at 168Mhz with no memory latency or wait states.

BottleNeck 2:

Multiplexed ADC means I have to wait until every analog signal is done being read, and then having the program react. This means the program can only change its output after (conversion time)*(number of analog signals) seconds.

Solution 2:

Turn the analog signals into PWM signals before it gets into CPU, then get all PWM inputs into CPU in parallel through the capture inputs. If the PWM runs at about 20kHz, then I feel comfortable updating all the output every maybe 50us.

Maybe I could run the PWM frequency up to 200kHz, and I could get all the analog input into the CPU every 5us??

#adc-latency-flash-optimization

drobison · ‎2012-09-28

Posted on September 28, 2012 at 20:10

I was looking at the STM32 reference manual. I found something I find ironic. At a frequency of 120Mhz the flash requires 6 wait states, so it is going no faster than the ST10 running at 20MHz getting instructions from my ROM chips.

20MHz = 120 MHz / 6(waits states)

aqueisser · ‎2012-09-28

Posted on September 28, 2012 at 21:30

The STM32 has a flash prefetch buffer that accelerates running from flash. I think it's a 64 bit wide fetch and I forget how deep it is but the basic idea is to keep the core from waiting for flash.

Here's a great doc that can explain it a lot better than me:

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCIQFjAA&url=http://www.hitex.com/fileadmin/pdf/insiders-guides/stm32/isg-stm32-v18d-scr.pdf&ei=dPplUMv-EIKs9ASuuYCwCw&usg=AFQjCNGDhFlStGDWoy_2LNnRj6I5rXT83w&sig2=llhQOs54m3PxTIEUYXpgKA

Andrew

drobison · ‎2012-09-28

Posted on September 28, 2012 at 22:35

I am incredulous. I read the section on the prefetch. It does not come to an explicit conclusion regarding how fast the instructions will be executed with help from the prefetch. Can it really be 168MHz?

Tesla DeLorean · ‎2012-09-28

Posted on September 28, 2012 at 22:40

Flash speed in these parts hasn't advanced much in the last 10+ years, it's nominal 35-40 ns in my experience. Fujitsu claims to have a faster native solution.

ARM9 cores tend to mask this slowness with caches, or TCM SRAM.

For the ST Mx cores there is some buffering, the STM32F2 and STM32F4 have something called ART which is a hybrid cache/buffer/prefetch mechanism that masks the slowness of the flash array, and plays on it's bigger width. At least one of the ART implementations has a critical path issue causing the prefetch to provide the wrong instruction.

https://my.st.com/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Flat.aspx?RootFolder=https://my.st.com/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Problem with STM32F407 RAM&FolderCTID=0x01200200770978C69A1141439FE559EB459D7580009C4E14902C3CDE46A77F0FFD06506F5B&currentviews=472

Not sure it's 6 wait states for 120 MHz

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2012-09-28

Posted on September 28, 2012 at 22:45

Can it really be 168MHz?

The pipeline clocks at this speed, it will have be slowed by AHB and APB accesses, and filled write buffers. The speed of execution will not be wholly predictable, but assume your load, store, branch, divide will not be single cycle events.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..