HOW ABOUT AN STM32F7/H7 WITH SOME DSP ACCELERATORS

Garnett.Robert · ‎2017-03-13

Posted on March 14, 2017 at 02:00

Hi,

I love the STM32 series processors. I have used F4's and F7's. The ADC's are great particularly when used in dual or tripple mode. The timers are great and the general purpose comms are great, but what would be really good would be some DSP accelerator peripherals that off load the inevitable cpu time consuming loops that DSP requires. I recently purchased and AD Sharc Board which has all of this stuff, but using it requires a Phd in computer science to understand it and it is seems to be targeted at audio using sigma delta adc's which aren't ideal for process control applications where the bandwidth includes a constant dc component. What would be really nice to bolt on to an F4 or F7 or the new H7 would be

- FFT Accelerator

- FIR Accelerator

- IIR Accelerator

These could be configured to use a dedicated RAM pool separate from the ARM processors ram.

I have used the F4's and F7's for FFT's, FIRs and IIR's but it is very easy to run out of memory and using external SDRAM slows everything down considerably.

In some cases you can use 16 bit integers which helps with calculation speed and memory, but fixed point is a pain to scale correctly and it does not have the dynamic range of single float. Also often the bounds checking essential for some fixed point algorithms uses up a lot of the speed gains of fixed point anyway. I have also used the CMSIS DSP library, but found that it has size limitations and lacks the flexibility of hand coded stuff where you can fold different DSP operations into each other to save memory and time and of course CMSIS DSP still ties up the CPU considerably.

I use uVision as my development platform and use the HAL drivers. I sometimes struggle with these as the relationship between the terminology used in the software does not always link up well with the processor reference manual. I also use the STM32CubeMX bizzo as I find this makes grappling with the IO to peripheral mapping much less of a pain. I also use freeRTOS and LWIP together without much trouble. I reckon DSP type periperals would fit in the HAL driver system very nicely.

I'm sure the clever people at ST could come up with some nice DSP stuff. It would enhance the current ST MCU range which is already excellent, by an order of magnitude.

Regards

Old Guy From Toon

AvaTar · ‎2017-03-15

Posted on March 15, 2017 at 11:11

DSP instructions and DSP peripherals do not really make up a DSP.

What distinguishes a DSP from a general-purpose MCU like a Cortex M is:

VLIW
separate parallel bus systems
usually native fixed-point support (in hardware)

The first two points mean, a DSP loads and executes multiple instructions at once, and can access the required resources (including RAM sections) at once, i.e. concurrently. The Cortex M bus system doesn't support anything like that.

DSPs are designed to do this (including FFTs and IIR/FIR filter operations) in real-time, with just the required precision. Thus the fixed-point format, which originated in the DSP world.

TI, BTW, has a nice DSP portfolio as well.

While a Cortex M can never keep up with such a specialized device, the toolchains and environments are usually quite 'special' as well.

waclawek.jan · ‎2017-03-15

Posted on March 15, 2017 at 11:31

I have used the F4's and F7's for FFT's, FIRs and IIR's but it is very easy to run out of memory and using external SDRAM slows everything down considerably.

What sort of memory requirements are you talking about? The F4s/F7s have a rather generous deal of RAM for real-time tasks - few hundreds of MHz minus various sorts of overhead per few hundreds of kB means you have only a few hundreds of instructions to be used on a byte/word of data (which has to include acquisition, processing and final store/disposal/decimation/whatever action). Less than that would limit you only trivial algorithms.

Or are you talking about off-line data processing? For that, IMO, Cortex-A or other architectures are more appropriate.

Also, I believe for most algorithms it's more appropriate (thus faster) to use asynchronous memories rather than synchronous, but I am no DSP expert.

JW

T J · ‎2017-03-15

Posted on March 16, 2017 at 02:03

You could consider enhancing your own PCB with an FPGA connected to the 'F7 FMC 32bit bus.

then you could build your DSP functions in hardware and still use the F7 DMAs to get it done fast.

Garnett.Robert · ‎2017-03-16

Posted on March 16, 2017 at 08:46

Hi Gent's

Thanks for your comments.

I will take all these turn but before I start, I am asking the question of ST in this forum as they don't seem to have a place where humble customers such as myself can suggest ideas for future products. It seems to me that TI and AD both have excellent offerings for dsp, but it doesn't seem to be on ST's radar. It amazes me that given the ubiquity of FFT's FIRs and IIR's that any chip maker worth their salt would add a this functionality to their offerings. No we don't need another DSP processor we just want some silicon that does these fairly simple tasks in an efficient way. The ARM is a fantastic processor, and ST's peripherals are rather super, but the ARM just needs a little help when it comes to realt-time basic, DSP. I am sorry if I gave the impression that I have a current problem with implementing DSP and I appreciate all of your comments.

1. Avartar. Have a look at the

http://www.analog.com/en/products/processors-dsp/sharc/ADSP-SC5html

. It has an ARM with DSP Periperals as will as two DSP cores. The accelerators are separate entities from the DSP cores. These devices have FFT, FIR and DSP accelerators. The ARM can off-load these mundane DSP tasks to the accelerators. The DSP cores don't get a look-in for a lot of jobs the accelerators can do. Trouble is sharcs cost a fortune and like I said you need a Phd which I haven't got and at 62 don't really have time to get. I would like to input a frame of data from the ADC point the FFT accelerator at it, enable interrupts and when it's ready I get the answer whilst the ARM core kicks back and has a latte.

I don't get your comments about fixed point as audio and wide dynamic range signals don't work well with fixed point, (more for graphics) and with the innovation with floating point processors; the new STM H series does double, why would I want to be masochistic and use fixed point? Floating point is getting very fast and very cheap. Yes I'll use fixed point if I have to, but if I don't why bother? Sharcs do fixed and floating in their DSP cores and I think theay have a floating point processor in the ARM subsystem.

I know that DSP cores require different tool-chains, and a different mind set that's why I don't want to use them.

2.

Waclawek.Jan

A 4096 bin fft requires as a starting point 4096 * 4 * 2 complex bytes for single precision. That's 32K for a single channel. For two channels that's 64 K bytes. Add the DMA ADC buffer 4096 * 4. If the algorithm cannot be in-place add another 64 K Bytes. Memory gets chewed up pretty quick.

I am doing things real-time. I have developed a time of flight anemometer and use pulse compression, to achieve good signal to noise and time resolution. The STM32F7 running at 216 MHz with tow matched filters of 1024 bins can run at 10 Hz with around 80% CPU utilization. That's with two channels; one north and one south. If the SM32F7 had a FIR accelerator I could do it a lot faster. I could also use an overlap add/save FFT implementation of the matched filter which might be better than a straight FIR.

3.

Marsh.Nick

I could do this, but I would have to learn about FPGa's and I'm getting a bit past that. I reckon that some DSP accelerators bolted on the the good o'l ST range would bridge the gap between the general purpose ARM, and full DSP/FPGA.

My real problem is I live in a remote country town in Australia, a country which is better known for it's funny accent and cargo-cult mentality with respect to technology than developing sophisticated DSP applications, so I have to learn everything on my own.

We did have a university down our way, but the government got rid of the engineering part because they felt it was wasted on us country folk. Also my old electronics lecturer electrocuted himself in his garage, so I don't even have him to talk to now.

Regards

Rob

Tesla DeLorean · ‎2017-03-16

Posted on March 16, 2017 at 09:28

I think the truck is driven by very large customers. ie Apple or Whirlpool say jump, ST asks how high.

Here is as good a place as any, your local sales rep is likely the best.

Not sure if what you want is of general utility, lot of people struggle with a simple TIM peripheral application. Not saying it doesn't have utility, but the silicon real estate cost is borne by everyone. I suspect what you want has a large gate and power consumption cost.

I'm a bit dubious of the JPEG core and the patent and licensing box that opens, but clearly the IP Camera folks have a large market/mindshare.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Nesrine M_O · ‎2017-03-16

Posted on March 16, 2017 at 09:54

Hi

Garnett.Robert

,

We are pleased to hear about your interest on our STM32 products.

It is important for us to know interests and suggests of our customers. So please feel free to share any time your experience. We are

happy to hear them.

About your request for DSP ACCELERATORS, I raise this to our team for checking, and I will come back to you when I have update.

-Nesrine-

waclawek.jan · ‎2017-03-16

Posted on March 16, 2017 at 10:10

Rob,

I have developed a time of flight anemometer

`Yes, I *am* impressed. But... you've managed to squeeze in, didn't you. And you ask for what, extra 20% of functionality? Face it, what you are doing is around the top notch, going beyond requires investment... in learning FPGAs or those obscure DSPs, for example.

Talking about numbers... A maskset for silicon manufacturing costs in the order of few M$ for the oh-not-that-cutting-edge-anymore technology which is appropriate for these mcus. That's 1$ plus in the price of chip if they sell a few millions, so it's just logical that that's what they are going to ask as a ransom if adding a new chip just to add some extra functionality. That's if everything works on the fist try which is not guaranteed either given complexity. Note, that while they are hundreds of STM32 models now available, the number of masksets in use is almost an order of magnitude less - they are so expensive that it pays off to sell chips with disabled function areas for fraction of price of their full-featured counterparts, until a customer comes around asking for those millions of pieces where making a new optimized maskset will pay off. Adding silicon-intensive portions also pushes it to better technology which again drives mask costs up exponentially.

This is why I have to +1 to everything both Clive, Nick and Avatar said above.

One more remark, I am not a DSP guy but I understand most DSP algorithms can't be divided efficiently; but in your case 'south and north' sound like being at least partially separable tasks, so you might perhaps want to investigate splitting the workload into separate 'F7s.

Also my old electronics lecturer electrocuted himself in his garage, so I don't even have him to talk to now.

I am really sorry to hear that, but you have us here, at least for the funny (albeit different) accent and some talk.

Jan

AvaTar · ‎2017-03-16

Posted on March 16, 2017 at 11:04

I am asking the question of ST in this forum as they don't seem to have a place where humble customers such as myself can suggest ideas for future products.

You can surely make it here, but what ST makes of it is another question.

1. Avartar. Have a look at the

/external-link.jspa?url=http%3A%2F%2Fwww.analog.com%2Fen%2Fproducts%2Fprocessors-dsp%2Fsharc%2FADSP-SC5html

. It has an ARM with DSP Periperals as will as two DSP cores. ...

I know similar chip combinations from TI, up to their 'Sitara' Cortex A. Dual-processor solutions have their pitfalls as well. Debugging such a thing is usually a PITA, if the IDE doesn't take care for both cores at once. And even then ...

I don't get your comments about fixed point as audio and wide dynamic range signals don't work well with fixed point, ...

AFAIK, the fixed-point format was 'invented' by DSP providers, and is usually implemented in hardware - i.e. an FPU (as 'Fixed-Point-Unit'). The dynamic range covers the standard DSP use case of real-time audio processing, and trades dynamic range for precision. Any higher precision costs more silicon and more money. The implement what they need to meet their numerical stability and audio THD requirements.

I know that DSP cores require different tool-chains, and a different mind set that's why I don't want to use them.

That, and often with a hefty price and sketchy support.

waclawek.jan · ‎2017-03-16

Posted on March 17, 2017 at 00:39

and sketchy support.

I know that you know that but let's just stress it: that boils down to money, again.

JW