AnsweredAssumed Answered

STM32F3 CMSIS DSP - Consequences of number representation (q7, q15, q31, f32)?

Question asked by rau.darren on Mar 14, 2016
Latest reply on Mar 14, 2016 by Clive One
Working through the concepts of DSP on the STM32 and I have run into the question of which format I should be using in my DSP chain (q7, q15, q31 or f32)

I have tried searching books and articles and have not found an answer that explains the consequences/pros/cons of each.

I am still fairly new to using fixed-point / floating-point operations so there may be some obvious things I am overlooking.

I have samples coming in from the SDADC peripheral on the STM32F3 in differential mode which has 16 bit samples (int16_t)

Given the above information what should I be doing to get the most efficiency out of the DSP chain?  Can I feed these values directly into a CMSIS biquad filter (q15 functions) and use the results in my code currently?  Do I need to do any special normalization techniques in order to avoid truncation issues.

From what I understand I will get better SNR if I use float values to avoid the rounding quantization errors through the filter, but I can trade this off as required (float more accurate, fixed is faster)

I have seen two examples where one is using the q15 data type and another that uses the CMSIS float to/from fixed functions.  But since I am starting off with int16_t values from the ADC I am not sure which path I should consider.

Currently I am offsetting and scaling the ADC samples into a float variable (to represent my quantity in the appropriate units) as a first step in my chain.  So I can either do the fixed point up front before the conversion or I can do it after either by using the f32 functions directly or using CMSIS to convert to/from fixed point to use those functions.

Can anyone offer any further insight into this as to which way would be "best practice" and what tradeoffs could be made?  I think fixed point is a little more demanding since you will have to keep track of the magnitudes of the values to get the best accuracy.  Are there normalization functions or other techniques that can be used to make this easier?

If I extended the question to include another stage after the filtering (such as a FFT) then what would be the best data type to use for the filter/transform stages?

I can understand and draw out the DSP signal chain diagram that I want to perform but I am just not sure about what the best implementation would be (data types and in which order to do things) 

Outcomes