I have a question about cmsis dsp lib source.((arm_mult_q31.c)

yh · ‎2022-11-07

Hi

I found something strange in cmsis dsp source.

I am using stm32f429 and LITTLE ENDIAN and cmsis dsp library V.1.5.1.

in cmsis dsp library(arm_mult_q31.c) :

void arm_mult_q31( q31_t * pSrcA, q31_t * pSrcB, q31_t * pDst, uint32_t blockSize)

{

...

while (blkCnt > 0U)

{

/* C = A * B */

/* Multiply the inputs and then store the results in the destination buffer. */

inA1 = *pSrcA++;

inA2 = *pSrcA++;

inA3 = *pSrcA++;

inA4 = *pSrcA++;

inB1 = *pSrcB++;

inB2 = *pSrcB++;

inB3 = *pSrcB++;

inB4 = *pSrcB++;

out1 = ((q63_t)inA1 * inB1) >> 32;

out2 = ((q63_t)inA2 * inB2) >> 32;

out3 = ((q63_t)inA3 * inB3) >> 32;

out4 = ((q63_t)inA4 * inB4) >> 32;

out1 = __SSAT(out1, 31);

out2 = __SSAT(out2, 31);

out3 = __SSAT(out3, 31);

out4 = __SSAT(out4, 31);

*pDst++ = out1 << 1U;

*pDst++ = out2 << 1U;

*pDst++ = out3 << 1U;

*pDst++ = out4 << 1U;

/* Decrement the blockSize loop counter */

blkCnt--;

}

...

}

In the case of Little Endian, it should be modified as follows. Is my thinking wrong?

Little endian case :

//out1 = ((q63_t)inA1 * inB1) >> 32;

//out2 = ((q63_t)inA2 * inB2) >> 32;

//out3 = ((q63_t)inA3 * inB3) >> 32;

//out4 = ((q63_t)inA4 * inB4) >> 32;

out1 = ((q63_t)inA1 * inB1);

out2 = ((q63_t)inA2 * inB2);

out3 = ((q63_t)inA3 * inB3);

out4 = ((q63_t)inA4 * inB4);

out1 = __SSAT(out1, 31);

out2 = __SSAT(out2, 31);

out3 = __SSAT(out3, 31);

out4 = __SSAT(out4, 31);

//*pDst++ = out1 << 1U;

//*pDst++ = out2 << 1U;

//*pDst++ = out3 << 1U;

//*pDst++ = out4 << 1U;

Thanks in advance!

Piranha · ‎2022-11-10

You are still not thinking about fractions.

Let's take a simple example on a fractional numbers with a precision of a single decimal digit after the decimal separator: A=0,2; B=0,3. A multiplication results in C=0,06. If we again reduce the resolution to a single decimal after the separator, we get C=0,0. Note that, because it is a fraction, the right digits are the ones, which we need to remove. Also in this case we ignored a correct rounding and just dropped the additional digits. And, by the way, the original code also does not do the rounding.

Now, let's take your example, but reduce it to (in ARM's notation) Q7/Q15 data types.

A = 0x02 = 0x02 / 2^7 = 2 / 128 = 0,015625

B = 0x03 = 0x03 / 2^7 = 3 / 128 = 0,0234375

C = A * B = 0,015625 * 0,0234375 = 0,0003662109375

C = A * B = (2 / 2^7) * (3 / 2^7) = (2 * 3) / (2^7 * 2^7) = 6 / 2^14 = 0,0003662109375

To represent this fraction as a Q15 type, we need to multiply the numerator and denominator by 2 (left shift by 1 bit).

C = (6 / 2^14) * (2 / 2) = 12 / 2^15 = 0x000C / 2^15 = 0,0003662109375

Therefore it is represented in Q15 format as 0x000C. Converting it back to the Q7 format means taking the highest byte, which is 0x00 = 0 / 128 = 0,0. And it is a correct result because the resolution of the Q7 format is 1 / 128 = 0,0078125 and with such a resolution the multiplied value 0,0003662109375 is represented as 0.

View solution in original post

Piranha · ‎2022-11-08

https://en.wikipedia.org/wiki/Q_(number_format)

Those data types represent fractions with a range [-1; +1) represented as N/2^31. Therefore the numbers are aligned to the left/highest bits and after the multiplication we also need to keep the left/highest part of the 64-bit result. This is pure arithmetic and it isn't impacted by endianess. We just need the logically highest bits regardless of their order in memory.

SSAT instruction and left shift of the result by 1 bit is also necessary for the math to be correct.

yh · ‎2022-11-09

Thank you for your answers.

I studied about Q(number format).

I think this source is not working well.

for example)

in little endian:

inA1 = 0x02;

inB1 = 0x03;

(in original source)

out1 = ((q63_t)inA1 * inB1) >> 32;

=>(0x00000000 00000006 >> 32)

=> 0x00

I actually compiled and checked the result. (result : 0x00)

I think it should be modified as below:

out1 = (((q63_t)inA1 * inB1)<<32) >> 32;

I'm not sure if this is correct.

What are your thoughts on this?

Piranha · ‎2022-11-10

You are still not thinking about fractions.

Let's take a simple example on a fractional numbers with a precision of a single decimal digit after the decimal separator: A=0,2; B=0,3. A multiplication results in C=0,06. If we again reduce the resolution to a single decimal after the separator, we get C=0,0. Note that, because it is a fraction, the right digits are the ones, which we need to remove. Also in this case we ignored a correct rounding and just dropped the additional digits. And, by the way, the original code also does not do the rounding.

Now, let's take your example, but reduce it to (in ARM's notation) Q7/Q15 data types.

A = 0x02 = 0x02 / 2^7 = 2 / 128 = 0,015625

B = 0x03 = 0x03 / 2^7 = 3 / 128 = 0,0234375

C = A * B = 0,015625 * 0,0234375 = 0,0003662109375

C = A * B = (2 / 2^7) * (3 / 2^7) = (2 * 3) / (2^7 * 2^7) = 6 / 2^14 = 0,0003662109375

To represent this fraction as a Q15 type, we need to multiply the numerator and denominator by 2 (left shift by 1 bit).

C = (6 / 2^14) * (2 / 2) = 12 / 2^15 = 0x000C / 2^15 = 0,0003662109375

Therefore it is represented in Q15 format as 0x000C. Converting it back to the Q7 format means taking the highest byte, which is 0x00 = 0 / 128 = 0,0. And it is a correct result because the resolution of the Q7 format is 1 / 128 = 0,0078125 and with such a resolution the multiplied value 0,0003662109375 is represented as 0.

yh · ‎2022-11-10

Thank you very much for your explanation.

I will study more on this.

Thank you.