STM32G4 series fastest way to add 32 16bit signed integers
I have the following function to sum 32 16bit signed integers as optimally as I think is possible:int16_t sum32elements2(int16_t vals[]) { uint32_t s1 = __SADD16(*(uint32_t*)&vals[0], *(uint32_t*)&vals[2]); uint32_t s2 = __SADD16(*(uint32_t*)&val...