Optimized Multiplies for Cosmic Compiler?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-11-11 02:27 PM
Are there optimized basic math functions available for the STM8 such as; 8 bit by 8 bit multiply and 8 bit by 16 bit multiply and similar?
My application has a time critical 8 bit by 16 bit multiply. The Cosmic compiler seems to always default to a 16 bit by 16 bit multiply, which is slower.I wrote an inline assembly macro that runs in about 2/3 the time of the compiler's output; but it was very tedious to write. I would rather not do this for every math function.
Any helpful information would be appreciated.
I include my macro here, in case anyone else finds it useful:
// macro to perform an optimized UI8 by UI16 multiply: uint16_t
RESULT_UI16 = (uint8_t)X_UI8 * (uint16_t)Y_UI16;
// Note: All arguments must be declared '@tiny'
// macro assumes that no overflow occurs
// '_asm()' will load an 8 bit argument in reg A or a 16 argument into reg X
#define MULT_8x16(X_UI8, Y_UI16, RESULT_UI16) {\
_asm('LDW Y,X\n SWAPW X\n',(uint16_t)Y_UI16);\
_asm('MUL X,A\n SWAPW X\n PUSHW X\n LDW X,Y\n', (uint8_t)X_UI8);\
_asm('MUL X,A\n ADDW X,($1,SP)\n POPW Y\n');\
_asm('CLRW Y\n LD YL,A\n LDW (Y),X\n',(@tiny uint16_t*)(&RESULT_UI16));\
}
#compiler #math Note: this post was migrated and contained many threaded conversations, some content may be missing.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-11-12 06:40 AM
I guess there would be a C rule for variable rank and promotion when operands differ. Casting the variables does not necessarily means the * operator won't convert them to something else. When things need to be very optimized to the core cycle level, it makes sense to get to assembly level as in this particular case. Check for the math ansi library for specific function (not using *) in case it exists... this requires some doc reading about the compiler.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-11-13 05:54 AM
Yes, C has its rules on types. But the compiler is still free to optimize, as long as the observable behaviour is the same. E.g. 8-bit types will always be promoted to at least 16-bit types by the rules. But compilers will still use 8x8->16 multiplication where they can.
Philipp
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-11-13 09:33 PM
In my opinion, arithmetic in COSMIC CXSTM8, is very high quality. However, for DSP this may not be enough.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-11-14 03:37 AM
I agree that the Cosmic compiler does well in all of the head to head comparisons I have seen; but, I don't think I would refer to simple multiplication as 'DSP'.
The reason for this post was that I was hoping that someone official or unofficial had identified common operations that could be sped up and then created optimized functions or macros to perform those operations. An application note would be great.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-11-14 03:43 AM
I assume, that with any compiler, someone would have looked into common operations, espcially multiplications, and how to speed them up. Multiplications can be quite time-intensive, and are important both in benchmarks and real-world applications. That 8x16->16 multiplication where neither operand is a constant is not treated as a special case probably means that it was not considered particularly common / important. Not even SDCC has such an optimization.
But if you provide examples from real-world code, where such an optimization matters a lot, requesting the feature from compiler developers might result in it getting implemented.
Philipp
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-11-14 05:46 AM
1. I agree that the compilers should do their best to optimize multiplies. However, here, they seem to have failed.
2. Why would 8x16 be any less common than any other multiply? In an embedded application, on an 8 bit micro, where every byte of RAM and every CPU cycle counts why would you NOT have a mix of 8 bit and 16 bit integers? A prudent coder uses the minimum precision practical - with the assumption that the compiler will too.
My particular application should be VERY common. I am scaling the ADC reading (10 bits) with the least precision necessary (6 bits).
3. It doesn't matter whether either factor is constant or not - multiplying by an arbitrary constant is the same as multiplying by a variable.
4. Asking for a new version of the compiler is not really a practical solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-11-14 05:47 AM
I think it's because of the standard. According to the rules of the C language, before performing the arithmetic operation, both operands are reduced to a longer one, of them and truncated to the result type. Special behavior in the calculations is not mandatory but some compilers support customizing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-11-14 06:07 AM
Do not blame the developers, please. Building a compiler is not a trivial task. In any case, you can always write down a section of high-speed code in assembler, corresponding to the architecture of the processor. You can purchase compiler from IAR which have more deep optimizing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-11-14 06:13 AM
4:
It might not be the fastest solution, but I'd still consider it practical. If the compiler developers get feedback indicating that 8x16->16 multiplication is speed-critical for some of their users optimizations for it are likely to get implemented.
2.,3:
I worked on the multiplication optimizations in SDCC. To do so, I needed a good understanding of which multiplications are common, and looked at code for 8-bit µCs from various sources. However, I have to admit that my analysis of which type of multiplications are common was mostly based on code from RTOSes, benchmarks, standard library functions. You might be more familiar with other code.
8x16-> multiplications are a) less common and b) can't be done that much faster than 16x16->16 (yes, there is potentail for optimization, but not as much as in 8x8->16 vs 16x16->16 or 16x16->32 vs 32x32->32). Most multiplications arise from accesses to arrays of structs/unions. If the programmer uses a cheap index type, such as uint_fast8_t, these are 8x8 ->16. If the programmer just uses int, these are 16x8 ->16. Naturally, the 8-bit operand is always a constant. The constant known at compile-time allows some optimizations that otherwise would not be possible. 16x16->16 happen once in a while, there is some code that makes heavy use of 32-bit multiplications (in many variants of operand sizes) in matrix multiplications. 8x16 with both operands being non-constant is less common. So are multiplications with 64-bit operands.
Philipp