2008-04-09 12:25 AM
ST7 and the new STM8A microcontroller family
2008-02-26 12:06 AM
Dear forums members,
the introduction of the new STM8A microcontroller family arises some questions on the ST7 family future. According to the STMicroelectronics announce, the ''real silicon'' will be delivered in the first half of this year, so there is time to take the appropriate actions. The STM8A uses a new single-wire-interface-module (SWIM) for debugging. New tools will therefore needed. I think that they will be cheap and a new STVD edition will help programmers port old code to the new platform. It seems that STM8A is an enhanced ST7, so this transition may be easy and perhaps inevitable because STM8A should be upward compatible with STM32 family as the Freescale Flexis family does. The automotive market will be the first marketing target, so the ST7 mid-and-high range will see the competition. The ST7Lite and ST7Fox families will perhaps survive because of their small resources (less of 8K Flash for code) and price. I think that now the only safe way is to use the C tools and the STM C library, because they can grant a painless transition. And you, what do you think of this new microcontroller family? EtaPhi2008-02-28 08:11 PM
we developed the compiler for this famiy, and gave hints to make it more ''C friendly''.
Based on this experience, I would say >> It seems that STM8A is an enhanced ST7... it's still a CISC, 8 bit, accumulator-based MCU, but, apart from that, it's pretty new. Anyway, this is mostly a matter of perception... >> I think that now the only safe way is to use the C tools and the STM C library, because they can grant a painless transition. Always use C :) Seriuosly, the asm compatibility is extremely limited, and, even in C, there is a bit of porting from one family to another (interrupt vectors...) >> And you, what do you think of this new microcontroller family? In my trials, the code is >30% smaller that on st7, and it executes ~4 times faster, which is a quite impressive performance Luca (Cosmic)2008-02-28 11:26 PM
luca,
your results are very interesting... A 4x speed increase is reasonable: a 3x frequency clock gain (8MHz vs 24 MHz) plus the benefits of a 3-stage pipeline. What impress me is the code reduction because I feared that the new pre-code bytes that were added to support the new features could result in a larger code size. Perhaps the new stack-relative addressing mode helped a lot the C compiler :-), however the MV instruction, the 16 bit hardware multiplication and division (and obviuously 16 bit addition) are a bonus even for the asm folks, like me. The more I read the preliminary documentation, the more I think that this new family will replace the old ST7 micros in any design whose code size is greater than 8Kb. EtaPhi.2008-02-29 01:51 AM
>> A 4x speed increase is reasonable: a 3x frequency clock gain (8MHz vs 24 MHz) plus the benefits of a 3-stage pipeline.
the 4x speed increase is with the same clock: it basically comes from the fact that ST7 had no prefetch buffer (read one byte at a time), while the stm8 reads 32 bits at one time. >> Perhaps the new stack-relative addressing mode helped a lot the C compiler.. most of the improvement comes from that fact that X and Y registers are now 16 bits, but what you mention is true as well. I didn't realize you are an ''asm folk'': what advantages do you see to it? Ciao, Luca2008-02-29 09:19 PM
Luca,
I'm on the asm side because I write firmware for fun. If I were a professional, I would use C with some inline asm because C have some limitations, mainly concerning the goto statement. To avoid spaghetti-code, the jump instructions were reserved to compilers and direct stack manipulation was considered an evil to eradicate. There are however some cases where these evils let me write cleaner code. For instance, my code is always arranged in concurrent tasks: one task handles the human machine interface (buttons, displays, etc), another handles the background work, another the communication with the other devices. My first implementation of this scheme used co-routines, a thing that no C compiler can handle without some asm code. This C example shows how co-routines A, B and C work. Each co-routine owns a stack share (32 bytes, in this example) and saves its stack pointer in a global variable during context switches. This means that g_SpA holds the co-routine A stack pointer, g_SpB that one of the co-routine B and g_SpC that one of the co-routine C. The co-routine A source is:Code:
unsigned char g_SpA;
extern unsigned char g_SpB; static void scheduler(void) { _asm { LD A,S LD g_SpA,A LD A,g_SpB LD S,A } /* notice: the closing brace inserts a RET instruction */ } /* pragma no_return */ void CoRoutineA(void) { /* stack initialization */ /* only for this co-routine as the others are set in main() */ _asm { RSP } /* other initialization */ /* ... */ for ( ; ; ) { /* some code */ /* ... */ scheduler(); /* ... */ /* a wait pattern, condition is changed externally e.g. by an interrupt handler */ while ( condition ) { scheduler(); } /* some other code */ /* ... */ scheduler(); } } The code skeleton for the other co-routines is the same, while the main is:Code:
/* pragma no_return */ void main(void) { /* some common initialization */ /* ... */ /* CoRoutineB & CoRoutineC initialization */ g_SpB = 0xDF; g_SpC = 0xBF; /* Start Co-Routines */ CoRoutineA(); } This code is cleaner than a single infinite loop in main(), since CoRoutineA knows only g_SpB (this information can even be hided by a macro) and the interactions points are defined by its include files. Each wait stops only a co-routine while the others are still running. scheduler() does the ''black magic'' by manipulating the stack to have a fast context switch (only 17 clock cycles for a ST7 device) that works because each scheduler() call pushes the return address on the stack. This was my first implementation, the fastest context switch but has the following drawbacks: - bigger code (each scheduler() call is 3 bytes long) - prevents the compiler further optimizations, since the call makes the registers dirty. - fixed scheduling: CoRoutineA ''calls'' CoRoutineB that ''calls'' CoRoutineC that ''calls'' CoRoutineA that ... Now my projects use a slight slower context switch that has some advantages. The scheduler() function is now:Code:
inline void scheduler() { _asm { TRAP } }; This means that now every CoRoutine (or Task) knows nothing about the others and that the ''black magic'' is located in a module. Now a software interrupt switches tasks execution and only a byte is needed to activate the next task. A clever compiler can recognize that register contents is preserved so that it can generate faster and smaller code. The software interrupt handler gives an hint on how tasks can freezed and waked-up. Here is the code:Code:
ContextSwitch LD A,S LD (TaskSp,Y),A LD Y,(NextTask,Y) LD A,(TaskSp,Y) LD S,A IRET Now there are two vectors that are located in the page 0: TaskSp[Y] holds the task Y stack pointer; NextTask[Y] holds the id of the next task to run after task Y; The code exploits a limitation of the ST7 core that makes the Y register slower that the X one. This means that in my code the Y register holds always the corrent task id, nothing more because it is slower. Since TRAP does not pushes Y on the stack as the new STM8 does, each task stack can be smaller. Since NextTask[Y] is stored in ram, it is now possible to freeze a task and resume it later by changing the value of the entry that ''points'' to it. This simple, yet powerful cooperative multitasking, needs the compiler help (don't touch Y, otherwise...) and its initialization is not so simple because a stack frame has to be created for each task. These drawbacks are however bounded to a single ''black magic'' library that Cosmic can provide like . There is another reason that keeps me on the asm side. This reason is related to the state machines I write. The C way of writing these state machines makes use of a state variable and a switch - case construct. I prefer to use a function pointer that I change instead of the state variable (this means that my state variable is the function pointer). Sometimes the states are a few lines of code, so they all can fit within a page. I exploit this feature by changing only the lower byte of the function pointer (you may bet that many compile-time directives check this feature :) ). My code is therefore smaller and faster and sometimes it is rearranged to make space so that all steps fit a page. I don' know if Cosmic compiler is so smarter. If it would, I had no excuse for using it! Ciao EtaPhi2008-03-12 12:41 AM
_luca and EtaPhi,
I would like to draw your attention to a new forum we just opened specifically on the STM8A. All the stuff related to this new µC family can be discussed there. Best Regards Alexander2008-04-09 12:25 AM
the speed improvements is caused by improvements in micro architecture, STM8 core needs less clock cycles for same instruction, than ST7. It is due to prefetch, 3 stage pipeline and improvements on microcode.
DIV is nice advantage in the arithmetical operation, because it is not needed to link SW routine anymore -> save on code size and gain on performance. There are much more advantages, of course. Therefore, STM8 will be the future. But, it does not mean ST7 will disapear now. It will be kept to support running projects, as well as new developments.