2010-09-07 03:11 AM
STM32F103 C faster then ASM?
2011-05-17 05:06 AM
It is a common fallacy that, simply by writing in assembler, you code will somehow magically become much faster and/or smaller!
Assembler is not a magic bullet - it is just a tool and, therefore, is only as good as the person using the tool! Modern optimising compilers really are very good - so, if you want to beat a modern optimising compiler, you are going to have to be an exceptionally good assembler programmer! If you can't understand the assembler that the compiler produces, that probably means that it is better at assembler than you are - and, thus, it's not surprising that its code is faster than yours!2011-05-17 05:06 AM
I think that:
MOV r2,#0x01 BICS r1,r2,r1,LSR #8 BEQ 0x080005E4 is the equivalent of: TST r1,#0x100 BNE 0x080005E42011-05-17 05:06 AM
And in C, a preload can be done like this:
void sendDataC(unsigned long *dat) { TIM3->CR1 = 0x1; //Enable TIM3 //----------------Startbit (Low)--------------- GPIO_Port->BSRR = dat[0]; unsigned long nPreload = dat[1]; while(!(Toggle_Port->IDR & Toggle_Pin) == 0); //----------------Data 1----------------------- GPIO_Port->BSRR = nPreload; nPreload = dat[2]; while((Toggle_Port->IDR & Toggle_Pin) == 0); //----------------Data 2----------------------- GPIO_Port->BSRR = nPreload; nPreload = dat[3]; while(!(Toggle_Port->IDR & Toggle_Pin) == 0); //----------------Data 3----------------------- GPIO_Port->BSRR = nPreload; while((Toggle_Port->IDR & Toggle_Pin) == 0); TIM3->CR1 = 0x00; //Enable TIM3 }2011-05-17 05:06 AM
If proximity to the rising/falling edge is important, you really shouldn't be fluffing around loading and incrementing the index before outputting the data. The key thing the compiler did was remove your ''r0 += 4'' code. Placement to the edge would be quicker if you loaded the output value before entering the spin loop.
__asm void sendDataASM(unsigned long *data){ ;****************************Init Registers**************************************** LDR r1,=0x40011810 ;GPIOE->BSRR LDR r3,=0x40011c08 ;GPIOF->IDR LDR r5,=0x40000400 ;TIM3->CR1 MOV r6,#0x01 MOV r7,#0x00 STRH r6,[r5,#0x00] ;Enable TIM3 LDR r2,[r0,#0x00] ; Data Start, Preload ;----------------Wait for the first edge------ while0 LDR r4,[r3,#0x00] TST r4,#0x100 BEQ while0 ;****************************Now send Bit for Bit, synchronized by TIM3************ ;----------------Startbit (Low)--------------- STR r2,[r1,#0x00] ; Out Data Start LDR r2,[r0,#0x04] ; Data 1, Preload while1 LDR r4,[r3,#0x00] TST r4,#0x100 BNE while1 ;----------------Data 1----------------------- STR r2,[r1,#0x00] ; Out Data 1 LDR r2,[r0,#0x08] ; Data 2, Preload while2 LDR r4,[r3,#0x00] TST r4,#0x100 BEQ while2 ;----------------Data 2----------------------- STR r2,[r1,#0x00] ; Out Data 2 LDR r2,[r0,#0x0C] ; Data 3, Preload while3 LDR r4,[r3,#0x00] TST r4,#0x100 BNE while3 ;----------------Data 3----------------------- STR r2,[r1,#0x00] ; Out Data 3 while4 LDR r4,[r3,#0x00] TST r4,#0x100 BEQ while4 ;****************************Sending done, Stop Timer and Jump back**************** STRH r7,[r5,#0x00] ;Diable TIM3 BX lr } If the data/clock edge placement is critical, and you *have* to bit-bang it in software, you'd be better off driving a pair of GPIO's together, and either have a software calibrated spin-loop, or use a high resolution free running counter to handle the mark/space ratio.