STM32F103 C faster then ASM?

darkfirefighter · ‎2010-09-07

Posted on September 07, 2010 at 12:11

Andrew Neil · ‎2011-05-17

Posted on May 17, 2011 at 14:06

It is a common fallacy that, simply by writing in assembler, you code will somehow magically become much faster and/or smaller!

Assembler is not a magic bullet - it is just a tool and, therefore, is only as good as the person using the tool!

Modern optimising compilers really are very good - so, if you want to beat a modern optimising compiler, you are going to have to be an exceptionally good assembler programmer!

If you can't understand the assembler that the compiler produces, that probably means that it is better at assembler than you are - and, thus, it's not surprising that its code is faster than yours!

stforum2 · ‎2011-05-17

Posted on May 17, 2011 at 14:06

I think that:

MOV r2,#0x01

BICS r1,r2,r1,LSR #8

BEQ 0x080005E4

is the equivalent of:

TST r1,#0x100

BNE 0x080005E4

stforum2 · ‎2011-05-17

Posted on May 17, 2011 at 14:06

And in C, a preload can be done like this:

void sendDataC(unsigned long *dat)

{

TIM3->CR1 = 0x1; //Enable TIM3

//----------------Startbit (Low)---------------

GPIO_Port->BSRR = dat[0];

unsigned long nPreload = dat[1];

while(!(Toggle_Port->IDR & Toggle_Pin) == 0);

//----------------Data 1-----------------------

GPIO_Port->BSRR = nPreload;

nPreload = dat[2];

while((Toggle_Port->IDR & Toggle_Pin) == 0);

//----------------Data 2-----------------------

GPIO_Port->BSRR = nPreload;

nPreload = dat[3];

while(!(Toggle_Port->IDR & Toggle_Pin) == 0);

//----------------Data 3-----------------------

GPIO_Port->BSRR = nPreload;

while((Toggle_Port->IDR & Toggle_Pin) == 0);

TIM3->CR1 = 0x00; //Enable TIM3

}

Tesla DeLorean · ‎2011-05-17

Posted on May 17, 2011 at 14:06

If proximity to the rising/falling edge is important, you really shouldn't be fluffing around loading and incrementing the index before outputting the data. The key thing the compiler did was remove your ''r0 += 4'' code. Placement to the edge would be quicker if you loaded the output value before entering the spin loop.

__asm void sendDataASM(unsigned long *data){

;****************************Init Registers****************************************

LDR r1,=0x40011810 ;GPIOE->BSRR

LDR r3,=0x40011c08 ;GPIOF->IDR

LDR r5,=0x40000400 ;TIM3->CR1

MOV r6,#0x01

MOV r7,#0x00

STRH r6,[r5,#0x00] ;Enable TIM3

LDR r2,[r0,#0x00] ; Data Start, Preload

;----------------Wait for the first edge------

while0 LDR r4,[r3,#0x00]

TST r4,#0x100

BEQ while0

;****************************Now send Bit for Bit, synchronized by TIM3************

;----------------Startbit (Low)---------------

STR r2,[r1,#0x00] ; Out Data Start

LDR r2,[r0,#0x04] ; Data 1, Preload

while1 LDR r4,[r3,#0x00]

TST r4,#0x100

BNE while1

;----------------Data 1-----------------------

STR r2,[r1,#0x00] ; Out Data 1

LDR r2,[r0,#0x08] ; Data 2, Preload

while2 LDR r4,[r3,#0x00]

TST r4,#0x100

BEQ while2

;----------------Data 2-----------------------

STR r2,[r1,#0x00] ; Out Data 2

LDR r2,[r0,#0x0C] ; Data 3, Preload

while3 LDR r4,[r3,#0x00]

TST r4,#0x100

BNE while3

;----------------Data 3-----------------------

STR r2,[r1,#0x00] ; Out Data 3

while4 LDR r4,[r3,#0x00]

TST r4,#0x100

BEQ while4

;****************************Sending done, Stop Timer and Jump back****************

STRH r7,[r5,#0x00] ;Diable TIM3

BX lr

}

If the data/clock edge placement is critical, and you *have* to bit-bang it in software, you'd be better off driving a pair of GPIO's together, and either have a software calibrated spin-loop, or use a high resolution free running counter to handle the mark/space ratio.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..