cancel
Showing results for 
Search instead for 
Did you mean: 

Help needed with STM32 ASM function

herbert
Associate III

Hallo All,

I am currently stuck and need your help. The following function counts the no. of 1 in a 32 bit word on an STM32F407. Why does it fail for values which have their bit 31 set? Precisely, it ignores bit 31 and in these cases is off by one.

   .global countBits
.text
.syntax unified
countBits:
mov r3, r0
mov r2, #32
mov r0, #0
next:
lsls r3,r3,#1
it mi
addmi r0, #1
subs r2, #1
bne next
bx lr

I would be glad if anyone could point me to the error which obviously I do not see.

Thanks a lot
Herby

1 ACCEPTED SOLUTION

Accepted Solutions
STM_Thirty2
ST Employee
lsls r3,r3,#1

 This instruction is shifting the value in R3 to the left by 1. If the 31st bit is set on the initial input number  then you are shifting it out and it is lost.

If you feel a post has answered your question, please click "Accept as Solution"

View solution in original post

4 REPLIES 4
STM_Thirty2
ST Employee
lsls r3,r3,#1

 This instruction is shifting the value in R3 to the left by 1. If the 31st bit is set on the initial input number  then you are shifting it out and it is lost.

If you feel a post has answered your question, please click "Accept as Solution"

You're using the sign bit, which is already in the high order bit at the first iteration

   .global countBits
   .text
   .syntax unified
countBits:
   mov  r3, r0
   mov  r0, #0
   orrs r3, r3
next:
   it mi
   addmi r0, #1
   lsls r3,r3,#1
   bne next
   bx lr

There are other faster ways of doing a population count.

popcount32:
	mov.w	r1, #$55555555
	and.w	r1, r1, r0, lsr #1
	subs	r0, r0, r1
	mov.w	r1, #$33333333
	and.w	r1, r1, r0, lsr #2
	and.w	r0, r0, #$33333333
	add	r0, r1
	add.w	r0, r0, r0, lsr #4
	and.w	r0, r0, #$F0F0F0F
	add.w	r0, r0, r0, lsr #8
	add.w	r0, r0, r0, lsr #16
	and.w	r0, r0, #$3F
	bx	lr
Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Thanks you for your input. Yes, I know and normally use the Gillies-Miller method for sideways addition. I was just drafting this other solution to make some time measuring experiments to show to my students how much faster sideways addition is.

Herby

If you know where the bits are going to lay you can change the shift direction so the looping method can "finish early". The loop counting wastes time in all situations.

I'd probably have opted for the shift into carry and use ADC r0,#0 type method and avoided the conditional execution entirely.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..