2024-03-12 11:39 AM
Hallo All,
I am currently stuck and need your help. The following function counts the no. of 1 in a 32 bit word on an STM32F407. Why does it fail for values which have their bit 31 set? Precisely, it ignores bit 31 and in these cases is off by one.
.global countBits
.text
.syntax unified
countBits:
mov r3, r0
mov r2, #32
mov r0, #0
next:
lsls r3,r3,#1
it mi
addmi r0, #1
subs r2, #1
bne next
bx lr
I would be glad if anyone could point me to the error which obviously I do not see.
Thanks a lot
Herby
Solved! Go to Solution.
2024-03-12 12:15 PM - edited 2024-03-12 12:16 PM
lsls r3,r3,#1
This instruction is shifting the value in R3 to the left by 1. If the 31st bit is set on the initial input number then you are shifting it out and it is lost.
2024-03-12 12:15 PM - edited 2024-03-12 12:16 PM
lsls r3,r3,#1
This instruction is shifting the value in R3 to the left by 1. If the 31st bit is set on the initial input number then you are shifting it out and it is lost.
2024-03-12 12:32 PM
You're using the sign bit, which is already in the high order bit at the first iteration
.global countBits
.text
.syntax unified
countBits:
mov r3, r0
mov r0, #0
orrs r3, r3
next:
it mi
addmi r0, #1
lsls r3,r3,#1
bne next
bx lr
There are other faster ways of doing a population count.
popcount32:
mov.w r1, #$55555555
and.w r1, r1, r0, lsr #1
subs r0, r0, r1
mov.w r1, #$33333333
and.w r1, r1, r0, lsr #2
and.w r0, r0, #$33333333
add r0, r1
add.w r0, r0, r0, lsr #4
and.w r0, r0, #$F0F0F0F
add.w r0, r0, r0, lsr #8
add.w r0, r0, r0, lsr #16
and.w r0, r0, #$3F
bx lr
2024-03-13 10:32 AM
Thanks you for your input. Yes, I know and normally use the Gillies-Miller method for sideways addition. I was just drafting this other solution to make some time measuring experiments to show to my students how much faster sideways addition is.
Herby
2024-03-13 01:12 PM
If you know where the bits are going to lay you can change the shift direction so the looping method can "finish early". The loop counting wastes time in all situations.
I'd probably have opted for the shift into carry and use ADC r0,#0 type method and avoided the conditional execution entirely.