2019-10-24 04:42 AM
I want to manipulate the program counter in an assembler function by branching down a list of instructions by a programmable amount
.syntax unified
.global myasm
myasm: PUSH {R0,R1,r2,r4,R5}
ADD R15,#4
is rejected by the assembler
.syntax unified
.global myasm
myasm: PUSH {R0,R1,r2,r4,R5}
mov r5,r2,lsl #2
ADD R15,r5
is accepted but isn't doing what I want. I assume that R15 increments by 4 for each instruction so I'm making sure that I'm always adding multiples of 4.
This is a technique I've uased on other processors from the PDP11 onwards but the ARM crashes as soon as the ADD instruction executes
Any help appreciated
2019-10-24 06:32 AM
> I assume that R15 increments by 4 for each instruction
No. Cortex-M don't execute ARM instructions.
The Cortex-M execute Thumb-2 instruction set, it has 16-bit instruction word, but some instructions are one-, some are two-instruction-words (i.e. some are 2 bytes, other 4-bytes (this differs between Cortex-M0/M0+ and Cortex-M3/M4/M7).
Thumb also severely restricts what you can do with PC/R15, your basic option is to use BX; but you might be pleasantly surprised by the TBB/TBH instructions.
Read the ARM-v7 Architecture Reference Manual (unless you intend to use Cortex-M0/M0++, in which case it's ARM-v6).
JW
2019-10-24 10:20 AM
Jan
Thanks for the pointer. TBH can be made to do what I want albeit in a clunky way compared to just directly adding an offset to the program counter. I've got my code working in a test program however, in the main program it causes problems. I suspect that despite pushing and popping all relevant registers the assembler code is conflicting with pipelining, cacheing or something similar.
2019-10-24 10:38 AM
Perhaps you can BL to the next instruction, ADD to LR, and the BX LR out of it.
2019-10-24 02:42 PM
"Perhaps you can BL to the next instruction, ADD to LR, and the BX LR out of it."
Wonderful bit of lateral thinking :) Unfortunately LR is like PC - you can't assign to it. It appears you can add a small literal
add Lr,#n
but you can't do
add LR,R2
2019-10-24 03:11 PM
But you can do this
ADD R2, LR
BX R2
What's the target here? A CM0(+)
2019-10-24 03:19 PM
>>Wonderful bit of lateral thinking =)
I don't even need to think outside the box, I am the box, and the surfaces see in all directions... inside and out
210 0000001A F000 F800 BL .+4
211 0000001E 4472 ADD R2, LR
212 00000020 4710 BX R2
Scaling LR probably isn't going to be helpful
2019-10-25 01:17 AM
IT WORKS!!
This code implements what in old PDP11 terminology was called a transfer vector. It allows a variable number of bytes to be copied without any loop or loop testing.
memcpy() seems to be appalling slow and a list of C pointer copies *p++=*q++; is very much faster but can't deal with the situation where the number of elements to copy is variable. This assembler code can. Thanks to Jan and Clive for the help. Tested on STM32H743
/*
Routine to copy bytes without any loop
R0 - source address
R1 - destination address
R2 - number of bytes to copy
example code allows 0-10 bytes to be copied
versions can be built to copy shorts, words etc all without looping
*/
mycpy: PUSH {R4-R5,lr}
bl next
next: mov r4,#104 @ length of the jump if zero bytes to copy
mov r5,r2,lsl #3 @each copy takes two words - 8 bytes
sub r4,r5 @offset the jump by the number of bytes
add r4,lr @add in the current return address
bx r4 @return from the subroutine to the correct address
LDRb r5,[r0],#1
strb r5,[r1],#1
LDRb r5,[r0],#1
strb r5,[r1],#1
LDRb r5,[r0],#1
strb r5,[r1],#1
LDRb r5,[r0],#1
strb r5,[r1],#1
LDRb r5,[r0],#1
strb r5,[r1],#1
LDRb r5,[r0],#1
strb r5,[r1],#1
LDRb r5,[r0],#1
strb r5,[r1],#1
LDRb r5,[r0],#1
strb r5,[r1],#1
LDRb r5,[r0],#1
strb r5,[r1],#1
LDRb r5,[r0],#1
strb r5,[r1],#1
LDRb r5,[r0],#1
strb r5,[r1],#1
POP {R4-R5,lr}
mov r0,#0
bx lr
2019-10-25 01:49 AM
Couldn't shifting the count by 1 and IT be used to break this down to a byte-halfword-word-two_words read, up to 15 bytes?
JW
2019-10-25 02:01 AM
"Couldn't shifting the count by 1 and IT be used to break this down to a byte-halfword-word-two_words read, up to 15 bytes?"
Yes there are lots of games to play with now the concept is working. This version deals with bytes and doesn't care about word boundaries in either the sourc eor destination. By adding extra testing etc. you can transfer bytes up to the word boundary, then words up to the last complete word, then bytes again for the remainder. The challenge is to optimise very short transfers which are most typical in my application but at the same time do long ones efficiently as well.