Writing assembly code in a C program to turn On/Off GPIO pin

Xxoyo.1 · ‎2020-11-22

Hello everyone, I am trying to measure how much time is used to finish executing a function. I cannot use the timers because it adds some delay to the system. My other option is to use GPIO pins.

To do this, I

SET a pin at the start of the function

RESET a pin at the end of the function

I then use an oscilloscope to measure the on off time of the pin.

In order to turn on and off the GPIO without too much delay, I have used assembly code to ensure just a few instructions are used for enable and disable the GPIO.

void HAL_ADC_ConvCpltCallback(ADC_HandleTypeDef* hadc)		//called every 4 us.
{
	__asm__ volatile (
			//turn on GPIO E2
"LDR		R1,		=0x40021014;"
"LDR		R0,	[R1];"
"ORR.W		R0,	#0x0004;"	//only set pin 2
"STR		R0,	[R1];"		//write data to memory
 
			//write data to data memory
"LDR		R3,		=0xD0007260;"
"LDR		R2,	[R3];"
"LDR   		R2,=0xFF000000;"
"STR		R2,	[R3];"		//write data to memory
 
 
			//turn off  GPIO E2
//"LDR		R5,		=0x40021014;"
//"LDR		R4,	[R5];"
"AND.W		R0,  #0xFFFFFFFB;"	//only clear pin 2
"STR		R0,  [R1];"			//write data to memory
	);
 
 
 
}

However the generated code looks like the image below...

Using this generated code, the GPIO pin to the Oscilloscope produced a pulse with varying width as shown in the image below

From my assembly code above, the pulse width should be more or less identical since I am running the same number of instructions each cycle. But I do not know why I cannot get the compiler to follow the assembly code that I wrote in my C program.

Ultimately, I am trying to achieve a very light code to turn on/off the gpio pin so that I can accurately measure the time taken by a function. The code I am trying to achieve would be like the following

void HAL_ADC_ConvCpltCallback(ADC_HandleTypeDef* hadc)		//called every 4 us.
{
 
//read GPIO E ODR address
__asm__ (
	"LDR		R1,		=0x40021014;"
	"LDR		R0,	[R1];"
);
 
//Turn on GPIO E2
__asm__ (
	"ORR.W		R0,	#0x0004;"	//only set pin 2
	"STR		R0,	[R1];"		//write data to memory
);
 
 
//example function 
BSP_LCD_DrawPixel(120, 60, LCD_COLOR_BLACK);
 
// Turn off GPIO E2
__asm__ (
		"AND.W		R0,  #0xFFFFFFFB;"	//only clear pin 2
		"STR		R0,  [R1];"			//write data to memory
);
 
}

Lastly, my compiler is set to Optimization Fast and I am using Atollic Truestudio. I only have one Interrupt in the system which is the ADC conversion complete.

If you need any other information, pls let me know. Thanks for reading!

EDIT1: Does anyone know why the compiled Assembly code is not the same as my inline assembly code? I would prefer this part of assembly code to not be touched /modified by compiler optimization while still using the optimize for speed option.

KnarfB · ‎2020-11-23

GPIOE->BSRR = GPIO_BSRR_BS_2;

...

GPIOE->BRR = GPIO_BSRR_BR_2;

View solution in original post

Radosław · ‎2020-11-22

Which core ?

If You use CubeMx do not expect, optimal code.

Writing an assembler, is not necessary, in C will be the same, or even better.

Still using the timer will be the best idea.

KnarfB · ‎2020-11-22

A lightweight way can be achieved using register level prog. Like:

LD2_GPIO_Port->BSRR = LD2_Pin;
HAL_UART_Transmit(&huart2, &ch, 1, 0 );
LD2_GPIO_Port->BRR = LD2_Pin;

to switch a LED on before and off after the function call. It translates to efficient assembler code. For switching on:

ldr     r3, [pc, #60]   ; (0x8000584 <main+108>)
mov.w   r2, #256        ; 0x100
str     r2, [r3, #24]

even at -O0 optimization. You may use instruction level debug-stepping to check the generated code (or objdump if you dare).

ARM thumb2 cannot load general 32-bit constants in one instruction. The constants are stored at the end of the function relative to PC (first instruction).

The BSRR and BRR registers are used write-only -> no need for a read-modify-write cycles as for ODR register.

The write value (1<<8) is loaded to r2 in the second instruction. It is finally written in the 3rd instruction.

Note that r3 holds the base address of the GPIO register block and #24 is the offset of the register within that block. At higher optimization levels, the base address maybe read only once and used many times.

Don't know why your pulses vary. Maybe caching, instruction folding,... dependig on you MCU (which one?).

Xxoyo.1 · ‎2020-11-22

Thank you for your quick reply, I am using STM32F429zi discovery board running at 168MHz

May I ask how can I store GPIO PE2 address in my program, so that the compiler will put it at address i.e. 0x8000584 ? Do i define a const address using C or using assembly?

ldr r3, [pc, #60] ; (0x8000584 <main+108>)

Radosław · ‎2020-11-22

Then use DWT register, it is part of debug unit.

KnarfB · ‎2020-11-23

GPIOE->BSRR = GPIO_BSRR_BS_2;

...

GPIOE->BRR = GPIO_BSRR_BR_2;

Xxoyo.1 · ‎2020-11-23

Thank you for your suggestion, i shall take a look at this DWT register!

Xxoyo.1 · ‎2020-11-23

Hello, I have checked the datasheet for my STM32F429 discovery board (attached link)

https://www.st.com/resource/en/datasheet/stm32f429zi.pdf

Unfortunately it seems this MCU doesnt support the DWT feature that you mentioned. Thanks for your help though

Radosław · ‎2020-11-23

Read RM for MCU not for board or better cortex M4 manual. But this future is for core (M3, M4 M7..)

KnarfB · ‎2020-11-23

DWT is a feature of the Cortex-M4 core and your MCU has it.

Enable cycle counter first:

  ITM->LAR = 0xC5ACCE55;
  CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
  DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;

and read it out

uint32_t tick = DWT->CYCCNT;
...
uint32_t tock = DWT->CYCCNT;