2015-08-20 12:26 AM
Hello
I am currently working on STM32F3 Discovery Board. I wrote a basic program in C to see if it works at 72 MHz which is about 14 ns in time. Hopefully it worked at this frequency, but when I want to toggle just one random pin on the device, turning it ON or OFF takes 660 ns in an infinite while loop. Then I checked with an oscilloscope, how much each command takes. Values might not be exact, Here is what I found; a while loop(only for checking) = 120 ns, a ''for'' loop(only for checking) = 160 ns One turn of a while loop = 280 ns, one turn of a for loop = 320 ns, which means each turn takes 160 ns. asm(''NOP'') = 14 ns(which must be equal to operator frequency(72 MHz, as expected), an empty function call = 225 ns I don't know if these values are different in other compilers, or other computers, but I am not even close to 72 MHz. Minimum interrupt I got was 2 microsecond. Is there any way to minimize these values?#include ''stm32f30x.h''
#include <common.h>
void
InitializeTimer(
void
) ;
void
EnableTimerInterrupt(
void
);
void
TIM2_IRQHandler(
void
);
GPIO_InitTypeDef configuration;
void
dly(
int
dly_nmbr);
int
main(
void
)
{
GPIO_InitTypeDef GPIO_InitStructure;
//NVIC_InitTypeDef NVIC_InitStructure;
RCC_ClocksTypeDef RCC_ClockFreq;
RCC_AHBPeriphClockCmd(RCC_AHBPeriph_GPIOA, ENABLE);
/* Configure PA10 in output pushpull mode */
/*****I called init_pin function from common.c because it was easier
but I guess it takes some time too*********/
init_pin(GPIOA,GPIO_Pin_10,GPIO_Mode_OUT,GPIO_Speed_50MHz,GPIO_OType_PP,GPIO_PuPd_NOPULL);
/****I tried usual way to configure the pins too****/
//configuration.GPIO_Pin = GPIO_Pin_10;
//configuration.GPIO_Mode = GPIO_Mode_OUT;
//configuration.GPIO_OType = GPIO_OType_PP;
//configuration.GPIO_Speed = GPIO_Speed_50MHz;
//configuration.GPIO_PuPd = GPIO_PuPd_NOPULL;
//GPIO_Init(GPIOA, &configuration);
//InitializeTimer();
/*** This function fills the RCC_ClockFreq structure with the current
frequencies of different on chip clocks (for debug purpose) **************/
RCC_GetClocksFreq(&RCC_ClockFreq);
/* Enable Clock Security System(CSS): this will generate an NMI exception
when HSE clock fails *****************************************************/
RCC_ClockSecuritySystemCmd(ENABLE);
/* Enable and configure RCC global IRQ channel, will be used to manage HSE ready
and PLL ready interrupts.
These interrupts are enabled in stm32f0xx_it.c file **********************/
/* I disabled the interrupts for the purpose of maximizing the program speed*/
// NVIC_InitStructure.NVIC_IRQChannel = RCC_IRQn;
// NVIC_InitStructure.NVIC_IRQChannelPreemptionPriority = 0;
// NVIC_InitStructure.NVIC_IRQChannelCmd = ENABLE;
// NVIC_Init(&NVIC_InitStructure);
/************* Output HSE clock on MCO1 pin(PA8) ************/
/****************** Enable the GPIOA Clock ******************/
RCC_AHBPeriphClockCmd(RCC_AHBPeriph_GPIOA, ENABLE);
/* MCO pin configuration: PA8 */
GPIO_InitStructure.GPIO_Mode = GPIO_Mode_AF;
GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz;
GPIO_InitStructure.GPIO_OType = GPIO_OType_PP;
GPIO_InitStructure.GPIO_PuPd = GPIO_PuPd_NOPULL;
GPIO_InitStructure.GPIO_Pin = GPIO_Pin_8;
GPIO_Init(GPIOA, &GPIO_InitStructure);
/* Output System Clock on MCO pin */
RCC_MCOConfig(RCC_MCOSource_SYSCLK);
/************** This is the toggling part. I tried each command here ***************/
while
(1)
{
// volatile int i = 0;
// dly(0);
/*I called an empty function called dly before and after the toggle
to see function call time*/
GPIO_Toggle(GPIOA,GPIO_Pin_10);
// dly(0);
// while(i<=0)
// {
// i++;
// }
// for(i=0;i<=0;i++);
}
}
/*Empty Function*/
void
dly(
int
dly_nmbr)
{
}
/************** Interrupt Functions **************/
// void TIM2_IRQHandler()
// {
// if (TIM_GetITStatus(TIM2, TIM_IT_Update) != RESET)
// {
// TIM_ClearITPendingBit(TIM2, TIM_IT_Update);
// }
// }
// void InitializeTimer()
// {
// RCC_APB1PeriphClockCmd(RCC_APB1Periph_TIM2, ENABLE);
//
// TIM_TimeBaseInitTypeDef timerInitStructure;
// timerInitStructure.TIM_Prescaler = 0;
// timerInitStructure.TIM_CounterMode = TIM_CounterMode_CenterAligned2;
// timerInitStructure.TIM_Period = 2;
// timerInitStructure.TIM_ClockDivision = 0x0000;
// timerInitStructure.TIM_RepetitionCounter = 100;
// TIM_TimeBaseInit(TIM2, &timerInitStructure);
//
// TIM_Cmd(TIM2, ENABLE);
// TIM_ITConfig(TIM2, TIM_IT_Update, ENABLE);
// EnableTimerInterrupt();
// }
// void EnableTimerInterrupt()
// {
// NVIC_InitTypeDef nvicStructure;
// nvicStructure.NVIC_IRQChannel = TIM2_IRQn;
// nvicStructure.NVIC_IRQChannelCmd = ENABLE;
// // TIM2->ARR = 1;
// // TIM2->PSC = 1;
// NVIC_Init(&nvicStructure);
//
// }
/****************** EOF ********************/
#compiler-processing-speed #interrupts #stm32f3
2015-08-20 12:46 AM
If you want to really understand these kinds of issues, you need to look at the assembler code generated by the C compiler, and read the ARM documentation on how many cycles each instruction needs.
It seems the STM32F3 are using a Cortex M4 core. Most instructions on the M4 run in 1 cycle, load/store run in 2. There are many more details to understand to get the full picture. Running code from flash generally incurs wait-states, running very time-critical code from SRAM can avoid this. Using library calls such asGPIO_Toggle() and TIM_GetITStatus()
will take more cycles due to various function call overheads, unless you can get your compiler to inline them. However, they are usually preferable nevertheless, except in extremely performance sensitive code.
Normally, on STM32 MCUs it is seldom necessary to rely on the CPU speed to get fast bit-banging, since there are so many flexible peripherals that can handle most I/O needs. For example, a timer with PWM should be able to toggle a GPIO every clock cycle.
2015-08-20 03:37 AM
And what I know, there are several internal buses with different speeds. IO is on one and instructions run on other.
I saw a test somewhere on the where people measured response times of ARM CPUs. If you are lucky you can find them in the net. I don't have a link to them.2015-08-20 04:14 AM
Yeah. Thanks for the answer. I try to write the code in asm. Like you said using libraries causes more cycles. When I use GPIOx->ODR ^= GPIO_Pin_Number ; instead of GPIO_Toggle(), speed greatly increases. Since I will need functions to read and send data and use interrupts, I do not think asm codes fit for me. I tried to use SPI or I2C to send the datas, they are faster, but there is another problem. Because I need some delays in System Clock and we cannot give specific delays inside our SPI or I2C. But I don't know how to make it work by using PWM module at MHz frequencies. To produce systematic clock period, PWM Duty range must be 50% as far as I know. So how can I do it by using PWM?
2015-08-20 05:28 AM
First:
Compiler optimization makes huge difference. Same nucleo commands to toggle pins can run up to 20x faster with different compiler ( IAR ARM vs ATOLIC), so make sure you have highest optimization settings, with no size constrains. Second, use longer while loop, with more commands, because while(1) will takes one command, this is just simple trick. Third, don't use standard GPIO toggle commands, go and look what they are doing.void GPIO_SetBits(GPIO_TypeDef* GPIOx, uint16_t GPIO_Pin)
{
/* Check the parameters */
assert_param(IS_GPIO_ALL_PERIPH(GPIOx));
assert_param(IS_GPIO_PIN(GPIO_Pin));
GPIOx->BSRR = GPIO_Pin;
}
Usually they check if your data input is valid, that takes lot of time. I always use lowest implementation possible, for F3 that will be:
GPIOA->BRR = GPIO_Pin_8;// CS_LOW
GPIOA->BSRR = GPIO_Pin_8; // CS_High
And i usually get maximum stated toggling frequency, with STM32F7 can get 100MHz, no problem. (IAR AMR)
Interrupt takes over 100ns for CPU to get any code executed, again, running on STM32F429 at 216MHz)
Just made simple test on STM32F7 running at 200MHz
GPIOB->ODR ^= GPIO_PIN_1; is running at 5MHz
Now fester code:
GPIOB->BSRR = GPIO_PIN_1;
GPIOB->BSRR = GPIO_PIN_1<<16;
This one runs at 9999MHz (it's exactly 1/2 core speed)
Now, use normal GPIO functions
HAL_GPIO_TogglePin(GPIOB,GPIO_PIN_1);
This only works at 1MHz
Now, lets take fastest code and make with no optimization. Answer, it runs at 3333MHz ( with highest optimization 100MHz), so it looks like it is executing 3 times more code. I will not post disassembly here.