cancel
Showing results for 
Search instead for 
Did you mean: 

STM32L1xx performance question

klemen
Associate II
Posted on January 04, 2014 at 13:57

Dear Sir/Madam,

In the past I have been using 8-bit AVR (Atmega328p) for my application. The application uses the following peripherals: - I2C peripheral and - USART peripheral The data is read form the I2C device (slave) using fast mode (400 kHz), then some calculations are performed and the results are transferred to the PC using the serial communication. The described chip was mounted on a custom board, and uses an external oscillator with 16 MHz. I wanted to improve the performance of my application and decided to use the STM32L1xx chip (because of the low-power consumption), which can achieve a core clock of 32 MHz. I have transferred my application to the STM32L152 discovery board successfully, but have observed that the program cannot achieve the same speed as the AVR chip. I have tested the I2C communication with the oscilloscope and the clock frequency is indeed 400 kHz. My duty cycle is 50%. And the core clock setting of the ARM are:

*=============================================================================
* System Clock Configuration
*=============================================================================
* System Clock source | PLL(HSI)
*----------------------------------------------------------------------------- 
* SYSCLK | 32000000 Hz
*----------------------------------------------------------------------------- 
* HCLK | 32000000 Hz
*----------------------------------------------------------------------------- 
* AHB Prescaler | 1
*----------------------------------------------------------------------------- 
* APB1 Prescaler | 1
*----------------------------------------------------------------------------- 
* APB2 Prescaler | 1
*----------------------------------------------------------------------------- 
* HSE Frequency | 8000000 Hz
*----------------------------------------------------------------------------- 
* PLL DIV | 2
*----------------------------------------------------------------------------- 
* PLL MUL | 4
*----------------------------------------------------------------------------- 
* VDD | 3.3 V
*----------------------------------------------------------------------------- 
* Vcore | 1.8 V (Range 1)
*----------------------------------------------------------------------------- 
* Flash Latency | 1 WS
*----------------------------------------------------------------------------- 
* Require 48MHz for USB clock | Disabled
*----------------------------------------------------------------------------- 
*=============================================================================

Does anybody have any similar experiences? I do not understand how the execution is slower with the ARM processor (faster clock + 32-bit)? Please tell me if you need any more information regarding this? Thank you very much for any help/suggestions and best regards. K #mco
9 REPLIES 9
Posted on January 06, 2014 at 13:27

Program execution speed - whatever it means - depends on a zillion of factors (and, in case of the 32-bitters, a dozen of zillions 🙂 ), of which clock frequency and data bus width are only two, not necessarily the most important.

Unless you make some obvious error, like compiler optimizations switched off, it would require a rather deep investigation to the code structure etc. to answer your question.

JW

os_kopernika
Associate II
Posted on January 06, 2014 at 15:27

I would suggest you should verify your HCLK clock frequency. If it is not 32MHz then there is not much to discuss about.

chen
Associate II
Posted on January 06, 2014 at 15:50

Hi

'' have observed that the program cannot achieve the same speed as the AVR chip''

is not an accurate or scientific measure of the speeds of the 2 processors.

''Program execution speed - whatever it means - depends on a zillion of factors''

Indeed - I suspect that the apparent slowness of the STM32 is actually due to delay loops in the STM32 I2C and USART drivers but I could be wrong.

ilmars
Associate II
Posted on January 06, 2014 at 15:51

If your application is all about just UART and i2c peripherals then most probably you trash all the CPU time in library calls where you possibly can use register access or just DMA thus possibly achieving sub-1% CPU load.

Anyway you shall describe your application in more details. Performance profiling by pointing out code parts which takes most of the CPU time will help either.
klemen
Associate II
Posted on January 07, 2014 at 15:00

Hello,

thank you for all your answers and suggestions. I admit I was rather vague in my explanation...

Anyway, today I have debugged the code (rather three separate modules, i.e., serial communication, I2C communication and calcultion) using the oscilloscope. I wrapped a pin toggle function around my three modules for both micro controllers and compared the results (execution time):

0690X000006054CQAQ.png

The only deviation between the both controllers is in the serial communication (both initialized to 115200 bps). It seems that the ARM baud is behaving properly as was initialized. Strangely, the baud rate of the AVR is 460800 bps, according to the measured time, even though it was defined also as 115200 bps. I have yet to find out why is this so. I am using the STM32L152 discovery and the AVR chip is on a standalone custom built board and the Tx and Rx pins are connected to a serial transceiver (which offers a 460800 bps maximum baud rate). I do not see how this could effect the stetted baud rate!??

Please tell me, how can I check the HCLK clock frequency properly?

Regarding the DMA - I do not know a lot about DMA. As far as I have read about it, I think I do not need DMA, since I have to wait for the data to be ready (using a peripheral that has its own processor to fill its data buffer) every loop iteration anyway. Or am I wrong?

Thank you and best regards,

K

ilmars
Associate II
Posted on January 08, 2014 at 10:59

When you profile CPU performance, you shall not include communication delays. Perhaps instead of using IRQ you busy-wait for UART TX buffer to be empty and in result are getting such results. If you change UART baudrate to 300bps then your ''CPU performance'' figures will get way worse, right? This is nothing close to CPU performance evaluation 🙂

os_kopernika
Associate II
Posted on January 08, 2014 at 11:23

''Please tell me, how can I check the HCLK clock frequency properly?''

Lets try once again:

Search for MCO

chen
Associate II
Posted on January 08, 2014 at 11:53

Hi K

I think you have found all the issues.

You have shown that raw processing power of the STM32 is better than the Atmel:

Calc time : Atmel - 6,608ms vs STM32 3,478ms

You have worked out why the Atmel SEEMS faster:

USART send time : Atmel 2,553ms vs STM32 12,360ms

(The Atmel does not seem to be complying with RS232 baud rate!)

''the AVR chip is on a standalone custom built board and the Tx and Rx pins are connected to a serial transceiver''

USB transceiver by any chance? USB virtual COM ports can be faster than the set baud rate!

''Please tell me, how can I check the HCLK clock frequency properly?''

By all means - check the clock for your own peace of mind.

''Regarding the DMA - I do not know a lot about DMA. As far as I have read about it, I think I do not need DMA, since I have to wait for the data to be ready (using a peripheral that has its own processor to fill its data buffer) every loop iteration anyway. Or am I wrong?''

You are right - DMA is best for transferring larg(ish) blocks of data when the data is ready.

It is no magic bullet. Most people do not realize it can actual stop the processor!

In order for DMA to do it's job - it needs full access to the address and data bus. Which ever starts first (processor or DMA) has priority over the bus - meaning that if DMA starts first - it can block the processor core. Yes, it depends on the bus topology.

Yes, DMA is faster than code loops. (However, DMA is specific to each processor peripheral core and can be tricky to set up and usually involve IRQs - meaning that sometimes the extra complexity out weighs the simple code loop)

Use appropriately.

klemen
Associate II
Posted on January 09, 2014 at 12:07

Hello,

regarding the CPU performance, I admit I stated this wrong. Actually I don't really care benchmarking the CPU performance. My reference is the overall let's say ''loop'' execution time.

Based on the test, it is clear that the difference is in the serial communication, although I do not know how this is possible, since the baud rates are initialized to the same value + the receiver end is initialized to the same baud. And the data on the receiver side seems ok!

I will check the clock directly on the Tx pin.

Thank you and best regards,

K