on 2024-08-27 04:19 AM
This article guides you on how to perform Dhrystone tests on an STM32L5 microcontroller. For this article, we use a NUCLEO-L552ZE-Q board, which has an STM32L552ZET6QU MCU that integrates an Arm® Cortex® M33 CPU. This code can be easily ported to other STM32 microcontrollers as well. This STM32 has a DMIPS score of 165, and a DMIPS/MHz score of 1.5DMIPS/MHz.
Dhrystone is a synthetic computing benchmark developed in 1984 by Reinhold P. Weicker to measure the combination of computing and compiler performance. It gives the result as a measured number of MIPS (million instructions per second).
This Benchmark program consists of functions that perform a variety of operations commonly performed in system programming such as string manipulation, function calls, assignment statements, arithmetic and logical operations.
Keil® uVision MDK Arm v5.38.0.0
Arm compiler v6.17/ v6.21
STM32CubeMX v6.9.1
Tera Term 4.105
STM32L5xx_DFP for MDK Arm
STM32Cube MCU Package for STM32L5 series v1.5.0 to generate the HAL/LL code using STM32CubeMX
USB-A to micro USB data cable
NOTE: DMIPS scores sometimes differ with MDK-Arm and Arm compiler versions; you may have to download the exact version listed to get matching scores. We use the Keil® IDE for this article to get the exact scores as mentioned on the Arm® website.
Let us start this demonstration by creating a new project for the STM32L552ZET6QU using STM32CubeMX and configuring the clocks, timer, and LPUART. Here are the steps to perform the initial configurations.
Open STM32CubeMX
Navigate to [File] -> [New Project]
Search for STM32L552ZET6Q and then click on [Start Project]
A pop-up appears for TrustZone activation, create the project with TrustZone deactivated to keep it simple
Next, let us configure the clock peripherals. Make sure that the core is running at the maximum frequency of 110MHz. This can be done by going into the [Clock Configuration] tab and entering 110 in the HCLK box. STM32CubeMX automatically calculates the values for the PLL multipliers and prescalars.
NOTE: Results (DMIPS/MHz) are linear with the clock frequency and hence it is not necessary to run the test at the maximum clock frequency. The DMIPS/MHz value is the same running the test at any clock frequency.
Back to [Pinout & Configuration], enable the LPUART1 under the the [Connectivity] menu. By default, LPUART1 (pins PG7/PG8) is connected to the STLINK Virtual COM port to print out the debug messages from the MCU to the PC.
The Virtual COM port settings are 115200 bps, 8-bit data, no parity, 1 stop bit, and no flow control. By default, LPUART1_RX and LPUART1_TX appears on PC0 and PC1 and they have to be redirected to PG8 and PG7.
Tip: There is a shortcut to do this by hovering the mouse pointer on the signal you want to move. Hold down CTRL + left mouse button and move it to the desired alternate function pin that blinks in black color.
Enable the ICache under system core tab and choose a 1-way (direct mapped cache)
A timer peripheral can be used for the calculation of the DMIPS score. Alternatively, we can use the SysTick timer or the DWT (Data watchpoint and trace) as well; all of these implementations are discussed in this article.
Enable a timer under the [Timers] tab and select TIM5 (can choose any other 32-bit timer as well). Change the clock source to [Internal Clock] and Channel1 to [Output Compare No Output] and continue with the default parameter settings.
Go to the [Project Manager] tab and choose a name for the project and change the Toolchain/ IDE to MDK-ARM and finally click on [Generate Code].
After clicking on generate code, click on [Open Project] and the Keil® IDE should open.
Now, you have to create 3 files dhrystone.c, dhrystone2.c and dhrystone.h
You can add new files to the project as follows:
Right-click Application/User/Core and choose [Add new item to group Application/User/Core]
Click on the .c file and enter the name (dhrystone.c) of the file and make sure that you place the files in the right folders
.c files are present in Project_name\Core\Src folder and .h files are present in Project_name\Core\Inc folder
Repeat the same steps to create the dhrystone2.c and dhrystone.h (Make sure you select Header File .h) files.
The code to add in-between the respective user code comments for main.c and main.h files are mentioned below. The code for dhrystone.c, dhrystone2.c and dhrystone.h can be found under the CORE->SRC folder in this link.
/* Private includes ----------------------------------------------------------*/
/* USER CODE BEGIN Includes */
#include "dhrystone.h"
/* USER CODE END Includes */
/* USER CODE BEGIN PD */
#define SYSTICK_LOAD 0xFFFFFF
#define SYSTICK_START 7
#define OVERFLOW_VAR 0xFFFFFFFF
/* USER CODE END PD */
/* USER CODE BEGIN PV */
uint8_t receive[15];
volatile uint64_t tmp1;
/* USER CODE END PV */
/* USER CODE BEGIN PM */
#if USE_DWT
volatile unsigned int *DWT_CYCCNT = (volatile unsigned int *)0xE0001004;
volatile unsigned int *DWT_CONTROL = (volatile unsigned int *)0xE0001000;
#endif
/* USER CODE END PM */
/* USER CODE BEGIN 2 */
start_Test();
/* USER CODE END 2 */
/* USER CODE BEGIN 4 */
void Serial_PutString(uint8_t *p_string)
{
uint16_t length = 0;
while (p_string[length] != '\0')
{
length++;
}
HAL_UART_Transmit(&hlpuart1, p_string, length, 0xFFFF);
}
void Serial_GetString()
{
uint8_t i=0;
do
{
HAL_UART_Receive(&hlpuart1, receive+i,1 , 0xFFFF);
i++;
}
while(*(receive+i-1)!=13);
}
uint32_t strcpy_cust(char* strDest, char* strSrc)
{
int32_t cnt=-1;
do
{
cnt++;
*(strDest+cnt)=*(strSrc+cnt);
}
while(*(strDest+cnt)!='\0');
return ++cnt;
}
uint32_t strcmp_cust(char* str1, char* str2)
{
int32_t ret = 0;
while (!(ret = *(unsigned char *) str1 - *(unsigned char *) str2) && *str2) ++str1, ++str2;
if (ret < 0)
ret = -1;
else if (ret > 0)
ret = 1 ;
return ret;
}
void memcpy_cust (char *d,char *s, uint32_t l)
{
while (l--) *d++ = *s++;
}
uint32_t startTimer()
{
#ifdef USE_DWT //use debug watch timer
*DWT_CONTROL|=DWT_CTRL_CYCCNTENA_Msk;
*DWT_CYCCNT=NULL;
return *DWT_CYCCNT;
#elif USE_TIMER //use timer
HAL_TIM_Base_Start(&htim5);
TIM5->CNT=NULL;
return TIM5->CNT;
#else //use systick
HAL_SuspendTick();
SysTick->CTRL = NULL;
uwTick = NULL;
SysTick->VAL = NULL;
SysTick->LOAD = SYSTICK_LOAD;
SysTick->CTRL = SYSTICK_START;
return NULL;
#endif
}
uint32_t stopTimer()
{
#if USE_DWT
*DWT_CONTROL&=DWT_CTRL_CYCCNTENA_Msk;
return *DWT_CYCCNT;
#elif USE_TIMER
HAL_TIM_Base_Stop(&htim5);
return TIM5->CNT;
#else
SysTick->CTRL = NULL;
tmp1 = ((uint64_t)uwTick * SYSTICK_LOAD);
tmp1 = tmp1 + (SYSTICK_LOAD - SysTick->VAL);
if (tmp1 > OVERFLOW_VAR)
{
/* if end time = start time there is an overflow*/
return NULL;
}
else
{
return (uint32_t)tmp1;
}
#endif
}/* USER CODE END 4 */
/* Exported types ------------------------------------------------------------*/
/* USER CODE BEGIN ET */
extern UART_HandleTypeDef hlpuart1;
extern TIM_HandleTypeDef htim5;
/* USER CODE END ET */
/* USER CODE BEGIN EM */
//#define USE_DWT 1 //enable debug watch counter
//#define USE_TIMER 1 //enable TIMER
/* USER CODE END EM */
/* USER CODE BEGIN EFP */
void Serial_PutString(uint8_t*);
void Serial_GetString();
uint32_t strcpy_cust(char* strDest, char* strSrc);
uint32_t strcmp_cust(char* str1, char* str2);
void memcpy_cust (char *d,char *s, uint32_t l);
uint32_t startTimer();
uint32_t stopTimer();
/* USER CODE END EFP */
Make sure that the NUCLEO-L552ZE-Q_BSP and STM32L5xx_DFP packs are installed and up to date. We use version 6.21 of the compiler. You can add and use older compiler versions by following this documentation.
Right click on the project name and select [Options for Target DMIPS_L5_TEST]
Make sure you have chosen the right compiler version under the tab [Target].
Under the tab [C/C++ (AC6], configure the [Optimization] level to -O3 under the tab and add this line in the [Misc Controls] box to disable inlining of functions.
-fno-common -fno-inline-functions -fno-lto
Compile the project by using the shortcut key F7 or clicking on the option shown below. The project should compile with no errors and warnings.
You can calculate the Dhrystone timings using three methods:
By default, the program uses the SysTick timer to calculate the timing. You can change this by uncommenting the required macro in the main.h file.
If you are using the debug counter or the timer peripherals, take care that the counter register does not overflow. No overflow mechanism is implemented. Keep the number of runs variable at a reasonable value. We can implement the overflow mechanism of timers by using interrupts. Let us make it simple here, as HAL functions take a considerable number of clock cycles when an interrupt gets triggered, which might have an impact on the result.
Connect your board to the PC using a USB cable. Load the binary file by using the shortcut key F8 or clicking on the button shown below. The programming should be completed without any errors.
Open Tera Term and choose the right COM port that your board is connected to. Check the [Device Manager] control panel under Ports if using a Windows machine, and then press OK.
The COM port terminal opens and now you have to make sure the serial port settings are correct.
Verify this by selecting [Setup] -> [Serial Port] -> verify the settings with the image below.
Now, reset the board by pushing the black reset button on the board. You should see a text prompt in the Tera Term terminal window asking you to select the number of runs.
We choose the value 3000000 and then press enter. Now, the execution starts and takes some time to display the results.
As you can see, we have obtained a DMIPS value of 165.6(~165) which is the same as the value specified on the ST website for this MCU.
The dhrystone score counts only the number of program iteration completions per second, allowing individual machines to perform this calculation in a machine-specific way
Another common representation of the Dhrystone benchmark is the DMIPS (Dhrystone MIPS) obtained when the Dhrystone score is divided by 1757 (the number of Dhrystones per second obtained on the VAX 11/780, nominally a 1 MIPS machine)
DMIPS/MHz allow for easier comparison of CPUs running at different clock rates
For example, "2000 Dhrystones per second" indicates that the program iterates 2000 times in a one-second period
Dhrystones per second = number of runs / execution time
DMIPS = Dhrystones_per_second / 1757.0
DMIPS / MHz = (DMIPS * 10^6) / CLOCK_SPEED_OF_MCU.
NOTE: Adjust the number of runs variable accordingly to allow the CPU to execute more than 2 seconds.
Separate compilation of files: The dhrystone program is implemented in two separate files according to the official documentation and you should not merge both the files into one
No procedure merging/function inlining
Default results are those without "Register" declarations for certain variables.
You can enable the register variable declarations for some benchmark variables by uncommenting the #define REG_ENABLE macro in the dhrystone.h file
Compiler optimizations are allowed but should be indicated
Multifile compilations are prohibited
Dhrystone code must be executed for at least two seconds, although longer is generally better and Arm recommends you run Dhrystone 10 times with varying iteration counts and that each run takes at least 20 seconds
It is common to get different DMIPS results for different compiler and IDE versions and for different compiler settings
It does not take multiprocessing into account as it is a single-threaded C program
While Dhrystone's simplicity continues to attract users, its lack of alignment with modern architecture and coding practices raises questions about its effectiveness in reflecting real-world performance. Arm® emphasizes the limitations of Dhrystone and recommends more representative benchmarks like EEMBC, CoreMark, HINT, stream, SPEC, and bytemark.
When compiling Dhrystone for an OS-hosted device, all required functions are provided by the standard C library. Alternatively, you can use your own versions of the strcpy() and strcmp() functions
When running Dhrystone on a bare-metal system, before each run you must initialize the processor by resetting, clearing, and enabling the caches, TLBs, BTBs, and other microarchitectural features such as branch prediction.
While designed to benchmark processors, Dhrystone heavily utilizes optimized C library functions provided by compilers. This raises questions about its accuracy, as the results might reflect compiler-specific optimizations of these functions rather than pure processor performance.
Enable cache (instruction and data) if they exist
Flash ART Accelerator or flash prefetch buffer have to be enabled if not enabled automatically
If the core contains TCM (Tightly coupled memories) like ITCM and DTCM, then the linker script has to be adjusted to place the code into ITCM and data into DTCM with zero wait states. For example: STM32H753ZI with Arm® Cortex® M7 CPU.
I have included custom strcpy_cust, strcmp_cust and memcpy_cust functions. You can replace the built in string.h functions with the built in functions in the Dhrystone.c and Dhrystone2.c files if you find a difference in the actual scores when using a different microcontroller.
It is advised to use the inbuilt string.h functions for this implementation.
There are many versions of the Dhrystone program, but the most used today is version 2.1. STMicroelectronics, Inc and Arm® both quote figures for the C version of Dhrystone 2.1. The original Dhrystone code is available for download and is free to all users and is available at this link.