cancel
Showing results for 
Search instead for 
Did you mean: 

How to run a Dhrystone/DMIPS benchmark on an STM32

Lionking
ST Employee

Introduction

This article guides you on how to perform Dhrystone tests on an STM32L5 microcontroller. For this article, we use a NUCLEO-L552ZE-Q board, which has an STM32L552ZET6QU MCU that integrates an Arm® Cortex® M33 CPU. This code can be easily ported to other STM32 microcontrollers as well. This STM32 has a DMIPS score of 165, and a DMIPS/MHz score of 1.5DMIPS/MHz.

Lionking_0-1722898420188.jpeg

Dhrystone is a synthetic computing benchmark developed in 1984 by Reinhold P. Weicker to measure the combination of computing and compiler performance. It gives the result as a measured number of MIPS (million instructions per second).

This Benchmark program consists of functions that perform a variety of operations commonly performed in system programming such as string manipulation, function calls, assignment statements, arithmetic and logical operations.

 

1. Hardware and software prerequisites

NOTE: DMIPS scores sometimes differ with MDK-Arm and Arm compiler versions; you may have to download the exact version listed to get matching scores. We use the Keil® IDE for this article to get the exact scores as mentioned on the Arm® website.

 

2. Development

Let us start this demonstration by creating a new project for the STM32L552ZET6QU using STM32CubeMX and configuring the clocks, timer, and LPUART. Here are the steps to perform the initial configurations.

  • Open STM32CubeMX

  • Navigate to [File] -> [New Project]

  • Search for STM32L552ZET6Q and then click on [Start Project]

  • A pop-up appears for TrustZone activation, create the project with TrustZone deactivated to keep it simple

Lionking_1-1722898420198.png

Next, let us configure the clock peripherals. Make sure that the core is running at the maximum frequency of 110MHz. This can be done by going into the [Clock Configuration] tab and entering 110 in the HCLK box. STM32CubeMX automatically calculates the values for the PLL multipliers and prescalars.

 

NOTE: Results (DMIPS/MHz) are linear with the clock frequency and hence it is not necessary to run the test at the maximum clock frequency. The DMIPS/MHz value is the same running the test at any clock frequency.

Lionking_2-1722898420209.png

 

Back to [Pinout & Configuration], enable the LPUART1 under the the [Connectivity] menu. By default, LPUART1 (pins PG7/PG8) is connected to the STLINK Virtual COM port to print out the debug messages from the MCU to the PC.

The Virtual COM port settings are 115200 bps, 8-bit data, no parity, 1 stop bit, and no flow control. By default, LPUART1_RX and LPUART1_TX appears on PC0 and PC1 and they have to be redirected to PG8 and PG7.
Tip: There is a shortcut to do this by hovering the mouse pointer on the signal you want to move. Hold down CTRL + left mouse button and move it to the desired alternate function pin that blinks in black color.

Lionking_3-1722898420221.png

 

Enable the ICache under system core tab and choose a 1-way (direct mapped cache)

Lionking_4-1722898420230.png

 

A timer peripheral can be used for the calculation of the DMIPS score. Alternatively, we can use the SysTick timer or the DWT (Data watchpoint and trace) as well; all of these implementations are discussed in this article.

Enable a timer under the [Timers] tab and select TIM5 (can choose any other 32-bit timer as well). Change the clock source to [Internal Clock] and Channel1 to [Output Compare No Output] and continue with the default parameter settings.

Lionking_5-1722898420241.png

 

Go to the [Project Manager] tab and choose a name for the project and change the Toolchain/ IDE to MDK-ARM and finally click on [Generate Code].

Lionking_6-1722898420246.png

 

After clicking on generate code, click on [Open Project] and the Keil® IDE should open.

Now, you have to create 3 files dhrystone.c, dhrystone2.c and dhrystone.h

You can add new files to the project as follows:

  • Right-click Application/User/Core and choose [Add new item to group Application/User/Core]

Lionking_7-1722898420249.png

 

  • Click on the .c file and enter the name (dhrystone.c) of the file and make sure that you place the files in the right folders

  •  .c files are present in Project_name\Core\Src folder and .h files are present in Project_name\Core\Inc folder

Lionking_8-1722898420252.png

  • Repeat the same steps to create the dhrystone2.c and dhrystone.h (Make sure you select Header File .h) files.

The code to add in-between the respective user code comments for main.c and main.h files are mentioned below. The code for dhrystone.c, dhrystone2.c and dhrystone.h can be found under the CORE->SRC folder in this link.

 

2.1. Main.c file

/* Private includes ----------------------------------------------------------*/
/* USER CODE BEGIN Includes */
#include "dhrystone.h"
/* USER CODE END Includes */

/* USER CODE BEGIN PD */
#define SYSTICK_LOAD 0xFFFFFF
#define SYSTICK_START 7
#define OVERFLOW_VAR 0xFFFFFFFF
/* USER CODE END PD */

/* USER CODE BEGIN PV */
uint8_t receive[15];
volatile uint64_t tmp1;
/* USER CODE END PV */
/* USER CODE BEGIN PM */
#if USE_DWT
		volatile unsigned int *DWT_CYCCNT   = (volatile unsigned int *)0xE0001004;
		volatile unsigned int *DWT_CONTROL = (volatile unsigned int *)0xE0001000;
#endif
/* USER CODE END PM */

/* USER CODE BEGIN 2 */
start_Test();
  /* USER CODE END 2 */


	/* USER CODE BEGIN 4 */
void Serial_PutString(uint8_t *p_string)
{
  uint16_t length = 0;

  while (p_string[length] != '\0')
  {
    length++;
  }
  HAL_UART_Transmit(&hlpuart1, p_string, length, 0xFFFF);
}

void Serial_GetString()
{
	uint8_t i=0;
	do
	{
		HAL_UART_Receive(&hlpuart1, receive+i,1 , 0xFFFF);
		i++;
	}
	while(*(receive+i-1)!=13);
}

uint32_t strcpy_cust(char* strDest, char* strSrc)
{
	int32_t cnt=-1;
	
	do
	{
		cnt++;
		*(strDest+cnt)=*(strSrc+cnt);
	}
	while(*(strDest+cnt)!='\0');
	
	return ++cnt;
}

uint32_t strcmp_cust(char* str1, char* str2)
{
  int32_t ret = 0;

  while (!(ret = *(unsigned char *) str1 - *(unsigned char *) str2) && *str2) ++str1, ++str2;

  if (ret < 0)

    ret = -1;
  else if (ret > 0)

    ret = 1 ;

  return ret;
}

void memcpy_cust (char *d,char *s, uint32_t l)
{
        while (l--) *d++ = *s++;
}

uint32_t startTimer()
{
	#ifdef USE_DWT  //use debug watch timer
		*DWT_CONTROL|=DWT_CTRL_CYCCNTENA_Msk;
		*DWT_CYCCNT=NULL;
		return *DWT_CYCCNT; 
	
	#elif USE_TIMER   //use timer
		HAL_TIM_Base_Start(&htim5);
		TIM5->CNT=NULL;
		return TIM5->CNT;
	
	#else  //use systick
	HAL_SuspendTick();
  SysTick->CTRL = NULL;
	
	uwTick = NULL;
  SysTick->VAL = NULL;
  SysTick->LOAD = SYSTICK_LOAD;
  SysTick->CTRL = SYSTICK_START;
	return NULL;
	#endif
}
uint32_t stopTimer()
{
	
	#if USE_DWT
			*DWT_CONTROL&=DWT_CTRL_CYCCNTENA_Msk; 
			return *DWT_CYCCNT;
	
	#elif USE_TIMER   
			HAL_TIM_Base_Stop(&htim5);
			return TIM5->CNT;
	#else
		 SysTick->CTRL = NULL;
    
    tmp1 = ((uint64_t)uwTick * SYSTICK_LOAD);
    tmp1 = tmp1 + (SYSTICK_LOAD - SysTick->VAL);
    if (tmp1 > OVERFLOW_VAR)
    {
      /* if end time = start time there is an overflow*/
      return NULL;
    }
    else
    {
      return (uint32_t)tmp1;
    }
	#endif
}/* USER CODE END 4 */

2.2 Main.h file

/* Exported types ------------------------------------------------------------*/
/* USER CODE BEGIN ET */
extern UART_HandleTypeDef hlpuart1;
extern TIM_HandleTypeDef htim5;
/* USER CODE END ET */

/* USER CODE BEGIN EM */
//#define USE_DWT 1  //enable debug watch counter
//#define USE_TIMER 1 //enable TIMER 
/* USER CODE END EM */

	/* USER CODE BEGIN EFP */
void Serial_PutString(uint8_t*);
void Serial_GetString();
uint32_t strcpy_cust(char* strDest, char* strSrc);
uint32_t strcmp_cust(char* str1, char* str2);
void memcpy_cust (char *d,char *s, uint32_t l);
uint32_t startTimer();
uint32_t stopTimer();
/* USER CODE END EFP */

Make sure that the NUCLEO-L552ZE-Q_BSP and STM32L5xx_DFP packs are installed and up to date. We use version 6.21 of the compiler. You can add and use older compiler versions by following this documentation.

 

Right click on the project name and select [Options for Target DMIPS_L5_TEST]

Lionking_9-1722898844209.png

 

Make sure you have chosen the right compiler version under the tab [Target].

Lionking_10-1722898844212.png

 

Under the tab [C/C++ (AC6], configure the [Optimization] level to -O3 under the tab and add this line in the [Misc Controls] box to disable inlining of functions.

           -fno-common -fno-inline-functions -fno-lto

 

Lionking_11-1722898844217.png

Compile the project by using the shortcut key F7 or clicking on the option shown below. The project should compile with no errors and warnings.

Lionking_12-1722898844218.png

 

3. Methods to calculate the execution time

You can calculate the Dhrystone timings using three methods:

  1. SysTick timer
  2. 32-bit internal timer
  3. Data watchpoint and trace (DWT) unit

By default, the program uses the SysTick timer to calculate the timing. You can change this by uncommenting the required macro in the main.h file.

Lionking_13-1722898957599.png

  • Remember that DWT control only works when the microcontroller is run in the debugger mode

If you are using the debug counter or the timer peripherals, take care that the counter register does not overflow. No overflow mechanism is implemented. Keep the number of runs variable at a reasonable value. We can implement the overflow mechanism of timers by using interrupts. Let us make it simple here, as HAL functions take a considerable number of clock cycles when an interrupt gets triggered, which might have an impact on the result.

  • Use a 32-bit timer that runs at HCLK clock frequency

 

4. Results

Connect your board to the PC using a USB cable. Load the binary file by using the shortcut key F8 or clicking on the button shown below. The programming should be completed without any errors.

Lionking_14-1722898957601.png

Open Tera Term and choose the right COM port that your board is connected to. Check the [Device Manager] control panel under Ports if using a Windows machine, and then press OK.

The COM port terminal opens and now you have to make sure the serial port settings are correct.

Verify this by selecting [Setup] -> [Serial Port] -> verify the settings with the image below.

 

Lionking_15-1722898957603.png

Now, reset the board by pushing the black reset button on the board. You should see a text prompt in the Tera Term terminal window asking you to select the number of runs.

Lionking_16-1722898957603.png

We choose the value 3000000 and then press enter. Now, the execution starts and takes some time to display the results.

Lionking_17-1722898957610.png

As you can see, we have obtained a DMIPS value of 165.6(~165) which is the same as the value specified on the ST website for this MCU.

 

5. How scores are calculated

  • The dhrystone score counts only the number of program iteration completions per second, allowing individual machines to perform this calculation in a machine-specific way

  • Another common representation of the Dhrystone benchmark is the DMIPS (Dhrystone MIPS) obtained when the Dhrystone score is divided by 1757 (the number of Dhrystones per second obtained on the VAX 11/780, nominally a 1 MIPS machine)

  • DMIPS/MHz allow for easier comparison of CPUs running at different clock rates

  • For example, "2000 Dhrystones per second" indicates that the program iterates 2000 times in a one-second period

Dhrystones per second = number of runs / execution time

DMIPS = Dhrystones_per_second / 1757.0

DMIPS / MHz = (DMIPS * 10^6) / CLOCK_SPEED_OF_MCU.

NOTE: Adjust the number of runs variable accordingly to allow the CPU to execute more than 2 seconds.

 

6. Rules to be considered while calculating Dhrystone

  • Separate compilation of files: The dhrystone program is implemented in two separate files according to the official documentation and you should not merge both the files into one

  • No procedure merging/function inlining

  • Default results are those without "Register" declarations for certain variables.

  • You can enable the register variable declarations for some benchmark variables by uncommenting the #define REG_ENABLE macro in the dhrystone.h file

  • Compiler optimizations are allowed but should be indicated

  • Multifile compilations are prohibited

  • Dhrystone code must be executed for at least two seconds, although longer is generally better and Arm recommends you run Dhrystone 10 times with varying iteration counts and that each run takes at least 20 seconds

 

7. Points to consider

  • It is common to get different DMIPS results for different compiler and IDE versions and for different compiler settings

  • It does not take multiprocessing into account as it is a single-threaded C program

While Dhrystone's simplicity continues to attract users, its lack of alignment with modern architecture and coding practices raises questions about its effectiveness in reflecting real-world performance. Arm® emphasizes the limitations of Dhrystone and recommends more representative benchmarks like EEMBC, CoreMark, HINT, stream, SPEC, and bytemark.

When compiling Dhrystone for an OS-hosted device, all required functions are provided by the standard C library. Alternatively, you can use your own versions of the strcpy() and strcmp() functions

When running Dhrystone on a bare-metal system, before each run you must initialize the processor by resetting, clearing, and enabling the caches, TLBs, BTBs, and other microarchitectural features such as branch prediction.

While designed to benchmark processors, Dhrystone heavily utilizes optimized C library functions provided by compilers. This raises questions about its accuracy, as the results might reflect compiler-specific optimizations of these functions rather than pure processor performance.

 

8. Porting to other microcontrollers

  • Enable cache (instruction and data) if they exist

  • Flash ART Accelerator or flash prefetch buffer have to be enabled if not enabled automatically

If the core contains TCM (Tightly coupled memories) like ITCM and DTCM, then the linker script has to be adjusted to place the code into ITCM and data into DTCM with zero wait states. For example: STM32H753ZI with Arm® Cortex® M7 CPU.

I have included custom strcpy_cust, strcmp_cust and memcpy_cust functions. You can replace the built in string.h functions with the built in functions in the Dhrystone.c and Dhrystone2.c files if you find a difference in the actual scores when using a different microcontroller.
It is advised to use the inbuilt string.h functions for this implementation.

 

Related links

Version history
Last update:
‎2024-08-27 02:58 AM
Updated by: