on 2024-08-27 05:30 PM
This article guides you on how to perform CoreMark® benchmarks on an STM32H5 microcontroller. For this article, we use a NUCLEO-H563ZI board, which has an STM32H563ZIT6 MCU that integrates an Arm® Cortex® M33 CPU. This code can be easily ported to other STM32 microcontrollers as well. The STM32H563/73 has a CoreMark score of 1023 CoreMark® when running at 250MHz (4.092 CoreMark®/MHz).
CoreMark® is a platform independent benchmark ANSI C program using diverse algorithms (lists, matrix CRC) to generate a single score for comparing MCU and CPU performance. Introduced in 2009 by Shay Gal-On at EEMBC, CoreMark® was created to replace the Dhrystone benchmark as the industry standard for performance evaluation of microcontrollers and CPUs.
CoreMark's compact size enables it to fit seamlessly within a processor's cache. This makes it suitable for performance evaluation across a broad spectrum of processors, encompassing both low-end and high-end devices.
Let us start this demonstration by creating a new project for the STM32H563ZIT6 using STM32CubeMX and configuring the clocks and UART. Here are the steps to perform the initial configurations.
Open STM32CubeMX
[File] -> [New Project]
Search for STM32H563ZIT6 and then click on [Start Project]
A pop-up will appear for TrustZone activation, create the project with deactivated TrustZone deactivated for simplicity.
Configure the Clock peripherals. Make sure that the core is running at the maximum frequency of 250 MHz. This can be done by going into the [Clock Configuration] tab and entering 250 in the HCLK box. STM32CubeMX automatically calculates the values for the PLL multipliers and prescalars to generate this 250MHz system clock.
NOTE: Results (CoreMark/MHz) are linear with the clock frequency and hence it is not necessary to run the test at the maximum clock frequency.
Back to [Pinout & Configuration], let us enable the USART3 under the connectivity menu. By default, USART3 (pins PD8/PD9) is connected to the STLINK Virtual COM port to print out the debug messages from the MCU to the PC.
The Virtual COM port settings are 115200 bps, 8-bit data, no parity, 1 stop bit, and no flow control. By default, USART3_RX and USART3_TX appears on PC4 and PB10 and they have to be redirected to PD9 and PD8. There is a shortcut to do this. Hover the mouse pointer on the pin signal that you want to move, and hold CTRL + left mouse button, move it to the desired alternate function pin that blinks in black color.
Enable the [ICache] under [System Core Tab] and choose a 2-way set associative cache.
This implementation uses the built-in SysTick timer to calculate the time for processing the benchmark program.
Then go to the [Project Manager] tab and choose a name for the project and change the Toolchain/IDE to MDK-ARM, increase the stack size to 0x2000 and finally click on [Generate Code].
Note: Make sure to download the STM32Cube FW_H5 firmware package by following the steps below.
Help -> Manage embedded software packages -> "STM32H5" under STM32 MCU packages-> Under STM32H5 install the MCU package.
After clicking on generate code, click on [Open Project] and the Keil® IDE should open.
Now, you have to start adding files and code to the project tree. Add the source and header files with the names:
You can add new files to the project as follows:
Right-click Application/User/Core and select [Add new item to group Application/User/Core]
Click on the .c file and enter the name (core_main.c) of the file and make sure that you place the files in the right folders.
.c files are present in Project_name\Core\Src folder and .h files are present in Project_name\Core\Inc folder.
Similarly create and place the remaining required .c and .h files in the respective path.
The code to add between the respective user code comments for the main.c file is mentioned below. The code for all the other files can be found under CORE->SRC in this link.
/* USER CODE BEGIN PFP */
extern int main2(void);
/* USER CODE END PFP */
/* USER CODE BEGIN 2 */
__HAL_FLASH_PREFETCH_BUFFER_ENABLE();
main2();
/* USER CODE END 2 */
Make sure that the STM32H5xx_DFP pack is installed and up to date. We use the compiler version 6.16. You can add and use older compiler versions by following this article.
Note: The official CoreMark® results for the STM32H5 are published by STMicroelectronics on the EEMBC website, where the compiler name and version are listed as Arm® clang compiler v6.16. Tests can be performed on the latest Arm compiler version as well.
Note that there may be large variations in the score when compared to the listed compiler version. If you would like to reproduce those results exactly, it is recommended to use the same compiler version.
Right click on the project name and select [Options for Target ‘coremark_h563]
Make sure you have chosen the right compiler version under the tab [Target].
Choose the compiler optimization level to -Ofast under the tab ‘C/C++ (AC6)’ and add this line (-mcpu=cortex-m33 -Omax) in the Misc Controls box. Make sure that you enable the [Link-Time Optimization] option and add ‘-Omax’ in the misc controls field under the Linker tab
Then compile the project by using the shortcut key F7 or clicking on the button shown below. The project should be compiled with no errors or warnings.
We are running 36000 iterations. The number of iterations can be changed by modifying the ITERATIONS macro in the core_portme.c file.
Connect your board to the PC using a USB cable. Then, load the binary file by using the shortcut key F8 or clicking on the button shown below. The programming should be completed without any errors.
Now, open Tera Term and choose the right COM port that your board is connected. You can do so by checking the [Device Manager] control panel under ports if using Windows OS, and then press OK.
The COM port terminal opens and now you have to make sure the serial port settings are correct.
Verify this by selecting setup -> serial port -> verify the settings using the image below as a reference.
Now, reset the board by pushing the black reset button on the board; the CoreMark tests start running and take about 36 seconds to complete and the results are displayed.
As you can see, we have obtained a CoreMark® score of 1023, which is the same as the benchmark published on the EEMBC website.
Enable cache (instruction and data) if they exist.
Flash ART Accelerator or flash prefetch buffer must be enabled if not enabled automatically.
If the core contains TCM (Tightly coupled memories) like ITCM and DTCM, then the linker script has to be adjusted to place the code in ITCM and data in DTCM with zero wait states.
For example: STM32H753ZI with Arm® Cortex M7 CPU.
CoreMark results are posted on the EEMBC website. The important point to remember is that all the not all MCUs are tested using the same IDE and the IDE, compiler versions, memories used, and other compiler settings are specified and with these details we can obtain the same CoreMark score as published.
Change the compiler, optimization, and the memory location details mentioned in the core_portme.h file macro definitions as per the IDE and the compiler optimization options you are using.
Macro values for parameters such as SEED_METHOD, MEM_METHOD, multithread configurations can be changed in the core_portme.h file.
In the ee_printf.c file, change the huart handle variable extern definition and in the HAL_UART_Transmit function according to the UART you are using in your application.
It is common to get different CoreMark® results for different compiler and IDE versions and for different compiler settings.
Make sure that you have a valid Keil® uVision IDE license version, as the usage of link-time compiler optimization does not work on the free version and you will see compilation errors.