2024-08-22 09:20 AM
Hi All,
I have a design that works in STM32CubeIDE but fails when building it with VScode+STM32 plug-in. The strange thing is that the design is working except for the UART TX DMA.
To narrow down the issue I created a small design with just the UART TX DMA function. I build the design in STM32CubeIDE and all was OK. I then took the STM32CubeIDE ioc file and created a new (cmake) design for VScode. I copied all the Core/Driver files and startup_stm32h743xx.s to the VSCode project and again the UART TX DMA was not working.
To make sure the above procedure worked I created the same design/procedure but used interrupts instead of DMA and in both STM32CubeIDE and VScode the code worked as expected.
The source files/drivers/startup/etc are all the same but the produced elf/map/hex files are different. Both STM32CubeIDE and VScode uses the same gcc version 12.3.1 and it looks like most of the gcc flags are the same:
VScode:
H:\ST\STM32CubeCLT_1.15.1\GNU-tools-for-STM32\bin\arm-none-eabi-gcc.exe -DDEBUG -DSTM32H743xx -DUSE_HAL_DRIVER -IH:/test/STM32/testnode/blink/cmake/stm32cubemx/../../Core/Inc -IH:/test/STM32/testnode/blink/cmake/stm32cubemx/../../Drivers/STM32H7xx_HAL_Driver/Inc -IH:/test/STM32/testnode/blink/cmake/stm32cubemx/../../Drivers/STM32H7xx_HAL_Driver/Inc/Legacy -IH:/test/STM32/testnode/blink/cmake/stm32cubemx/../../Drivers/CMSIS/Device/ST/STM32H7xx/Include -IH:/test/STM32/testnode/blink/cmake/stm32cubemx/../../Drivers/CMSIS/Include -mcpu=cortex-m7 -mfpu=fpv5-d16 -mfloat-abi=hard -Wall -Wextra -Wpedantic -fdata-sections -ffunction-sections -O0 -g3 -mcpu=cortex-m7 -mfpu=fpv5-d16 -mfloat-abi=hard -Wall -Wextra -Wpedantic -fdata-sections -ffunction-sections -O0 -g3 -g -std=gnu11 -MD -MT CMakeFiles/blink.dir/Core/Src/main.c.obj -MF CMakeFiles\blink.dir\Core\Src\main.c.obj.d -o CMakeFiles/blink.dir/Core/Src/main.c.obj -c H:/test/STM32/testnode/blink/Core/Src/main.c
STM32CubeIDE:
arm-none-eabi-gcc "../Core/Src/main.c" -mcpu=cortex-m7 -std=gnu11 -g3 -DDEBUG -DUSE_HAL_DRIVER -DSTM32H743xx -c -I../Core/Inc -I../Drivers/STM32H7xx_HAL_Driver/Inc -I../Drivers/STM32H7xx_HAL_Driver/Inc/Legacy -I../Drivers/CMSIS/Device/ST/STM32H7xx/Include -I../Drivers/CMSIS/Include -O0 -ffunction-sections -fdata-sections -Wall -fstack-usage -fcyclomatic-complexity -MMD -MP -MF"Core/Src/main.d" -MT"Core/Src/main.o" --specs=nano.specs -mfpu=fpv5-d16 -mfloat-abi=hard -mthumb -o "Core/Src/main.o"
Before I start digging deeper, does anybody have an idea what can make the same code fails in VScode but not in STM32CubeIDE?
Thanks,
Hans.
Solved! Go to Solution.
2024-08-23 11:50 PM
This is only marginally related to the VScode extension. It's an H7 thing. Here is a detailed explanation:
https://community.st.com/t5/stm32-mcus/dma-is-not-working-on-stm32h7-devices/ta-p/49498
The link script solves it because it moves all your data into RAM_D1 also known as AXI_SRAM. This works because the "normal" DMA1/2 controllers can all SRAMs, except DTCM. The H7's split the hardware into domains and you need to study it a little to understand its limitations.
However DTCM_RAM has a faster access time normally, being tightly coupled to the core. The only DMA engine which can access it is MDMA which can be chained with the normal DMAs, but it's quite complicated.
If you want to make use of all the SRAMs available, you'd need to learn a bit more about linker scripts and how to place variables from your code to specific SRAMS (with special __attribute__ specifiers in your code)
2024-08-23 12:19 PM
After some more digging I noticed STM32CubeIDE and the VScode STM32 plug-in uses different linker scripts. The one use in the VScode plugin is from 2019 whereas the STM32CubeIDE one is from 2024, look at the _estack memory location below:
// 2019 STM32 Plugin
_estack = ORIGIN(DTCMRAM) + LENGTH(DTCMRAM); /* end of RAM */
_Min_Heap_Size = 0x200; /* required amount of heap */
_Min_Stack_Size = 0x400; /* required amount of stack */
/* Specify the memory areas */
MEMORY
{
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 2048K
DTCMRAM (xrw) : ORIGIN = 0x20000000, LENGTH = 128K
RAM (xrw) : ORIGIN = 0x24000000, LENGTH = 512K
RAM_D2 (xrw) : ORIGIN = 0x30000000, LENGTH = 288K
RAM_D3 (xrw) : ORIGIN = 0x38000000, LENGTH = 64K
ITCMRAM (xrw) : ORIGIN = 0x00000000, LENGTH = 64K
}
// 2024 STM32CubeIDE
_estack = ORIGIN(RAM_D1) + LENGTH(RAM_D1); /* end of RAM */
_Min_Heap_Size = 0x200; /* required amount of heap */
_Min_Stack_Size = 0x400; /* required amount of stack */
/* Specify the memory areas */
MEMORY
{
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 2048K
DTCMRAM (xrw) : ORIGIN = 0x20000000, LENGTH = 128K
RAM_D1 (xrw) : ORIGIN = 0x24000000, LENGTH = 512K
RAM_D2 (xrw) : ORIGIN = 0x30000000, LENGTH = 288K
RAM_D3 (xrw) : ORIGIN = 0x38000000, LENGTH = 64K
ITCMRAM (xrw) : ORIGIN = 0x00000000, LENGTH = 64K
}
Changing _estack to RAM (address 0x24000000) fixed the issue and my UART TX DMA is working as expected.
Did somebody forget to update the STM32 plug-in linker script? if so you own me a days work :thinking_face:
Regards,
Hans.
2024-08-23 01:53 PM - edited 2024-08-23 01:54 PM
There should be more differences in the link script that cause the problem with DMA. The stack in DTCM RAM alone possibly is not the culprit. Where are data and bss sections?
2024-08-23 11:50 PM
This is only marginally related to the VScode extension. It's an H7 thing. Here is a detailed explanation:
https://community.st.com/t5/stm32-mcus/dma-is-not-working-on-stm32h7-devices/ta-p/49498
The link script solves it because it moves all your data into RAM_D1 also known as AXI_SRAM. This works because the "normal" DMA1/2 controllers can all SRAMs, except DTCM. The H7's split the hardware into domains and you need to study it a little to understand its limitations.
However DTCM_RAM has a faster access time normally, being tightly coupled to the core. The only DMA engine which can access it is MDMA which can be chained with the normal DMAs, but it's quite complicated.
If you want to make use of all the SRAMs available, you'd need to learn a bit more about linker scripts and how to place variables from your code to specific SRAMS (with special __attribute__ specifiers in your code)
2024-08-24 01:27 AM
Hi Pavel,
Yes you are correct, .bss and .data are also placed at 0x2400000 (see below) but for my simple test case just changing _estack was sufficient.
Hi Aefth,
Thanks that is a great link. Unfortunately I did not spend a lot of time looking at the DMA as the design was working fine in STM32CubeIDE and all examples on the web used the same HAL calls.
I was incorrect in contributing this to the VScode extension as the linker script is generated by STM32CubeMX (latest 6.12.0). If you select STM32CubeIDE for the toolchain you get the correct(?) 2024 script and for Makefile and Cmake options you get the incorrect(?) 2019 script.
I have no problem digging into linker script and have done so in the past. However, one could argue if this is really necessary given that STM32CubeMX is suppose to give you push button peripheral configuration, it should work out of the box for a simple UART. I think this is a simple oversight on ST's part and I hope they fix it soon,
Regards,
Hans.
2024 Linker Script:
/*
******************************************************************************
**
** File : LinkerScript.ld
**
** Author : STM32CubeIDE
**
** Abstract : Linker script for STM32H7 series
** 2048Kbytes FLASH and 1056Kbytes RAM
**
** Set heap size, stack size and stack location according
** to application requirements.
**
** Set memory bank area and size if external memory is used.
**
** Target : STMicroelectronics STM32
**
** Distribution: The file is distributed as is, without any warranty
** of any kind.
**
*****************************************************************************
** @attention
**
** Copyright (c) 2024 STMicroelectronics.
** All rights reserved.
**
** This software is licensed under terms that can be found in the LICENSE file
** in the root directory of this software component.
** If no LICENSE file comes with this software, it is provided AS-IS.
**
****************************************************************************
*/
/* Entry Point */
ENTRY(Reset_Handler)
/* Highest address of the user mode stack */
_estack = ORIGIN(RAM_D1) + LENGTH(RAM_D1); /* end of RAM */
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size = 0x200; /* required amount of heap */
_Min_Stack_Size = 0x400; /* required amount of stack */
/* Specify the memory areas */
MEMORY
{
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 2048K
DTCMRAM (xrw) : ORIGIN = 0x20000000, LENGTH = 128K
RAM_D1 (xrw) : ORIGIN = 0x24000000, LENGTH = 512K
RAM_D2 (xrw) : ORIGIN = 0x30000000, LENGTH = 288K
RAM_D3 (xrw) : ORIGIN = 0x38000000, LENGTH = 64K
ITCMRAM (xrw) : ORIGIN = 0x00000000, LENGTH = 64K
}
/* Define output sections */
SECTIONS
{
/* The startup code goes first into FLASH */
.isr_vector :
{
. = ALIGN(4);
KEEP(*(.isr_vector)) /* Startup code */
. = ALIGN(4);
} >FLASH
/* The program code and other data goes into FLASH */
.text :
{
. = ALIGN(4);
*(.text) /* .text sections (code) */
*(.text*) /* .text* sections (code) */
*(.glue_7) /* glue arm to thumb code */
*(.glue_7t) /* glue thumb to arm code */
*(.eh_frame)
KEEP (*(.init))
KEEP (*(.fini))
. = ALIGN(4);
_etext = .; /* define a global symbols at end of code */
} >FLASH
/* Constant data goes into FLASH */
.rodata :
{
. = ALIGN(4);
*(.rodata) /* .rodata sections (constants, strings, etc.) */
*(.rodata*) /* .rodata* sections (constants, strings, etc.) */
. = ALIGN(4);
} >FLASH
.ARM.extab (READONLY) : /* The READONLY keyword is only supported in GCC11 and later, remove it if using GCC10 or earlier. */
{
*(.ARM.extab* .gnu.linkonce.armextab.*)
} >FLASH
.ARM (READONLY) : /* The READONLY keyword is only supported in GCC11 and later, remove it if using GCC10 or earlier. */
{
__exidx_start = .;
*(.ARM.exidx*)
__exidx_end = .;
} >FLASH
.preinit_array (READONLY) : /* The READONLY keyword is only supported in GCC11 and later, remove it if using GCC10 or earlier. */
{
PROVIDE_HIDDEN (__preinit_array_start = .);
KEEP (*(.preinit_array*))
PROVIDE_HIDDEN (__preinit_array_end = .);
} >FLASH
.init_array (READONLY) : /* The READONLY keyword is only supported in GCC11 and later, remove it if using GCC10 or earlier. */
{
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT(.init_array.*)))
KEEP (*(.init_array*))
PROVIDE_HIDDEN (__init_array_end = .);
} >FLASH
.fini_array (READONLY) : /* The READONLY keyword is only supported in GCC11 and later, remove it if using GCC10 or earlier. */
{
PROVIDE_HIDDEN (__fini_array_start = .);
KEEP (*(SORT(.fini_array.*)))
KEEP (*(.fini_array*))
PROVIDE_HIDDEN (__fini_array_end = .);
} >FLASH
/* used by the startup to initialize data */
_sidata = LOADADDR(.data);
/* Initialized data sections goes into RAM, load LMA copy after code */
.data :
{
. = ALIGN(4);
_sdata = .; /* create a global symbol at data start */
*(.data) /* .data sections */
*(.data*) /* .data* sections */
*(.RamFunc) /* .RamFunc sections */
*(.RamFunc*) /* .RamFunc* sections */
. = ALIGN(4);
_edata = .; /* define a global symbol at data end */
} >RAM_D1 AT> FLASH
/* Uninitialized data section */
. = ALIGN(4);
.bss :
{
/* This is used by the startup in order to initialize the .bss section */
_sbss = .; /* define a global symbol at bss start */
__bss_start__ = _sbss;
*(.bss)
*(.bss*)
*(COMMON)
. = ALIGN(4);
_ebss = .; /* define a global symbol at bss end */
__bss_end__ = _ebss;
} >RAM_D1
/* User_heap_stack section, used to check that there is enough RAM left */
._user_heap_stack :
{
. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);
} >RAM_D1
/* Remove information from the standard libraries */
/DISCARD/ :
{
libc.a ( * )
libm.a ( * )
libgcc.a ( * )
}
.ARM.attributes 0 : { *(.ARM.attributes) }
}
2019 Linker script:
/*
******************************************************************************
**
** File : LinkerScript.ld
**
** Author : STM32CubeMX
**
** Abstract : Linker script for STM32H743VITx series
** 2048Kbytes FLASH and 1056Kbytes RAM
**
** Set heap size, stack size and stack location according
** to application requirements.
**
** Set memory bank area and size if external memory is used.
**
** Target : STMicroelectronics STM32
**
** Distribution: The file is distributed “as is,” without any warranty
** of any kind.
**
*****************************************************************************
** @attention
**
** <h2><center>© COPYRIGHT(c) 2019 STMicroelectronics</center></h2>
**
** Redistribution and use in source and binary forms, with or without modification,
** are permitted provided that the following conditions are met:
** 1. Redistributions of source code must retain the above copyright notice,
** this list of conditions and the following disclaimer.
** 2. Redistributions in binary form must reproduce the above copyright notice,
** this list of conditions and the following disclaimer in the documentation
** and/or other materials provided with the distribution.
** 3. Neither the name of STMicroelectronics nor the names of its contributors
** may be used to endorse or promote products derived from this software
** without specific prior written permission.
**
** THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
** AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
** IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
** DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
** FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
** DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
** SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
** CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
** OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
** OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**
*****************************************************************************
*/
/* Entry Point */
ENTRY(Reset_Handler)
/* Highest address of the user mode stack */
_estack = ORIGIN(DTCMRAM) + LENGTH(DTCMRAM); /* end of RAM */
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size = 0x200; /* required amount of heap */
_Min_Stack_Size = 0x400; /* required amount of stack */
/* Specify the memory areas */
MEMORY
{
DTCMRAM (xrw) : ORIGIN = 0x20000000, LENGTH = 128K
RAM (xrw) : ORIGIN = 0x24000000, LENGTH = 512K
RAM_D2 (xrw) : ORIGIN = 0x30000000, LENGTH = 288K
RAM_D3 (xrw) : ORIGIN = 0x38000000, LENGTH = 64K
ITCMRAM (xrw) : ORIGIN = 0x00000000, LENGTH = 64K
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 2048K
}
/* Define output sections */
SECTIONS
{
/* The startup code goes first into FLASH */
.isr_vector :
{
. = ALIGN(4);
KEEP(*(.isr_vector)) /* Startup code */
. = ALIGN(4);
} >FLASH
/* The program code and other data goes into FLASH */
.text :
{
. = ALIGN(4);
*(.text) /* .text sections (code) */
*(.text*) /* .text* sections (code) */
*(.glue_7) /* glue arm to thumb code */
*(.glue_7t) /* glue thumb to arm code */
*(.eh_frame)
KEEP (*(.init))
KEEP (*(.fini))
. = ALIGN(4);
_etext = .; /* define a global symbols at end of code */
} >FLASH
/* Constant data goes into FLASH */
.rodata :
{
. = ALIGN(4);
*(.rodata) /* .rodata sections (constants, strings, etc.) */
*(.rodata*) /* .rodata* sections (constants, strings, etc.) */
. = ALIGN(4);
} >FLASH
.ARM.extab : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
.ARM : {
__exidx_start = .;
*(.ARM.exidx*)
__exidx_end = .;
} >FLASH
.preinit_array :
{
PROVIDE_HIDDEN (__preinit_array_start = .);
KEEP (*(.preinit_array*))
PROVIDE_HIDDEN (__preinit_array_end = .);
} >FLASH
.init_array :
{
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT(.init_array.*)))
KEEP (*(.init_array*))
PROVIDE_HIDDEN (__init_array_end = .);
} >FLASH
.fini_array :
{
PROVIDE_HIDDEN (__fini_array_start = .);
KEEP (*(SORT(.fini_array.*)))
KEEP (*(.fini_array*))
PROVIDE_HIDDEN (__fini_array_end = .);
} >FLASH
/* used by the startup to initialize data */
_sidata = LOADADDR(.data);
/* Initialized data sections goes into RAM, load LMA copy after code */
.data :
{
. = ALIGN(4);
_sdata = .; /* create a global symbol at data start */
*(.data) /* .data sections */
*(.data*) /* .data* sections */
. = ALIGN(4);
_edata = .; /* define a global symbol at data end */
} >DTCMRAM AT> FLASH
/* Uninitialized data section */
. = ALIGN(4);
.bss :
{
/* This is used by the startup in order to initialize the .bss secion */
_sbss = .; /* define a global symbol at bss start */
__bss_start__ = _sbss;
*(.bss)
*(.bss*)
*(COMMON)
. = ALIGN(4);
_ebss = .; /* define a global symbol at bss end */
__bss_end__ = _ebss;
} >DTCMRAM
/* User_heap_stack section, used to check that there is enough RAM left */
._user_heap_stack :
{
. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);
} >DTCMRAM
/* Remove information from the standard libraries */
/DISCARD/ :
{
libc.a ( * )
libm.a ( * )
libgcc.a ( * )
}
.ARM.attributes 0 : { *(.ARM.attributes) }
}
2024-08-25 02:57 AM
As per AEfth response (solution), anybody using an STM32H7xxx needs to read this article if they are planning to use DMA:
https://community.st.com/t5/stm32-mcus/dma-is-not-working-on-stm32h7-devices/ta-p/49498
Unfortunately DMA does not (always) work out of the box even though STM32CubeMX gives you the impression it is all configured.
To try the solutions from this article I connected USART1 TX to UART5 RX pin and enabled DMA on both. I then used a terminal to send characters via DMA from USART1 to UART 5. All the solutions in the article worked on my STM32H743VITx chip.
If I enable Dcache (SCB_EnableDCache()) I get invalid characters on UART5, disable it and all is OK. This is solution1 in the article but only if you use the 2024 linker script, for the 2019 script you need to make the RAM_D1 region changes as described in solution1. Disabling dcache is unlikely to be a solution as most users (incl me) picked the H7 for performance reasons.
Next I tried solution3 (dcache enable + cache management), this also worked. I simply place SCB_CleanDCache_by_Addr() call before the DMA transfer:
SCB_CleanDCache_by_Addr((uint32_t*)(((uint32_t)txbuffer) & ~(uint32_t)0x1F), 16+32);
if (HAL_UART_Transmit_DMA(&huart1, txbuffer, 16)!=HAL_OK) ...
The final solution2 (which is what I am going to use) is to use MMU to create a small memory hole for the DMA buffers in RAM_D2 with the dcache is disabled. This is easy to do using STMCubeMX and also worked.
The only change I had to make was to initialize the DMA buffer as there is no default initialization (as mentioned in the article),
Regards,
Hans.