2023-12-22 05:29 AM - edited 2023-12-22 12:45 PM
Hello everyone!
Background
We are working on STM32G474VET6 based custom board for some specific motor control application. Project demands to have a custom bootloader on FD-CAN with 500kbit/s arbitration baudrate and data rate with 2Mbit/s with BRS option turned ON. Bootloader is done and working fine.
Board has a support for three CAN buses, therefore using FDCAN1, FDCAN2 and FDCAN3 periphery. All CAN buses has the same bus configuration and are not inter-connected.
Everything works perfectly with running only application on uC. As soon as we add bootloader that runs on FDCAN1, that same application (adjusted to be bootloadable ofcourse - offseted in flash, vector table adjusted) that was running all three CANs, now hangs if any message is received/transmitted on eihter FDCAN2 or FDCAN3. If there is no traffic on FDCAN2 or FDCAN3 everything is working as expected. Beside CAN periphery we're running also ADC1&2 with injected and regular channels, triggered by center aligned PWM timers, UART for debug purposes,...
Anyway application boots completely normal and works OK until we receive something on FDCAN2 or 3, then things fall apart. Mind that FDCAN2&3 works perfecly without bootloader. So I'm susspecting something around that transition from bootloader to application.
FDCAN Driver
FDCAN drivers source file are append to post. There it is visible how they are configured and used.
Transition from bootloader to application
Before entering application, bootloader de-init and resets all periphery that is using. FDCAN periphery is de-initialized, therefore when application boots it should find FDCAN clean as in standalone application.
De-init routine before entering application:
// De-Initialize cryptographic library
if ( CMOX_INIT_SUCCESS != cmox_finalize( NULL ))
{
status = eBOOT_ERROR;
}
// De-init gpio
gpio_deinit();
// De-init CAN
can_deinit( eCAN_MASTER );
// De-init clocks
HAL_RCC_DeInit();
// De-init HAL
HAL_DeInit();
// Disable systick
SysTick->CTRL &= ~SysTick_CTRL_ENABLE_Msk;
Problem with FDCAN on application with bootloader support
Questions
Any recomendations are highly welcome. We're eager to hear your suggestions/answers!
BR, Žiga
2023-12-23 02:30 AM
Update (23.12.2023): I've speculated that bootloader might mess FDCAN periphery up, as it works without any problem on standalone application. Therefore I've made a test with bootloader so simple, that it does only one think and that is the jump to application. Before that I've flashed bootloadable application on proper flash location using STM32CubeProgrammer.
Even with that simple bootloader, FDCAN 2&3 still does not work in bootable application! That means that bootloader has nothing todo with FDCAN problem, as it doesn't touch anything at all. The problem must be in application that is prepared for bootloader.
With that said, I'll start to search for issues at application side.
BR, Žiga
2023-12-23 11:29 AM
Well, personally I wouldn't tear down the entire system.
Have some plan to transferring control. Define the expectations on each side, there's little reason to do initialization TWICE, and it can be disruptive. If the clocks and peripheral are all running properly there's no need to kill them. If the clocks are already optimal/maximal don't run SystemClock_Config() AGAIN
What you want to stop is interrupts at their source (peripheral register), and expectations in HAL initialization/instances you hold in RAM.
Make sure the Vector Table is correctly selected on the App side, perhaps by using the symbol for it in the SystemInit() of SCB->VTOR
Do not transfer control from an Interrupt Handler or Call-Back. Watch for RTOS and User/System execution context.
2023-12-24 01:16 AM
Hi Tesla DeLorean,
thank you for your comments/recommendations!
"Well, personally I wouldn't tear down the entire system."
That is my plan also. I'll take everything from application until CAN starts to work.
"Have some plan to transferring control. Define the expectations on each side, there's little reason to do initialization TWICE, and it can be disruptive. If the clocks and peripheral are all running properly there's no need to kill them. If the clocks are already optimal/maximal don't run SystemClock_Config() AGAIN"
That is interesting thinking. I've never thought about not re-initialize all the stuff application need it. It makes perfect sense but on the other hand it also has downsides to write application code to apply that logic. As application needs to initialize only stuff that is not already init, therefore making high coupling between bootloader and application code. For example, if bootloader interface changes (from FDCAN to whatever), application code needs to be adopted for that change so, as now it needs to initialize FDCAN, as previously it was done by bootloader. And there are many of such real-life scenarios that can happened...
"Make sure the Vector Table is correctly selected on the App side, perhaps by using the symbol for it in the SystemInit() of SCB->VTOR"
This is OK, as otherwise nothing will work. As already mentioned, ADC, USART and other periphery triggers interrupts and beside FDCAN everything is working as expected.
"Do not transfer control from an Interrupt Handler or Call-Back."
Yes, that is also taken care of. Jump to application is done from main loop. Right before jump I also call "__disable_irq()".
Can you maybe talk more about "Watch for RTOS and User/System execution context"?
What do you think by that? Do you have any good reference that you can share about that?
Overall, you were more focus on general boot-app mechanism, but I'm more suspicious of FDCAN periphery as interrupts are triggered when they are disabled.
Do you think that problem is actually on boot-app level and it only manifest on FDCAN periphery level? So FDCAN periphery doesn't have anything todo with those issues?
Thank you!
BR, Žiga
2023-12-26 11:33 PM
Well, I finally get to the problem and solution.
I've completely taken apart application code so that what has left is only CAN code, sending some dummy msg on all three FDCANs. The result was the same, receiving/transmitting any packet on either FDCAN2/3 result in abnormal interrupt triggering (like explained above). I couldn’t make application to boot and to work normally.
False interrupt triggering got me into thinking about interrupt vector offset settings and that was a root cause of the problem that manifests as abnormal FDCAN behaviour (triggering some IRQ that it should not be). After inspecting VTOR settings I've founded that nothing is wrong here as everything is according to ST recommendations. Looking into system_stm32g4xx.c:
/************************* Miscellaneous Configuration ************************/
/* Note: Following vector table addresses must be defined in line with linker
configuration. */
/*!< Uncomment the following line if you need to relocate the vector table
anywhere in Flash or Sram, else the vector table is kept at the automatic
remap of boot address selected */
// ZIGA: Enable user vector table settings
#define USER_VECT_TAB_ADDRESS
#if defined(USER_VECT_TAB_ADDRESS)
/*!< Uncomment the following line if you need to relocate your vector Table
in Sram else user remap will be done in Flash. */
/* #define VECT_TAB_SRAM */
#if defined(VECT_TAB_SRAM)
#define VECT_TAB_BASE_ADDRESS SRAM_BASE /*!< Vector Table base address field.
This value must be a multiple of 0x200. */
#define VECT_TAB_OFFSET 0x00000000U /*!< Vector Table base offset field.
This value must be a multiple of 0x200. */
#else
#define VECT_TAB_BASE_ADDRESS FLASH_BASE /*!< Vector Table base address field.
This value must be a multiple of 0x200. */
// ZIGA: Added vector offsetting
#ifndef __BOOTLOADER_SUPPORT__
#define VECT_TAB_OFFSET 0x0U /*!< Vector Table base offset field. This value must be a multiple of 0x100. */
#else
#define VECT_TAB_OFFSET 0x8100U /*!< Vector Table base offset field. This value must be a multiple of 0x100. */
#endif
#endif /* VECT_TAB_SRAM */
#endif /* USER_VECT_TAB_ADDRESS */
/******************************************************************************/
So I have setup as:
SCB->VTOR = VECT_TAB_BASE_ADDRESS | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal SRAM */
which results in VTOR=0x08008100 and that address is start of application.
Notice the comment:
/*!< Vector Table base offset field. This value must be a multiple of 0x100. */
As everything seems to be OK with VTOR settings and therefore I didn't pay much attention to that section and move on to investigate other part of the boot-app system.
But then I started to play with VTOR settings, as knowing from other ST uC that they demands VTOR to be multiple of 0x200 (STM32F4/L4) or 0x400 (STM32H7). It turns out that changing VTOR from 0x100 multiple to 0x200 multiple solves the problem that was causing FDCAN strange behaviour. Now the bootable application is working the same as standalone application.
So my question now are:
@Tesla DeLorean i'm eager to hear from you!
Thanks in advance!
BR, Žiga