Strange behavior with USB CDC as virtual COM port on NUCLEO-L476RG

Benjamin Brammer · ‎2020-01-07

Hello everybody,

I am using the NUCLEO-L476RG to send simple ACII text over UART (RS232 adapter) or USB CDC as a virtual COM port. UART communication is working seamlessly but when I use the USB CDC library functions from STM there is hapenning something strange I cannot explain:

I am sending in regular intervalls ASCII-tesxt in the follwoing format: ABP;***;***;***\n

The * stands for different numbers. normally I would see this kind of text flow on a terminal like hterm:

ABP;***;***;***\n

... and so on

but with the USB CDC usage I see occasionally this:

ABP;***;***;***\n

ABP;ABP;***;***;***\n

ABP;***;***;***\n

ABP;ABP;***;***;***\n

when I halt the debugger to check if my character array is corrupted, it isn't. So my assumption is that there is something not working correctly with the USB CDC device library from STM. This is the code, that gets invoked when I transmit the ASCII-text:

sprintf(Communication.measure,"ABP;%ld;%ld;%ld\n",Patient.Systolic, Patient.Diastolic, Patient.MAP);
if(System.Mode == UART)
{
	LPUART_transmit_message(Communication.measure);
}
else if(System.Mode == USB)
{
	CDC_Transmit_FS(Communication.measure, strlen(Communication.measure));
}

as you can see, i am using the same array preparations for the USB CDC version as for the UART version, except that the UART version functions seamlessly. The big difference between both versions is that I only utilize USB CDC device library functions provided by STM, not my own.

Has anybody experienced something similar? Or has a good hint where to find the problem?

best regards

Benjamin

Benjamin Brammer · ‎2020-01-21

Hey Bob,

thanks for your answer. i am not shure how to do this properly. Can you give me some advice how to track that error down? As I have checked the memory map the region from 0x0010 0000 to 0x0800 0000 is reserved, that is what you meant, right? I suppose only increasing stack and heap size doesn't solve this, right?

Maybe I did a notation error, since PC in the screenshot says 0x8405e2a... ?

Bob S · ‎2020-01-21

> As I have checked the memory map the region from 0x0010 0000 to 0x0800 0000 is reserved, that is what you meant, right?

Not quite. The **vaild** FLASH memory region goes from 0x0800 0000 to 0x080F FFFF (this is 1 MByte, of 0x0010 0000 bytes long). The contents of the PC (0x0840 5e2a) is outside this range, and is inside a range that the data sheet shows as "RESERVED".

Your debugging IDE should have a window somewhere that shows the stack callback trace. The Eclpise-based IDE I use has this in a window called "Debug". It looks like you may be using the Keil IDE or some other IDE that I am not familiar with, so I can't tell you where to look for this info. Try single stepping your program from the start of main(). Step into the first function call (possible HAL_Init()). Then look around your IDE for something that shows functions main() and HAL_Init(). That is the window you want to look at when your code hits the fault handler. You can (or should be able to) click on each function name in the call chain and have the IDE show you where the CPU was in each function. That will hopefully show you where a function call was made through a pointer, and therefore which pointer was corrupted. The hard part will be figuring out how that pointer got corrupted.

Benjamin Brammer · ‎2020-01-21

Hey Bob,

I use STM32CubeIDE, so Eclipse based. Off course I know the "debug" view for debugging my processor's application. But I think you mean the Debug view that shoes me the thread and core I use and what disassembly is presented, right?

Benjamin Brammer · ‎2020-01-21

Ah.. I managed to force another hardfault. This time an imprecise data access violation:

when i click on "openeditor on fault location" I jump to the following code from the HAL_LPTIM_OnePulse_Stop_IT(..) function:

/* Change the TIM state*/
  hlptim->State = HAL_LPTIM_STATE_READY;

A jump to the LR ends in Nirvana...

might this be a problem with interrupt priority? This is my NVIC setting:

Bob S · ‎2020-01-22

Probably not your NVIC priorities, unless the actual issue is something like a memory management fault when you were already in the hard fault handler or debug monitor.

Where does "jump to the LR" come into play? At the end of that function as the return? The source line you show doesn't (shouldn't) have any kind of jump. When you are in the debugger after the fault, you can inspect the variables in the various functions up and down the call chain. So look at "hlptim" and see if it actually points to one of your lptim1 or lptim2 structures. You may also need to look at the previous line of code.

Now that I look again, the call chain looks a bit suspicious as it lists only "0x200" as the function that was running before the HAL_LPTIM_OntPulse_Stop_IT() function was called. It has been a while since I've looked at call chain info from a fault handler, so I don't remember is that is normal. It is NOT normal if I set a breakpoint in an interrupt function - I can see the entire call stack from main() up through the interrupt/signal handler call and interrupt functions. So this may indicate an issue before the timer interrupt fired.

Benjamin Brammer · ‎2020-01-23

Hey Bob,

in the Fault Analyzer tab it is possible to jump to different addresses by clicking on some buttons in the right uppest corner. It is either possible to jump to the c-code which cuased the problem or to the PC and LR disassembly view. But the important information resides in the image I attached where the register contens during fault exception is stored.

You mean 0x200 as the register content of r1? but r1 to r12 are general-purpose registers for data operations..so this could be everything right?

Bob S · ‎2020-01-23

No,

The "0x200" that I am talking about is the one in the "Thread #1" call stack in the "Debug" tab in the upper left of the screen shot. There should be a function name there, Well, actually there should be something like "<signal handler called...>" BELOW (i.e. before) the HAL_LPTIM_OnePulse_Stop_IT() is shown, and then a function name below that. Something like this:

Note the <signal handler> line just before USART2_IRQHandler(). In fact, just like the UART code in my example shows USART2_IRQHandler(), then HAL_UART_IRQHandler() then UART_EndTransmit_IT(), I would expect to see in your case something in your case like LPTIM1_IRQHandler(), then HAL_LPTIM_IRQHandler(), and *THEN* the call to HAL_LPTIM_OnePulse_Stop_IT().

The register contents shown in your screen shot are practically useless without the context of the code that was running when the hard fault occurred (which is not shown). You should be able to click on a function name in the call stack ("Debug" tab in upper left of your screen) and have the IDE jump to that line. You can then examine the contents of the variables used in that function by hovering over them. Do that and see what the contents of the "hlptim" variable is in the HAL_LPTIM_InePulse_Stop_IT() function.

Benjamin Brammer · ‎2020-01-23

i amanaged to force the error again:

but I don't know what this is telling me about the error..

when I click on 0x200, I get the following screen:

Bob S · ‎2020-01-23

Sigh.... OK, maybe I was not as clear as I thought I was. So, for the 3rd time, being as explicit as I can be:

Run the code until it generates this fault
Click on the HAL_LPTIM_OnePulse_Stop_IT() function in the "Debug" tab in the upper left corner of your screen. This will (should) show the source code for that function in the IDE
In the source code window, hover the cursor over the "hlptim" local variable inside the HAL_LPTIM_OnePulse_Stop_IT() function and see what value it has
Is the value of "hlptim" local variable equal to the address of one of your global "hlptim1" or "hlptim2" structures? NOTE that the name of your actual LPTIM handle structures may be different, if it is not "hlptim1" or "hlptim2" then look in your source for a declaration of type LPTIM_HandleTypeDef. Once you know the name of the global variable you can get the address of LptimHandle from your map file. Hint, it should be 0x200X XXXX where "X" can be any hex digit.
Still in the source code window. If the value of "hlptim" is valid, then look at the contents of the structure to which it points. Specifically, look at the "Instance" member. This should be the address of either LPTIM1 or LPTIM2 (which ever one you are using). This is either 0x4000 9400 for LPTIM2 or 0x4000 7C00 for LPTIM1.

It would also possibly be helpful to show what line in HAL_LPTIM_OnePulse_Stop_IT() was the current line when the fault happened (it should be the highlighted line when you click on the function name in the "Debug" tab).

I realize I am focusing on that one function, and that one variable inside that one function. That may not be the cause. If that variable looks rational, then you are back to basic debugging rules. How often does this happen? For example, does it always happen 30 seconds after you start running? Always after some external event? Or just (apparently) totally at random? So far, the fault always appears to happen while in the HAL_LPTIM_OnePulse_Stop_IT() function. Is that always the case? Try adding code to HAL_LPTIM_OnePulse_Stop_IT() that checks the value of hlptim and set a breakpoint if not valid. You may have a stray pointer somewhere that is overwriting memory, or are overwriting past the end of arrays. So we are back where we started - Look for sprintf(), strcpy() and similar functions. Change every one of them to snprintf() or strncpy() or the like to make sure you don't ever write past the end of a buffer.

On the other hand, if the CPU really *IS* executing from 0x0000 0200 when it somehow calls HAL_LPTIM_OnePulse_Stop_IT(), that probably means that somewhere you have a HAL structure with a function pointer that has not been set (or, again, has been corrupted/overwritten). So instead of calling a function, it ends up jumping to 0x0000 0000 (HAL sets all function pointers to NULL when they are not used). So maybe set a breakpoint at 0x0000 0200 and see what happens. It might trigger the interrupt during normal

Benjamin Brammer · ‎2020-01-24

Hello Bob,

I am sorry, that I don't understand what you trying to tell me right away but I am thankful that you still take the time to precisly tell me what to do.

My LPTIM_HandleTypeDef handles have the following values:

hlptim1.instance = 0x40007c00 / resides in RAM at 0x20003094

hlptim2.instance = 0x40009400 / resides in RAM at 0x200038cc

When the hardfault occurs, and this only happens sometimes if I have measurements running and do attach the USB cable, the local LPTIM_HandleTypeDef handl has the value 0xffffffe9 but I cannot see values inside the .instance struct:

The fault allways occurs at the same code line in HAL_LPTIM_OnePulse_Stop_IT(LPTIM_HandleTypeDef *hlptim), which I allready mentioned:

/* Change the TIM state*/
  hlptim->State = HAL_LPTIM_STATE_READY;

So am I making the correct assumption that this suspicious address of 0xffffffe9 for the local handle is not good?

I also have no sprintf() and strcpy functions..