cancel
Showing results for 
Search instead for 
Did you mean: 

Bringing up a new board with STM32L443 hangs in the HAL initialization routines.

KiptonM
Lead

It was hanging in many of the initiaization routines. There was a problem with the clock. I had an external 16 MHz crystal So I switched back to the internal and it got past that one for now.

It got through MX_GPIO_Init();

 MX_DMA_Init();

 MX_ADC1_Init();

It had a problem with TIM2 where it setup the GPIO for PWM. So I commented out the initialization for now.

I find it is easier to debug if I can write out to the UART, then I have a written and saved states of variables, and it helps find problems quicker.

Now I am on  MX_USART1_UART_Init();

It appears to be failing on this one as soon as I single step out of the NVIC_EncodePriority inline function.

/**
  \brief   Encode Priority
  \details Encodes the priority for an interrupt with the given priority group,
           preemptive priority value, and subpriority value.
           In case of a conflict between priority grouping and available
           priority bits (__NVIC_PRIO_BITS), the smallest possible priority group is set.
  \param [in]     PriorityGroup  Used priority group.
  \param [in]   PreemptPriority  Preemptive priority value (starting from 0).
  \param [in]       SubPriority  Subpriority value (starting from 0).
  \return                        Encoded priority. Value can be used in the function \ref NVIC_SetPriority().
 */
__STATIC_INLINE uint32_t NVIC_EncodePriority (uint32_t PriorityGroup, uint32_t PreemptPriority, uint32_t SubPriority)
{
  uint32_t PriorityGroupTmp = (PriorityGroup & (uint32_t)0x07UL);   /* only values 0..7 are used          */
  uint32_t PreemptPriorityBits;
  uint32_t SubPriorityBits;
 
  PreemptPriorityBits = ((7UL - PriorityGroupTmp) > (uint32_t)(__NVIC_PRIO_BITS)) ? (uint32_t)(__NVIC_PRIO_BITS) : (uint32_t)(7UL - PriorityGroupTmp);
  SubPriorityBits     = ((PriorityGroupTmp + (uint32_t)(__NVIC_PRIO_BITS)) < (uint32_t)7UL) ? (uint32_t)0UL : (uint32_t)((PriorityGroupTmp - 7UL) + (uint32_t)(__NVIC_PRIO_BITS));
 
  return (
           ((PreemptPriority & (uint32_t)((1UL << (PreemptPriorityBits)) - 1UL)) << SubPriorityBits) |
           ((SubPriority     & (uint32_t)((1UL << (SubPriorityBits    )) - 1UL)))
         );
}

It comes in with PriorityGroup = 3

PreemptPriority = 0

SubPriority = 0

It creates:

PriorityGroupTmp = 3

PreemptPriorityBits = 4

SubPriorityBits = 0

When I exit this routine with a single step, the debugger stops working. The only thing I can do is Terminate.

If I suspend it says 0x1fff2ea4 but I do not know how to interpret that,

0693W00000Y7svuQAB.pngAre there any ideas where I should look to get this working?

Kip

1 ACCEPTED SOLUTION

Accepted Solutions
KiptonM
Lead

I had two boards built. #1 had 19 cold solder joints on the processor, everything else looked good.

#2 had 2 cold solder joints on the processor. I fixed it and that is the one I have been testing.

I made the test as simple as possible. I toggled two pins in a while(1) loop. It quit in just under 1 ms. And I could not recover with the debugger. I had to terminate the debugger and start it again.

So trying to get an idea of what could possibly be going on I activated the MCO (Master Clock Output) and set it at 16 MHz.

Everything stopped at the same time.

And it was pretty consistent at about 960 ns after the MCO started according to the oscilloscope.

Today #1 came back fixed of the cold solder joints. And it works. No problems with it quitting in the first millisecond. It has been running now for about 30 minutes while I was on a conference call.

It has to be a hardware issue. The power supply is not very accurate, but on the board that crashed, the current draw was 0.03 A at 24V (there is a switching power supply on the board.) The board that works is reading 0.02 A at 24V. Power supplies are not known for having the most accurate current sensors. I am not putting a lot of trust in it, but it is a data point.

I sent it back and they are going to look at the processor pins again more carefully. The processor has the finest pin spacing on the board. Most everything else is SO parts and 0805 or 0603 with the exception of the L6474H.

View solution in original post

8 REPLIES 8
Bob S
Principal

0x1fff2ea4 is in the "system memory" section, i.e. the built-in bootloader. How do you have the BOOT0 pin wired?

KiptonM
Lead

Boot0 It is brought out to a header, and is pulled down with a 100K resistor. In the future I may use to load SW in the field.

Once I am running it should not jump into the bootloader.

I am programming and debugging with the ST-LINK V2 (a.k.a. Blue Pill)

Thanks for the links, I am working through them.

I decided to toggle some pins just too get something to work. And high bounced around a lot. So I looked at the power supply and it was rock solid, right at 3.3V

After thinking I was crazy, I looked at the board under the microscope and discovered 19 of the 48 processor pins had cold solder joints. I sent it back to the soldering lady. Unfortunately she was out yesterday and today.

She had made a second board and I found two pins with cold solder joints which I was able to fix.

I just got it powered up with a pin toggling loop. It appears to toggle pins for 929.4 us then quits. I suspect something is off with the SYS_TICK since that is on a 1 ms interrupt.

I see one of the links is about the interrupt vector. I will start reading your links to see. The issue is not addressed in the first one, so I will continue reading.

Thank you for the links. When I figure it out I will post again with what happened.

KiptonM
Lead

The second link is to one of my posts from July. :)

I know. I put it there to point out that you've already had problems with the mcu starting with the System memory mapped at 0x0000'0000 (a.k.a. default bootloader, maybe due to the empty-FLASH mechanism) spuriously.

As you cling to Cube, my guess is that the 1ms operation you're observing is stopped by systick interrupt errorneously executing from the system memory. Observe content of PC (break and look at disasm) at that moment, and observe what's on 0x0000'0000.

JW

KiptonM
Lead

The latest. It is not about the Empty Check. I am able to program it and run it in the debugger and I do have the connect with reset setup in the IDE.

The program does start and I can single step through some of the program. It is now appearing to quit and is not recoverable in the Debugger, after what my oscilloscope counts as 930 ns which is probably 1 ms after it starts. That is when the first Sys_tick interrupt should fire. So it is making me think it has something to do with that.

I did post about this on another board and processor in July and I do not remember what I did to fix it.

Here was the recommendation then.

You say the debugger crashes. Do you get any useful message from it?

In the Debug Config Debugger tab you can set the ST-LINK freq. Try setting to a low value to compensate for possibly weak ST-LINK connection. There is also a Misc/Log to file checkbox which you might check.

I did change the speed from auto to 8000 kHz.

When the debugger crashes it will not let me reset, or step or look at variables. I have to terminate the debugger, then start the debugger again.

I am going to look for the Misc/Log to file, and see if I can find it.

KiptonM
Lead

I had two boards built. #1 had 19 cold solder joints on the processor, everything else looked good.

#2 had 2 cold solder joints on the processor. I fixed it and that is the one I have been testing.

I made the test as simple as possible. I toggled two pins in a while(1) loop. It quit in just under 1 ms. And I could not recover with the debugger. I had to terminate the debugger and start it again.

So trying to get an idea of what could possibly be going on I activated the MCO (Master Clock Output) and set it at 16 MHz.

Everything stopped at the same time.

And it was pretty consistent at about 960 ns after the MCO started according to the oscilloscope.

Today #1 came back fixed of the cold solder joints. And it works. No problems with it quitting in the first millisecond. It has been running now for about 30 minutes while I was on a conference call.

It has to be a hardware issue. The power supply is not very accurate, but on the board that crashed, the current draw was 0.03 A at 24V (there is a switching power supply on the board.) The board that works is reading 0.02 A at 24V. Power supplies are not known for having the most accurate current sensors. I am not putting a lot of trust in it, but it is a data point.

I sent it back and they are going to look at the processor pins again more carefully. The processor has the finest pin spacing on the board. Most everything else is SO parts and 0805 or 0603 with the exception of the L6474H.