cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H7 USB driver fails highest optimization (-O3)

People (and me) 'complain' about, compiling a project with USB HAL drivers with highest optimization (GCC with -O3) does not work. You had to fall back to -O2.

Sure, I found the reason and easy to fix:

The low-level driver, file 'stm32h7xx_ll_usb.c' has a lot of missing volatile (or __IO) definitions.

Example:

uint32_t USB_ReadInterrupts(USB_OTG_GlobalTypeDef *USBx)

{

 uint32_t tmpreg;

 tmpreg = USBx->GINTSTS;

 tmpreg &= USBx->GINTMSK;

 return tmpreg;

}

'tmpreg' is used on an USB peripheral register which are properly defined as __IO (volatile).

The variables used on such __IO registers should be also defined as __IO.

Or: make sure on loops where the 'count' is used, it is also a volatile (otherwise a loop could be optimized to nothing by compiler if such variable is not used, actually).

Example:

static HAL_StatusTypeDef USB_CoreReset(USB_OTG_GlobalTypeDef *USBx)

{

 uint32_t count = 0U;

 /* Wait for AHB master IDLE state. */

 do

 {

   if (++count > 200000U)

   {

     return HAL_TIMEOUT;

   }

 }

So, I have added on several places the volatile as additional declaration, e.g.:

uint32_t USB_ReadDevInEPInterrupt(USB_OTG_GlobalTypeDef *USBx, uint8_t epnum)

{

 uint32_t USBx_BASE = (uint32_t)USBx;

 volatile uint32_t tmpreg, msk, emp;

 msk = USBx_DEVICE->DIEPMSK;

 emp = USBx_DEVICE->DIEPEMPMSK;

 msk |= ((emp >> (epnum & EP_ADDR_MSK)) & 0x1U) << 7;

 tmpreg = USBx_INEP((uint32_t)epnum)->DIEPINT & msk;

 return tmpreg;

}

So, check all these 'tmpreg' variables and add volatile to it (where it is used on an __IO registers).

BTW:

There is also one tmpreg defined as static, w/o any need to do so:

HAL_StatusTypeDef USB_HC_StartXfer(USB_OTG_GlobalTypeDef *USBx, USB_OTG_HCTypeDef *hc, uint8_t dma)

{

 uint32_t USBx_BASE = (uint32_t)USBx;

 uint32_t ch_num = (uint32_t)hc->ch_num;

 /*static*/ __IO uint32_t tmpreg = 0U;

Also check if count variables used in 'empty' waiting loops should be augmented with volatile.

After these changes - I can compile and run with optimization level -O3.

I have attached here my modified USB LL drivers for STM32H7xx.

4 REPLIES 4
Pavel A.
Evangelist III

IMHO this should not be a problem...

Example:
 
uint32_t USB_ReadInterrupts(USB_OTG_GlobalTypeDef *USBx)
{
 uint32_t tmpreg;
 tmpreg = USBx->GINTSTS;
 tmpreg &= USBx->GINTMSK;
 return tmpreg;
}

The tmpreg variable should not be volatile here and in similar context.

static HAL_StatusTypeDef USB_CoreReset(USB_OTG_GlobalTypeDef *USBx)
{
 uint32_t count = 0U;
 /* Wait for AHB master IDLE state. */
 do
 {
   if (++count > 200000U)
   {
     return HAL_TIMEOUT;
   }
 }

In this case, yes. Even better, use a decent microsecond delay instead of this /*expletive*/ (a timer...)

-- pa

If all these volatile are really needed? Maybe not. But for sure there is an issue with this code:

Try to run USB (BTW: on any MCU) compiled with -O3 - it will fail.

After adding this to the code - it works

From the program correctness standpoint, the only accesses that have to be volatile are those which access the peripheral registers. Comparing the "corrected" functions' asm/disasm with with or without volatile (and in various -O settings) should reveal whether those accesses have been incorrectly optimized out or not. I doubt that they were, in any setting.

OTOH, unnecessary volatile accesses introduce delays (similarly as lower optimization settings), which may be necessary to cope with timing problems in the Synopsys IP.

Volatile also ensures the peripheral registers are accessed in the correct order, but again for that it's enough that the peripheral registers themselves are tagged so, not the temporary variables. Given we are talking here about CM7 which is superscalar, peripheral registers have to be also placed in Device region by MPU, so that's maybe one more thing to check.

Loopdelay is the school example for using volatile, although as school examples often do, for the wrong reason. But the real culprit here is the Synopsys IP not signaling correctly end of reset thus requiring any delay. That delay of course should be written properly.

JW

flyer31
Senior

If you use STM32H750, then to my experience better add 10msec delay loop after clock start, before you start the USB ... .

Though STM32H743 should NOT need this to my experience.

(But I anyway hate O3 optimization level, as it usually gives only very small gains, but the assembly gets that much optimized, that debugging gets difficult ... at least concerning Keil ARMCC to my experience).