32F417: How to block USB_FS ISR from a shared resource (serial FLASH, on SPI)

PHolt.1 · ‎2022-01-09

I have a serial FLASH, a part of which implements a Windows file system (removable block device). Another part of the FLASH is for data logging etc usage.

All this is working, but the ISR which implements the USB block device must be blocked out of the SPI function during data logging, obviously, otherwise it could jump in during one of the data logging accesses; the SPI and the FLASH are obviously not re-entrant!

I have tried disabling USB interrupts, via one of these methods

zeroing bit 0 of GAHBCFG
zeroing all of GINTMSK

and restoring these registers after SPI is finished. I am assuming they are fully readable...

Both methods work equally: 100% for protecting the data logging side but there is a tiny chance of USB accesses (file writes, 2MB files) getting corrupted; probability is around the PPM level but not zero, and only if there is heavy data logging FLASH usage. USB file reads are 100% reliable.

USB ints are potentially disabled for 1 FLASH page write which is 18ms. The Windows host is fine with that - so long as I delay about 10ms after each FLASH page write (during logging) to give USB a chance to get in. I also need a bit of a delay to avoid data corruption on the USB VCP which is used for debugs; not surprising.

Another method is for the data logging side to set a flag when using SPI, and when the USB ISR sees this, it returns USBD_BUSY (instead of USBD_OK) and stays out of the SPI. Same principle used in say a USB to serial converter, when the UART TX buffer is full.

This also works 100% for protecting the data logging side, but USB written files are heavily corrupted. I can't see why.

Can anyone see a reason for this? Are the two registers above not quite what they seem in the RM? The text says GAHBCFG should be set up just once and not touched, but the Cube IDE generated sample code does exactly the opposite!

Some more info posted here

https://www.eevblog.com/forum/microcontrollers/cleanest-way-to-block-usb-interrupts/msg3925778/#msg3925778

but suggestions are not useful or practical at this point in time, and nobody seems to know about the USB implementation on this chip.

Thank you in advance.

PHolt.1 · ‎2022-01-10

There is a bug in the ST code in the higher-up driver. It fails to support any return code other than USBD_OK.

usbd_msc_scsi.c

waclawek.jan · ‎2022-01-10

Why would that be a bug? Where are return codes of callbacks in USBD_StorageTypeDef documented, and why would they be USBD_***?

JW

waclawek.jan · ‎2022-01-10

In this post, I'm not going to discuss merits of disabling USB interrupts, as that has consequences on several layers - at the lowermost one it's OK, simply not handling endpoints leads to NAKs on the bus and unless the input FIFO is overflown there's no harm here; but how do various hosts cope with unresponsive endpoints beyond a couple of frames is up to the hosts' discretion.

A slightly more refined approach would disable individual OTG interrupts selectively, or handle things directly at the endpoint level. Consequences as per host behaviour are the same, though.

So, the following are only technical remarks:

> I have tried disabling USB interrupts, via one of these methods

> zeroing bit 0 of GAHBCFG

> zeroing all of GINTMSK

Both are equivalent, see the OTG_xx Interrupts subchapter of USB/OTG chapter in RM, and the Interrupt hierarchy figure therein.

> The text says GAHBCFG should be set up just once and not touched

The same text says also: This register mainly contains AHB system-related configuration parameters.. It might've been clearer, but obviously GAHBCFG.GINTMSK is *not* AHB-interface-related, so that bit can be modified freely.

Nonetheless, disabling interrupts at the source level involves a gotcha in that there may be already a pending interrupt in the pipeline leading to NVIC. In other words, there may be several instructions after the instruction which disables the interrupt, which can still be interrupted. This has the same root cause as the "late interrupt clear causes ISR to reenter" problem. It's better do disable interrupts at the NVIC level - NVIC is part of the processor, and writes to NVIC registers act immediately on NVIC and even present an implicit instruction barrier.

JW

PHolt.1 · ‎2022-01-10

Thanks for the tip about the NVIC, JW. However we are instead trying to do it "the proper way" which is returning a BUSY status.

To answer your question:

This is the usbd_storage_if.c generated by Cube for MSC that suggests the return code for the Write or Read functions should be USBD_OK or USBD_FAIL. The USBD error codes are positive values but the SCSI_ProcessWrite() in usbd_msc_scsi.c only tests for negative return values and consequently will ignore any error. This is surely a bug!

Even the very latest ST code has the same bug.

waclawek.jan · ‎2022-01-10

> This is the usbd_storage_if.c generated by Cube for MSC that suggests the return code for the Write

> or Read functions should be USBD_OK or USBD_FAIL.

So that's some code generated by CubeMX? How does that suggestion look like? I don't use CubeIDE/CubeMX.

As far as I can tell, the example projects in Cube clearly indicate that 0 is OK and -1 is fail, e.g. here https://github.com/STMicroelectronics/STM32CubeF4/blob/4aba24d78fef03d797a82b258f37dbc84728bbb5/Projects/STM324xG_EVAL/Applications/USB_Device/MSC_Standalone/Src/usbd_storage.c#L169

JW

@Amel NASRI , can somebody please look at this?

PHolt.1 · ‎2022-01-11

I could not get the USBD_BUSY method to work. It almost looks like the USB host (a win7-64 machine) doesn't like to see a BUSY for 20ms (the FLASH write time). USB is just very unreliable. Same method has worked on other systems e.g. Atmel.

So I went back to disabling interrupts and currently I am achieving 100% reliability using the bit 0 of GAHBCFG method.

I started looking at the NVIC and after an hour of reading the RM gave up. I would need to read the current value, disable the USB global interrupt, and then restore the current value. I found macros in the ST code for enabling or disabling the interrupt in the NVIC and they are obvious enough but could not be sure re the code for reading the current setting first. I am OK with a delay of a number of instructions so this was not worth pursuing.

The BUSY method should work because disabling interrupts is a right bodge : - ) I am sure many have been down this road before and just silently fixed it.

waclawek.jan · ‎2022-01-11

> I started looking at the NVIC and after an hour of reading the RM gave up.

NVIC is not described in RM, but in the Cortex-M4 UM (as NVIC is part of the core).

I did not mean global interrupt disable, but selectively disable interrupts from the OTG module in NVIC:

NVIC_DisbleIRQ(OTG_FS_IRQn);

NVIC_EnableIRQ(OTG_FS_IRQn);

> It almost looks like the USB host (a win7-64 machine) doesn't like to see a BUSY for 20ms (the FLASH write time).

There is no BUSY in basic USB, nor in MSC/BOT. There is BUSY in SCSI, but that's not propagated directly through MSC/BOT status back to host, for whatever reason MSC/BOT authors deemed to be reasonable. The MSC/BOT status passed from device to host in CSW can be Command Passed ("good status"), Command Failed or Phase Error. Phase Error leads to Reset Recovery which you don't want, and Command Failed does not have a specified timing/retry mechanism in MSC/BOT, so it's entirely up to the host, how will it cope.

> Same method has worked on other systems e.g. Atmel.

As you don't know what exactly that method meant at Atmel (whatever Atmel means in this context) nor what exactly that means in STM32/Cube, you can't say it's "same method".

As I've said, I would handle things at the endpoint level. While host may still decide to be impatient and time out, the general wait mechanism in USB at data packet level is the NAK mechanism (see USB 2.0 5.3.2, An endpoint can inform the host that it is busy by responding with NAK. NAKs are not used as a retire condition for returning an IRP to a software client. Any number of NAKs can be encountered during the processing of a given IRP. A NAK response to a transaction does not constitute an error and is not counted as one of the three errors described above.) This appears to be widely accepted by hosts and drivers therein. It means, that you don't re-enable an Out endpoint for Rx (or don't write data for Tx to In endpoint and enable it) until whatever transaction with the slow memory is finished, using whatever mechanism (e.g. buffering and leaving the USB interrupt, the transaction to be finished by "main" or RTOS later); attempts of host to send more data to Out endpoint or to poll In endpoint are NAKed by the USB hardware of device.

Yes, this is not something you can click in Cube or fix by adding a few selected lines. Cube, as any library, inevitably caters only for a miniscule fraction of all possible usage cases, arguably the "most common" ones. OTOH, Cube is not the only option. There are open-source and third-party solutions out there, and there are consultants available.

My 2 eurocents.

JW

PHolt.1 · ‎2022-01-11

Thank you.

I found those two macros and can see how they use the value OTG_FS_IRQn to index to the right register and then go to a field within that register, but got a bit stuck on how to read the current status and be sure I am actually doing that, given the various gotchas with reading 32F4 registers which may contain write-only bits.

FWIW, there appears to be no mileage in trying to disable some of the USB interrupts, because there is no evident separation between USB CDC and USB MSC in what interrupts are used. But maybe you have ideas...

Regarding the USB return stuff:

We may have "wires crossed". My approach was to return a SCSI sense code of with a key of UNIT_ATTENTION and an additional sense code of NOT_READY_TO_READY_CHANGE. This is what has worked with another composite USB mass storage device implemented with an Atmel 8 bit AVR (XMEGA) using Atmel's ASF libraries.

waclawek.jan · ‎2022-01-11

>> NVIC_DisbleIRQ(OTG_FS_IRQn);

>> NVIC_EnableIRQ(OTG_FS_IRQn);

> I found those two macros

Technically, they are not macros but static inline functions.

> got a bit stuck on how to read the current status

Both NVIC_ISER and NVIC_ICER read returns the enable status of given interrupt, see description of these registers in PM0214.

> FWIW, there appears to be no mileage in trying to disable some of the USB interrupts, because

> there is no evident separation between USB CDC and USB MSC in what interrupts are used. But

> maybe you have ideas...

They use different endpoints and you can enable/disable interrupts per endpoint. I would strongly discourage this, though, due to the Synopsys OTG module's rather peculiar common Rx FIFO.

> My approach was to return a SCSI sense code of with a key of UNIT_ATTENTION and an additional sense code of NOT_READY_TO_READY_CHANGE.

I don't grok SCSI. But the main caveat here is, that SCSI is used under MSC/BOT, and there's no inherent method in MSC/BOT to return SCSI status, nor is there a normative prescription in what handling MSC/BOT Command Failed status should result. In other words, hosts may choose to remain ignorant and not to investigate what was the reason for Command Failed and then the device has no way to convey SCSI status to the host. In yet another words, this may work with windows and may not work with some other host. That may be good enough for you, though. Feel free to port that method under Cube.

> Atmel's ASF libraries

Oh those. Those are cuboid enough to make me ignore them entirely.

JW