cancel
Showing results for 
Search instead for 
Did you mean: 

USB FS TxFIFO problem

kinovicf
Associate II
Posted on August 28, 2014 at 14:22

Hello,

I tested STM VCP example on STM32F4-DISCOVERY kit and after solving some code problems (permanent emptyfifo interrupt, size of USB_MAX_STR_DESC_SIZ, ZLP), I got transmission speed cca 500 kB/s from USB device to PC. But I would like to use simple USB bulk communication without the VCP and I would like to get maximal FS speed (max. speed = USB interface is well written).

I would like to reach 1024 kB/s and USB device should transmit 1024 bytes (16 packets of 64B) during 1 ms frame through TxFIFO. This possibility USB FS is written in reference manual (... OTG_FS core is able to fill in the 1.25 Kbyte RAM buffer very efficiently ...). I created code derived from VCP and HID demo. The code sends a data block every second through endpoint IN EP1 and monitors USB interrupts and rotines. Data is read by application using libusb. Communication is working, but transmission is not full efficient. I get following behavior for below settings:

Here is explanatory text:

f = SOF interrupt

[x,y] = x-length, y-packets by USB_OTG_EPStartXfer()

[0,1] = null packet by USB_OTG_EPStartXfer()

i6 = In TxFIFOEmpty interrupt

i1 = In TX complete interrupt

(x...y) = monitoring of DCD_WriteEmptyTxFifo(), x=begin fifo available space, y=final fifo available space, .=calling of USB_OTG_WritePacket()

A) sending 512B, TX1_FIFO_FS_SIZE=128 -> 128*4=512B=8packets*64

fff[512:8]i6(128.......0)i6(64.48)fi1[0:1]i1fff

B) sending 512B, TX1_FIFO_FS_SIZE=160 -> 160*4=640B=10packets*64

fff[512:8]i6(160.......0)i6(80.64)_i1[0:1]fi1ff

C) sending 512B, TX1_FIFO_FS_SIZE=256 -> 256*4=1024B=16packets*64

fff[512:8]i6(256.......0)i6(160.0)i1[0:1]i1fff

D) sending 1024B, TX1_FIFO_FS_SIZE=128 -> 128*4=512B=8packets*64

fff[1024:16]i6(128.......0)i6(64...0)i6(64...0)fi6(64...0)i1[0:1]i1fff

E) sending 1024B, TX1_FIFO_FS_SIZE=256 -> 256*4=1024B=16packets*64

fff[1024:16]i6(256.......0)i6(160.0)i6(160.0)i6(160.0)i6(160.0)i6(160.0)i6(160.0)i6(160.0)i6(160.0)fff

usb stop working

It is noticeable that when trying to write 512 bytes into TxFIFO where there is space for 512 bytes or more, only 7packets*64=448 bytes is written (after 7th packet there is zero space available). I didn't find limit for EP1 TxFifo size in manual, only there is a note in chapter Device programming model about 3-bit internal core packet counter, which limit max. 8 packets for each FIFO. So, I am not able to write 8 packets less so16 packets in fifo. Because of this limitation I can't reach 1024B during 1 ms interruption.

When I try to send more than 512 bytes by TxFifo with space over 512 bytes, USB FS communication stops working.

I would like to ask if there is possibility to transmit 1.25kB during 1 ms as written on USB FS and how? Maybe I am missing some settings. I would be grateful for any comment.

Nelup

#usb-fs-txfifo-problem-max_speed #usb-fs-hs-txfifo-speed #usb-fs-txfifo
7 REPLIES 7
tsuneo
Senior
Posted on August 28, 2014 at 19:58

> fff[512:8]  

It looks like you are assuming that the bulk IN transfer would always start at the beginning of a frame. But it isn’t true.

While the bulk IN endpoint is empty, the SIE (USB engine) responds to IN transaction from host with NAK. Host repeats 40 - 100 IN transactions / frame, while the endpoint is NAKing. And then, the bulk transfer may start at any phase of a frame, immediately after the firmware writes a transfer to the IN endpoint, asynchronously with SOF timing.

Actually, fff[512:8] should be represented as fff - (variable gap) - [512:8]

As the start timing is not fixed in a frame, you are misled to wrong conclusion.

a) call USB_OTG_EPStartXfer() in the SOF ISR, to fix the start timing

OR

b) Capture a timer to measure the interval from ''[512:8]'' to ''i1''

I believe you’ve already gotten ''1024 bytes (16 packets of 64B) during 1 ms frame'' in your C) setup ;)

Tsuneo

kinovicf
Associate II
Posted on August 29, 2014 at 15:15

Hello Tsuneo,

thank you for your comment. I don't say the SOF is needed for starting transfer. The letter 'f' only debug me on console that SOF interrupt occured and I know from logic analyzer that it is every 1 ms (all usb interrupts make me LED pin pulse). There is 1000 'f' letters between transactions because of 1 s transmit period. You can see that 'f' sometimes occurs during transmission. I assigned some letter for every interrupt flag to know what is happening.

I am able to reach almost 1 MB/s (without console debugging of-course), but it is not efficient - I have many ''fifo-empty'' interrupts during 1 ms on analyzer, because TxFifo is not used effectively.

Here you can see transmission of 2048 bytes (TX1_FIFO_FS_SIZE=128 -> 128*4=512B=8packets*64):

fff[2048:32]i6(128.......0)i6(64...0)i6(64...0)i6(64...0)i6(64...0)fi6(64...0)i6(64...0)i6(64...0)i6(64...0)i6(64.48)_fi1[0:1]i1fff

The 'f'' letters can be ignored, it only represent SOF interrupt. The [2048:32] means that application called DCD_EP_Tx() -> USB_OTG_EPStartXfer() and that IN endpoint transmission was enabled (empty fifo interrupt was enabled too). And then ''i6'' means just empty fifo interrupt occurence and data are placed into it (each dot '.' means one 64B-packet). You can see that there is 128 words (512B = 8packets) space available at the beginning, but only 7 packets (448B) are written (0 space at the end). And then empty fifo interrupt occurs again because it is 1/2 empty. 64 words (256B = 4packets) is available, but only 3 packets (3xdot=192B) are written. And so on. Because emptyfifointr occurs when 1/2 fifo is empty, only 3 packets are continually filled (10x i6 interrupts during transmission of 2048 bytes). I am not able to increase fifo size, respectively I can't write more packets into it. There is 4kB RAM available for each EP, but it can't be used. I am little disappointed by this behavior.

Maybe I do something badly. If there is some more effective way, I would like to see some example. VCP demo example generates many interrupts per ms too and resizing its TX1_FIFO_FS_SIZE stops USB working.

Nelup

tsuneo
Senior
Posted on August 31, 2014 at 10:49

> 64 words (256B = 4packets) is available, but only 3 packets (3xdot=192B) are written.

Ah, it’s coding mistake on the library Fix this part of usb_dcd_int.c, and you’ll see the behavior as you expect, at least on this issue ;)

\STM32_USB-Host-Device_Lib_V2.1.0\Libraries\STM32_USB_OTG_Driver\src\usb_dcd_int.c
static uint32_t DCD_WriteEmptyTxFifo(USB_OTG_CORE_HANDLE *pdev, uint32_t epnum)
..
//while (txstatus.b.txfspcavail > len32b && // replace this line with next line
while (txstatus.b.txfspcavail >= len32b &&

Tsuneo
kinovicf
Associate II
Posted on September 01, 2014 at 14:03

Hello Tsuneo,

I already found this bug and above behaviour has the fix. Here is my part of DCD_WriteEmptyTxFifo() code with debug printf:

len32b = (len + 3) / 4;
txstatus.d32 = USB_OTG_READ_REG32( &pdev->regs.INEP_REGS[epnum]->DTXFSTS);
debug_usb_dcd(''(%d'', txstatus.b.txfspcavail);
while (txstatus.b.txfspcavail >= len32b &&
ep->xfer_count < 
ep-
>xfer_len &&
ep->xfer_len != 0) {
/* Write the FIFO */
len = ep->xfer_len - ep->xfer_count;
if (len > ep->maxpacket) {
len = ep->maxpacket;
}
len32b = (len + 3) / 4;
debug_usb_dcd(''.'');
USB_OTG_WritePacket (pdev , ep->xfer_buff, epnum, len);
ep->xfer_buff += len;
ep->xfer_count += len;
txstatus.d32 = USB_OTG_READ_REG32(&pdev->regs.INEP_REGS[epnum]->DTXFSTS);
}
if (epnum > 0) {
debug_usb_dcd(''%d)'', txstatus.b.txfspcavail);
}
if (ep->xfer_count >= ep->xfer_len) {
debug_usb_dcd(''_'');
fifoemptymsk = 1 << 
epnum
;
USB_OTG_MODIFY_REG32(&pdev->regs.DREGS->DIEPEMPMSK, fifoemptymsk, 0);
}

You can see that debug value ''%d)'' is the state of available space at the end and it is zero! TxFifo behavior is strange. Nelup
tsuneo
Senior
Posted on September 01, 2014 at 16:54

Though the focus of the discussion has moved from the bus side to the performance of TX FIFO,

I measured bulk IN transfer speed on full-speed bus

- DCD_EP_Tx() on STM32_USB-Host-Device_Lib_V2.1.0 (**1) and F4 discovery board.

- monitored on a hardware bus analyzer

FS Bulk IN transfer speed, depending on the type of host controller

- OHCI:           15 full-size packets / frame

- UHCI:           17 packets / frame

- EHCI over hub : 17 packets / frame

No NAK was found between the bulk transactions on any of above cases.

That is, on the bus side, STM32F4 USB engine is quick enough to satisfy host IN requests.

Above additional fix doesn’t affect to the bus transfer speed at all.

(**1) applied this fix, as you did ;)

https://my.st.com/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Flat.aspx?RootFolder=/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Flooded%20empty%20FIFO%20interrupts%20still%20make%20USB%20devices%20crawl

Also,

usb_conf.h

#define TX1_FIFO_FS_SIZE   128

Tsuneo

kinovicf
Associate II
Posted on September 03, 2014 at 09:37

Hello Tsuneo,

thanks for response. Yes, I agree that USB FS is able transmit full-speed, but there are many USB transmission interrupts during it. Maybe I try to formulate my questions better:

1) Is is possible to use more than 128 words for TxFifo of USB FS core?

2) Is above TxFifo behavior (available space - 1) normal?

3) Could be using USB HS core with internal FS PHY and DMA better performance (more CPU time for other tasks) solution?

Thanks.

Nelup

kinovicf
Associate II
Posted on September 29, 2014 at 15:04

Hello.

I did some more testing of USB bulk trasmit speed on IN endpoint. I little improved reading application (libusb, async) and I got following maximal speeds together with USB interrupt frequency (used cpu differs according to our available hardware):

a) Using USB-FS core, intenal FS PHY on STM32F407 (168MHz)

1023 kB/s by 4095B ... usb interrupt 6.2 kHz (2.61% cpu time)

b) Using USB-HS core, external HS PHY ISP1705AET on STM32F207 (120MHz)

10513 kB/s by 4095B ... usb interrupt 81.6 kHz (26.85% cpu time)

10580 kB/s by 4095B + DMA ... usb interrupt 9.29 kHz (1.74% cpu time)

c) Using USB-HS core, intenal FS PHY on STM32F429I (168MHz!)

1023 kB/s by 4095B ... usb interrupt 15.7 kHz (3.76% cpu time)

1023 kB/s by 4095B + DMA ... usb interrupt 1.25 kHz (0.23% cpu time)

I reached 10 MB/s using USB-HS together with external PHY. I think that using DMA is needed otherwise usb interruption is too heavy.

Very interesting is using USB-HS with internal FS PHY together with DMA (maximal FS speed 1 MB/s and minimal usb interruption).

But using USB-HS with internal FS PHY without DMA makes more interrupts (15.7 kHz, 3.76% cpu) than using USB-FS (6.2 kHz, 2.61% cpu). It is little strange. The interrupts are shorter - less data is moved during each one, but it is opposite than I would expected from HS TxFIFO.

On STM32F429I can't be used 180 MHz for system clock, because USB will not get valid 48 MHz clock.

Nelup