Showing results for 
Search instead for 
Did you mean: 

USB CDC how to increase speed (baud rate) ?

Associate III

Hello everyone,

I am working on a signal analyser device, which needs to transmit data serially as fast as possible (idealy 1Mbaud/s).

I am doing the following loop to transmit:



uint8_t buf_high[1] = {124}; // '|'
uint8_t buf_low[1] = {48}; // '0'
while(buffer[0] == 'c'){
    if(HAL_GPIO_ReadPin(GPIOA, GDO0_Pin) == 1){
        while(CDC_Transmit_FS(buf_high, 1));
        while(CDC_Transmit_FS(buf_low, 1));



But I only see around 56000 bytes in 10 seconds, so 5600 bytes/s = 44800 baud/s

Is there a way to set the baud rate in USB CDC? I dont see any configuration for that in cubemx. My MCU is STM32F042F6P6.

Would really appreciate some help, since I will not believe that a simple Arduino UNO has higher baud rates (1Mbaud/s) than my MCU which has a 48 MHz clock compared to 16 MHz. There has to be a way of transmitting serial data faster.

Pavel A.
Evangelist III

Sending one byte packets over USB is not optimal. Please see here for example how this can be done. Maybe, get this software as a prototype.


Thanks for the reply.

So I reduced the Tx buffer size from 512 bytes to 64 bytes. This sped up the transmission significantly. I want to understand why this is.

I understand that USB full speed transmits a frame of max 1023 bytes (with overhead) every 1 ms. So its seems that what is happening in my code is that I am just placing 1 byte in each frame, so the transmission is slower. However, changing the Tx buffer size from 512 to 64 bytes speeds up the transmission. I am left to conclude, now, that what is actually happening is the following:

When CDC_Transmit(buf, 1) is called, the Tx buffer size (64 bytes) is allocated somewhere in the next frame. So out of the 1023 bytes, blocks of 64 bytes are reserved for the transmission. Its these blocks that are the reason that Tx buffer size increasing slows down the transmission (less blocks per frame).

Basically, I would like to understand what the Tx buffer of the CDC actually does in the context of the USB frame, so I can understand the speed limits.

This makes sense, but if you could Mr Pavel confirm that my thinking is on the right track, it would be great, thanks.

Also I appreciate the link to complex projects for signal analysers, but I must have my communication be USB-Serial communication, so those projects are different from what I see.

USB full speed has a max packet size of 64 bytes.

Changing packet size from 512 to 64 should slow down the transmission, so something else may be going on.

Note that CDC_Transmit_FS happens in the background and you must wait for it to complete before sending the next buffer if you care about packets being dropped.

A good page for learning about USB internals is here:


If you feel a post has answered your question, please click "Accept as Solution".

Yes I was using the mechanism while(CDC_Transmit_FS) to avoid packets being dropped.

But the changing packet size from 512 to 64 means I changed in cubemx the USB CDC Tx Buffer Size from 512 to 64. The actual intruction, CDC_Transmit_FS(buf, 1) still just has a single byte. This is what is going on, but then why would it speed up? I assume its because this buffer gets transmitted when its full, therefore 64 bytes would fill up quicker. Is this about correct?

Pavel A.
Evangelist III

I don't understand how a size of 512 could work at all with a FS device - as TDK noted, 64 bytes is the max. size. When you call CDC_Transmit with one byte size, this means sending a packet of 1 byte payload of 64 possible. This is perfectly valid (for example for a terminal app where a human types using one finger) but not optimal. If you have Teraterm on Windows, compare what it sends over USB when one types in the terminal window vs. sending a binary file. Try to buffer and send at once as much data as possible, to fill more payload in the data packets.


Lead II

When you call Transmit, a packet is sent. Your packets are 1 byte long, which limits the transfer speed. The Tx buffer defined in usb_cdc_if.c file is not used at all, so its size doesn't matter. Also, the maximum usable size of Rx buffer with full-speed USB is 64 bytes. To speed up your transfers, sent the data in 64-byte packets (or 63-byte to avoid the problem which occurs is the USB stack does not properly handle ZLP). You may also call Transmit to send bigger amount of data - it will be split into 64-byte packets anyway (internally by USB stack). The maximum you may easily get should be close to 64 kB/s.

> The maximum you may easily get should be close to 64 kB/s.

There's no limit of number of Bulk transfers per frame, so the maximum (if both Host and Device do their best, there are no other devices transferring on the same tree etc.) is given just by the raw baudrate and protocol overhead, i.e. around 1MB/s.

There's no guaranteed bandwidth for Bulk transfers either.


Associate III

Hello again everyone,
I managed to achieve my goals regarding speed. I believe I understand what is happening better now.

> The maximum you may easily get should be close to 64 kB/s.

This is not true. The way you get more than that is easy: just call CDC_Transmit_FS(buf, 512) for example. This way, the STM32 puts as many 64 byte "bulk transfer packets" (out of the 512) as it can in the next available frame. The limit you were describing, of 64kB/s, is if only a single 64byte bulk transfer happens per frame. As waclawek.jan said, this is not the limit, and you can keep putting the 64 byte bulk transfer packets on the frame (up until around its 1024 byte max).

Now, the key for the CDC_Transmit_FS(buf, 512) call to put as many packets in the next frame is simple: the USB CDC Tx Buffer Size parameter matters. It actually (from what my results show) tells the function when to start pushing packets to the queue. Therefore, to acheive MAXIMUM CDC SPEED, this value should be the minimum 64 bytes. Otherwise, it calling CDC_Transmit_FS(buf, 512) with USB CDC Tx Buffer Size=1024 doesn's transmit anything, only when called for the second time it would transmit.


I did a double buffering technique using a ISR to transmit and sample concurrently, at 12.5us sampling period. Here is the code:

#define USB_BUFFER_SIZE  256

volatile uint8_t bufferA[USB_BUFFER_SIZE];
volatile uint8_t bufferB[USB_BUFFER_SIZE];
volatile uint8_t* currentBuffer = bufferA;
volatile uint8_t* transmitBuffer = NULL;
volatile int bufferIndex = 0;
volatile uint8_t bufferReady = 0;

void ISR_Sampler() {
    // Sample the pin
    uint8_t pin_state = HAL_GPIO_ReadPin(GPIOA, GDO0_Pin);

    // Store in the current buffer
    currentBuffer[bufferIndex] = pin_state ? 124 : 48;

    // Increment buffer index and check if full
    if (bufferIndex >= USB_BUFFER_SIZE) {
        // Swap buffers
        transmitBuffer = currentBuffer;
        currentBuffer = (currentBuffer == bufferA) ? bufferB : bufferA;
        bufferIndex = 0;
        bufferReady = 1;

void HAL_TIM_PeriodElapsedCallback(TIM_HandleTypeDef *htim){
	if(htim == &htim2){
int main(void)
else if(buffer[0] == 'c' && buffer[1] == 'o' && buffer[2] == 'n' && buffer[3] == 't'){ //init Tx values, exept modulation and bitrate
	  	  		  		  	  while(!(buffer[0] == 's')){
	  	  		  		  		  if (bufferReady == 1) {
	  	  		  		  			  CDC_Transmit_FS((uint8_t*)transmitBuffer, USB_BUFFER_SIZE);
	  	  		  		  			  bufferReady = 0;
	  	  		  		  	  HAL_GPIO_WritePin(GPIOA, GDO0_Pin, 0);

static void MX_TIM2_Init(void)

  /* USER CODE BEGIN TIM2_Init 0 */

  /* USER CODE END TIM2_Init 0 */

  TIM_ClockConfigTypeDef sClockSourceConfig = {0};
  TIM_MasterConfigTypeDef sMasterConfig = {0};

  /* USER CODE BEGIN TIM2_Init 1 */

  /* USER CODE END TIM2_Init 1 */
  htim2.Instance = TIM2;
  htim2.Init.Prescaler = 0;
  htim2.Init.CounterMode = TIM_COUNTERMODE_UP;
  htim2.Init.Period = 600-1.;
  htim2.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1;
  htim2.Init.AutoReloadPreload = TIM_AUTORELOAD_PRELOAD_DISABLE;
  if (HAL_TIM_Base_Init(&htim2) != HAL_OK)
  sClockSourceConfig.ClockSource = TIM_CLOCKSOURCE_INTERNAL;
  if (HAL_TIM_ConfigClockSource(&htim2, &sClockSourceConfig) != HAL_OK)
  sMasterConfig.MasterOutputTrigger = TIM_TRGO_RESET;
  sMasterConfig.MasterSlaveMode = TIM_MASTERSLAVEMODE_DISABLE;
  if (HAL_TIMEx_MasterConfigSynchronization(&htim2, &sMasterConfig) != HAL_OK)
  /* USER CODE BEGIN TIM2_Init 2 */

  /* USER CODE END TIM2_Init 2 */


 The device receiving the USB serial communication is an Andoid Phone. Here is the graph it plots, sampling a signal (this part of the signal is the preamble) with bit pulse width of 400us:


WhatsApp Image 2023-12-18 at 14.59.11.jpeg

I'm very pleased with these results, as I was starting to think an Arduino Uno was faster at serial communication than a STM32F042F6P6.

The resulting baud rate is: 8*(1/12.5us)=640000 baud

Although I would be capable of transmitting (sampling) even faster. I was sending 256 bytes per frame. That is a limit of ~2Mbaud in theory, until the CDC_Transmit returns BUSY. And I could have pushed it further towards the 1MB/s limit of USB FS, that is util my 48MHz clock gives out. But my requirements were met, I surpassed the 26us resolution of the Arduino UNO.