cancel
Showing results for 
Search instead for 
Did you mean: 

Optimizing instructions for using a very frequent ISR along with USB CDC

LLope.31
Associate III

Hello everyone,

 

I have a complex problem which I would like help from the memory experts and instruction optimization, if possible.

 

I am trying to receive data using CDC_Receive_FS() call back, and, concurrently, process the data using an ISR which is running every 300 clock cycles (the chip has a 48MHz clock).

The issue arises when this ISR becomes too fast. Anything below ~350 clock cycles overwhelms the USB callback/USB in general, and the communication fails. It works well with 500 cycle ISR.

I have a few questions, firstly, is it that the processing in the ISR consumes too many of the 350 cycles, or is it that the FREQUENCY of the ISR is INTERRUPTING the USB too much?

Here is the full code, which has a circular buffer for the incoming data, and flow control. The flow control is a synchronous packet that gets sent every received packet (this might be what is causing the larger consumption of cycles in the USB side):

static int8_t CDC_Receive_FS(uint8_t* Buf, uint32_t *Len)
{
  /* USER CODE BEGIN 6 */
  USBD_CDC_SetRxBuffer(&hUsbDeviceFS, &Buf[0]);
  uint8_t len = (uint8_t) *Len; // Get length


  if(transmissionEnable == 1){
	  uint16_t tempHeadPos = rxBufferHeadPos; // Increment temp head pos while writing, then update main variable when complete
	  for (uint32_t i = 0; i < len; i++) {
		rxBuffer[tempHeadPos] = Buf[i];
		tempHeadPos = (uint16_t)((uint16_t)(tempHeadPos + 1) % HL_RX_BUFFER_SIZE);
		if (tempHeadPos == rxBufferTailPos) {
		  return USBD_FAIL;
		}
	  }
	  rxBufferHeadPos = tempHeadPos;
	  CDC_SendRxBufferStatus_FS();
  }
  else{
	  memcpy(bulk_packet, Buf, len);
  }


  USBD_CDC_ReceivePacket(&hUsbDeviceFS); //indicates that the device is ready to receive more packets

  return (USBD_OK);
  /* USER CODE END 6 */
}


uint8_t CDC_ReadRxBuffer_FS(uint8_t* Buf, uint16_t Len) {
	uint16_t bytesAvailable = CDC_GetRxBufferBytesAvailable_FS();

	if (bytesAvailable < Len)
	return USB_CDC_RX_BUFFER_NO_DATA;

	for (uint8_t i = 0; i < Len; i++) {
		Buf[i] = rxBuffer[rxBufferTailPos];
		rxBufferTailPos = (uint16_t)((uint16_t)(rxBufferTailPos + 1) % HL_RX_BUFFER_SIZE);
	}

	return USB_CDC_RX_BUFFER_OK;
}

uint16_t CDC_GetRxBufferBytesAvailable_FS() {
	return (uint16_t)(rxBufferHeadPos - rxBufferTailPos) % HL_RX_BUFFER_SIZE;
}

void CDC_SendRxBufferStatus_FS(){
	uint8_t packet[64];
	uint16_t bytesAvailable;

	// Header
	packet[0] = 'B'; // Example header byte 1
	packet[1] = 'S'; // Example header byte 2

	// Container Size
	bytesAvailable = CDC_GetRxBufferBytesAvailable_FS();
	packet[2] = (uint8_t)(bytesAvailable >> 8); // High byte
	packet[3] = (uint8_t)(bytesAvailable & 0xFF); // Low byte

	// Padding
	for (int i = 4; i < 64; i++) {
		packet[i] = 0x00;
	}

	// Send Packet
	CDC_Transmit_FS(packet, 64);
}

void ISR_Sampler_writting() {
    static uint8_t bitIndex = 7; // Start from the MSB
    static uint8_t currentByte = 0; // The current byte to be written to the pin

    // Check if there is data available in the buffer
    if (CDC_GetRxBufferBytesAvailable_FS() > 0) {
        // If this is the first bit of the byte, read the next byte from the buffer
        if (bitIndex == 7) {
            CDC_ReadRxBuffer_FS(&currentByte, 1); // Assume function returns only 1 byte
        }

        // Write the current bit to the pin
        if (currentByte & (1 << bitIndex)) {
            //HAL_GPIO_WritePin(GPIOA, GDO0_Pin, GPIO_PIN_SET); // Set pin high
        	GPIOA->BSRR = (1 << 3);  // Set pin 3 (GDO0_Pin) high
        } else {
            //HAL_GPIO_WritePin(GPIOA, GDO0_Pin, GPIO_PIN_RESET);
        	GPIOA->BSRR = (1 << (3 + 16));// Reset pin 3 (GDO0_Pin) low
        }

        // Decrement bit index and check if we have processed the whole byte
        if (bitIndex == 0) {
            bitIndex = 7; // Reset bit index back to the MSB for the next byte
        } else {
            bitIndex--;
        }
    }
}



void HAL_TIM_PeriodElapsedCallback(TIM_HandleTypeDef *htim){
	if(htim == &htim2){
		if(transmissionEnable == 1){
			ISR_Sampler_writting();
		}
		else if(transmissionEnable == 2){
			ISR_Sampler_raw();
		}
	}
}

static void MX_TIM2_Init(void)
{

  /* USER CODE BEGIN TIM2_Init 0 */

  /* USER CODE END TIM2_Init 0 */

  TIM_ClockConfigTypeDef sClockSourceConfig = {0};
  TIM_MasterConfigTypeDef sMasterConfig = {0};

  /* USER CODE BEGIN TIM2_Init 1 */

  /* USER CODE END TIM2_Init 1 */
  htim2.Instance = TIM2;
  htim2.Init.Prescaler = 0;
  htim2.Init.CounterMode = TIM_COUNTERMODE_UP;
  htim2.Init.Period = 350-1;
  htim2.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1;
  htim2.Init.AutoReloadPreload = TIM_AUTORELOAD_PRELOAD_DISABLE;
  if (HAL_TIM_Base_Init(&htim2) != HAL_OK)
  {
    Error_Handler();
  }
  sClockSourceConfig.ClockSource = TIM_CLOCKSOURCE_INTERNAL;
  if (HAL_TIM_ConfigClockSource(&htim2, &sClockSourceConfig) != HAL_OK)
  {
    Error_Handler();
  }
  sMasterConfig.MasterOutputTrigger = TIM_TRGO_RESET;
  sMasterConfig.MasterSlaveMode = TIM_MASTERSLAVEMODE_DISABLE;
  if (HAL_TIMEx_MasterConfigSynchronization(&htim2, &sMasterConfig) != HAL_OK)
  {
    Error_Handler();
  }
  /* USER CODE BEGIN TIM2_Init 2 */

  /* USER CODE END TIM2_Init 2 */

}

 I am not including the main(). Assume the TIM2 is running and with the callback ISR running. I am not including the code from the host, because it is an Android device, the code is in Java, and it's NOT relevant. Assume the host is putting out a bulkPacket of some bytes on every frame, according to the required flow. Assume the flow control works and the buffer never underflows or overflows. This has all been tested.

What I am asking for is help regarding optimizing this concurrence between the CDC callback when a new bulkTransfer packet is found, and the TIM2 processing happening fast, and them both fighting for the 350 cycles that are available between one ISR processing. Context about the processing being done: it's simply writing samples onto the GPIO 3. This whole project is about transfering signals as samples from an Android phone to a stm32 to be applied. So the PRIORITY of the ISR has to be ABOVE everything else, otherwise the signal is not procesed in time, and is CORRUPTED. I put the NVIC priority of the ISR above the USB priority.

 

Thank you and I await patiently for expertise regarding instruction optimization.

Luis

5 REPLIES 5
gbm
Lead II

Simply set the lower priority for USB interrupt. If your MCU is Cortex-M0 based, use priority 3 for USB, priority 0 for timer. USB is not time-critical, the service may be delayed by even few ms, so any time-critical actions may safely preempt USB interrupt. The only problem is that you must plan the communication/synchronization between the CDC and timer carefully. Maybe a better idea would be to use DMA for GPIO control (possible on F0 series, impossible on any M0+-based MCU).

TDK
Guru

350 cycles is not much, especially using HAL functions. Consider writing your own IRQ handler function directly. This is your best chance.

> is it that the processing in the ISR consumes too many of the 350 cycles, or is it that the FREQUENCY of the ISR is INTERRUPTING the USB too much?

Likely the processing is taking too long. Frequency alone shouldn't matter.

 

If you feel a post has answered your question, please click "Accept as Solution".

Ok, good to know that USB can be interrupted, and that is not the issue. Actually, the USB comms never fail now that I placed USB at lower priority 3.

It's the ISR that is taking up all of the execution... but you say 350 cycles is not much, but I am not using any HAL functions:

uint8_t * rxBuffer; // Receive packet buffer
volatile uint16_t rxBufferHeadPos = 0; // Receive buffer write position
volatile uint16_t rxBufferTailPos = 0; // Receive buffer read position

void ISR_Sampler_writing() {
    static uint8_t bitIndex = 7; // Start from the MSB
    static uint8_t currentByte = 0; // The current byte to be written to the pin

    // Check if there is data available in the buffer
    if (CDC_GetRxBufferBytesAvailable_FS() > 0) {
        // If this is the first bit of the byte, read the next byte from the buffer
        if (bitIndex == 7) {
            CDC_ReadRxBuffer_FS(&currentByte, 1); // Assume function returns only 1 byte
        }

        // Write the current bit to the pin
        if (currentByte & (1 << bitIndex)) {
            //HAL_GPIO_WritePin(GPIOA, GDO0_Pin, GPIO_PIN_SET); // Set pin high
        	GPIOA->BSRR = (1 << 3);  // Set pin 3 (GDO0_Pin) high
        } else {
            //HAL_GPIO_WritePin(GPIOA, GDO0_Pin, GPIO_PIN_RESET);
        	GPIOA->BSRR = (1 << (3 + 16));// Reset pin 3 (GDO0_Pin) low
        }

        // Decrement bit index and check if we have processed the whole byte
        if (bitIndex == 0) {
            bitIndex = 7; // Reset bit index back to the MSB for the next byte
        } else {
            bitIndex--;
        }
    }
}



void HAL_TIM_PeriodElapsedCallback(TIM_HandleTypeDef *htim){
	if(htim == &htim2){
		ISR_Sampler_writing();
	}
}

uint8_t CDC_ReadRxBuffer_FS(uint8_t* Buf, uint16_t Len) {
	uint16_t bytesAvailable = CDC_GetRxBufferBytesAvailable_FS();

	if (bytesAvailable < Len)
	return USB_CDC_RX_BUFFER_NO_DATA;

	for (uint8_t i = 0; i < Len; i++) {
		Buf[i] = rxBuffer[rxBufferTailPos];
		rxBufferTailPos = (uint16_t)((uint16_t)(rxBufferTailPos + 1) % HL_RX_BUFFER_SIZE);
	}

	return USB_CDC_RX_BUFFER_OK;
}

uint16_t CDC_GetRxBufferBytesAvailable_FS() {
	return (uint16_t)(rxBufferHeadPos - rxBufferTailPos) % HL_RX_BUFFER_SIZE;
}

 This is all my ISR is doing. It's reading a byte from an address, shifting it, & with 1, and writing that value to the GPIO . I even used register level GPIO write instead of HAL. Shouldn't this take like 10 cycles, plus some more for the extra if statements and whatnot? Why is this taking more than 300?

gbm
Lead II

Actually you are using HAL timer interrupt redirection which adds at least 50 cycles to the interrupt handling. Define your own timer ISR instead of going through HAL.

> but I am not using any HAL functions:

You're using HAL_TIM_PeriodElapsedCallback. That's not the IRQ handler, it's called from it. Look at your stm32_*_it.c file for the actual handlers.

> Shouldn't this take like 10 cycles, plus some more for the extra if statements and whatnot? Why is this taking more than 300?

Look at the assembly code. What you have posted will take way more than 10 cycles. Step through using DWT->CYCCNT to time it exactly.

If you feel a post has answered your question, please click "Accept as Solution".