How is I/O accomplished with Cube-AI models running locally on an STM32G4?

KHarb.1 · ‎2024-06-03

I haven’t used AI models before and I’m trying to figure out the best way to get one running via CubeAI. I can use UART I\O to a PC to generate a model, but once I port the model and run it on an STM32, how do I communicate with it? I see the UART selection in the config options…is this what it’s using for I\O with the locally run model?

If so, is there another way I can feed it a data stream?

Muhammed Güler · ‎2024-06-03

The way I use to run AI in the MCU is;
Creating data collection software on a PC that could easily collect several thousand samples.
Creating an MCU Task that will feed the PC software,
collecting data and training the network,
Loading the trained network into the MCU and testing whether it gives the result I expect.
Instead of continuous data flow, I prefer to position the sensor as I want and get the value when I press the button.
I prefer USB CDC for data transfer as it also allows charging my battery-powered devices.

KHarb.1 · ‎2024-06-03

When running on the MCU, can you share the method by which your trained network is fed data?

Muhammed Güler · ‎2024-06-05

I did it a long time ago at my old company, I don't remember what I did. But when I trained a simple network with Python keras and gave it to cubemx, it generated the following codes.

static int ai_boostrap(const uint8_t *obj, uint8_t *ram_addr,
    ai_size ram_sz, ai_handle w_addr, ai_handle act_addr[])
{
....
#else
  for (int idx=0; idx < AI_NETWORK_IN_NUM; idx++) {
	  ai_input[idx].data = data_ins[idx];
  }
#endif

static int ai_run()
{
  ai_i32 batch;

  batch = ai_rel_network_run(network, ai_input, ai_output);
  if (batch != 1) {
    ai_log_err(ai_rel_network_get_error(network),
        "ai_rel_network_run");
    return -1;
  }

  return 0;
}

Below is the code of an AI I created with ESP32.

  // Allocate memory from the tensor_arena for the model's tensors.
  TfLiteStatus allocate_status = interpreter->AllocateTensors();
  if (allocate_status != kTfLiteOk) {
    TF_LITE_REPORT_ERROR(error_reporter, "AllocateTensors() failed");
    return;
  }

  // Obtain pointers to the model's input and output tensors.
  input = interpreter->input(0);
  output = interpreter->output(0);

  // Keep track of how many inferences we have performed.
  inference_count = 0;
  while (1) {
    osDelay(1);
    // put your main code here, to run repeatedly:
    xSemaphoreTake(SemTagReading, portMAX_DELAY);
    sensor.takeMeasurementsWithBulb();
    InputRegisters[0] = sensor.getR();
    InputRegisters[1] = sensor.getS();
    InputRegisters[2] = sensor.getT();
    InputRegisters[3] = sensor.getU();
    InputRegisters[4] = sensor.getV();
    InputRegisters[5] = sensor.getW();
    //for (int i = 0; i < 6; i++) Finput[i] = InputRegisters[i];
    for (int i = 0; i < 6; i++) input->data.f[i] = InputRegisters[i];
    // Run inference, and report any error
    TfLiteStatus invoke_status = interpreter->Invoke();
    if (invoke_status != kTfLiteOk) {
      TF_LITE_REPORT_ERROR(error_reporter, "Invoke failed\n");
      return;
    }

I don't have an STM32 board to test the first code, but I know the ESP32 version works.
There are minor differences between the codes, I believe you can solve them with a little experimenting.