Integrating Image Classification Model with X-CUBE-AI on STM32 (Input from SD Card)

EmreTuncer · ‎2024-12-03

Hi everyone,

I am working on implementing an image classification model on an STM32 board using the X-CUBE-AI extension. I am relatively new to embedded software development; my background is primarily in AI and deep learning. While I have managed to generate the necessary .c and .h files for my model using the X-CUBE-AI extension, I am having trouble providing input to the model from an image stored on an SD card. As far as understand from created cods I have modify acquire_and_process_data and post_process functions that have write down below. Also I have provide output of my code. In main function just MX_X_CUBE_AI_Process function works in while for AI proces.

Here’s a brief summary of what I’ve done so far:

Model Conversion: Successfully converted my image classification model using the X-CUBE-AI tool and generated the necessary files.
SD Card Integration: Implemented functions to read BMP image files from the SD card and store them in a memory buffer. These functions are working correctly, and I can read the image data into an array.
Issue: When I feed the image data to the model, the output is always predicting the first class, regardless of the input image.

Questions

Input Preparation: How should I preprocess the BMP image data to ensure it is compatible with the model input?
Data Feeding: What is the correct way to load the image data into the input tensor for X-CUBE-AI models?
Debugging Tips: Are there any specific debugging steps or tools I can use to ensure the input is being fed correctly to the model?
Example Code: If anyone has successfully implemented a similar use case (loading images from SD card for inference), could you share a sample implementation?

The Codes:

int acquire_and_process_data(ai_i8* data[])
{
/* fill the inputs of the c-model
for (int idx=0; idx < AI_NETWORK_IN_NUM; idx++ )
{
data[idx] = ....
}
*/

uint32_t width, height;
uint8_t *pixelData = NULL;
uint8_t imageBuffer[64 * 64 * 3]; // Example with 64x64 resolution in RGB format

// Read the BMP file
FRESULT result = ReadBMPFile("image.bmp", imageBuffer, sizeof(imageBuffer));

if (result != FR_OK) {
HAL_UART_Transmit(&huart1, (uint8_t*)"Error reading BMP file.\n", 24, HAL_MAX_DELAY);
return -1;
}

// Allocate memory for data[0] with size 64x64x3
data[0] = (ai_i8*)malloc(64 * 64 * 3 * sizeof(ai_i8));

if (data[0] == NULL) {
HAL_UART_Transmit(&huart1, (uint8_t*)"Memory allocation failed.\n", 25, HAL_MAX_DELAY);
return -1;
}

uint16_t x = 0;
for (int i = 0; i < 64 * 64; i++) {
uint8_t *bfr = &imageBuffer[i * 3]; // Each pixel has 3 bytes (R, G, B)

// Copy the R, G, B components sequentially to the `data[0]` array
for (int j = 0; j < 3; j++) {
data[0][x++] = bfr[j]; // Transfer the data correctly
}
}

return 0;
}

int post_process(ai_i8* data[])
{
/* process the predictions
for (int idx=0; idx < AI_NETWORK_OUT_NUM; idx++ )
{
data[idx] = ....
}
*/

if (data[0] == NULL) {
HAL_UART_Transmit(&huart1, (uint8_t*)"Data[0] is NULL\r\n", 18, HAL_MAX_DELAY);
return -1; // Error condition
}

// Define class names
const char* class_names[] = {"Pizza", "Steak", "Sushi"};
const int num_classes = sizeof(class_names) / sizeof(class_names[0]);

// Process the results from the AI network
float* predictions = (float*)data[0]; // Cast the output to float

// 1. Find the maximum value and its index
float max_value = predictions[0];
int max_index = 0;
for (int i = 1; i < num_classes; i++) {
if (predictions[i] > max_value) {
max_value = predictions[i];
max_index = i;
}
}

// 2. Send the predicted class via UART
char uart_message[50]; // Buffer to hold the message
snprintf(uart_message, sizeof(uart_message), "Predicted class: %s\r\n", class_names[max_index]);
HAL_UART_Transmit(&huart1, (uint8_t*)uart_message, strlen(uart_message), HAL_MAX_DELAY);

// 3. Send all elements of the prediction array via UART
HAL_UART_Transmit(&huart1, (uint8_t*)"Predictions:\r\n", 13, HAL_MAX_DELAY);
for (int i = 0; i < num_classes; i++) {
char prediction_message[50];
snprintf(prediction_message, sizeof(prediction_message), "Prediction[%d]: %.2f\r\n", i, predictions[i]);
HAL_UART_Transmit(&huart1, (uint8_t*)prediction_message, strlen(prediction_message), HAL_MAX_DELAY);
}

return 0; // Operation complete
}

OUTPUT

FATFS mounted successfully. // from sd card test function
Opened BMP file: image.bmp // from sd card image read function
Read 12288 bytes from BMP file: image.bmp // from sd card image read function
File closed successfully.
Predicted class: Pizza //which is wrong
Prediction[0]: 4.32
Prediction[1]: -3.78
Prediction[2]: -0.45

OguzhanBalci · ‎2024-12-11

I'm facing a similar problem. Did you find a solution?

EmreTuncer · ‎2024-12-12

Unfortunately I cannot find a solution or any source for now. I am still searching about it. I have try to read image from ram and use different function that I have found internet (Given down below). Now model turn same probibilty for eache class

void RGB24_to_Float_Asym(uint8_t *pSrc, uint8_t *pDst, uint32_t pixels){

struct rgb

{

uint8_t r, g, b;

};

struct rgbf

{

float r, g, b;

};

struct rgb *pivot = (struct rgb *) pSrc;

//struct rgbf *dest = (struct rgbf *) pDst;

float dummy;

uint16_t j,x=0;

for (int i = 0; i < pixels; i++)

{

dummy = (((float)(pivot[i].b)) / 255.0F);

uint8_t* pr = (uint8_t*)(void*)&dummy;

for(j=0;j<4;j++){

//for(j=3;j>0;j--){

pDst[x++]=pr[j];

}

dummy = (((float)(pivot[i].g)) / 255.0F);

uint8_t* pg = (uint8_t*)(void*)&dummy;

for(j=0;j<4;j++){

//for(j=3;j>0;j--){

pDst[x++]=pg[j];

}

dummy = (((float)(pivot[i].r)) / 255.0F);

uint8_t* pb = (uint8_t*)(void*)&dummy;

for(j=0;j<4;j++){

//for(j=3;j>0;j--){

pDst[x++]=pb[j];

}

void AI_Output_Display(uint8_t* AI_out_data){

uint16_t i,j,x=0;

float dummyfloat;

uint32_t u32dummy;

for(i=0;i<10;i++){

uint8_t* p = (uint8_t*)(void*)&dummyfloat;

u32dummy = (uint8_t)AI_out_data[x+3];

u32dummy = (u32dummy << 8)|(uint8_t)AI_out_data[x+2];

u32dummy = (u32dummy << 8)|(uint8_t)AI_out_data[x+1];

u32dummy = (u32dummy << 8)|(uint8_t)AI_out_data[x];

x+=4;

for(j=0;j<4;j++){

p[j]=u32dummy >> (8*j);

}

predictionval[i]=dummyfloat*100;

}

Bubblesort();

}

static void Bubblesort(void){

int total_count, counter, counter1,swap_rank;

float swap_var;

total_count=10;

for(counter=0;counter<10;counter++){

class_name_index[counter]= counter;

}

for (counter = 0 ; counter < total_count - 1; counter++){

for (counter1 = 0 ; counter1 < total_count - counter - 1; counter1++){

if(predictionval[counter1]>predictionval[ counter1+1]){

swap_var = predictionval[counter1];

predictionval[counter1]=predictionval[counter1+1];

predictionval[counter1+1]=swap_var;

swap_rank = class_name_index[counter1];

class_name_index[counter1]=class_name_index[counter1+1];

class_name_index[counter1+1]=swap_rank;

}

OguzhanBalci · ‎2024-12-13

The problem is in the Application Template. Write your own code (I probably don't know how to use Application Template)

Code from X-CUBE-AI Docs:

#include <stdio.h>

#include "network.h"
#include "network_data.h"

/* Global handle to reference the instantiated C-model */
static ai_handle network = AI_HANDLE_NULL;

/* Global c-array to handle the activations buffer */
AI_ALIGNED(32)
static ai_u8 activations[AI_NETWORK_DATA_ACTIVATIONS_SIZE];

/* Array to store the data of the input tensor */
AI_ALIGNED(32)
static ai_float in_data[AI_NETWORK_IN_1_SIZE];
/* or static ai_u8 in_data[AI_NETWORK_IN_1_SIZE_BYTES]; */

/* c-array to store the data of the output tensor */
AI_ALIGNED(32)
static ai_float out_data[AI_NETWORK_OUT_1_SIZE];
/* static ai_u8 out_data[AI_NETWORK_OUT_1_SIZE_BYTES]; */

/* Array of pointer to manage the model's input/output tensors */
static ai_buffer *ai_input;
static ai_buffer *ai_output;

/* 
 * Bootstrap
 */
int aiInit(void) {
  ai_error err;
  
  /* Create and initialize the c-model */
  const ai_handle acts[] = { activations };
  err = ai_network_create_and_init(&network, acts, NULL);
  if (err.type != AI_ERROR_NONE) { ... };

  /* Reteive pointers to the model's input/output tensors */
  ai_input = ai_network_inputs_get(network, NULL);
  ai_output = ai_network_outputs_get(network, NULL);

  return 0;
}

/* 
 * Run inference
 */
int aiRun(const void *in_data, void *out_data) {
  ai_i32 n_batch;
  ai_error err;
  
  /* 1 - Update IO handlers with the data payload */
  ai_input[0].data = AI_HANDLE_PTR(in_data);
  ai_output[0].data = AI_HANDLE_PTR(out_data);

  /* 2 - Perform the inference */
  n_batch = ai_network_run(network, &ai_input[0], &ai_output[0]);
  if (n_batch != 1) {
      err = ai_network_get_error(network);
      ...
  };
  
  return 0;
}

/* 
 * Example of main loop function
 */
void main_loop() {
  aiInit();

  while (1) {
    /* 1 - Acquire, pre-process and fill the input buffers */
    acquire_and_process_data(in_data);

    /* 2 - Call inference engine */
    aiRun(in_data, out_data);

    /* 3 - Post-process the predictions */
    post_process(out_data);
  }
}

EmreTuncer · ‎2024-12-13

Thank you very much. The examples I have seen so far were using their own temples, I think I understand why. I will try it as soon as possible and write whether it works for me or not.

Julian E. · ‎2024-12-13

Hello @OguzhanBalci,

Thank you for your help on this subject.

Could you please comment what you mean by "the problem is in the application template". What exactly do you think is the problem here and is there something you would like to see instead so that I can transmit it to the dev team.

Thanks

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

EmreTuncer · ‎2024-12-13

As far as I understand, he is talking about the built-in functions used in the app_x-cube-ai.c file.

OguzhanBalci · ‎2024-12-14

Yes

GRATT.2 · ‎2024-12-17

Hello @EmreTuncer,

Here is an answer to your questions:

1. The image preprocessing depends entirely on the preprocessing used to train the model. The data format and dimensions should be exactly the same for both training and inference. I can't provide more specific advice because I don't know how you trained your model, but the key points are: image dimensions (height * width), channel format (RGB, BGR, grayscale), data format (int8, uint8), and whether the channel is first or last.

2. To ensure you are doing it correctly, you can compare your code to the ModelZoo application code for STM32H7. It works the same way, except that the data comes from a camera instead of an SD card. Ultimately, the data is loaded into the model input buffer before inference.

3. You can use the gdb command "restore" to fill a buffer with the binary data of your choice. You can also use the command "dump" to download a buffer's content into a binary file. Tip: If you save the binary buffer content in a .data file, Gimp will be able to read this file as an image. By selecting the right data format, you will be able to visually check the image contained in the buffer.

4. There is a similar implementation in the FP-AI-VISION1 package. Please note that this package is deprecated; however, you can still extract the code you need from it.

Sorry for the late reply.

Guillaume