Problems with inference using NUCLEO-N657X0-Q

LisaB · ‎2025-04-10

Hi everyone,
I'm trying to run inference with a TFLite model on a Nucleo-N657X0-Q board.

The issue I'm facing is that the network returns a constant output for constant inputs - but the value of this constant changes each time I restart the application. When the input varies, the output also varies, but the values returned are not consistent with those obtained during the validation process.

The model was quantized using a post-training quantization approach.

I'm using AXISRAM4 and AXISRAM5, which have been correctly initialized and enabled in the main.c.

I attach the .ioc file, the app_x-cube-ai.c file and the quantized tflite model for reference.

Thanks in advance for any suggestions!

SlothGrill · ‎2025-04-14

Hello Lisa,

From what you sent, we can easily reproduce the issue, thanks for providing the project.

Using what has been provided, the main issue is that the weights of the network are not initialized (we do not have access to the weights initializer file -i think).

In your case, you requested for the weights to be stored in internal RAM (684.601kB stored in npuRAM4 and 5). Since this memory is not initialized neither by the code, nor by an external debugger access, the weights used are "the random values sitting there when the ram was powered up". Leading to seemingly random outputs at each restart.

When you try to do a "validation on target" using the tools, the weights are actually loaded by the debugger on board, after powering-up the RAMs, ensuring proper outputs.

For your example, you should maybe store the weights in flash (and fetch them from flash at inference time) or copy/paste them from flash to internal rams.

Anyway, for testing purposes in your project, you can fake the fact that weights are all 0 everywhere by adding the following extra initialization step before your `for` loop that sets the inputs:

	memset(0x34270000, 0, 448*1024);  // Set npuRAM4 contents to 0
	memset(0x342E0000, 0, 448*1024);  // Set npuRAM5 contents to 0
	SCB_CleanInvalidateDCache_by_Addr(0x34270000, 448*1024); // Ensure the 0 are all written to physical memory
	SCB_CleanInvalidateDCache_by_Addr(0x342E0000, 448*1024); // Ensure the 0 are all written to physical memory

You will see that the outputs now are the same at every reset.

Keep us informed if there is still an issue.

Best regards.

View solution in original post

Julian E. · ‎2025-04-10

Hello @LisaB,

Did you used any argument/option other than the default one in X cube AI?

At first glance, it looks like a cache issue.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

LisaB · ‎2025-04-10

Hi @Julian E.,

thanks for your reply!

As you can see in the attached picture I used the default options for X cube AI.
I thought about a cache problem too, for this reason i added the command
SCB_CleanInvalidateDCache_by_Addr(buffer_in, nn_in_len)
before the network initialization and the command
SCB_InvalidateDCache_by_Addr(buffer_out, nn_out_len)
after the network inference, but I was not able to solve the problem (you can find this in the file app_x-cube-ai.c previously attached).

Julian E. · ‎2025-04-11

Hello @LisaB ,

Can you copy/paste the "Extra Neural Art option", I don't think I can see them all in the screenshot.

Did you used the --cache-maintenance option?

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

LisaB · ‎2025-04-11

Hi @Julian E.,

these are the options:

--all-buffers-info --mvei --no-hw-sw-parallelism --cache-maintenance --Oalt-sched --native-float --enable-virtual-mem-pools --Omax-ca-pipe 4 --Oshuffle-dma --Ocache-opt.

I didn't change anything, they are the default ones.

Julian E. · ‎2025-04-11

Hello @LisaB,

Can you share your whole project in a .zip file please.

We don't see anything wrong.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

LisaB · ‎2025-04-11

Hi @Julian E.,

I attach the whole project.

If I insert a breakpoint at the row 142 of the file app_x-cube-ai.c, each time I restart the debug session the value pointed by buffer_out (quantized value returned by the first output neuron after the inference) changes.

Many thanks for the interest expressed in my topic.

Lisa

SlothGrill · ‎2025-04-14

Hello Lisa,

From what you sent, we can easily reproduce the issue, thanks for providing the project.

Using what has been provided, the main issue is that the weights of the network are not initialized (we do not have access to the weights initializer file -i think).

In your case, you requested for the weights to be stored in internal RAM (684.601kB stored in npuRAM4 and 5). Since this memory is not initialized neither by the code, nor by an external debugger access, the weights used are "the random values sitting there when the ram was powered up". Leading to seemingly random outputs at each restart.

When you try to do a "validation on target" using the tools, the weights are actually loaded by the debugger on board, after powering-up the RAMs, ensuring proper outputs.

For your example, you should maybe store the weights in flash (and fetch them from flash at inference time) or copy/paste them from flash to internal rams.

Anyway, for testing purposes in your project, you can fake the fact that weights are all 0 everywhere by adding the following extra initialization step before your `for` loop that sets the inputs:

	memset(0x34270000, 0, 448*1024);  // Set npuRAM4 contents to 0
	memset(0x342E0000, 0, 448*1024);  // Set npuRAM5 contents to 0
	SCB_CleanInvalidateDCache_by_Addr(0x34270000, 448*1024); // Ensure the 0 are all written to physical memory
	SCB_CleanInvalidateDCache_by_Addr(0x342E0000, 448*1024); // Ensure the 0 are all written to physical memory

You will see that the outputs now are the same at every reset.

Keep us informed if there is still an issue.

Best regards.

SlothGrill · ‎2025-04-14

Hello again, Lisa,

Just for our culture, we noticed that the last "dense" layer of your model has a unique quantization pattern (per-channel).
Could you share how you ended up with such a configuration for the quantization, please ?

Thanks a lot, have a nice day.

LisaB · ‎2025-04-14

Hi @SlothGrill,

thanks for your response!

I've tried to enable the external flash by configuring the XSPI2 peripheral, but there is still something missing since the execution remains stucked in the inference while loop...

I attach the Python script that I used to generate, train and quantize the resnet model!

Lisa