cancel
Showing results for 
Search instead for 
Did you mean: 

X-CUBE-AI: External memory corruption during inference on STM32U5

wwoods93
Associate II

Hello,

We are experiencing corruption of our external IS66WVO32M8 PSRAM while executing x-cube-ai generated code on target device STM32U585AIIxQ. This corruption happens consistently when running the model (i.e. running the ai_*_run() generated function), but cannot be replicated outside of it. Please see the issue description below.

 

Context

We are using the STMicroelectronics.X-CUBE-AI package version 10.2.0 (STM32CubeMX version 6.16) running the .a static library generated by the cubemx tool on our STM32U585AIIxQ target. This is a Keras neural network that has been successfully analyzed and "validated on desktop" using the tools provided through the cubemx X-CUBE-AI gui. The model requires 1.5MB of RAM, and for this we have an external PSRAM device (IS66WVO32M8DALL ISSI Octal PSRAM). Everything is apparently configured correctly, and we are able to initialize the model and run the inference using the interface provided at X-CUBE-AI/App/app_x-cube-ai.c/h in the generated code.

 

Problem

We are consistently receiving incorrect output from the neural network and can see that our external PSRAM, which serves as working memory for the x-cube-ai module, is being corrupted during execution of the inference (the ai_*_run() generated function). Due to the amount of RAM required for the model (1.5MB), this device is our only viable solution for volatile memory on our current board. The PSRAM is correctly configured in memory-mapped mode and is able to pass rigorous read/write tests both directly before and directly after the call to ai_*_run().

We have run the "on target validation" through cubemx along with our own application code/implementation with identical results. By setting a breakpoint directly after the x-cube-ai code returns from the call into the library that executes the model, we are consistently able to see corruption to the PSRAM buffer ("pool0") used by the x-cube-ai code for activations.

Based on the above, the PSRAM is unable to handle the quantity or size of the accesses being performed by the x-cube-ai model code, but we are unable to see these accesses and debugging is limited. We have tried a wide array of known-good configurations of the PSRAM device, and have been able to reduce corruption but not remove it completely. We have had success disabling DTR for memory mapped access and reducing the IS66WVO32M8's OCTOSPI2 clock from 100MHz to 50MHz (OCTOSPI clock sourced from PLL2Q at 200MHz), with diminishing returns from reductions much past 50MHz. Still, we are unable to remove the corruption completely. 

 

Questions

As stated we have strong evidence that our PSRAM device and driver, along with our implementation of the x-cube-ai converted neural net C library, are correct and functioning properly. The problem is consistent in the model and not (thus far) re-creatable elsewhere. Therefore we are primarily interested in how the generated model code is interacting with the memory. A few things we are trying to ascertain:

 

  • Is memory-mapped access by the module indeed our problem?
  • What types of accesses are being performed by the generated code? (Presumably memory-mapped only, but anything we can learn about access size and frequency, accesses to multiple OCTOSPI devices sequentially, etc. would be helpful.)
  • Closely related to above, is there a way to see the C code (function implementations) compiled into the .a library containing the model?
  • Is there anything that can be done from the X-CUBE-AI side that can potentially mitigate this? Do we stand a better chance quantized/compressed/optimized a particular way? Other settings or anything under the hood? Again here we have tried many combinations without success.
  • Is there any known issue with the STM32U585 target or IS66WVO32M8 PSRAM device that could be contributing? Or alternatively, an example of this combination working? There are several errata for our target device related to OCTOSPI, but ostensibly none are applicable to our use case. 

 

Answers to any or all of the above would be greatly appreciated, and we are more than happy to provide additional information upon request.

 

Thanks very much for your time,

 

Wilson

 

 

 

0 REPLIES 0