cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H7 Ethernet + DMA + Cache Explained + LwIP +Without Any OS, This explanation to help save your time.

English language:

STM32H7 Ethernet + DMA + Cache Explained + LwIP (EN) 

One of the most common sources of confusion when working with STM32H7 microcontrollers and Ethernet (via LwIP + HAL ETH) is data cache (D-Cache) behavior.

When using DMA (like Ethernet MAC DMA), if D-Cache is enabled (which it is by default in STM32H7 projects), you must manage cache coherency manually. Otherwise, the DMA peripheral and the CPU will not agree on what data is in RAM.

I encountered this exact issue myself and spent significant time identifying the root cause. Since many others using STM32H7 are likely to face this same problem, I'm sharing this explanation to help save your time and effort.

Problem:

If you fill a pbuf->payload buffer with data and send it using HAL_ETH_Transmit() without cleaning the D-Cache, the CPU will still be holding the most recent data in its cache, but the DMA will read stale or uninitialized data from RAM.

Result: corrupted Ethernet frames, broken ping replies, UDP issues, and silent communication errors.

Solution: Clean the D-Cache

Before calling HAL_ETH_Transmit(), you must flush the cache so the DMA can see the actual up-to-date data. This should be done inside low_level_output() function, typically located in ethernetif.c. If CubeMX overwrites this file, you can wrap it by renaming the original to low_level_output_internal() and calling it from a user-defined low_level_output() placed in a separate file such as ethernetif_user.c.

:warning:This cache clean step is necessary whether or not you use an RTOS. In FreeRTOS-based projects, the D-Cache is still active and must be manually synchronized before DMA transactions, just as in bare-metal applications.

uint32_t addr = (uint32_t)p->payload;
uint32_t size = p->len;
uint32_t aligned_addr = addr & ~0x1F;
uint32_t aligned_size = ((addr + size + 31) & ~0x1F) - aligned_addr;
SCB_CleanDCache_by_Addr((uint32_t*)aligned_addr, aligned_size);

HAL_ETH_Transmit(&heth, &TxConfig, ETH_DMA_TRANSMIT_TIMEOUT);

Summary: When to use which?

Direction Cache Operation

CPU ➔ DMA (Tx)SCB_CleanDCache_by_Addr()
DMA ➔ CPU (Rx)SCB_InvalidateDCache_by_Addr()

Turkish language:


STM32H7 Ethernet + DMA + Cache Anlatımı + LwIP (TR)

STM32H7 mikrodenetleyicilerinde Ethernet (LwIP + HAL ETH) kullanırken karşılaşılan en kritik sorunlardan biri data cache (D-Cache) ile DMA arasındaki uyumsuzluktur.

D-Cache etkinse (STM32H7 projelerinde genelde aktiftir), DMA ile çalışırken cache senkronizasyonunu manuel olarak yapman gerekir. Aksi takdirde DMA RAM’den eski veriyi okurken CPU cache'te yeni veriyi görür.

Ben de bu sorunla bizzat karşılaştım ve çözümü bulmam zaman aldı. Aynı sıkıntıyı yaşayabilecek başka geliştiricilere yol göstermesi için bu yazıyı paylaşıyorum.

Sorun:

p->payload içine veri yazdın ve HAL_ETH_Transmit() ile DMA'ya verdin ama SCB_CleanDCache_by_Addr() çağrılmadıysa:

  • CPU veriyi sadece cache’e yazmıştır

  • DMA RAM’den eski ya da boş veri alır

Sonuç: bozuk ping cevabı, boş UDP paketleri, sessiz haberleşme hataları

Çözüm: Cache Temizleme

DMA gönderimi öncesi cache’i temizlemelisin. Bu kod parçası ethernetif.c dosyasındaki low_level_output() fonksiyonuna eklenmelidir. Ancak CubeMX bu dosyayı sık sık yenilediği için, fonksiyonu low_level_output_internal() olarak değiştirip, gerçek low_level_output() fonksiyonunu ethernetif_user.c gibi bir kullanıcı dosyasına taşıman tavsiye edilir:

:warning:Bu temizleme adımı, RTOS kullanılıp kullanılmamasından bağımsız olarak gereklidir. FreeRTOS tabanlı projelerde de D-Cache aktiftir ve DMA işlemlerinden önce elle senkronizasyon gerekir.

uint32_t addr = (uint32_t)p->payload;
uint32_t size = p->len;
uint32_t aligned_addr = addr & ~0x1F;
uint32_t aligned_size = ((addr + size + 31) & ~0x1F) - aligned_addr;
SCB_CleanDCache_by_Addr((uint32_t*)aligned_addr, aligned_size);

HAL_ETH_Transmit(&heth, &TxConfig, ETH_DMA_TRANSMIT_TIMEOUT);

Özet: Ne zaman hangisini kullanmalısın?

Veri Yönü Yapılacak İşlem

CPU ➔ DMA (Tx)SCB_CleanDCache_by_Addr()
DMA ➔ CPU (Rx)SCB_InvalidateDCache_by_Addr()

Bu basit ama hayati detay, STM32H7 ile Ethernet haberleşmesinin sorunsuz çalışması için şarttır.



How to create a project for STM32H7 with Ethernet and LwIP stack working 

STM32H7 Series, Ethernet, LwIP, D-Cache, RAM, DMA

5 REPLIES 5
SFleming
Associate

Thank you for sharing.

My question is do you have an example for changes to low_level_input() ?

According to your post SCB_InvalidateDCache_by_Addr() would be called.

Ping is broken on a H753ZI code generated with LwIP STM32CubeIDE 1.18.0

Also, in CubeMX is there any specific configuration of DMA or BDMA required ?

I have already reviewed and tested with following article (with no success): 

How to create a project for STM32H7 with Ethernet and LwIP stack working 

 

Answer (English):

Thank you for your question and detailed notes.

I also experienced similar issues on STM32H7 and can confirm that CubeMX does not always correctly generate or link the required LwIP files and configurations. Sometimes changes made in CubeMX UI (e.g., enabling ICMP or setting heap pointer) are not reflected in the generated code.

Recommendation:

Use git diff or another comparison tool after each CubeMX generation to ensure your LwIP configuration and buffer placements have been applied properly in the codebase. Some flags like MEM_SIZE, LWIP_RAM_HEAP_POINTER, NO_SYS, and even ICMP settings are often ignored silently by CubeMX.


Regarding low_level_input() and SCB_InvalidateDCache_by_Addr():

You do not need to add SCB_InvalidateDCache_by_Addr() manually if your project uses HAL_ETH_RxLinkCallback() correctly.

In stm32h7xx_hal_eth.c, this callback internally already performs cache invalidation on the received buffer. To confirm this:

  • Put a breakpoint inside HAL_ETH_RxLinkCallback()

  • Then ping the board

  • If the breakpoint hits, you're safe — cache invalidation is already handled.


 On USE_HAL_ETH_REGISTER_CALLBACKS

Set USE_HAL_ETH_REGISTER_CALLBACKS to 0, not 1.
Why? Because when it's set to 1, you must manually register all callbacks, including Rx and Tx. If you don’t do that explicitly, they won't be called, and nothing will work. Setting it to 0 lets STM32 HAL automatically assign them internally.


 Finally: low_level_output() cache flush

Before calling HAL_ETH_Transmit(), you must add:

SCB_CleanDCache_by_Addr(...);

Otherwise, the buffer written by the CPU is only in cache and not visible to the DMA.

 

 

Hello @Ahmet Yasin CİVAN 

Thank you for the sharing. But please before pasting a text be sure that the all text is in English language.

Your text is English-Turkish languages mixed.

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.

Hello,

Thank you for the feedback. Initially, I included both English and Turkish versions in my response — English first, then Turkish — to support Turkish-speaking developers who may be facing the same STM32H7 Ethernet-related issue.

However, after your note, I removed the Turkish section and kept only the English version to align with the forum’s guidelines.

Appreciate your understanding!

Best regards,
Ahmet Yasin CİVAN

Ok no problem but in that case better to highlight that two languages has been used in the post.

And better to differentiate them with different colors.

Example:

English language:

.....

Turkish language:

....

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.