memcpy corrupts data far away IN FRONT OF destination while parameters are within bounds.

Claegtun · ‎2024-02-10

Greetings,

This is a revival of this question but refocused on the actual line of code iteself. Also, I thought that I may have better luck on this board instead.

We have a problem where a certain call of memcpy overwrites/corrupts memory elsewhere, but this seems to not be a simple case of overflow since the place that gets corrupted is before the destination, i.e. the address's value is lower, i.e. 'behind' instead of 'ahead'. Furthermore, when I look at the values of all the relevant parameters and even the registers at the assembly-level, everything seems to be within bounds and in order.

The context of the memcpy is that we are copying bytes of a custom protobuf packets from the USB RX-buffer in CDC_Receive_HS callback to our own FIFO ring-buffer.

What happens is that the memory at 0x2400162c, which happens to be hUsbDeviceHS.ep_out[1].rem_length, begins to get corrupted and keeps corrupting all the memory from then onwards, right down to some important variables in rx_buffer that break the code with a hard fault. It seems that the corruption is endless as, if it wasn't for the hard fault, it would keep going to the bottom of the .bss section, and maybe even to that of whole D1 region.

After I set a watchpoint on 0x2400162c, I found what line of code was corrupting the data, but it is still confusing as to why. The line of source in question is:

memcpy(&rx_buffer.data[rx_buffer.back], &Buf[0], *Len);

in the CDC_Receive_HS callback where

rx_buffer.data is a 2048 byte long array,
rx_buffer.back is 1135,
uint8_t* Buf and uint32_t* Len are arguments of the callback,
and *Len is 113.

Everything seems to be within bounds. The exact line of assembly is:

08013662 strb.w r4, [r3, #1]!

where

r4 is 0xef,
and r3 is 0x24002c06, which points to rx_buffer.data[rx_buffer.back].

So far, this seems to be the storing half of the memcpy, and everything again seems to be in bounds.

Furthermore, I am pretty sure that this break at the watchpoint is the event of corruption since when I continue the debugger (F8), it keeps hitting the watchpoint, and .rem_length (which is uint32_t) accumulates bytes in little endian order (the corruption seems to be byte-wise) until all four bytes are overwritten and it then hits the hard-fault breakpoint where the corruption has continued past to everything afterwards. I even put watchpoints at variables a bit further ahead to confirm the behaviour. So, it does not seem to be legimate writing of the .rem_length member.

The corrupting data also seems to be random-like garbage, not the bytes of our custom packet which has a lot of repetition and 0x00s. Sometimes, the first byte that corrupts 0x2400162c is either 0xa1 or 0x5e following a varying string of 0x00s. So, sometimes the location gets filled as 0x...a100, 0x...a10000, 0x...a1, etc. I have not played around much to tell how consistent this is.

Also, as an aside of the IDE, when I scroll up on the Disassembly view where the watchpoint has been hit, the blue arrow disappears and the address next to the strb.w line changes from 08013662 to 08013663. So, not actually sure where the watchpoint is hit. Also, in one of the attachments, see how there is a push two lines above. At first, I thought that the stack was overflowing there, but the stack-pointer was 0x2407fb60, well within the ._user_heap_stack.

This seems absolutely bizzare. Even, @TDK can't find why. I am going to post on other forums, stackexchange, etc. and maybe try some different things like copying *Len to a local variable.

One caveat that I didn't say before in the older thread was that the project is in C++, but of course, ST's USB code much like the HAL code is written as C. I am wondering whether the mixture of C and C++ is breaking something since the CDC_Receive_HS function is in a .c file and usbd_cdc_if.h is wrapped in extern "C". Would a temporary solution be to somehow use a 'safer' C++ alternative to memcpy if one can even do that given that file is written in C (maybe rewriting it as .cpp), or maybe even use memory-to-memory DMA (which we may do anyway as an upgrade for optimisation) or a simple for-loop.

I understand that this more of a unique bug-fixing problem. This is my first time that I have to look for a memory-corruption/overflow this deep in complexity. So, any guidance is appreciated.

Thanks,

Claegtun · ‎2024-02-10

Partially solved at the original thread.

https://community.st.com/t5/stm32-mcus-embedded-software/memory-is-corrupted-from-husbdevicehs-ep-out-1-rem-length/m-p/638148

View solution in original post

Claegtun · ‎2024-02-10

Partially solved at the original thread.

https://community.st.com/t5/stm32-mcus-embedded-software/memory-is-corrupted-from-husbdevicehs-ep-out-1-rem-length/m-p/638148

tjaekel · ‎2024-02-11

Hard to believe that mempy() has a bug. It is used all over the place and it should fail on other places where used as well.

I assume a different issue: stack size, INT handler is doing something wrong...

If an INT happens - it needs a lot of stack size (to save the registers). If stack is too small, or INT handler (what is done during INT) corrupts the stack - then memcpy() could fail, esp. if the saved registers during INT are corrupted (because stack during INT is corrupted and popping back all the registers let's memcpy() keep going with wrong register content).

The correlation between "the corruption happens during memcpy()" and watch point is triggered - I would not rely on the watchpoint: if an INT kicks in, the watch point might still "assume" the memcpy() code is executed (but it is not: the INT handler code instead).

If you change memcpy() to your own copy loop - it changes the timing: now the INT (where the handler is maybe the root cause) is now different in a timing relation. It looks different now. Sometimes adding code and shuffling code around could "solve" a problem (it looks like) but just because of: it does hit you now on a different memory, maybe not effecting the "moved code". I had projects where a simple printf() "has solved an issue".

You could try these things:

increase the stack size: maybe you are running at the bottom of stack and when an INT happens, it corrupts some other data in memory (other sections, before the stack region, the stack might overflow during an INT)
encapsulate the memcpy() with a __disable_irq() and __enable_irg():
is your watch point now triggered on some other code line (not memcpy() anymore)?
Assuming it complains now that the instructions/code after memcpy() seem to be faulty. It would be a clear indication that your INT handler corrupts the memory.
Or disable the USB INT when you do memcpy().
watch the stack usage (with "coloring the stack"), check the "memory layout": is something before the stack damaged? Do you find a register set to a wrong memory address? (esp. after an INT has interrupted)

Claegtun · ‎2024-02-12

Edit: I see your point now.

Thanks,