2025-06-12 7:00 AM
This is just a friendly PSA/warning to others who may be using this code: do not name any global/extern variable instances "context" if you use hardware-accelerated vector rendering. If you do, you'll probably run into strange hard faults or data corruption issues.
The function `stencil_buffer_set_prealloc()` writes four bytes to the global symbol with name "context", and thanks to how library linkage works there are no errors or warnings about this collision with their intended "context" variable. The library function writes the value 0x202CCC5C in my code, and from the value (and behavior when I mess with it) this looks like a pointer to some memory in RAM. This appears to be related to vector font rendering (or possibly all vector rendering).
See this relevant Stack Overflow answer for details about how this problem arises. I was not aware of this pitfall, and I will be religiously using static and name-mangling any globals from now on (I already am usually pretty good about this, but deadlines...).
I had a struct named "context" in one of my files to group up variables for easier viewing in the debugger, and I forgot to mark it static even though I was only using it internal to that file. I also just so happened to have a bool at the top of my struct to indicate that my module was initialized, and due to struct padding this has been happening for a long time without me noticing due to the non-zero value having no impact on the logic. All of the bytes in the address it sets are non-zero, and no other data in the struct was modified. I was blissfully unaware of this for months, and eventually didn't even use this flag for weeks in favor of computing whether I am initialized based on other state. As far as I knew this was just an unused flag variable that I removed while refactoring.
Unfortunately for me I put a TX_MUTEX up at the top, and about lost my sanity as I slowly discovered that merely initializing the mutex was causing a hard fault. I of course did this with many other refactoring changes to the module, so naturally I spent hours ruling out everything else that actually makes sense (like stack overflows, array access overflows, etc.) before I figured out what was going on.
In my case I got hard faults if this memory was modified. After some more debugging I noticed that if I skipped over all tx_mutex_create() initialization code except for where it memsets the struct to 0 my code would run, but all of my vector-rendered text vanished. I then noticed that all of my mutex was already 0, except for the first four bytes. When I saw that the hex value was a valid address to RAM this immediately made me think of a pointer variable guarded by a NULL check. With that clue I put a hardware watchpoint on my struct and discovered the culprit.
Up to that point I was thinking that my mutex init was somehow corrupting the nema library's memory, not that it was stealing mine. I am honestly shocked that a closed-source library would use simple names like "context" for variables with external linkage. I looked through libnemagfx.a and found some other problematic names: "stencil" and possibly "lut" (it isn't linked from other .o files in the lib, but it is shown as a global). There are also "context_" and "stencil_" globals (all of these are defined in nema_vg_context.o).
I fixed this by making my struct static, and also by renaming it to something else.
IMHO this qualifies as a bug. Hopefully this comes up for anyone who runs into this in the future and saves some time.