Is there a bug in the Newlib heap on the 32F417 Cube IDE library?

PHolt.1 · ‎2023-04-29

I posted details here

https://www.eevblog.com/forum/programming/help-needed-with-some-heap-test-code/

It works if you allocate the same size block each time, or if you start big and allocate progressively smaller ones. But if you start small and allocate progressively bigger ones, it fails at around half the available heap area.

Each block is freed right after allocation.

I was suspecting a bug in _sbrk but the one I use also came from ST and is actually identical to every other one I've seen for embedded. Some look different in the way they pick up the heap base address and top (using symbols defined in the linkfile) and some stupidly limit the heap at the current value of SP, but all are basically the same as mine:

// This is used by malloc().
// The original Newlib version of this, on which the ST code was based
// https://github.com/zephyrproject-rtos/zephyr/blob/main/lib/libc/newlib/libc-hooks.c
// allowed the heap to go all the way up to the current SP value, which is stupid.
// This one sets the limit at the base (lowest memory address) of the stack area.
 
caddr_t _sbrk(int incr)
{
 
	// These two are defined in the linkfile
	extern char end asm("_end");	// end of BSS
	extern char top asm("_top");	// base of the general stack
 
	static char *heap_end;			// this gets initialised to NULL by C convention
	char *prev_heap_end;			// this gets initialised on 1st call here
 
	// This sets heap_end to end of BSS, on the first call to _sbrk
	if (heap_end == NULL)
		heap_end = &end;
 
	prev_heap_end = heap_end;
 
	// top = top of RAM minus size of stack
	if ( (heap_end + incr) > &top )
	{
		errno = ENOMEM;				// not apparently used by anything
		return (caddr_t) -1;
	}
 
	heap_end += incr;
 
	return (caddr_t) prev_heap_end;
 
}

My malloc() and free() functions are mutexed to make them thread safe. The original ST libc.a had empty stubs for mutex functions but since the whole libc.a was not "weak" it had to be weakened with objcopy and then proper calls to FreeRTOS mutexes could be implemented. Accordingly, the _sbrk is also mutex protected. However I am running this test single threaded.

ST do not supply the source for libc.a (and there are many libc.a libs to choose from according to the CPU and whether newlib-nano etc) but I found various candidate sources. Unfortunately all heap sources are massive, 10k lines plus.

Pavel A. · ‎2023-04-30

Do you see the _sbrk failing at all? The default _sbrk in ST templates is simplistic, various STM32s have several RAMs so stack and malloc heap indeed can be in discontinuous areas.

ST really needs to publish a good app note on how they build the newlib and from which source base. As there are already several forks or clones.

PHolt.1 · ‎2023-04-30

Yes; I posted further data in the original link.

Whenever malloc() fails, _sbrk() failed immediately before. I think that is how it is supposed to work.

14388: malloc sbrk, incr=32768, ret=ffffffff

14390: malloc failed, bk=29900

ST supply a huge number of libc.a variants and while one can eliminate most of these as the one actually used, one is left with a few candidates. Then there is a load of course code around the internet. For example I found the printf lib is a newlib one dated 1990. I was able to do that by comparing Cube disassembly with the sourcecode. I haven't found the heap code for sure. This is from my own doc:

The suggested candidate source code

https://github.com/devkitPro/newlib/blob/master/newlib/libc/stdlib/_mallocr.c

appears to not be the one (unless fixed since) because the problem described here

https://stackoverflow.com/questions/39088598/malloc-in-newlib-does-it-waste-memory-after-one-big-failure-allocation/76138157#76138157

(which does refer to the above source) is not present in ours. A malloc of > heap space fails but subsequent reduced mallocs work fine. A more likely candidate sourcecode is the “nano�?

https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdlib/nano-mallocr.c;h=13b72c99ffd7007b53c2e3270a56da237857742a;hb=HEAD

but this has another issue:

https://www.eevblog.com/forum/programming/help-needed-with-some-heap-test-code/msg4838741/#msg4838741

the conditions for which remain to be clarified.

Also I have not selected "nano" in my build options:

The problem is that the bug suggested today in the eevblog thread would be quite serious.

PHolt.1 · ‎2023-04-30

Problem solved - see above link.

One has to allocate (and free) the biggest possible block, at product startup. The heap code then works as it should afterwards.

Re the ST libc.a sources and documenting which are used, I found this

https://nadler.com/embedded/newlibAndFreeRTOS.html

which suggests that Cube MX has access to the sources and it compiles them according to options like configUSE_NEWLIB_REENTRANT. Having spent so much time replacing e.g. the crappy newlib printf family in libc.a (which wasn't "weak" so I had to do that with objcopy first) this totally amazed me. Is it really true that MX builds these libs according to FreeRTOSconfig.h ? The sources are not in c:\ST. Only the compiled libs are there.