I don't have any material I can share, this is a space in which I work commercially, and ST provides me with no material assistance/compensation for my work here. I tend to build mine using HAL directly and using GNU/GCC with MAKE.
Would suggest initially making an application to bring-up, read, write and erase the flash. Have this be instrumented so you can test and validate functionality. Once you have that working and debugged migrate it to an External Loader. The External Loaders are a lot harder to debug, and are more like Windows DLL in their operation, which a number of defined entry points to implement specific functionality, and without main() nor interrupt support. They can use whatever hardware you have on your board, so can output diagnostic/telemetry to an available UART, indicate success failure with RED/GREEN LEDs or whatever you chose.
There are some relatively poor examples in the STM32 Cube Programmer file trees, and you might also want to look at Keil's FLASH / FLM implementation as this might have better documentation and examples, and is what ST's model is derived from.
Tips, Buy me a coffee, or three.. PayPal VenmoUp vote any posts that you find helpful, it shows what's working..