Curious as to the distance between your devices? Doubt that you have added line buffers but range extension would heighten interest/forum response. (you don't identify signal corruption - wonder about your BER)
Most recent post notes that "the master instructs slave to construct." And this does insure current data - at the cost of "turn-around time" and perhaps leaving the slaves "too idle."
Lastly - assume you gate the NSS line with STM32's GPIO to "CS" individual Slaves. Have you tried experimenting with a single, master-slave - just to eliminate this gating as a factor?
CPLD to manage multiple slaves:
Appears your/our technique substantially similar. We did one thing that may be of interest/value - while awaiting new, "multi-bit" SPI serial flash devices, we employed the CPLD to manage multiple, SPI serial flash chips - operated in parallel.
May we ask if you use a "decode" (like HC138: 3 in -> 8 out) function or decode the end address via a unique SPI bit pattern - sent prior to remote SPI "engagement?"
Retrieving data ...