When SPI is in SLAVE mode, clocked externally, NSS down, data Tx buffer is empty (SPI_SR.TXE is set); if byte/halfword is written to SPI_DR during the SCK period when it is supposed to output the first bit (between edges opposite to sampling edges), the first bit may be received corrupted by the master receiver.
This is because during that period, whenever the write occurs, its first bit is output onto MOSI immediately, possibly violating the setup time to the sampling edge or even outputting the first bit after the sampling edge.
I tried CPOL=1 and CPHA=1 only.
In attachment, code to reproduce the problem on an STM32F4 DISCO with shorted PB10-PB12 (GPIO Out -> NSS), PB11-PB13 (GPIO Out ->SCK) and PB14-PB15 (MOSI->MISO). In the tss array, status bits are stored edge-by-edge, to be observed by a debugger when arrives at the while(1) loop. Deliberately long delays between signals' edges are used to exclude any slow-APB-fast-SCK issues; I use no clock-adjusting code in startup thus run out of HSI in default RSS settings.
The attached spreadsheet attempts to depict the signals and data how they change SCK edge-to-edge (empty bit-wide means 0, it IMO better for visualisation). Willing to explain further if needed.