I'm not convinced you are approaching the problem correctly,
Ask yourself why you need this level of guidance to succeed in a task.
The DMA and POLLED uses case for SPI aren't hugely different, you aren't changing the chip select across the transfer, and you're using the machine to iterate a block of data over the channel rather than walking in manually.
Find other SPI DMA examples and APPLY them here.
Generally you'll want the Write/Read together as the writes generate the clocks, and the interface is symmetrical. When READing blocks from the media, break down the transfer into 512 byte units so your output buffer of dummy bytes doesn't get to be too large. Using two DMA channels uses more resources, and is more cumbersome, but again this is the path you have chosen.
See this example, or similar one for the unspecified STM32 being used in each case..