2018-05-20 12:28 PM
On an STM32F103 nucleo board I occasionally find my program locked in a loop waiting for the SPI2's BSY bit to become cleared.
I am, honestly, totally clueless what is happening here. I am hardly able to track down the issue. What I found out so far:
Errata: There might be the possibility of a silicon issue. I found a couple of related discussions:
Optimization: The issue arises depending on the level of optimization requested from the (official) ARM-GCC. The code in question is:
static void txbuf(uint8_t *data, uint16_t n)
{
while (n--) {
while (!(SPI2->SR & SPI_SR_TXE));
SPI2->DR = *data++;
}
}�?�?�?�?�?�?�?�?�?�?�?�?�?�?
What happens is that with optimization turned off everything works. With -O1 or -O2 enabled I found out that the SPI peripheral is actually generating 16 clocks per iteration of the loop. In the disassembly this translates to
08002008: strh r2, [r4, #12]
where r4 is the SPI2's memory offset and r2 is the byte to transmit. Everytime I step this instruction in instruction stepping mode I can see 16 clocks on the oscilloscope. I also examined that DFF is 0. Setting DFF to 1 causes it to output 32 clocks per instruction.
SPI clock: The SPI2 peripheral is clocked without prescaler from the APB1, which is running at 36MHz. System clock is 72MHz. The SPI clock thus is 18MHz, which I can also observe on the oscilloscope. Choosing a larger prescaler makes everything work at any level of optimization.
Update May 21
https://community.st.com/people/Waclawek.Jan
I am using GCC, too, from the official ARM release.The assembly looks almost the same for any level of optimization chosen:
8002522: 8923 ldrh r3, [r4, #8]
8002524: f013 0f02 tst.w r3, #2
8002528: d0fb beq.n 8002522 <command+0x62>
800252a: f812 3b01 ldrb.w r3, [r2], #1
800252e: 81a3 strh r3, [r4, #12]
8002530: 42aa cmp r2, r5
8002532: d1f6 bne.n 8002522 <command+0x62>
r4 is the SPI2 memory base and r2 is the buffer to be transmitted. It first spins until the TXE bit becomes set, then transfers one byte at a time from memory to the DR. The loop I got stuck in then looks like this:
255 while (!(SPI2->SR & SPI_SR_TXE));
08002818: ldr r2, [pc, #164] ; (0x80028c0 <sd_enable+172>)
0800281a: ldrh r3, [r2, #8]
0800281c: lsls r1, r3, #30
0800281e: bpl.n 0x800281a <sd_enable+6>
256 while (SPI2->SR & SPI_SR_BSY);
08002820: ldr r2, [pc, #156] ; (0x80028c0 <sd_enable+172>)
08002822: ldrh r3, [r2, #8]
08002824: lsls r3, r3, #24
08002826: bmi.n 0x8002822 <sd_enable+14> <--- Stuck here
SPI configuration. I think I found out something else. In this application the SPI2 is first configured at 280kHz or so. Then some communication is successfully going on, including the waiting for the BSY flag.
The SPI is then reconfigured to a faster speed like so:
SPI2->CR1 &= ~SPI_CR1_SPE;
SPI2->CR1 &= ~SPI_CR1_BR;
SPI2->CR1 |= SPI_CR1_SPE | BR_TRANS;�?�?�?
And then the weird thing begin.
However, when I insert a NOP instruction between lines 2 and 3 everything is fine. The assembly again looks unsuspicious:
732 SPI2->CR1 &= ~SPI_CR1_SPE;
080029b0: ldrh r2, [r3, #0]
080029b2: bic.w r2, r2, #64 ; 0x40
080029b6: lsls r2, r2, #16
080029b8: lsrs r2, r2, #16
080029ba: strh r2, [r3, #0]
733 SPI2->CR1 &= ~SPI_CR1_BR;
080029bc: ldrh r2, [r3, #0]
080029be: bic.w r2, r2, #56 ; 0x38
080029c2: lsls r2, r2, #16
080029c4: lsrs r2, r2, #16
080029c6: strh r2, [r3, #0]
734 SPI2->CR1 |= SPI_CR1_SPE | BR_TRANS;
080029c8: ldrh r2, [r3, #0]
080029ca: orr.w r2, r2, #64 ; 0x40
080029ce: strh r2, [r3, #0]
If the system clock is twice as fast as the peripheral clock (SYSCLK is 72MHz, APB1 clock is 36MHz) are there any synchronization constraints?
Any clues appreciated. I will update this with further findings as appropriate...
#stm32f103 #spi2018-05-20 02:21 PM
So, at the end of the day, you see twice as much clocks transmitted as you expect?
And how does this relate to the problem with BSY?
Please read out and post the SPI registers content and the disassembly (best mixed with source, I don't know how to achieve that in arm's tools, I use gcc) of relevant parts of program (including the wait for busy).
JW
2018-05-20 04:12 PM
unsigned char SPI_t::transfer_receive(unsigned short data) {
char RxSPI;
// Clear_SPI1_nSS_Out(); while (!(hspi1.Instance->SR & SPI_FLAG_TXE)) ;*((__IO uint8_t *)&hspi1.Instance->DR) = data; // force the SPI to transceive 8 bit
while (!(hspi1.Instance->SR & SPI_FLAG_TXE)) // wait to leave the first buffer ; while ((hspi1.Instance->SR & SPI_FLAG_BSY)) // wait to leave the chip ; while ((hspi1.Instance->SR & SPI_FLAG_RXNE)) // read out all bytes in Rx ( usually double buffered) RxSPI = hspi1.Instance->DR; // read all Rx bytes, the last one is yours. // Set_SPI1_nSS_Out(); return RxSPI;}