Help with configuring DMA (with interrupts) for SPI on STM32L4R5 using assembly code

CHowa · ‎2019-07-28

I'm new to Cortex-M, STM chips, and the STM32L4R5 in particular, and am having problems getting this working. I have used ARM chips (since ARM2 :-), Atmel <x|mega>AVR and the Raspberry Pi Cortex-A chips...

I seem to be the only person who prefers assembly code over C / Cube / HAL etc... but that's a different matter :)

I think I have SPI configured correctly, because writing to SPI_DR works (I see SPI_SCK go active and SPI_MOSI shows the correct bits on the oscilloscope). [The STM32L4R5 is the master. The slave is silent. 2 MHz clock. 16-bit transfers. LSB first. clock on leading, rising edge. No NSS line used. Using SPI1 and DMA1_CH1]. But if I set up things for DMA transfer from memory to SPI and enable DMA and SPI, nothing gets transferred.

enable GPIOA, DMA and SPI clocks in RCC.
set up SYSCLK and PLL in RCC to get 64 MHz.
set the GPIO_MODE to AF for PA5 and PA7 and set GPIO_AFRL to AF5 for both pins.
set bits LSBFIRST | BR_DIV32 | MSTR | SSM | SSI in SPI1->CR1
set bit DS16 in SPI1->CR2
enabled DMA1_CH1 interrupts in NVIC->ISE0 (bit 11)
set bits MSIZE_16 | PSIZE_16 | MINC | DIR | TCIE in DMA1->CCR1
store the destination address (SPI1 + SPI_DR) in DMA1->CPAR1
store DMAREQ_ID_SPI_TX (11) in DMAMUX->C0CR
sent two 16-bit data-chunks by
1. setting SPE in SPI1->CR1
2. storing a half-word in SPI1->DR
3. immediately storing another half-word in SPI1->DR
4. waiting until FTLVL and then BSY in SPI1->SR are clear
5. clearing SPE in SPI1->CR1
store the data origin (memory) address in DMA1->CMAR1
store the count [ n(half-words) ] in DMA1->CNDTR1
set bit EN in DMA1->CCR1
set bit TXDMAEN in SPI1->CR2
set bit SPE in SPI1->CR1

At this point I expect the DMA to send the data from memory to the SPI, and SPI to activate its clock and MOSI lines. But nothing happens.

At the end of the transfer I expect the DMA1_Channel1_handler to get called and do:

clear the interrupt flag by writing a '1' (CTCIF1) to DMA1->IFCR
clear bit EN in DMA1->CCR1
write a new memory address in DMA1->CMAR1
rewrite the count to DMA1->CNDTR1
set bit EN in DMA1->CCR1

And finally terminate with:

write a dummy data half-word to SPI1->DR
clear bit EN in DMA1->CCR1
wait until FTLVL and then BSY in SPI1->SR are clear
clear bit SPE in SPI1->CR1
clear bit TXDMAEN in SPI1->CR2

Assuming all my register base addresses, offsets and bit definitions etc are correct -- is there any obvious step I've left out? If not, any ideas as to how I can proceed with testing/debugging? I can't see what's going wrong, but the dma_req from the SPI (due to TXE bit being set in SPI1->SR) doesn't seem to be arriving at the DMA.

I'm using arm-none-eabi-gdb with openocd 0.10.0.+dev-00921-g263deb38 on a Mac. The chip is on a Nucleo-L4R5ZI board.

Thanks for any pointers...

thanks4opensource · ‎2019-07-28

> Mmmm. STM documentation is perfectly clear -- if you already know precisely what it's trying > to say.

We might be saying the same thing, but I agree in that it's clear if you already know the answers -- and therefor probably don't need the documentation in the first place.

My major complaint is the lack of cross references to required prerequisites. Sections on how to program peripherals don't point back to the RCC peripheral clock enable register which needs to be set to make the peripheral work in the first place. Sure, once you know this it's easy, but the first time through? (Unless you've read the entire manual and remember what was in the RCC section hundreds of lines earlier.) BTW, NXP does much better in this regard.

There's also the RCC clock configuration diagram which uses different nomenclature and provides no references to the registers and bits which control the various muxes, etc. And for the absolute worst example of missing/hidden crucial documentation, see my second post in https://community.st.com/s/question/0D50X0000B8hz9vSQA/how-to-do-a-spi-communication-on-stm32f7-without-dma?t=1564374099538 and related links.

> JW already found the problem: I simply hadn't enabled the DMAMUX clock. So, the

> recipe is simply the first list in the first post. Except line 1 should be:

>

> 1. enable GPIOA, DMA1, DMAMUX and SPI1 clocks in RCC.

Yes, thanks. I'd misinterpreted what you'd written and thought there were further problems to be solved.

> If you look at C code as written by Thompson and Ritchie (Lion's commentary on

> UNIX 6th edition -- or something like that) and compare it with autogenerated

> stuff intended for consumption by a compiler... well. It's different.

My complaint isn't even with the auto-generated code from CubeMX, etc. -- I gave up on that soon after it produced code for me with hundreds of lines of boilerplate, yet after a day of debugging discovered that I had to manually insert several calls to "init" functions for various subsystems which were explicitly specified in the GUI configuration (and code for which was included in the sources it generated). How those calls belonged in a section labeled "Insert User Code Here" instead of the auto-generated initialization section of main() was beyond me.

No, it's with the HAL and even so-called "low-level" ("LL") libraries which are the only examples, auto-generated or not, which STM currently distributes. Their design requires user code to populate large data structures (on the stack or even worse in static memory) which are passed to library routines which merely unpack them to do the actual work of setting hardware registers. This doesn't even provide much benefit in terms of claimed portability (between ST MCUs only, of course) because chip-specific knowledge is still required to correctly use the structs. Beyond this are the layers upon layers of indirection, macros, wrappers, etc. I find this bloat and inefficiency to be an order of magnitude worse than anything which can be gained by going from C to assembly. And I partly disagree with Clive Two.Zero about assembly being deprecated due to its increased development cost. That's certainly true, but in my case I find that assembly, except in vary rare, isolated instances, produces no improvements over modern C/C++ compilers (rather than the improvements not being economically worthwhile).

> Have a look at the book I linked in my second post. It's now free and still almost

> all relevant. It's an elegant, well-written little book.

Thanks for the link to Peter Cockerell's book (and to him for distributing it freely) but as nice of an introduction to assembly coding as it is, I found it fairly out of date for the current Cortex Thumb/Thumb2 chips. The changes when they went from the original 32-bit instruction set to 16-bit Thumb (and 32-bit Thumb2), including the required removal of the condition codes from most of the instructions, are significant. I've found that references such as "ARM v7-M Architecture Reference Manual", "ARM ® Cortex ® -M4 Processor Technical Reference Manual", and the "Procedure Call Standard for the ARM ® Architecture" (among others) to be necessary and, although not perfect, sufficient.

> Then I read that the Cortex-M (which has its interrupt system, LR and even the

> bloody vector table messed up -- because it's specialised for fast interrupt handling)

Yes, when I started writing for ARM chips I was very naive about this. I was under the mistaken impression that ARMs were RISC chips that executed all instructions in one clock cycle. As i said before, ARM-v7m has some very complex instructions. When I found out that interrupts take 12-15 cycles (on M0+ to M4) I had to re-architect some time-critical code from interrupts to polling despite my strong design preference for the former.

All very ironic when you read the published claims that current ARM chips have best-in-class interrupt performance, and that this was a primary design goal when they developed the architecture.

CHowa · ‎2019-07-29

OK, this thread has strayed a little, but I don't mind. I also agree with you completely. So there are at least two of us :) But I can see the temptation to pull your leg a bit... ;)

The STM32L4R reference manual is over 2000 pages (while the Cockerell book is under 200 pages) but you're not expected to read all of it! Unfortunately, you do need to read all of it, in order to determine which parts you do/don't need to read :) So, you start at the beginning. Maybe you'll get a hang of things and find you can skip parts...

It turns out ... the chip has memory [2], it has flash [3], some access protection [4], the chip powers up [5] etc etc... all of this has sensible defaults and you don't need to do anything. After over 200 pages you come to something important

6.Reset and clock control (RCC)

...

When exiting Standby mode, all registers in the VCORE domain are set to their reset value. Registers outside the VCORE domain (RTC, WKUP, IWDG, and Standby/Shutdown modes control) are not impacted.

[aha. good. so?]

Low-power mode security reset

To prevent that critical applications mistakenly enter a low-power mode, two low-power mode security resets are available. If enabled in option bytes, the resets are generated in the following conditions:

...

[yeah, yeah...]

Four different clock sources can be used to drive the system clock (SYSCLK):

HSI16 (high speed internal)16 MHz RC oscillator clock
MSI (multispeed internal) RC oscillator clock
HSE oscillator clock, from 4 to 48 MHz
PLL clock
The MSI is used as system clock source after startup from Reset, configured at 4 MHz.

[so, you switch it on, and it will run. Albeit slowly. OK - PLL will be worth checking out...]

Each clock source can be switched on or off independently when it is not used, to optimize power consumption.
Several prescalers can be used to configure the AHB frequency, the APB1 and APB2 domains. The maximum frequency of the AHB, the APB1 and the APB2 domains is 120 MHz.

[uhuh. ok]

When the MSI clock is auto-trimmed with the LSE, it can be used by the USB OTG FS device.
When available, the HSI48 48 MHz clock can be coupled to the clock recovery system allowing adequate clock connection for the USB OTG FS (Crystal less solution).

[splendid]

The SAI1 and SAI2 clocks which are derived (selected by software) from one of the following sources:
- – an external clock mapped on SAI1_EXTCLK for SAI1 and SAI2_EXTCLK for SAI2
- – PLLSAI1 VCO (PLLSAI1CLK)
- – PLLSAI2 VCO (PLLSAI2CLK)
- – main PLL VCO (PLLSAI3CLK)
- – HSI16 clock

[ummm. later. maybe]

For full details about the internal and external clock source characteristics, please refer to the “Electrical characteristics�? section in your device datasheet.
The ADC clock can be derived from the AHB clock of the ADC bus interface, divided by a programmable factor (1, 2 or 4). When the programmable factor is ‘1’, the AHB prescaler must be equal to ‘1’.

[ho hum. yawn.]

The HSERDY flag in the Clock control register (RCC_CR) indicates if the HSE oscillator is stable or not. At startup, the clock is not released until this bit is set by hardware. An interrupt can be generated if enabled in the Clock interrupt enable register (RCC_CIER).

[my board doesn't have an HSE oscillator ...]

other clocks, PLL, calibration, clock "security", ADC clock, RTC, timers, watchdog, ... [break for dinner] ... clock-out capability ...

The MSI and HSI16 oscillator both have dedicated user-accessible calibration bits for this purpose.
The basic concept consists in providing a relative measurement (e.g. the HSI16/LSE ratio): the precision is therefore closely related to the ratio between the two clock sources. The higher the ratio is, the better the measurement will be.
If LSE is not available, HSE/32 will be the better option in order to reach the most precise calibration possible.
It is however not possible to have a good enough resolution when the MSI clock is low (typically below 1 MHz). In this case, it is advised to:

accumulate the results of several captures in a row
- use the timer’s input capture prescaler (up to 1 capture every 8 periods)
- use the RTC wakeup interrupt signal (when the RTC is clocked by the LSE) as the input for the channel1 input capture. This improves the measurement precision. For this purpose the RTC wakeup interrupt must be enable.

[snore]

peripheral clock enable register (RCC_AHBxENR, RCC_APBxENRy),

Each peripheral clock can be enabled by the xxxxEN bit of the RCC_AHBxENR, RCC_APBxENRy registers.

When the peripheral clock is not active, the peripheral registers read or write accesses are not supported.

low power modes,

AHB and APB peripheral clocks, including DMA clock, can be disabled by software.
Sleep and Low Power Sleep modes stops the CPU clock. The memory interface clocks (Flash, SRAM1, SRAM2 and SRAM3 interfaces) can be stopped by software during sleep mode. The AHB to APB bridge clocks are disabled by hardware during Sleep mode when all the clocks of the peripherals connected to them are disabled.
Stop modes (Stop 0, Stop 1 and Stop 2) stops all the clocks in the VCORE domain and disables the three PLL, the HSI16, the MSI and the HSE oscillators.
All U(S)ARTs, LPUARTs and I2Cs have the capability to enable the HSI16 oscillator even when the MCU is in Stop mode (if HSI16 is selected as the clock source for that peripheral).
All U(S)ARTs and LPUARTs can also be driven by the LSE oscillator when the system is in Stop mode (if LSE is selected as clock source for that peripheral) and the LSE oscillator is enabled (LSEON). In that case the LSE remains always ON in Stop mode (they do not have the capability to turn on the LSE oscillator).
Standby and Shutdown modes stops all the clocks in the VCORE domain and disables the PLL, the HSI16, the MSI and the HSE oscillators.

There you go. Can't ask for clearer than that, can you? You didn't miss "When the peripheral clock is not active, the peripheral registers read or write accesses are not supported." on p.234, did you? You didn't maybe just skip to 8. GPIO to switch your LED on and off?

Now, at the beginning of each peripheral section they could write:

Note, in order to use this peripheral you must:

enable its clock by setting bit xxxx in RCC_AHB1ENR [see section 6.4.16]
enable the clock for PORTx [see section ...] (for peripherals that use pins)
set bits *** in the ALTERNATE FUNCTIONS registers (see Datasheet and Programming Manual for your chip)
enable the peripheral by setting the ...EN bit in the peripheral's control register (...CRx)

Something like that. But that would make things too easy and boring.

CHowa · ‎2019-07-29

"Yes, when I started writing for ARM chips I was very naive about this. I was under the mistaken impression that ARMs were RISC chips that executed all instructions in one clock cycle. As i said before, ARM-v7m has some very complex instructions."

Well, that's why I keep mentioning the book. The ARM 3 had about 20 instructions which execute in 1 cycle, except for load/store operations and branches. And MUL -- but MUL is nice to have.

"When I found out that interrupts take 12-15 cycles (on M0+ to M4) I had to re-architect some time-critical code from interrupts to polling despite my strong design preference for the former."

Actually, it's a bit odd. I think the Cortex-A [more or less a microprocessor for running a computer with operating system etc] still does the old jump to interrupt handler thing (where you have banked registers) and leaves you to push/pop if you need to. But the Cortex-M [for microcontrollers] gives you no choice about stacking loads of registers, assuming you're using AAPCS, I suppose. I'd have expected it to be the other way around.

waclawek.jan · ‎2019-07-29

Actually, this is written at every example for every peripheral in the Snippets (and basically that's what is appendix A in the respective RMs).

Snippets are available for F0 and L0 only.

JW

thanks4opensource · ‎2019-07-29

> Now, at the beginning of each peripheral section they could write:

>

> Note, in order to use this peripheral you must:

> 1. enable its clock by setting bit xxxx in RCC_AHB1ENR [see section 6.4.16]

> 2. enable the clock for PORTx [see section ...] (for peripherals that use pins)

> 3. set bits *** in the ALTERNATE FUNCTIONS registers (see Datasheet and Programming

> Manual > for your chip)

> 4. enable the peripheral by setting the ...EN bit in the peripheral's control register (...CRx)

Exactly.

> Something like that. But that would make things too easy and boring.

And make it easier to write code without using Cube{MX,IDE} and/or the HAL/LL libraries, thus potentially lowering the customer's commitment to the ST ecosystem, and also letting them spec lower end ST chips with less FLASH and RAM because their code is more efficient and smaller in size. Just sayin'. :)

BTW, here's an NXP example, from "UM10800 LPC82x User manual Rev. 1.2 — 5 October 2016". Don't get me wrong -- NXP has their own crazinesses and omissions, but at least they get this part right:

> 14.3 Basic configuration

> Configure SPI0/1 using the following registers:

> - In the SYSAHBCLKCTRL register, set bit 11 and 12 (Table 35) to enable the clock to

> the register interface.

> - Clear the SPI0/1 peripheral resets using the PRESETCTRL register (Table 23).

> - Enable/disable the SPI0/1 interrupts in interrupt slots #0 and 1 in the NVIC.

> - Configure the SPI0/1 pin functions through the switch matrix. See Section 14.4.

> - The peripheral clock for both SPIs is the system clock (see Figure 5 “Clock

> generation�?).

(Doesn't show up here when reformatted/quoted, but all of those "Table ***", "Section x.y", and "Figure n" texts are clickable links to the appropriate pages in the PDF document. Definitely way too easy and boring.)

thanks4opensource · ‎2019-07-29

Yes, the "Snippets" (Appendix A A.1 Code examples) help. They should be included in all of the reference manuals.

The only L0 or F0 chip I've used is the STM32L031, so I only have "RM0377 Reference manual

Ultra-low-power STM32L0x1 advanced Arm ® -based 32-bit MCUs" right now. It has:

> A.1 Introduction

> This appendix shows the code examples of the sequence described in this Reference

> Manual.

> These code examples are extracted from the STM32L0xx Snippet firmware package

> STM32SnippetsL0 available on www.st.com.

So maybe the code examples in the firmware package (which I don't have) are more complete, but in RM0377 itself there's things like:

> A.17 SPI code example

> A.17.1 SPI master configuration code example

> /* (1) Master selection, BR: Fpclk/256,

> CPOL and CPHA at zero (rising first edge) */

> /* (2) Slave select output enabled, RXNE IT, 8-bit Rx fifo */

> /* (3) Enable SPI1 */

> SPI1->CR1 = SPI_CR1_MSTR | SPI_CR1_BR; /* (1) */

> SPI1->CR2 = SPI_CR2_SSOE | SPI_CR2_RXNEIE; /* (2) */

> SPI1->CR1 |= SPI_CR1_SPE; /* (3) */

Note that there's no mention of the SPI1EN bit in RCC_APB2ENR. See my post above with the example from and NXP reference manual for comparison.

waclawek.jan · ‎2019-07-29

I know both the examples in appendix A there and Snippets themselves have their defficiencies. It's just that that's IMO the way to go. The Snippets initiative was cut short by the ST management, in favour of the cuboids.

I know the NXP manuals. Maybe that's a way, too (but then, the content of the NXP manuals' chapters suffer from heavy copypastitis and at places lack rudimentary information, too). Nonethelss, concise examples have to be provided, too. And application notes, in high quality and quantity. What's given as appnotes now is ridiculous.

JW

thanks4opensource · ‎2019-07-29

I agree on all points.