2019-07-28 08:08 AM
I'm new to Cortex-M, STM chips, and the STM32L4R5 in particular, and am having problems getting this working. I have used ARM chips (since ARM2 :-), Atmel <x|mega>AVR and the Raspberry Pi Cortex-A chips...
I seem to be the only person who prefers assembly code over C / Cube / HAL etc... but that's a different matter :)
I think I have SPI configured correctly, because writing to SPI_DR works (I see SPI_SCK go active and SPI_MOSI shows the correct bits on the oscilloscope). [The STM32L4R5 is the master. The slave is silent. 2 MHz clock. 16-bit transfers. LSB first. clock on leading, rising edge. No NSS line used. Using SPI1 and DMA1_CH1]. But if I set up things for DMA transfer from memory to SPI and enable DMA and SPI, nothing gets transferred.
At this point I expect the DMA to send the data from memory to the SPI, and SPI to activate its clock and MOSI lines. But nothing happens.
At the end of the transfer I expect the DMA1_Channel1_handler to get called and do:
And finally terminate with:
Assuming all my register base addresses, offsets and bit definitions etc are correct -- is there any obvious step I've left out? If not, any ideas as to how I can proceed with testing/debugging? I can't see what's going wrong, but the dma_req from the SPI (due to TXE bit being set in SPI1->SR) doesn't seem to be arriving at the DMA.
I'm using arm-none-eabi-gdb with openocd 0.10.0.+dev-00921-g263deb38 on a Mac. The chip is on a Nucleo-L4R5ZI board.
Thanks for any pointers...
2019-07-28 10:42 AM
>>I seem to be the only person who prefers assembly code over C / Cube / HAL etc... but that's a different matter
Well I think it is more a case of economics, people paying for work tend to be interested in the speed of development, and completion of functional goals, not that the code is small/fast.
>> I have used ARM chips (since ARM2 :-),
My younger brother got one of the first Archimedes systems, I have the VLSI chip manuals still, and had already mastered 6502, Z80, 68K and assorted 808x assemblers. The flash programming side of my STM32 boot loaders use assembler, basically to contain dependencies, keeping things fast and small, and easy to copy into RAM.
You'll likely need to compare/contrast what the C libraries are doing, and decompose the sequences.
The DMA is driven by the TXE bit.
Prove that SPI is working in a polled mode, ie reading SPI1->SR, writing SPI1->DR
Then check the trigger paths for the DMA, and that it is not flagging errors/faults, and the address advances.
Mostly bit settings and control paths.
2019-07-28 10:53 AM
> enable GPIOA, DMA and SPI clocks in RCC.
Does this include DMAMUX clock?
> if I set up things for DMA transfer from memory to SPI and enable DMA and SPI, nothing gets transferred.
Then interrupts are secondary. You need the SPI to move first, and for that, the Tx DMA to get triggered and transferred data into SPI1->DR.
You can always read back content of all the registers you've set and check if they are set as you expect them to be. You can also observe the DMA status register for possible fault and check if the relevant NDTR changes as expected.
JW
2019-07-28 11:46 AM
"My younger brother got one of the first Archimedes systems"
I had an Archimedes A305 (no hard drive...) and learnt assembler from Peter Cockerell's ARM Assembly Language Programming book.
ARM code (and the whole system) was elegant in those days...
"I have the VLSI chip manuals still."
Me too. They were posted for free on request.
2019-07-28 11:53 AM
"Does this include DMAMUX clock?"
No, of course not. <sigh>
Well spotted. Thanks. My first DMA transfer goes through now. Nothing after that, but that's just normal debugging ...
2019-07-28 12:23 PM
> No, of course not. [DMAMUX clock]
That's why I always recommend to read back the registers' content as the first debugging step. Guess why.
> I had an Archimedes A305 (no hard drive...)
Used one at the university, BASIC only. Been charmed. Does that count?
JW
2019-07-28 12:54 PM
"That's why I always recommend to read back the registers' content as the first debugging step."
Wouldn't have helped in this case since I'd have read back what I was expecting to see. It was about 3 a.m. though... :\ And today I thought "I've done that" (TM)
"Used one at the university, BASIC only. Been charmed. Does that count?"
Of course -- if you've been corrupted and think that a windowed multi-tasking operating system, with anti-aliased text/graphics in a windowing system that was nicer than the Macintosh's etc etc etc, should be blazing fast, boot in a couple of seconds and fit in a 4 Mbyte (?) ROM (including all the fonts and decent vector graphics and text editors etc etc etc). And this was 30 years ago.
... I'd better stop. We minority quasi-dead systems aficionados all sound like lunatics.
But the reason it was so good, was that a) Sophie Wilson and Steve Furber were brilliant and b) a lot of the system was hand-crafted assembly code.
2019-07-28 01:20 PM
"Well I think it is more a case of economics, people paying for work tend to be interested in the speed of development, and completion of functional goals, not that the code is small/fast."
You're right, of course. And I'm not going to fight windmills.
But...
[OK, the windmill attacked me first!]
No. I am successfully resisting the urge to explain (and prove!) why the world is wrong...
The world can write 5 Gbyte updates to my laptop's operating system (which seemingly only added animated emojis -- but you pay for that by losing your ESC key...) and I, an artist, will craft operating systems that fit in 15 k, boot in a couple of micro-seconds and contain a LISP interpreter. Or whatever :)
2019-07-28 02:09 PM
I'm following this thread with interest as using SPI with DMA (already have it working without) on several STM32 MCUs is one of my next tasks. Should be easy, but experience has been nothing ever is with STM libraries and documentation. Please post your solution when you find it.
For me the issue is not C versus assembly, but direct/simple/efficient C compared to the indirect, bloated, obfuscated, inefficient HAL examples that are the only ones currently provided by ST. And the reference manuals, which, if they were more complete, better written, and error-free would make example code unnecessary.
BTW, I too started my programming career in assembly, on chips and systems long pre-dating ARM. I've played around with ARM assembly and intend to use it in the limited instances where it can provide performance benefits. "ARM code (and the whole system) was elegant in those days..."? I wish I had been involved with it then. I wrote a partial simulator for the Cortex-M4 Thumb2 instruction set, and the deeper I got into it the more my reaction was, "They call this a reduced instruction set architecture???"
2019-07-28 03:14 PM
"Should be easy, but experience has been nothing ever is with STM libraries and documentation. Please post your solution when you find it."
Mmmm. STM documentation is perfectly clear -- if you already know precisely what it's trying to say.
JW already found the problem: I simply hadn't enabled the DMAMUX clock. So, the recipe is simply the first list in the first post. Except line 1 should be:
After that it works properly. (I just had a little copy/paste bug, where I'd omitted to modify the pasted line...)
"but direct/simple/efficient C compared to the indirect, bloated, obfuscated, inefficient HAL examples"
If you look at C code as written by Thompson and Ritchie (Lion's commentary on UNIX 6th edition -- or something like that) and compare it with autogenerated stuff intended for consumption by a compiler... well. It's different.
As to the reference manuals... I think they could be improved.
' I wish I had been involved with it then. I wrote a partial simulator for the Cortex-M4 Thumb2 instruction set, and the deeper I got into it the more my reaction was, "They call this a reduced instruction set architecture???"'
Have a look at the book I linked in my second post. It's now free and still almost all relevant. It's an elegant, well-written little book.
I would love to know whether Wilson and Furber have any opinions on current ARM architectures and Thumb instruction set etc. Oh. I just see that there are interviews with Sophie Wilson on youtube. I'll have a look in a minute...
For example, on the ARM2 and ARM3, you could put the FIQ (fast interrupt request) handler code at the FIQ location in the vector table, because FIQ was the last entry in the table. So your interrupt handler would start executing immediately. And is has banked versions of r8-r14 so it doesn't need to push any registers on the stack.
Then I read that the Cortex-M (which has its interrupt system, LR and even the bloody vector table messed up -- because it's specialised for fast interrupt handling) has an interrupt latency of 12 (possibly 29) cycles on entry and 10 (27) on exit. Oh, and that's if there are no (flash) wait states -- which there are...! So my FIQ handler could load a register, think about it, flip some bits, write a register and return -- and the cortex-M would still be stacking registers which I'm not even going to be modifying! Oh well, that's 30 years of progress for you.
But the ARM2 didn't have a 4 MSPS 12 bit ADC built in. So I'm still happy :)