F4 DMA2 errata - how to handle that
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2019-05-23 11:59 PM
Hello!
We have choosen the STM32F407/427 for our product, a couple of years ago.
It fits with the speed, and also with the possibility to use DMA for mutliple peripherals.
Now, as we have found out the hard way, the DMA2 fails while handling ADCs and SDIO.
As of now, I have found the F4 errata, and also the great writeup at https://blog.frankvh.com/2012/01/13/stm32f2xx-stm32f4xx-dma-maximum-transactions/ .
Obviously, this does happen quite randomly, and we got to it just after debugging a set of field returns. During the development and testing, it might have had occured, but it was obviously missed (SDIO failures granted that the card was weared down, etc.).
Anyway, the product is out, and I need to find a working FW changes to update it,
and to remove the major DMA issue.
I have searched the forum, but not much useful has been found.
To describe the situation:
on DMA2, there is
- ADC1 - Stream4 CH0 , ADC digitizing three channels, one time before DMA Stream is done
- ADC3 - Stream0 CH2, ADC digitizing eigth channels, 32 times before DMA Stream is done (we use it for oversampling calculations)
- SDIO - Stream3 CH4
ADC1 and ADC3 are TIMer triggered (TRGO), repeatedly. Both ADCs do convert multiple channels, and the DMA2 neatly stores each of the data values to RAM, where it is picked up inside the DMA Stream complete IRQ.
Specifically the ability to use DMA for ADC work and also for SDIO, was one of the main drivers for the use of F4 at the day
(including the RAM size back then - today I would go for H7, of course, but the HW is out there already).
SDIO is a classic X*512B transfer (multiblock), standard DMA2 code as in examples (with all the unrelated bugfixes ever done - the SDIO is also a wierd critter, but that is a unrelated story).
Now for the set of questions:
- Obviously the DMA2 bug is in silicon, so a FW workaround is required
- Errata suggests to use only one single DMA Stream at a time
- either (1) move the ADC handling to MCU, that is to use IRQ and not DMA
- or (2) disable ADC DMA access just when SDIO takes over DMA, and re-enable it afterwards - is this even possible ?
Variant (1) is not great at all, as there will be an IRQ call for each every finished conversion, and also there is the possible uncertainty to which ADC input channel the actual data in the register belong.
On the other hand, I suppose only through this way, there won't be any ADC samples lost,
yet I will loose the precious MCU cycles.
Please help me out with (2):
"(2) disable DMA2 for ADC during SDIO DMA access"
How to precisely do that ?
I would expect that I'd call the DMA_Cmd(DMA2_StreamXXX, DISABLE);
Will this finish the DMA transfer half-way, or will it wait for the whole transfer to go through ?
I suppose I will also need to wait for a verification of a disablement, how to do that ?
And finally, how to re-enable the DMA2 for the ADCs ?
Just by issuing the "Stream enable" DMA_Cmd ? or is there more required (to clear some flags, or what) ?
(note that the ADC will keep converting regardless on the DMA Stream state, as all the ADCs are continously TRGO triggered).
Or,
would it be wiser to disable the ADCs themselves ?
Or, do both and disable ADCs and the DMA2 for ADCs ?
That would ge to be too much overhead already, and it will be losing samples . . .
So, in general, any advice is very welcome,
and I'm sure other F4 users will benefit from this as well.
regards,
a.
---
EDIT: invalit bus name was provided, deleted the line to remove confusion. Otherwise, the content stays the same.
- Labels:
-
ADC
-
DMA
-
SDIO-SDMMC
-
STM32F4 Series
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2019-05-24 01:08 AM
This is a primarily user-driven forum, with casual ST presence, so I wouldn't hold my breath waiting for a qualified answer. And, ST is not famous for bravura when dealing with their hardware bugs - acknowledging them in errata basically means for them it's done.
The proper way how to stop and restart DMA transfer is described in detail in the DMA chapter in RM 10.3.14 DMA transfer suspension subchapter. What you need to do is
- stop the stream by setting SxCR.EN=0
- read back SxCR in a cycle until SxCR.EN reads back as 0 (this may take a couple of cycles, depending on whether you use FIFO or not, and depending on the load on the target RAM's bus)
- read out the respective NDTR to find out, how many data are already transferred
- write back to NDTR the remaining number
- add to the memory pointer the number of bytes occupied by already transferred data if you want continuous data in buffer
- restart DMA by setting SxCR.EN=1
I here just rewrote the same procedure as described in RM. Note, that if you are using circular DMA, there's no way to do that with ensuring continuing data in the buffer - the DMA can't be "suspended", just restarted.
And I wouldn't recommend trying to suspend the ADC DMA at the trigger-from-ADC side, that may hit other, unacknowledged by ST bugs in the DMA https://community.st.com/s/feed/0D50X00009XkW5oSAF .
JW
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2019-05-24 03:14 AM
Awesome, thank You for the quick and comprhensive reply.
As I read it, the overhead for dealing with DMA is not small, and it might just be another reason to conside dropping the DMA for the ADCs at once (and use IRQ). Oh well.
No, the circular DMA is not used. ADC is triggered periodically by timer, and the DMA is not triggering it, just merely capturing the data (adc dr -> sram1). The capture buffer is just set to be longer, so we can get multiple ADC passes into the buffer, before the DMA-completed IRQ is fired.
This used to work well, and was absolutely great in a way how the F4 simplified the continous ADC readout.
Moving to a plain ADC IRQ method is a step back, yet it seems it is the only safe solution here.
I'm glad to be able to clear up my thoughts on this, the previous topic reply helped a lot!
Also, in a case of ADC with multiple ADC input channels in the ADC arbiter, it won't make much sense to restart DMA where it previously left, as the ADC will finish the list regardless of DMA running or not.
It seems the ADC will also need to be restarted as well, and this will be ultimately hard with the TRGO applied.
BTW. not that it makes much more sense now, but is there a guarantee that doing this (DMA pause) will be enough to circumvent the issue ?
Nevermind. The guys that have designed it most likely already left the ST corporate by now . . . or if it was outsourced to (God help us all) India or so, than again, . . . nevermind.
Jan, thanks again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2019-05-28 12:27 AM
Here is the final solution that I've got implemented.
The DMA2 simply does not work properly, if three peripherals are being used, while one of it is SDIO.
Not sure how it fits the errata, as all peripherals are APB, but the SDIO has FIFO so it just might fit it a bit.
In any way, it is super easy to prove that the DMA2 with 2xADC + 1xSDIO breaks the date.
Specifically, typically, it stalls in a middle of transfer, and the Stream TC IRQ does not fire.
Nicely visible in the raw data (SDIO read), where there are bytes read from a card, but not all 512bytes, but just a short piece of it (like 83 bytes or so, depends on when the DMA Stream stalls).
I've tried a lot. Changing the DMA streams. Pausing the DMA for ADCs.
All to no avail.
To repeat, we are using the ADC1, ADC3 and SDIO in the product.
ADCs are TRGO and do transfer the data via DMA2.
SDIO obviously uses DMA2 as well.
I wanted to use all at once (at moments), but to no avail do to the DMA2 bug.
I have discussed the approach to the DMA2 issue with our HDL guys, who do write the (unrelated) DMA IP cores used in FPGAs in our products.
Basically, they've explained to me how the ST design got possibly messed up
and what would be the only safe approach from the IP core designed perspective (see below - and it worked).
In short, one can say, that in general any IP core designers are totaly cut off from the real life and the final use of their cores;
so no surprise the ST has this major error in the core block. It also primarely points out that the testing was inadequate.
Well, that disucssion did explain a lot to me, I would recomment this (talk to IP guys) to everyone, it is an eye opener.
In fact, should we have a direct contact to the HDL/RTL designer in ST, we might be able to figure out a better solution on what to do.
For sure there is a live person with a name and adress, who have written the code and done the design.
But as far as I know how the corporations do works, this will never happen.
I know how it goes in the corps I used to work for - by principle, the management blocked any such request coming from our customers, I've never understood why.
The only and one solution that is there, is to (chronologically), once a SDIO bock read / write is required:
1] stop the ADC1 and ADC3 streams and wait for it to stop
2] disable DMA requests for ADCs
3] completely RESET the DMA2 peripheral before the SDIO DMA2 Stream is set,
4] stop ADC1 and ADC3 peripheral,
5] then completely RESET the DMA2
6] only now the SDIO DMA for RX or TX could be set up ! also make sure the setup is not interrupted (use critical section there)
6a] make ultimately sure to enable and clear the DMA IRQ flags in a proper way (note the difference in the IT_TC and FLAG_TCIF and which registeres are these used for - I have noticed that various example code has this messed up !)
7] now, after the DMA transfer is done (TC IRQ):
8] completely RESET the DMA2 peripheral
9] enable ADC1 and its DMA
10] enable ADC3 and its DMA
Now you have to repeat this for every new SDIO access.
Obviously this is as bad as it gets in the terms of performace,
yet using the ADCs in the IRQ mode is much, much worse, so we have to live with that, that you ST employes.
Yet this is the only rock-solid solution that we have found.
Thankfully, the ADCs are not sampled continously, but with a time spacing between bursts (periodict timer triggered).
I certainly hope that in F7 (which we use now) and H7, there will not be any issue like this.
If so, I'd be very, very sad. I'm going to find out soon.
-
I hope this will help others in the same situation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2019-05-28 01:18 AM
I didn't realize that SDIO is *not* an AHB peripheral in the 'F4. In this case, the erratum does not apply.
The steps you desccribed are extensive and I am sure not all of them are needed.
I assume you have FIFO switched on for SDIO, do you? Isn't there a DMA FIFO error thrown for the SDIO DMA when it stops?
Read out and show us the content of the SDIO-stream DMA registers.
JW
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2019-05-28 02:39 AM
I now realize that you are talking about SDIO *read* - in which case it wouldn't be DMA FIFO error, but SDIO FIFO overflow - was there any of that?
My point is, that SDIO might present quite a load on the APB2 bus, even if clocked at maximum (is it?) and then on the DMA; depending on how exactly the SDIO handles its FIFO and DMA requests. The ADC/DMA is an additional load on the same bus. Also, there might be additional contention from CPU accesses onto APB2 peripherals, and/or contention on the RAM side of the equation (i.e. where is the CPU stack located? Is there any other DMA traffic (including the DMA1, ETH and OTG_HS) into the same SRAM where the SDIO data are dumped? etc.)
I understand that once you've reached a working solution you are not willing to dig further; but others reading this thread might want to consider all these.
JW
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2019-05-28 04:08 AM
Surely willing to discuss that. Most of the time the most of the valuable information is available right on the forums, so it is for the best to get it as complete as possbile.
I know the steps are extensive and I'm not happy of it.
Removing any of that, then the device will eventually fail. Wierd, I know.
There were two general issues: DMA stopping part-way through (not finishing the transfer) and SDIO failing with RXOVERR (as read in SDIO IRQ), but sometimes the SDIO did not report any error in the register and DMA was just dead (I have a timeout, 16sec, while waiting for the TC IRQ, and looking at the data buffer in RAM, it was partially filled).
Originally, once this happened, to fix it the "ignorant way" (e.g. not fixing the true issue), I had to go through SDIO peripheral/driver reinit, as that was the only way to get it running again. Yet that meant to remount the FAT which took time, so this whole approach is no good.
Regarding the RXOVERR - The DMA2 registers do not show anything of value, as the RXOVERR flag comes from the SDIO register (STA).
Obviously it points to the situation that DMA Stream stopped taking data from SDIO FIFO.
But why ?
Of course, I'm sticking to the SDIO read, where I have seen truly the most of the failures.
But I can't say the SDIO write used to work - it did fail as well, same symptoms.
OK, to describe the implementation:
Each of the ADC1 and ADC3 conversion is triggered by TIMx.
ADC1 has 3 channels to convert, and DMA2 stores it to "uint16_t aligned(4) adcraw1[1 * 3];" (declaration simplified but obvious).
ADC3 has 8 channels to convert, and DMA2 stores it to "uint16_t aligned(4) adc3raw[32 * 8];" (declaration simplified but obvious).
The trick with ADC3 is, that we will get Stream TC IRQ for it only once the 32*8 values are ready; and then the decimation math is applied.
This approach is very crucial to get the highest ENOB from the 12bit noisy ADC.
Simply the DMA2 Stream has to get its amount of data and then fire an TC IRQ, there is nothing that shall go wrong.
ADC1 conversion gets triggered once per 1 msec, and ADC3 once per 78 usec.
So this is pretty slow, and shall not present a heavy load.
Having just both ADC1 and ADC3, running over DMA2, all works well for months, no errors showing up.
Once the SDIO is added to the mix, the issue pops up. And quite frequently.
Regardless on the SDIO bus clock and other SDIO bus settings (also limited by the other Erratas, but I've tried all the combinations).
SDIO DMA transfer is made to always work on 4byte aligned boundary (via a dedicated buffer in SRAM1).
To complete the information:
F4 MCU runs on a top valid speed (168MHz), and the AHB/APB bus clocks are also maxed out (yet within their limitations).
Master SDIO clock (PLL output) is 48MHz, and then the divider used is >3 (e.g. 16MHz or lower).
SDIO is using FIFO and the DMA Stream config is standard for the SDIO transfer (not different than most of the examples anywhere).
I do think the FIFO is a hard part of the SDIO and can't be avoided (switched off).
MCU FLASH is accelerated using ART.
ETH MAC is active (100Mbit/s) and ETH-DMA is used exclusively for SRAM2. Note: observed the DMA2 failure regardless if the ETH is connected or not.
Some other peripherals are used (SPI, USART, GPIO) but none of it uses DMA.
There is no use of USB, CAN or any other stuff.
The DMA1 is not initialized / used. DMA2 exclusively uses SRAM1.
CPU stack is in SRAM1.
CPU is spending most of the time in a control loop, where RAM is mostly accessed, and a minimal of peripherals if any are access most of the time.
The contention idea is interesting - the CPU uses SRAM1,2 and CCMRAM quite a lot, yet that was never ever a problem while running just the ADCs via DMA2.
Adding the SDIO might be issue there, but I suppose this would clear (won't apply) if the SDIO bus clock is dropped down - and I went as slow as cca 400kHz and it did not do any difference from this perspective.
The bonus part:
I have tried various ways on how to pause/stop DMA2 for ADCs and a few other alternative approaches.
It was notable that the SDIO eventually failed in all of such experimental cases, but with a different cadency.
For example, I tried to wait for ADCx Streams to complete before stopping the Stream (waited for the TC IRQ).
By doing so, the DMA2 always failed for the subsequent SDIO transfer.
Just by removing this wait, and by simply forcing the Stream to disable, the DMA2 SDIO Stream was failing just one third of the times.
Of course adding a wait for the Stream to complete (reading the DMA2 register) after disabling the Stream, did help a bit, but did not fix the thing.
The only sure way was to completely reset the DMA2 core and re-init it.
I have it running solid for a couple of days and all is well.
BTW. thanks for the help, appreciated.
regards,
a.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2019-05-28 05:59 AM
Much of these symptoms are strange, but then there may be also coincidences.
> I went as slow as cca 400kHz and it did not do any difference from this perspective.
This mostly excludes the idea of bus contentions and FIFO overflows.
Does the SDIO DMA stream use FIFO? Does it use bursts on either peripheral or memory side? Is the memory buffer address fixed or may it change between transfers? Could you make sure the memory address is 16-aligned?
JW