2024-07-19 10:01 AM
I find that flashing a program using openocd, STLink (using a Nucleo F446, or L474), to our custom board frequently fails when there is CAN bus activity. What is causing this?
Our board has BOOT0 tied to ground. The SWD is gnd, data, clock, and connects to pins STLlink on the Nucleo board. For a while the flashing would "usually" work, occasionally there would be a failure to erase a blocks, or range of blocks, or a failure to write. With more units on the CAN bus, it always fails. For a while I thought the processor flash had gone bad until I stumbled on it working "better" when some of the other units were taken off the bus. That led to determining that flashing works 100% if the unit being flashed is the only unit on the bus (i.e. it is powered from the CAN cable as that carries power as well as the CAN signals).
The docs talk about the built-in routine for flashing where the standard peripheral pins are checked for data, etc. That suggests that somehow that is involved the above. If so, maybe there is a way to avoid it.
Further up-the-chain, is a program update over CAN scheme that I have in place. A load-over-CAN program is located low address flash, and upon boot waits for a short time for msgs to start an interactive loader sequence with a PC connected with a CAN gateway. Some failures to correctly complete the loading have "feel" that the above issue may be involved. The first step is to understand why CAN activity is causing the openocd,STLink flashing failures.
2024-07-19 10:24 AM
I'd say, ground issues.
Try to connect all devices on the CAN bus and others sharing the same ground - including/starting with the PC to which the STLink is connected - using short and thick wires to one common ground point in star fashion.
Also generally make sure that you have adequate return (ground or other DC) in your reasonably short SWD cable (15cm or so), ground separating SWDIO and SWCLK. "Occassionally failing programming" is not OK.
JW
2024-08-07 06:24 PM
[waclawek.jam. Thanks for the response. I thought I sent a response, but I don't see it. Here is a summary of it, plus an update.]
These boards are on battery modules that are connected in series. The processors on these boards are isolated from the CAN bus, so one would not expect a ground loop problem. Currently, the modules are not interconnected so connecting grounds can be done, however that hasn't helped.
Since the initial post, many experiments have been tried to identify the source of the problem. E.g. with and without the CAN bus connected; ground jumpers; single boards, ... We don't find anything repeatable that leads to a solution. A colleague with an identical setup (other than the size of the cells in the battery module) has encountered a similar problem. With my setup, I found that changing from phone type cable to CAT5 for the CAN bus made a big improvement, however, sometimes openocd doesn't connect.
The colleague's setup even fails more often. Sometimes it works 100%, then later openocd will not connect, and the setup hasn't changed. Yesterday, we could not get his setup to work, even though the day before it was working 100% and nothing had changed. Today he reports it works, however he did note that the A/C was off, so the room was few degrees warmer. (I noted that the newspaper said Mercury was in retrograde ;))
An2606 page 335 has the following--
– For the USB-DFU interface, the CRS (clock recovery system) is not
correctly configured and this may lead to random USB
communication errors (depending on temperature and voltage). In
most case communication error will manifest by a "Stall" response
to setup packets.
This might explain the works/not-works, but the processor versions check to be 0x10, not 0xff the paragraph applies to.
Overall, it is quite baffling.
2024-08-08 01:07 AM
Which STM32 in particular?
Much of what you say points to hardware (okay ground loop mostly excluded), but some may be also software interaction?
Contemplate trying some of (one at a time):
- different SWD cable - a short one (max. 15cm) flat with each other wire begin GND
- add NRST to the SWD connection
- use CubeProgrammer or STLink-Utility instead of OpenOCD
- use STLinkV3 instead of STLinkV2
- in the test item, use a program (plain blinky perhaps) which does not have the CAN bootloader and/or CAN interaction at all (while other boards still creating traffic on the CAN bus)
all under the worst conditions, to see if there is any improvement.
Your last remark left me baffled (way less than the Mercury one): how is USB-DFU interface related to SWD?
JW
2024-08-08 10:39 AM
wacleawek.jan,
Thanks for the thoughts. Here are a few responses; I think the last holds some promise--
Processor: STM32L431RCT6
SWD cable: length 16 cm
Two days ago my colleague's setup had a F3 for the STLink. The openocd connection was failing 100% attempts. I had him separate the three wires in the cable. It worked! So, we figured that indeed the crosstalk was the problem. The next day, the same setup, same in all respects, was failing 100% of the time, but today it is working.
STLinkV2 versus STLinkV2--
I"ve used both, and both have been intermittent. We also did updates to STLink on F4Discovery, F3Discovery, Nucleo F446RE to see if that had an effect.
CAN traffic:
For the original post, it appeared that CAN traffic was the culprit, but since that is not a direct cause. We have tried setups with no CAN traffic, and setups with a single board and no CAN connection.
use CubeProgrammer or STLink-Utility instead of OpenOCD--
For openocd, the problem was observed with versions 9, 10, 11, and 12. Currently, we have openocd on all the machines and OSs on v12. The OSs running that were tried are Ubuntu 16 native, Ubuntu 16 WMWare on Win 10, Ubuntu 22 native.
We haven't tried STLink-Utility so that is worth some effort.
add NRST to the SWD connection--
This one will require tacking on a header pin for reset to a via on the board, and it looks doable at least for experiment. I am suspicious that at some point in the SWD/openocd startup, the pin configuration for the data or clock lines in the SWD float and become subject to stray capacitance coupling to noise sources, and a reset would take care of what appears to be random failure.
Thanks again for the thoughts.
2024-08-08 11:00 PM
Sometimes the ground loops are not that apparent at first.
I would check:
- cable shield connections: I've seen people using isolated interfaces, but the cable shields were connected on both sides to the devices' cases
- mind that the programming adapter itself might produce a ground loop, depending on what the device is connected to / powered from, etc...
- or sometimes another test device might still be connected and cause a ground loop, a generator, a scope