Complex SPI messaging between multiple STM32s

darcy · ‎2009-01-28

Posted on January 28, 2009 at 15:09

darcy · ‎2011-05-17

Posted on May 17, 2011 at 13:00

Hi all,

I’m having a bit of difficulty trying to work out how to organise a single master, many slave, arrangement of STM32 processors.

We have a single SPI master and 11 SPI slaves (every processor is an STM32F103)

Transmit and receive are both handled by dedicated DMA channels (i.e. not switched to other peripherals) and the ‘automatic’ CRC16 is to be generated by the SPI peripheral.

There are a number of issues I’m trying to work through having previously spent all my time working on ARM7’s with 8/16 word FIFOs with built in half/full/timeout notifications.

Part of my problem is that I kind of have a preconceived way about how I've expected this to work based on how I designed it on the old ARM7 system (same number of processors). There are most definitely ways to approach this that I haven't thought of...

Thanks very much in advance 🙂

Current problems:

1. The Master/Slaves are not aware of the packet size they are about to receive until they receive the first word of the packet (where the first word of every packet specifies the entire packet size, excluding auto-generated CRC16)

2. This method of message exchange is SO much more involved than it was on the ARM7 using FIFOs and interrupts every 8-16 words – there MUST be an easier way.

Limitations:

1. The packet ACKs need to be exchanged in the same master/slave session as the original packets

2. Devices are addressed sequentially 1-11, then 1-11, then 1-11....

(Desired) sequence of events:

1. The master selects one of the slaves to communicate with

2. The master loads its DMA buffer and begins SPI transfer with CRC calculation enabled

3. The slave begins to transmit any queued packets it may have preloaded in to its DMA transmit buffer with CRC calculation enabled

4a. The slave starts to receive the master’s message and pulls out the first word so it knows how many words it should expect before message completion

4b. The master begins to receive the slave’s message and pulls out the first word so it knows how many words it should expect

Variation #1 (the master message is shorter than the slave message):

5a. The master transmit DMA empty interrupt triggers, also triggering the CRC16 to be transmitted to the slave

5b. The master disables CRC, points the DMA at a 0x0000 circular buffer, and begins to transmit as many words as are required to complete reception from slave (having picked up the incoming length during step 4b)

5c. The slave receive DMA full interrupt triggers (the DMA buffer size having somehow been set back at step 4a)

5d. The slave disables the receive DMA (preserving the buffer contents)

6a. The slave transmit DMA empty interrupt triggers, also triggering the CRC16 to be transmitted to the master

6b. The master receive DMA full interrupt triggers (the DMA buffer size having somehow been set back at 4b)

7a. The slave validates the CRC of the master’s message

7b. The master validates the CRC of the slave’s message

7c. The master and slave each transfer their received packets to the message handler for processing from the main thread

7d. The slave generates an ACK message, and reconfigures the slave transmit DMA channel

7e. The master generates an ACK message, and reconfigures the master transmit DMA channel

9. The master and slave essentially start at step 1 again to exchange acknowledgements, stopping at 7b

Variation #2 (the master message is longer than the slave message):

Not worth writing out...

peter2 · ‎2011-05-17

Posted on May 17, 2011 at 13:00

Hi,

we have recently programmed a SPI communication on the STM32.

And I think that your variation No1 will not work the way you think. As far as I know, it is not possible to setup the DMA Buffer size while the communication is running. I think you will run in problems with the triggering of the CRC value.

It will be better to set up a fixed packet size first. After this partial packet is transmitted, you are able to setup up the DMA and buffers on both sides for the rest of the packet. After this transmission is over the master could send the ACK frame. Because of the ACK frame, which has a fixed size I pressume, you should setup the size of the inital part package to the size of the ACK frame. With this setup you are able to receive a packet and ACK frame without having to stick them together all the time, but you need a header wich identifies the kind of message: data packet or ACK frame.

Besides this, we had had many troubles to set up a proper communication anyway with the SPI. The is only one konfiguration which worked in our application. The data has to be locked at the second edge of the clock and the clock has to be a positive one. That's what I remember at the moment.

Also there seems to be a bug on the CRC calculation. While we have communication with the secand slave, the CRC calculation on the first one seems to be running without having the select pin set. So we get a CRC-error on the first transmission after communicating with the second one. We have made the workaround to calculate the CRC ''by hand'' after receiving a message with CRC-error. So we are happy that we communicate with the second slave only during wakeup.

darcy · ‎2011-05-17

Posted on May 17, 2011 at 13:00

Hmmm... the ARM7's certainly had their advantages with some of this stuff.

Thanks for your thoughts on that...

Another method we've just started to consider is to completely drop the use of the CRC16 and use the hardware CRC32 registers instead. We've kind of been forcing ourselves with the idea of using the CRC16 because it's there... yet using it is creating additional problems and by the time messages get back up to the application layer has probably slowed things down more than if we used the CRC32

- The first word in a packet specifies the packet length

- A packet header contains the ACK for the previous received packet

- The device's transmit DMA size is set to the packet size including CRC32

- The device's receive DMA is set to a size greater than the max expected packet size.

- The slaves NSS line is asserted (pulled low) by the master - instructing the slave to construct its packet (allowing us to pass back the most up to date info - created every few hundred uS)

- Transmission in both directions takes place

- The master finishes transmission (DMA empty INT) and checks to see if the number of bytes received is greater than the packet length. If not then continue sending 0x0000 to the slave until complete

- The master finishes transmission again (DMA empty INT) and processes the receive packet buffer

- If the slave finishes transmission before the master then it just sends 0x0000

- When the master finishes transmission it asserts the NSS line (pulls high) telling the slave that transmission has finished

This configuration allows the master to instruct the slave(s) to construct its packet to send - allowing us to receive up to date info. It also caters for scenarios where clock pulses may be missed thereby removing the need for timeouts (because the timeout takes place when the master pulls NSS high again). It also means we can have variable length packets containing multiple messages - meaning that when we only have a short message to send we're not wasting bandwidth...

jj · ‎2011-05-17

Posted on May 17, 2011 at 13:00

@darcy-

Thank you. Well answered/queried.

Remote range extension while maintaining data rate:

Our slave-remotes were ''off-board'' - some nearly 5 meters distant. We used differential lin

jj · ‎2011-05-17

Posted on May 17, 2011 at 13:00

Interesting - thank you both for your detailed posts.

As we in technology know - device selection most often involves ''trade-offs'' - STM32 is highly attractive in many areas - sometimes the detail is not so... (unfortunate that one must ''dig deep'' prior to such discovery)

Curious as to the distance between your devices? Doubt that you have added line buffers but range extension would heighten interest/forum response. (you don't identify signal corruption - wonder about your BER)

Most recent post notes that ''the master instructs slave to construct.'' And this does insure current data - at the cost of ''turn-around time'' and perhaps leaving the slaves ''too idle.''

Lastly - assume you gate the NSS line with STM32's GPIO to ''CS'' individual Slaves. Have you tried experimenting with a single, master-slave - just to eliminate this gating as a factor?

darcy · ‎2011-05-17

Posted on May 17, 2011 at 13:00

Quote:

Curious as to the distance between your devices? Doubt that you have added line buffers but range extension would heighten interest/forum response. (you don't identify signal corruption - wonder about your BER)

Hi, we've used line buffers to drive each side of our board - with each side having either five or six STM32s. The whole board is around 180x200mm so travel distance isn't substantial... As we don't have the hardware for this revision of board yet I have no idea about the BER. We can sacrifice a bit of speed for reliability - 10Mbps would do the trick nicely.

Quote:

Most recent post notes that ''the master instructs slave to construct.'' And this does insure current data - at the cost of ''turn-around time'' and perhaps leaving the slaves ''too idle.''

I'm not sure what you mean by ''too idle''? They've got quite a lot of external control work going on when they're not passing back info to the master. The turn around time would be however long it takes to construct the packet from the preconstructed messages and run the packet through the CRC32.

Quote:

Lastly - assume you gate the NSS line with STM32's GPIO to ''CS'' individual Slaves. Have you tried experimenting with a single, master-slave - just to eliminate this gating as a factor?

I'm not really sure what is being asked here... We setup the device select by telling a CPLD which device is about to be driven, then the NSS line is hardware driven by the master

darcy · ‎2011-05-17

Posted on May 17, 2011 at 13:00

Quote:

CPLD to manage multiple slaves:

Appears your/our technique substantially similar. We did one thing that may be of interest/value - while awaiting new, ''multi-bit'' SPI serial flash devices, we employed the CPLD to manage multiple, SPI serial flash chips - operated in parallel.

That's quite a cool concept... I'll store that one away 🙂 We've just used an SD card via SDIO for large data storage and internal flash for device configuration/calibration.

Quote:

May we ask if you use a ''decode'' (like HC138: 3 in -> 8 out) function or decode the end address via a unique SPI bit pattern - sent prior to remote SPI ''engagement?''

Basically yes, we use 5 in, lots out... 🙂 We're able to select individual devices or groups of devices depending on requirements at the time (e.g. broadcast messages)