cancel
Showing results for 
Search instead for 
Did you mean: 

Need for Speed Using STM32 SPI with DMA to W5500.

CK.3
Associate III

I am porting the code from ATMega128 to STM32F103 for faster throughput to WIZNET W5500 using the WebServer.C for ATMega128. I am using the HAL coding as I am new to STM32. While the throughput is faster than the ATMega128 but I need more speed. Changed the same to using DMA but find the speed slower than without it, using SPI in Polling method. Expected the speed in DMA will be twice faster. Both DMA TX and Rx Channel are enabled connected to SPI2 on STM32F103C8T6. Don't know were I am wrong.

Thank you in advance.

9 REPLIES 9
TDK
Guru

The SPI clock frequency will be the same in polling or DMA mode. Polling mode has less overhead so it will be slightly faster in general, with the big downside that it occupies your CPU during the process. In instances where the CPU can't keep up, DMA will be faster.

Likely your timing scheme is faulty, and/or the amount of data you are sending is insignificant compared to the overhead/setup time. Sending 1-2 bytes at a time is not going to be particular efficient with either method.

If you feel a post has answered your question, please click "Accept as Solution".
Georgy Moshkin
Senior II

few years ago I had a project with w5500. I remember analyzing speed using wiznet AX1 loopback utility to find slowdowns and fixed them. I only used only single connection for emulating faster UART over LAN. You can try to examine data lines with a scope and check if there are significant periods of "silence".

Disappointed with crowdfunding projects? Make a lasting, meaningful impact as a Tech Sponsor instead: Visit TechSponsor.io to Start Your Journey!
CK.3
Associate III

The SPI clock is 16MHz, The system scans 26 sensors thru I2C for 8 bit data and transmits them Via W5500 at 150 times a second. The transmitted string is 167 characters long for which I have allocated 256 Characters buffer. Besides the scan , the other sub functions are belt speed calculation and a interrupt Timer for scan timing generation depending on belt speed. Two PWM channels connected to another single timer. Only these task the MCU during Scan mode. The main load on the MCU being the LAN operation. I am able to achieve only 134 strings transmitted against the required 150 times per second .

Thanks to TDK and Georgy for their suggestions.

What is receiving all that data? I have a feeling that I had much greater speeds. Make sure you do not close TCP connection. Maybe group data to larger blocks (26 sensors * 4 blocks sent in single packet), if some latency is acceptable.

On a PC side I've used this:

bool sendGetTcp(uint8_t *data, DWORD sendLen, DWORD needGetLen)
{
    int ret;
 
    int cnt;
    ret = send(sockTRGClient, (char *)&data[0], sendLen, 0);
 
    std::wcout<<"SEND ret="<<ret<<std::endl;
 
    if (ret<0){
        ret=WSAGetLastError();
        std::wcout<<"err="<<ret<<std::endl;
    }
 
    int ii=0;
 
    while (ii<needGetLen)
    {
        cnt=recv( sockTRGClient, (char *)&data[ii], needGetLen-ii, 0 );
        ii=ii+cnt;
    }
 
    return true;
 
}
 
// ...
 
        dummy[0]=mode;
        dummy[1]=source;
 
         sendGetTcp((uint8_t*)&dummy,2,0);
 
// ...
         sendGetTcp((uint8_t*)&dummy,2,1024*4*4);
 
            int i;
            for (i=0; i<1024*4; i++)
            {
                int32_t i32=(int32_t) ((dummy[i*4+0]<<0)|
                                       (dummy[i*4+1]<<8)|
                                       (dummy[i*4+2]<<16)|
                                       (dummy[i*4+3]<<24));
// ...

And sending on MCU side:

int32_t xsend(uint8_t sn, uint8_t * buf, uint16_t size)
{
	
	uint16_t size2;
	uint16_t freesize ; 
	uint8_t status;
		
	
	size2=size;
	if (size2>getSn_TxMAX(sn)) {size2=getSn_TxMAX(sn);}
	
  do
  {
    freesize = getSn_TX_FSR(sn);
    status = getSn_SR(sn);
    if ((status != SOCK_ESTABLISHED) && (status != SOCK_CLOSE_WAIT))
    {
		 close(sn);
     return SOCKERR_SOCKSTATUS;
    }
  } 
	while (freesize < size2);	
		
	   
	wiz_send_data(sn,buf,size2);
	setSn_CR(sn,Sn_CR_SEND);	
  while(getSn_CR(sn)){};
		 
		
	while( (getSn_IR(sn) & Sn_IR_SENDOK)!=Sn_IR_SENDOK)
	{
		 status = getSn_SR(sn);
    if ((status != SOCK_ESTABLISHED) && (status != SOCK_CLOSE_WAIT) )
    {
      close(sn);
      return SOCKERR_SOCKSTATUS;
    }
	}
	
	setSn_IR(sn, Sn_IR_SENDOK);
  
  return size2;
 
	
}
 
// ...
 
int32_t sendall(uint8_t sn, uint8_t * buf, uint16_t size)
{
	  uint16_t sentsize = 0; 
		int32_t ret;
 
	
		while(size != sentsize)
			{			
 
				
				
				ret = xsend(sn, &buf[sentsize], size-sentsize);
				
				if(ret < 0)
				{
					close(sn);
					return ret;
				}
				
				
				sentsize += ret; // Don't care SOCKERR_BUSY, because it is zero.
			}
			
	
			
			
			
return 0;			
			
}
 
// ...
 
 if ((tcpbuf[0]==0)||(tcpbuf[0]==1)) {sendall(0,(uint8_t *)&fftBuffer,4096*4);}

I am not sure if it is latest source code. I remember idea was to open TCP connection and start sending/receiving. Also there was re-connection mechanism if LAN cable was loosely connected, this case was not implemented in any of W5500 examples I copy pasted from.

Disappointed with crowdfunding projects? Make a lasting, meaningful impact as a Tech Sponsor instead: Visit TechSponsor.io to Start Your Journey!
TDK
Guru

Are you sure the SPI is the bottleneck here and not the I2C? You will need to buffer data, as packets can be delayed. Consider bunching up data and sending less packets with more data per packet. This will surely be more efficient.

I have gotten well over 2000 TCP packets per second out of the W5500 being controlled by an STM32F4.

If you feel a post has answered your question, please click "Accept as Solution".
CK.3
Associate III

The I2C is operating at 400KHz, may be this is the reason that is delaying/ missing the packet transmission, while SPI operates at 16MHz. As TDK suggested may have to consider concatenating the packets. But concatenating the packets which I have not done before. Any hints/examples please.

Also the LAN reframing may have to be done as suggested by Georgy

CK.3
Associate III

As suggested tried concatenating strings to send as large packets but resulted in the continuous flow of data (repeat of data) without any end.

sprintf(temp_buffer,"%d/%d/%d %d:%d:%d Dt:%d/%d/%d %d %u Ar: %d/%d %d/%d %d/%d %d/%d %d/%d %d/%d %d/%d %d/%d %d/%d %d/%d %d/%d %d/%d %d/%d %d %d \r\n",date_read,month_read,year_read,hours_read,minutes_read,seconds_read,kilo,meter,centi,speed,tacho_pulse,PU_Array1_up,PU_Array1_dn,PU_Array2_up,PU_Array2_dn,PU_Array3_up,PU_Array3_dn,PU_Array4_up,PU_Array4_dn,PU_Array5_up,PU_Array5_dn,PU_Array6_up,PU_Array6_dn,PU_Array7_up,PU_Array7_dn,PU_Array8_up,PU_Array8_dn,PU_Array9_up,PU_Array9_dn,PU_Array10_up,PU_Array10_dn,PU_Array11_up,PU_Array11_dn,PU_Array12_up,PU_Array12_dn,PU_Array13_up,PU_Array13_dn);

strcat(buffer,temp_buffer); // uint8_t buffer[1024], uint8_t temp_buffer[200]

delay(5);

if(strlen(buffer) > 800) //considering each string 176 char long;

{

send(0,buffer,strlen(buffer));

}

where

int send(uint8_t sock,uint8_t *buffer,int buflen)

{

int ptr,offaddr,realaddr,txsize;

// Make sure the TX Free Size Register is available;

txsize = SPI_Read(SO_TX_FSR); //get the free TX memory size

txsize = (((txsize & 0x00FF) << 8 ) + SPI_Read(SO_TX_FSR + 1)); // gets 2 bytes (integer).

// Read the Tx Write Pointer

ptr = SPI_Read(S0_TX_WR);

offaddr = (((ptr & 0x00FF) << 8 ) + SPI_Read(S0_TX_WR + 1));

while(buflen)

{

buflen--;

realaddr = offaddr & TX_BUF_MASK; //Calculate the real W5500 physical Tx Buffer addrs

SPI_WriteBuf(realaddr,*buffer); // Copy the application data to the W5500 Tx Buffer

offaddr++;

buffer++;

}

SPI_Write(S0_TX_WR,(offaddr & 0xFF00) >> 8 ); // Increase the S0_TX_WR value, points to next

SPI_Write(S0_TX_WR + 1,(offaddr & 0x00FF));

SPI_Write(S0_CR,CR_SEND); // Now Send the SEND command

while(SPI_Read(S0_CR)); // Wait for Sending Process

return 1;

}

Is my concatenating wrong or my sub function send() unable to handle

From code you've posted it is unclear how main loop calls are arranged. If you do not want to post the whole code you can contact me directly by email in my profile.

I would suggest following approach:

First, ensure that you can get I2C working alone and performing continuous readout with a required data frame rate. You can toggle some GPIO after all sensor readout is performed, and analyze this pin output frequency on a scope.

Second, debug W5500 speed alone without I2C. In my projects i use wiznet AX1 loopback utility, it sends data from PC to LAN (STM32+W5500), and my firmware sends this data back to PC. AX1 utility shows the speed and checks if data is not corrupted.

Usually it is not required if you already have experience doing this. One way or another, at this stage you have a code which is well tested and prepared to work with both peripherals, and need to be combined to work together.

If something goes wrong, you can define few GPIO outputs and put pin toggle to analyze timings on a two channel scope:

HAL_GPIO_WritePin(GPIOE, GPIO_PIN_9, GPIO_PIN_SET);
// ... W5500 SPI related TX/RX function calls ... PE9 connected to scope channel 1
HAL_GPIO_WritePin(GPIOE, GPIO_PIN_9, GPIO_PIN_RESET);
 
HAL_GPIO_WritePin(GPIOE, GPIO_PIN_10, GPIO_PIN_SET);
// ... I2C related TX/RX function calls .. PE10 connected to scope channel 2
HAL_GPIO_WritePin(GPIOE, GPIO_PIN_10, GPIO_PIN_RESET);
 
void HAL_Some_Interrupt(*hsomething)
{
HAL_GPIO_WritePin(GPIOE, GPIO_PIN_8, GPIO_PIN_SET);
// ... check if your code takes too much time inside interrupt ... connect to scope when needed
HAL_GPIO_WritePin(GPIOE, GPIO_PIN_8, GPIO_PIN_RESET);
}

You can define more GPIO outputs, and investigate lengthy blocking code.

I2C readout is probably more or less deterministic in time, it may work as continuous readout. SPI+W5500 may be a little more complicated mainly because of some unexpected network slowdowns.

The best approach is probably I2C + Circular DMA continuous readouts, I2C DMA interrupt copies data to other larger buffer using memcpy or MEM2MEM. Main loop analyzes current state of large buffer and if there is enough data, sends it to SPI+W5500. For best performance at least one of those things must be non-blocking (either I2C+DMA, or SPI+DMA). For a commercial product I would perform further testing: sending random data from I2C, disconnecting and re-connecting I2C sensors, re-connecting LAN cable, limiting LAN speed to see if double buffering mechanism skips I2C frames properly, etc..

Disappointed with crowdfunding projects? Make a lasting, meaningful impact as a Tech Sponsor instead: Visit TechSponsor.io to Start Your Journey!
CK.3
Associate III

Tried all as suggested and monitored on scope. With timer interrupts operating atleast 4000 times a second, tacho interrupt for speed and 2 PWM operating in back ground in addition to I2C to scan the 32 sensors at least 150 times a second.

Finally achieved by compromising with timer interrupt to 2000 times a second from 4000 and degrading the resolution by scanning the sensor to 120 times a second instead of 150.

The I2C RTC is now scanned only in absence of data.

Thanks TDK and Georgy for being a pillar of support.