where is the STM32F107xx IEEE 1588 PTPd package?

Dennis Chou · ‎2018-02-16

Posted on February 16, 2018 at 18:55

Where can I download the

STM32F107xx lwIP stack with IEEE 1588 PTPd support? I have read the application notes AN3102 and AN3411, and the 3411 note appear to indicate there is a PTPd port for this line of processors.

Thanks

Dennis

#stm32f107xx-ptp-ieee-1588

Tesla DeLorean · ‎2019-05-06

Yes the discussion comes up on a recurrent basis. There was a thread a few months back where the timing and dividers were discussed, perhaps in the F7 or H7 vein. I need to search some more.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2019-05-06

https://community.st.com/s/question/0D50X0000ALw0M2SQJ/can-i-use-the-stm32f4-ptp-clock-without-an-rmii-clock

I'm not 100% convinced with all the math in here, but it was the most interesting PTP related stuff of recent memory.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Dennis Chou · ‎2019-05-06

Thank you everyone for chipping in. Really appreciate it.

VTrue · ‎2019-05-14

Did you succeed the implementation of PTP for F7. Can you get some example?

Tesla DeLorean · ‎2019-05-14

https://community.st.com/s/question/0D50X0000AntJizSQE/ieee1588-ptp-how-to-calculate-addend-app-note-unclear

The mechanics of the PTP unit are consistent across the STM32 F2/F4/F7 range as I recall.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

VTrue · ‎2019-05-15

Thank you. But i need in complete PTP project which used the HAL library. No for commerce. Only for precicsion time measurement and management

cwparker · ‎2019-05-15

The F/H7 add some additional (PTP register) features that are not in the F4, but for PTP implementation they are essentially the same. The biggest difference is the clock speed. I'm hoping to get some spare time (no pun intended) to write up a stand alone example for Cube. I have an implementation working, but it is embedded with all the other code for the application it is being used for, Here is the initialization code for the PTP registers which may be useful. For the F7, just replace the F4's 168 MHz with the F7's clock (most likely 200 or 400 MHz). For me, it made the most sense to use a addin that gives the most accuracy that's less than 2^32. It's just an initial starting point anyway since the PTP algorithm will change it anyway.

void ptp_init()
//
// see "Programming steps for system time generation initialization" in Ref manual
// note that it specifies 0d43 for subsecond increment, which is 43 decimal (see below)
// for fine adjustment control:
//    a subsecond increment (ETH->PTPSSIR) of 43 => ~49.94148 MHz (2^31 / 43) or 20.0234 ns tick
//    thus the add in value (ETH->PTPTSAR) == (2^32 / (168 MHz / 49.94148 MHz))
//                                         == ~1276768000.672 ~= 1276768001
// if we choose a subsecond increment of 16, we get: (2^31 / 16) = 134.217728 MHz or 7.45058 ns tick
//    and an addin of (2^63 / 16) / 168 MHz) == ~3431314001.81
//                                           == ~3431314002
// if we choose a subsecond increment of 13, we get: (2^31 / 13) = 165.191049846 MHz or 6.053596735 ns tick
//    and an addin of (2^63 / 13) / 168 MHz) == ~4223155694.530575
//                                           == ~4223155695 = 0xFBB83DEF
{
	ETH->MACIMR = ETH_MACIMR_TSTIM | ETH_MACIMR_PMTIM;	// disable time stamp (and PMT) interrupt
	ETH->PTPSSIR = 13;					// subsecond (constant) increment
	ETH->PTPTSAR = 4223155695;		// add in register, rollover increments subsecond register
	// flag uP to set the add in register (ETH->PTPTSAR), TSARU must be clear before setting
	while ((ETH->PTPTSCR & ETH_PTPTSCR_TSARU) != 0) ;
	// enamble time stamps and fine adjustment
	ETH->PTPTSCR |= (ETH_PTPTSCR_TSE | ETH_PTPTSCR_TSARU | ETH_PTPTSCR_TSFCU);
	ETH->PTPTSHUR = 0;					// time stamp high update
	ETH->PTPTSLUR = 0;					// time stamp low update
	// flag uP to init time, TSSTI must be zero before setting
	while ((ETH->PTPTSCR & ETH_PTPTSCR_TSSTI) != 0) ;
	ETH->PTPTSCR |= ETH_PTPTSCR_TSSTI;	// time stamp initialize
	ETH->PTPTSCR |= ETH_PTPTSSR_TSSARFE;// enamble time stamps for all received Ethernet packets
 
// output pulse timing select
//	ETH->PTPPPSCR = 0;		// PPS @ 1 Hz
//	ETH->PTPPPSCR = 1;		// PPS @ 2 Hz
//	ETH->PTPPPSCR = 3;		// PPS @ 8 Hz
	ETH->PTPPPSCR = 11;		// PPS @ 2048 Hz
//	ETH->PTPPPSCR = 13;		// PPS @ 8192 Hz
 
	// for HW pulse output on pin PTP_SYNC_OUT (triggers initially as soon as interrupt active)
	ETH->PTPTTLR = 0;
	ETH->PTPTTHR = 0;
}

In my application, PPS output is used to trigger timer 2 (thus synchronizing timer 2 on each of the PTP devices). you can also use it to hook up a scope to each device and visually see how well synchronized they are.

Here's the code that actually updates the PTP clock based on the PTP packets between master and slave:

// update local PTP clock
	ptp_time_t		time_err;
 
// latency = ((t2-t1) + (t4-t3)) / 2;
//         = ((t2-t3) + (t4-t1)) / 2;
// where:
//		t1 = ptp_sync_sent (remote time)
//		t2 = ptp_sync_recv (local time)
//		t3 = ptp_delay_sent (local time) == ptp_sent when we get here
//		t4 = ptp_delay_recv (remote time)
// note: for PTP, all times use subsecond counter << 1 (.LowPart)
	int32_t t2_t3 = (int32_t)(ptp_sync_recv.LowPart - ptp_sent.LowPart);
	int32_t t4_t1 = (int32_t)(ptp_delay_recv.LowPart - ptp_sync_sent.LowPart);
	int32_t latency = (t2_t3 + t4_t1)/2;
 
...
 
	// calculate dt between local time and master
//	// dt = (now - sync_recv) + latency
//	// new_master = sync_sent + dt
//	ptp_time_t		time_now;
//	ptp_get_master_time(&time_now);
//	ptp_subtract(&time_dt, &time_now, &ptp_sync_recv);
//	ptp_offset(&time_dt, latency + ptp_offset);
//	ptp_add(&new_master, &ptp_sync_sent, &time_dt);
//	// dt = (sync_sent + (now - sync_recv) + latency) - now
//	ptp_subtract(&time_dt, &new_master, &time_now);
// simplify to:
//	// dt = (sync_sent - sync_recv) + latency
//	ptp_subtract(&time_dt, &ptp_sync_sent, &ptp_sync_recv);
//	ptp_offset(&time_dt, latency + ptp_offset);
//	// get dt error (due to difference between master and slave clock frequencies)
//	ptp_subtract(&time_err, &time_dt, &ptp_master_dt);
// ultimately simplify to:
	ptp_subtract(&time_err, &ptp_sync_sent, &ptp_sync_recv);
	ptp_offset(&time_err, latency + ptp_offset);
 
	// update PTP clock addin to compensate for difference in frequencies
	int32_t raw_err = (int32_t)time_err.LowPart;
	if (((time_err.HighPart == 0) && (raw_err >= 0)) ||
	    ((time_err.HighPart == -1) && (raw_err < 0)))
	{ // only update addin if previous sample to compare to
//		float err = ((float)raw_err / 2.0f) * (1000000.0f / dt);
		float err = (float)raw_err * 500000.0f / (float)dt; // counts / sec
		// d_addin = (2^32 / 168000000) * (err/13)
		//         = (2^32 / (13 * 168000000)) * err
		//         = 1.9665601172161172161172161172161 * err
		float d_addin = 1.96656f * err;
		float tc = (abs(raw_err) > (ptp_lock_threshold<<1))? 0.8f : 0.96f;
		ptp_filtered_addin = d_addin - tc*(d_addin - ptp_filtered_addin);
		ETH->PTPTSAR += (int32_t)ptp_filtered_addin;
 
		// flag uP to set the add in register (ETH->PTPTSAR), TSARU must be zero before setting
		while ((ETH->PTPTSCR & ETH_PTPTSCR_TSARU) != 0) ;			
		ETH->PTPTSCR |= ETH_PTPTSCR_TSARU;
	}
 
	// update local time to master time
	if (time_err.HighPart >= 0) 
	{
		ETH->PTPTSLUR = (raw_err >> 1);
		ETH->PTPTSHUR = time_err.HighPart;
	} else {
		ETH->PTPTSLUR = 0x80000000 | ((0-raw_err) >> 1);
		ETH->PTPTSHUR = (0 - time_err.HighPart) - 1;
	}
 
	// flag uP to update master time, TSSTU must be zero before setting
	while ((ETH->PTPTSCR & ETH_PTPTSCR_TSSTU) != 0) ;
	ETH->PTPTSCR |= ETH_PTPTSCR_TSSTU;

I should point out that I'm using the 2^31 subsecond mode (and shift it one bit to get "signed" values) since it makes my calculations easier. I'm not using a PTP aware switch. PTP (IEEE 1588) aware switches expect a decimal subsecond count (and aren't worth the $ if you're only using a single switch on your LAN and don't need every last drop of synchronization).

Be aware that clock generation for the uP affects the synchronization quite a bit as well. I did initial development using Nucleo-F429ZI boards (24 of them) and measured a clock dither (oscillation, not frequency difference - there's that too) of over 100 ns per second between two boards using the onboard clock (from 8 MHz crystal for the STLink). I've observed similar dither using Discovery boards. Our original prototype boards for our application used a similar crystal circuit and had the same issue. The new boards have a more accurate clock circuit because of this.

Hope this helps.

Clint

cwparker · ‎2019-05-17

I realized that I left out some code (see previous post - nested) that you'll need to understand what I posted. Where I had the ellipses, you'll need:

	uint32_t now = VM_CLOCK;
	int32_t dt = now - ptp_last_update;
	ptp_last_update = now;

Where VM_CLOCK is defined to be TIM5->CNT and timer 5 is setup to have a 1 us tick (1,000,000 counts per second). This is used in:

	float err = (float)raw_err * 500000.0f / (float)dt; // counts / sec

and is the elapsed time in the slave since the last PTP update (it is not the 'dt' in the comments about latency). The above statement adjusts the error based on the actual time between samples (which for my application is roughly 50 ms).

anotherandrew · ‎2019-05-17

Thank you for this; I'm going to compare it with my own work. I haven't yet been able to figure out why my PTP implementation "explodes" when it goes to apply the one-way delay.

What is "sub-second update mode"? Is that the STMicro "fine update" mode? I believe it is, as that's the mode that uses the addend mode.

Are you using the ptpd code or something homegrown for your ptp implementation?

cwparker · ‎2019-05-20

First, if you didn't notice my other post about the missing 'dt' code from my example above, please look at it.

My application only needed to synchronize clocks between multiple pieces of hardware on a LAN that only uses one (non-PTP aware)switch. As such, it was much easier to not use the ptpd code which implements many IEEE-1588 features I didn't (not to mention I find that code, and the IEEE specification, very hard to wrap my head around).

My application already had peer to peer Ethernet comms implemented and it was very easy to add the PTP messages. In my case, there is a predetermined master which knows who the slaves are (so essentially no discovery process). To eliminate differences in switching latency, the master directly transmits the 'Sync' message to each slave (no broadcast) saving the transmit time stamp from the ST hardware (t1 - master PTP clock). The slave then records the hardware time stamp of when the 'SYNC' is received (t2 - slave PTP clock) and when it sends the 'Delay_Req' records the hardware time stamp for transmit (t3 - slave PTP clock). The mast then records the hardware time stamp for the receipt of the 'Delay_Req' (t4 - master PTP clock) and send a 'Delay_Resp' packet back to the slave containing t1 and t4. The slave then 'syncs' it's PTP clock using the code in the previous message using t1, t2, t3, t4 and dt (the time interval since the last update).

The only part missing from the code in the previous message (other than the actual message send/receive code) is the message filtering done to eliminate packets with excessive latency introduced by the switch. The master transmits a new 'Sync' to each slave (staggered) every 50 ms. The slave calculates the latency (base on t1, t2, t3, and t4) after getting the 'Delay_Resp' message and determines if that latency is 'too' high (an outlier). If it is, it isn't used for the PTP calculation. That code is the most complex of all the PTP code and I can't go into it here. The idea is that by throwing out the outlier packets, you get a nearly minimal round trip through the switch thus eliminating most of the asymmetry from the switch. With a good clock circuit for the uP, we've been able to sync 24 nodes all to well under +/- 100 ns (with ptp_offset = 0, ptp_offeset can be set to another value to improve this, but that requires calibration. This is essentially the OutboundLatency correction described in AN3411.). Directly connecting two nodes, we see results similar to what is presented in AN3411, but that is not realistic for our application that requires up to 24 nodes being synchronized. A PTP aware switch would most likely give better results, but the cost is prohibitive for our application.

I will again finish by stressing, as they did in AN3411, that the clock circuit for the STM32Fx plays a big roll in how accurate of a synchronization you can get. The default crystal circuit on Discovery and Nucleo boards have a lot of low frequency 'dither/drift' that really affects the PTP results.