Skip to main content
Tobe
Senior III
March 19, 2024
Question

Compiler optimized program has errors

  • March 19, 2024
  • 5 replies
  • 14208 views

While trying to debug a problem, it seems that another one is giving me a hard time.

One function call is not shown, another has the wrong name ("writeFLASHDoubleWord" should be "waitForBusy").

The breakpoint is also missed.

 

Originally the code worked, but then i moved to the new prototype. I copied the project, and made a few minor changes. Mainly just the pins.

I had some hardfault before, which happend seemingly at "CLEAR_BIT(FLASH->CR, FLASH_CR_PG);" after writing FLASH. But after seeing the above, i doubt this!

5 replies

Lead II
March 19, 2024

When you see a "wrong" call trace, strange jumping through your code - my first thought would be: the memory is corrupted.

Any write operation, e.g. writing a buffer, writes to a wrong location, writes beyond the buffer size... Many reasons why to see a "strange" call trace:

  • the code is corrupted ("it jumps strange")
  • the stack is corrupted ("it does not come back from function call in the right way, e.g. registers restored are wrong")
  • the stack is too small ("a function called writes outside stack region")
  • you have not enable something needed ("the code assumes all is enabled and does not check for errors")
  • you malloc (head) region overflows ("malloc called without a check if successful and keep going with wrong pointer")
  • with the debug compile option: the code image contains additional (helper code and meta data) debug info:
    even this can be damaged (and wrong reports)

I would assume first (if debug trace looks "wrong" or "strange"): the memory content is corrupted.

When you say: "I had some hardfault before": have you fixed these issue? Or did they disappear just by adding more code? If so - the root cause is still there, just hidden and now hitting you again.

I assume, you have a bug somewhere in your code: check all writes, memory handling, memory sizes (e.g. heap for malloc, stack size), code writing to memory (buffers), all needed stuff enabled, if you handle "errors" properly (not ignoring issues, not to keep going when anything went wrong already)...

Happy debugging... do a code review.

Tobe
TobeAuthor
Senior III
March 19, 2024

I always try to find the root cause, but sometimes its hard. When the problem dissapears without knowing why, it is the worst thing for me. My RAM is only 7% used. I do not use malloc (at least in my code).

 

I had some minor warnings, that i cleaned up. I also tried another hardware, which brings the same error.

 

Lead II
March 19, 2024

Code Review!

Go over your code and verify if all is reasonable, e.g. right size of buffers, right length when using it, enough stack size allocated...

Check also for different types used, esp. when it comes to "casting". Example:

uint8_t b[10];
void f(uint32_t *b, int i)
{
 while(i--)
 *b++ = i;
}

void main(void)
{
 f((uint32_t *)b, 10);
}

This is for sure a bug! You write way beyond the allocated memory for b[]! This code will corrupt memory (silently) and something very strange can happen.

waclawek.jan
Super User
March 20, 2024

Are you talking about inputWriteIndex? Where else in the program do you use/change it?

JW

Tobe
TobeAuthor
Senior III
March 20, 2024

Its only used in the interrupt "newMessageIRQ()". I just checked.

waclawek.jan
Super User
March 20, 2024

You can use data breakpoint (watchpoint) to find out, where does a variable get written.

JW

Tobe
TobeAuthor
Senior III
March 20, 2024

No, the debugger wont really work with O2. The program runs fine with Og. This is quite an annoying error.

Andrew Neil
Super User
March 20, 2024

Have you tried it (data breakpoint/watchpoint)  ?

The debugger still has full access to the chip's resources - it's mostly lining up source lines to executable code where the problem happens

A complex system that works is invariably found to have evolved from a simple system that worked.A complex system designed from scratch never works and cannot be patched up to make it work.
waclawek.jan
Super User
March 20, 2024

I've just looked at the timestamps in that screenshot in the 2024-03-19 11:54 PM post, and they are consecutive.

It means, your conclusion that the index is incremented by 2 is incorrect.

What IMO happens is, that the messages are buffered in the CAN controller, and you read them out incorrectly/late/in incorrect order.

JW

Tobe
TobeAuthor
Senior III
March 20, 2024

I just found out, that the interrupt is fired twice. But i dont know why, it does not when there is no optimization. Placing breakpoints does not work at all!

This is quite a tough one.

waclawek.jan
Super User
March 20, 2024

Maybe this?

JW

Tobe
TobeAuthor
Senior III
March 21, 2024

No, there is a lot of code behind that part in the interrupt.

I have another finding: When the TFE flag is set, the interrupt is executed a second time, even tough, the interrupt enable IS NOT enabled. How in the world?

 

void newMessageIRQ()
{
	unsigned long nowMicros = micros();
	print(100);

	if(dataFlowControl() != 0)
		return;

	print(hfdcan1.Instance->IE >> 3*8);
	print(hfdcan1.Instance->IE >> 2*8);
	print(hfdcan1.Instance->IE >> 1*8);
	print(hfdcan1.Instance->IE >> 0*8);

	print(101);

	print((uint8_t) (hfdcan1.Instance->IR >> 3*8));
	print((uint8_t) (hfdcan1.Instance->IR >> 2*8));
	print((uint8_t) (hfdcan1.Instance->IR >> 1*8));
	print((uint8_t) (hfdcan1.Instance->IR >> 0*8));

	// TODO low prio: make another version of this, that filters out ids below x beforehand
	print(TK_FDCAN_GetRx0Message(&hfdcan1, &inputBuffer[inputWriteIndex]));
	print(102);

	print((uint8_t) (hfdcan1.Instance->IR >> 3*8));
	print((uint8_t) (hfdcan1.Instance->IR >> 2*8));
	print((uint8_t) (hfdcan1.Instance->IR >> 1*8));
	print((uint8_t) (hfdcan1.Instance->IR >> 0*8));

	SET_BIT(hfdcan1.Instance->IR, FDCAN_IR_TFE); // Clear bit

	// Bootloader does not use ids below ID_REQUEST and 1764 is from diagnose software (dont use it!)
	if(inputBuffer[inputWriteIndex].id < ID_REQUEST || inputBuffer[inputWriteIndex].id == 1764){
		print(99);
		return;
	}

	print(103);
	// Control data
	if(inputBuffer[inputWriteIndex].id < ID_DATA_START)
	{
		lastControlID 	= (uint32_t) inputBuffer[inputWriteIndex].id;
		lastControlMode 	= (uint32_t) inputBuffer[inputWriteIndex].data[0];
		lastAdditionalData 	= (uint32_t) inputBuffer[inputWriteIndex].data[3];
	}

	print(104);

	lastBusActivityMicros = nowMicros; //TODO: DELETE VARIABLE (is not used)
	inputBuffer[inputWriteIndex].timeMicros= nowMicros;
	print(105);

	inputFrameCount++;
	inputWriteIndex++;
	print(106);
	delayMicroseconds(150);
	if(inputWriteIndex >= INPUT_BUFFER_SIZE){
		inputWriteIndex = 0;
		inputOverFlow = true;
	}
	delayMicroseconds(150);
	print(inputReadIndex);
	print(inputWriteIndex);
	print(inputOverFlow);
	delayMicroseconds(150);
}

 

 It prints (for the registers (IE, IR, and IR after the FDCAN routin) only - decimal, one digit for each byte):

When TFE is set in IR:

0001 / 0021 / 0021 (*) / 0001 / 0020 / 0020 ( For every CAN frame)

When TFE is cleared in IR (in the interrupt):

0001 / 0021 / 0021 (First CAN frame)

0001 / 0003 / 0003 (Second CAN frame - 3 means new Message; buffer full)

 

How can this interrupt fire, when it is not enabled?

 

Andrew Neil
Super User
March 21, 2024

@Tobe wrote:

there is a lot of code behind that part in the interrupt.


That's generally a Bad Things - ISRs should generally be kept as short as possible.

 

Placing breakpoints does not work at all! This is quite a tough one.

Although, if it's timing-related, it's possible (likely?) that breakpoints wouldn't help anyhow.

Tough indeed!

Maybe try toggling some pins to observe on a scope or a logic analyser?

 

A complex system that works is invariably found to have evolved from a simple system that worked.A complex system designed from scratch never works and cannot be patched up to make it work.