2025-01-02 08:45 AM - edited 2025-01-03 06:33 AM
Hi Folks and Happy New Year!
I've been trying to track down a problem I'm having with my project for several months now. Essentially I'm getting these red-ish lines through portions of my buttons, but only when (I presume) they are drawn or redrawn - namely when the screen first loads, when they are pressed, or when as in this example, they have text drawn over them that updates.
In the case of the above picture, I've been able to get the lines to appear more consistently by layering some text over top of some of the buttons and then updating that text with each tick where it says "Tick Counter: xxxx". I managed to luck out and have it appear on the "Cancel" of the cancel button this time, though it probably only happens about 1 out of 10 times when the screen loads.
These lines have been appearing pretty consistently throughout my development on the project, and they seem (though I'm still not completely sure) to be happening regardless of my application code. The reason I say this is that I've commented out large portions of my application code and had them still appear. Frustratingly, they are not consistent. I can do a build, have them appear, and then insert a trivial line of code (even a NOP), and then they will disappear, only to reappear again after I add a few more lines.
This really seems to point to some sort of issue where I'm writing off the edge of an array or something (it seems that parts of my frame buffer are getting inadvertently overwritten? But then why just buttons? And why does it still happen when I essentially remove my application code?)
I've included my TOUCHGFXHAL code that has my driver code in it.
Details on my project:
MCU: STM32L496VGTN
Display: NHD-2.8-240320AF-CSXP-FT (ST7789 Controller) connected via FMC 16-bit parallel
External Flash: MT25QL128A using quad-spi with a custom loader
Firmware details: No RTOS (Bare Metal/super-loop)
Touch GFX Configuration:
Other possible helpful details:
Any tips, tricks, or advice towards debugging this issue would be greatly appreciated. You might say figuring this out has become my new year's resolution!
2025-01-27 07:14 AM
I'm not sure if I've fully tested all of your suggestions, or if I addressed them all in my last post, but just tried to make sure that the QSPI flash data is reliably transferring to the display by repeatedly writing one of the button bitmaps straight from memory to the display:
void displayBitmap(uint16_t bitmapId)
{
// Retrieve the bitmap
touchgfx::Bitmap bitmap = touchgfx::Bitmap(bitmapId);
// Get the pixel data and dimensions
const uint8_t* pixelData = bitmap.getData();
uint16_t width = bitmap.getWidth();
uint16_t height = bitmap.getHeight();
// Configure display to receive data for the bitmap
setLCDwindow(100, 100, width, height); // Set display window to the bitmap size
LCD_IO_WriteReg(RAMWR); // Write to display RAM
// Transfer the bitmap data to the display via FMC
for (uint32_t y = 0; y < height; y++)
{
for (uint32_t x = 0; x < width; x++)
{
uint32_t pixel = ((uint32_t*)pixelData)[y * width + x];
uint16_t rgb565 = ((pixel >> 8) & 0xF800) | // Extract red (bits 16-23), shift, and mask for 5 bits
((pixel >> 5) & 0x07E0) | // Extract green (bits 8-15), shift, and mask for 6 bits
((pixel >> 3) & 0x001F); // Extract blue (bits 0-7), shift, and mask for 5 bits
LCD_IO_WriteData( rgb565 ); // Write each pixel to the display
}
}
}
I commented out all the TouchGFX code and am just running this 10x per second:
displayBitmap(BITMAP_BUTTON_140X100_ACTIVE_ID);
The button's image draws to the screen with no problems, artifacts or distortions. I'm not sure if this is fully testing the QSPI flash connection/drivers, but figured it was worth a try.
To your other suggestion of flushing the DMA prior to writing the display in flushFrameBuffer(), this doesn't seem to make any difference, though I probably need to do some more testing there. In this case I implemented it this way:
void TouchGFXHAL::flushFrameBuffer(const touchgfx::Rect& rect)
{
dma.flush();
uint16_t* frameBuffer = getTFTFrameBuffer();
setLCDwindow(rect.x, rect.y, rect.width, rect.height);
LCD_IO_WriteReg(RAMWR);
for (int32_t y = rect.y; y < rect.bottom(); y++) {
for (int32_t x = rect.x; x < rect.right(); x++) {
*FMC_BANK1_MEM = frameBuffer[y * TFT_WIDTH + x];
}
}
TouchGFXGeneratedHAL::flushFrameBuffer(rect);
}
Again, I'm not sure if this was what you had in mind, so feel free to redirect me as necessary. Thank you!
2025-01-28 12:42 AM
This was indeed what I meant.
I dno't know what to chase here. You write in your original post that you have verified the errors are in the frame buffer. Have you verified this by dumping memory, or what do you mean by that statement?
In that case, looking at the code that transfers to the display is probably barking up the wrong tree.
Have you looked at how large the Rect you get is, and how long the data transfer takes? You could, for example, pull a GPIO high while transferring, that would show you if you miss frames. If you are ever close to your VSync speed, that might cause problems.
You could try modifying your test to cache the bitmap in RAM, but again, if you are entirely sure the problem is also present in your framebuffer, this will probably also show no improvement.
You should probably enter the inner loop to do this:
*FMC_BANK1_MEM = frameBuffer[y * TFT_WIDTH + x];
__DSB();
DSB synchronizes memory, so the program will not continue running until all operations that affect memory, including caches, branch prediction and so on has completed before continuing. That is advisable to do for FMC in any case.
2025-01-28 06:25 AM
@mathiasmarkussen - I verified that the errant pixels were in the frame buffer by putting a breakpoint at the end of my transfer-framebuffer-to-display routine and then stepping the code until I saw one of the errors, and then reading out the frame buffer using the memory browser at the location that I see the artifact at:
For example:
Which I was able to determine with a little guessing to be at line 229 in my display, so in the memory browser it looks like:
0xF992 is a magenta color the same color I'm seeing on my display (and checking the rest of the colors before and after, they are also consistent with what I'm seeing on the display as well):
To your point about transfer time, if I perform the transfer using the CPU:
for (int32_t i = 0; i < TFT_WIDTH*TFT_HEIGHT; i++) {
LCD_IO_WriteData(currFbBase[i]);
}
Then it takes about 70% of all available computational time available (obviously not sustainable). The On-Time for the pulse below represents the amount of time required to perform the transfer.
Using DMA2D transfers (using the code in the post above), I can get this down to less than 10%, which is obviously the way to go...
But what is interesting (or perhaps obvious) is that no matter what I use, I still get the artifacts, which further reinforces your point that the problem is occurring before we transfer the image, and that the transfer code isn't the issue. So then the next question I suppose would be what do I look at next? TouchGFX manages everything from a rendering perspective, so how would I be messing that up? Memory buffer overrun? Blowing my stack? These both seem unlikely since I've managed to comment out nearly all of my application code and still get the artifacts.
I did also try:
for (int32_t i = 0; i < TFT_WIDTH*TFT_HEIGHT; i++) {
LCD_IO_WriteData(currFbBase[i]);
__DSB();
}
But that had seemingly no effect. Was that how I was supposed to use the __DSB() statement? (Thank you for your continued support on this!)
2025-01-29 02:10 AM
That was indeed what I meant regarding the __DSB() statement.
Looking at the picture in your original post, it seems that the errors span the text in the cancel button, but the error is not present in the pixels that belong to the text. To me, that indicates that the error is in the bitmap that is loaded from flash, and not the rendering.
That leaves DMA2D and the flash itself.
Do you have room in internal flash for one of you buttons? You could use the L8 format or RGB compression if needed. You could try that and see if you still get errors in the other buttons, but not the one in internal flash - that would indicate that the problem is with your external flash. I would guess it is a signal integrity issue rather than an error in your code.
You could also try to decrease the clock speed of the flash significantly and see if the errors still appear at a similar rate.
2025-01-29 01:04 PM
Thanks for that observation. I'm definitely getting artifacts over the buttons in where I invalidate that centrally located text area, but your absolutely right, the one I picked to show what I was seeing didn't have any text over it, although the whole purpose of the text area drawing over the buttons was to force invalidation and try and make the artifacts show up more consistently so that they could be easily detected when they happen.
Per your suggestion (as well as @JTP1's), I commented out the linker assignments that push all the graphical assets to the external flash:
...
/*ExtFlashSection :
{
*(ExtFlashSection ExtFlashSection.*)
*(.gnu.linkonce.r.*)
. = ALIGN(0x4);
} >QUADSPI
FontFlashSection :
{
*(FontFlashSection FontFlashSection.*)
*(.gnu.linkonce.r.*)
. = ALIGN(0x4);
} >QUADSPI
TextFlashSection :
{
*(TextFlashSection TextFlashSection.*)
*(.gnu.linkonce.r.*)
. = ALIGN(0x4);
} >QUADSPI*/
}
and the artifacts do go away.
I then started digging into my QUADSPI MX settings and driver code.
The first thing I noticed is that I have my QUADSPI pre-scaler set to 2, and with my main clock running at 80MHz, I assume this would mean that it was running at 20MHz, and my oscilloscope agrees. Setting the prescaler to 0 makes it run at 80MHz, and interestingly (or maybe frustratingly) the artifacts go away. No sure what's going on there, as I would expect if I were having a timing issue that running faster would make things much worse.
I did also note something in my QUADSPI driver code that didn't make much sense, namely in the QSPI_Configuration function:
/*Enable quad mode and set dummy cycles count*/
uint8_t QSPI_Configuration(void) {
QSPI_CommandTypeDef sCommand;
uint16_t reg;
// Command Code: READ VOLATILE CONFIGURATION REGISTER (Command: 0x85)
sCommand.Instruction = READ_VOLATILE_CONFIG_REG_CMD;
// Command-Address-Data (Single Line SPI): 1-0-1
sCommand.InstructionMode = QSPI_INSTRUCTION_1_LINE;
sCommand.AddressMode = QSPI_ADDRESS_NONE;
sCommand.DataMode = QSPI_DATA_1_LINE;
// Address Bytes: None (not required)
sCommand.AddressSize = 0; // No address for this command
// Dummy Clock Cycles: 0
sCommand.DummyCycles = 0;
// Other
sCommand.AlternateByteMode = QSPI_ALTERNATE_BYTES_NONE;
sCommand.DdrMode = QSPI_DDR_MODE_DISABLE;
sCommand.DdrHoldHalfCycle = QSPI_DDR_HHC_ANALOG_DELAY;
sCommand.SIOOMode = QSPI_SIOO_INST_EVERY_CMD; // Instruction sent with every command
// Number of data bytes to read: 2
sCommand.NbData = 1;
if (HAL_QSPI_Command(&hqspi, &sCommand, HAL_QPSI_TIMEOUT_DEFAULT_VALUE)
!= HAL_OK) {
return HAL_ERROR;
}
if (HAL_QSPI_Receive(&hqspi, (uint8_t*)(®), HAL_QPSI_TIMEOUT_DEFAULT_VALUE) != HAL_OK) {
return HAL_ERROR;
}
if (QSPI_WriteEnable() != HAL_OK) {
return HAL_ERROR;
}
/*set dummy cycles*/
MODIFY_REG(reg, 0xF0F0, ((DUMMY_CLOCK_CYCLES_READ_QUAD << 4) | (DUMMY_CLOCK_CYCLES_READ_QUAD << 12)));
sCommand.Instruction = WRITE_VOL_CFG_REG_CMD;
if (HAL_QSPI_Command(&hqspi, &sCommand, HAL_QPSI_TIMEOUT_DEFAULT_VALUE)
!= HAL_OK) {
return HAL_ERROR;
}
if (HAL_QSPI_Transmit(&hqspi, (uint8_t*)(®), HAL_QPSI_TIMEOUT_DEFAULT_VALUE) != HAL_OK) {
return HAL_ERROR;
}
return HAL_OK;
}
This code is of course borrowed from one of the BSP/demo code packages (I forget exactly which one), and one line I didn't understand and just left the way it was was:
MODIFY_REG(reg, 0xF0F0, ((DUMMY_CLOCK_CYCLES_READ_QUAD << 4) | (DUMMY_CLOCK_CYCLES_READ_QUAD << 12)));
But looking in the datasheet, the volatile configuration register only 8 bits wide, so probably don't need or want the '(DUMMY_CLOCK_CYLES_READ_QUAD<<12)'. If remove it, the artifacts go away, though this is far from a definitive fix, especially given how they go away for any number of other changes. I'm not sure if I'm circling the root problem here or just finding more red herrings.
For completeness, I should mention that my Dummy Cycles are set to 8, which I believe is correct based on the datasheet (or perhaps even a little conservative).
Signal integrity-wise things seem ok at 20MHz (Prescaler = 2), but I get artifacts. At 80MHz (Prescaler = 0), no artifacts, but signals look not great, though I'm not sure if this isn't just my scope setup. At prescaler = 3, and 10 (two arbitrary values that I picked) the artifacts also don't appear. So I don't know what to think.
I haven't figured out how to put one button in internal flash and leave the rest external, though i'll see if I can figure that out tomorrow.
2025-01-29 01:05 PM
2025-01-30 01:59 AM
I think you'll be better off troubleshooting QSPI issues elsewhere in the forum.
I can say that my experience is that dummy cycles do matter quite a lot, but you will usually lose the first pixels, not the last ones. In you case the buttons are quite uniform in colour, so that might be the case, of course, although I would assume the flash would just continue delivering sequential bits, which would be the same ones still. But according to the datasheet I found you should be fine with 1 dummy bit at 39 MHz, and it does not specify anything lower. At 80 MHz you should be at 5 or 6. It might be the issue, but again, I'm no QSPI expert :)
Why are you running at 20 instead of 80?
You can put single images in the internal flash in the designer as show in the screenshot.
.
2025-01-30 01:49 PM
Sounds good on the qspi issues - I'll take it there once I confirm that's what I'm dealing with.
Why am I running at 20 MHz instead of 80? That's a great question, without a great answer. In part because I set it that way a long time ago not necessarily understanding that's what I was setting it to and...now I'm leaving it there because at the moment if I change to anything else, the artifacts go away...including when I run the qspi faster...except I know they're not truly gone, they are just waiting to show up later...which is what makes this whole thing so gosh-darn impossible to figure out.
Thanks so much for that screen shot, that was really helpful.
I put a copy of one of the buttons into internal flash, and you are right, no artifacts on that button, while they appear on the other two coming from external qspi flash. So...qspi flash it is! Except...
I thought "well how about I run comparison of the bitmap in external flash with the one on internal flash":
void compareBitmaps()
{
// Load bitmaps from internal flash and QSPI flash
const uint8_t* internalBitmap = touchgfx::Bitmap(BITMAP_BUTTON_140X100_ACTIVE_INTERNAL_ID).getData();
const uint8_t* qspiBitmap = touchgfx::Bitmap(BITMAP_BUTTON_140X100_ACTIVE_ID).getData();
// Get bitmap sizes (assuming both are the same size)
uint32_t size = touchgfx::Bitmap(BITMAP_BUTTON_140X100_ACTIVE_ID).getWidth() *
touchgfx::Bitmap(BITMAP_BUTTON_140X100_ACTIVE_ID).getHeight() *
4; // Bitmaps seem to be stored in ARGB8888 format
// Compare memory contents
for (uint32_t i = 0; i < size; i++)
{
if (internalBitmap[i] != qspiBitmap[i])
{
TX_String("Mismatch in Memory!!!\r\n"); // <== Set a BREAKPOINT HERE!!!!
}
}
}
And would you believe I've been running every tick it for over an hour with it never halting at the breakpoint? I'm also doing my normal display routines and there's artifacts all over the buttons per usual. What the....?
I wonder if it's too late to pursue a career in theatrical lighting and set design?