ST Reps: Need help understanding if this is normal or misconfigured or hardware issue...

Jazman · ‎2021-03-24

Hey all -

I really need one of the ST guys to weigh in on this question... I have been struggling with graphic performance on things that I THOUGHT should be simple and easy for a high power F746 core and TouchGFX, but I'm starting to wonder if my expectations weren't realistic.

Reference: I first got serious about the expected power of the platform with the F469 Discovery board and the TouchGFX demo app where they had the set date wheels spin to a random date. The animation was smooth and the scroll wheels were pretty cool. If a simple demo app could do that with a high-resolution 480x800 pixel display, what could I make it do?!? Unfortunately, the ToughGFX is obscured from us so I really can't easily tell if I have unrealistic expectations or if I have some other issue with configuration or hardware. I don't really have the time to reverse engineer the assembly code as I'm using TouchGFX because it's supposed to SAVE me development time.

So, I've implemented a few features (well a number of them but a few are giving me consternation). The two features are this:

1. An alpha fade-in of a logo using the scaleable widget app with bilinear interpolation.

2. Scroll wheels to set date and time... 5 wheel total on one screen.

The question I really need answered by the ST personnel here is this: Does my data look reasonable for a correctly configured system or should my performance really be higher and I have a problem to find. I've already spent days and I don't want to waste more if it seems like my system is performing as it should be.

1. I setup a startup screen using a scaleable image with bilinear interpolation (gave best visual quality). Alpha fade in over 1000ms... hold for 3000ms, then fade out over 100ms. Using the TouchGFXGPIO.cpp and the RENDER_TIME setting for an I/O pin, the render time is 165ms per frame. If I disable ChromeART, that goes to 170ms per frame. This results in an unusable feature.

2. To be "cool", I am using the "animatetoItem" function with a 60 frame (ideally 1s) setting. This results in render time of about 170ms per frame. Also undesireable.

HARDWARE DETAILS and further testing:

Custom board

LCD is 480x800 pixels RGB565 interface using LTDC, 27.4MHz dot clock

STM32F746 running at 200MHz

Frame buffer in external DRAM running at 100MHZ (HCLK/2)

STM32CubeIDE v1.6.0

STM32Cube v1.16.1

TouchGFX v4.15.0

ChromeART enabled/ DMA2D

1. Fading in image using changing alpha with the scaleable image widget. Native image is 400x106 scaling to 700x185 using bilinear interpolation. Render time per frame is 165ms w/ChromeART and 170ms with it disabled. It appears from probing the SDRAM that render time per pixel might be 400-500ns/pixel if accessed randomly 1 pixel at a time.

As a test, I used a fixed image widget and the render time is 4ms per frame (well within the VSYNC). If this scaling was the only limitation, I would have written it off and moved on. But because of example #2 which is an actual feature I'd like to use, I'm in a quandry.

2. I've using 5 scroll wheel widgets on the screen at once. Three of them are 74x355 pixels and two are 180 x 355. If I use the "animatetoItem" function on entering the screen, the frame rate is around 170ms per frame which results in very jerky animation. Again, if that was the only symptom, I would have probably just moved on.

HOWEVER, if I use my finger to scroll the smaller, 74x355 wheel it takes roughly 40ms to render the frame and it takes roughly 100ms to render the frame for the 180x355 wheel. All tests were with ChromeART on. I didn't turn it off for this test. It appears to take roughly 500ns per pixel most of the time but some are shorter and some are longer. This results in a sluggish and non-responsive UI.

SO- The real question is this... does it look like I have some sort of issue or do those times make sense given what I'm trying to do? I've probed the SDRAM bus and it definitely looks like lockDMAToFrontPorch is false for best performance. I can see a lot of SDRAM activity in the dead time of LCD refresh compared to static screen just refreshing the screen.

Thank you in advance for your prompt response!

Keith

Alexandre RENOUX · ‎2021-03-24

Hello Jazman,

Did you try to check on an ST board if your UI was performing well ? Like the F469-DISCO you mentioned in you message ?

This would be the first thing to do to ensure that it's not a direct TouchGFX issue.

If you cannot test maybe you can enclose your UI (only UI, no hardware related code, just the simulator) and we will try to check.

At first glance, what you are doing should not impact heavily performance like you experienced.

If it works on an ST board (which is always the first thing to test in case of performance issue) then your hardware is probably wrongly configured.

/Alexandre

Michael K · ‎2021-03-24

I too was disappointed with the performance until I enabled optimizations, specifically Optimize for Speed. In CubeIDE right click on your project in the sidebar, properties, C,C++ Build, Settings, Tool Settings.

MM..1 · ‎2021-03-25

Your point 1. is normal scalable images is hard power for MCU and dont be accelerated . ChromART isnt Nvidia core...

Scalable is for use in realtime change size without animate or alpha. When you need animate and store other size, generate dynamic image resized and use it for alpha animation. But here you need space for it in sdram for example.

Point 2 is maybe from any bad CubeMX implement between versions. I have simmilar issues and now i replace target/generated files from TouchGFX 4.13 to my projects regenerated in CubeIDE or MX to higher versions, where this files especialy DMA2D seems be bad .

Jazman · ‎2021-03-26

Intermediate answer: @Michael K - Thank you very much for that feedback. Honestly, compiler optimization didn't really occur to me at this point in the project. I don't usually switch on compiler optimization until later in the project as you usually lose realistic debug capability. Given the fact that we're being delivered pre-compiled libraries for TouchGFX, I am quite shocked that it made a difference. There must be a lot more interaction between the libraries and the user code than I expected. For the "animateto" on the 5 wheels, it amounted to a 30% improvement. Still not enough to be usable, but a significant improvement. For the scrollwheel performance for a user, there was a 55% improvement on the smaller wheel which did bring it to about the same as the framerate and usably responsive. The larger wheel had a 65% improvement. Still awkward to use, but I'm not done investigating yet.

For the fade-in/fade-out of the logo screen with a scaled image, there was no improvement. I switched the animation to a fixed image the same size as the scaled animation with an alpha fade and it had a time of 80ms with optimization off and 78ms with it on. So there was neglible improvement there. However, that is one of those things with very little interaction with user code.

I haven't gone back to the DISCO board yet. However, I remember previously there wasn't an example app that worked with the TouchGFX where you could actually use the designer (that board has a DSI interface which isn't natively supported). Without that, it's a useless experiment since there is no way to know I configured the board correctly.

Keith

More as I figure more out. THANK YOU MICHAEL!

Michael K · ‎2021-03-26

Glad it was helpful!

Off the top of my head, I think the pixel clock for the f769/f469 disco is higher than that (42MHz? 48MHz?). I was getting screen tearing issues that were solved by lowering the clock to 32MHz, but that also slowed down my animations by a good bit. Luckily that's when I discovered the optimizations and they increased back to acceptable levels.

By the way, the F469 disco indeed has a TouchGFX application template that works with the designer. That screen also has a 40ish MHz LCD if I remember correctly so you might get some extra performance if that's the bottleneck.

Jazman · ‎2021-04-05

Going to wrap this up for the time being. After a significant amount of trial and error, it looks like the main cause of the render time has to do with the image format selected. The best performance seems to be with ARGB8888 being used with my RGB565 screen. I had been using L8_ARGB8888 but that slows things down significantly.

Now, why this is I don't know. The ChromeART does have hardware conversion for L8 format input and RGB565 output. I've tried setting different ChromeART settings in the CubeIDE but it doesn't seem to have any effect. In theory, this would be correct since TouchGFX SHOULD be changing the mode of ChromeART depending on what it's doing, but I'm not sure it is. If I get the time, I will go back and see if I can breakpoint at certain areas and see if the settings in the registers are matching.

But I managed to get some performance that's close enough to what I need for now. It's not final since I can't really use raw ARGB8888 formats for image storage in flash due to size, but I have some ideas on how to manage that which I need to work out. I need to work that out for other reasons as well but that's a different thread.

Michael: Those boards use a DSI LCD and my board is using a regular RGB+sync interface. The dot clock is 27.5MHz with porch timings recommended by the manufacturer. I tried to increase the clock rate and porch lengths that are still within the datasheet for my LCD, and the scope timings look OK, but the LCD isn't happy. I was hoping to create more "dead" time for the graphics engine to work but had no luck. Maybe something to revisit if I need to...

*EDIT* Forgot to add, also discovered something related to double buffering with framebuffers in external RAM: If your external RAM has banks (most do), you should probably put the 2 framebuffers in different banks. I got a 10% improvement in render time by specifying the framebuffer addresses in the CubeIDE to be different banks.

Keith