Long rendering time when moving two images

Marco.R · ‎2019-11-20

Hi @Martin KJELDSEN

On a screen I want to move two large images (one has transparency). I've implemented it like in your birdgame example (with the moveto(x,y) method.

Now I recognized that the animation is very slow and the rendering time is very long (>30ms). I don't know why this take so long. I measured it with the RENDER_TIME GPIO and also in SW. I see that the TouchGFX Library is active for more than 30ms.

Do you have any idea where the issue come from? Below you will find the implementation

Thank you

Best regards

Marco

#include <gui/containers/container_moving_waves.hpp>
#include <BitmapDatabase.hpp>
#include <touchgfx/Color.hpp>
 
container_moving_waves::container_moving_waves(): m_waveForeground(), 
                                                  m_waveBackground(), 
                                                  m_animationState(AnimationState::AnimationRunning),
                                                  m_tickCounter(0U),
                                                  m_tickinterval(1U) {}
 
void container_moving_waves::initialize()
{
    container_moving_wavesBase::initialize();
    initializeLayer(m_waveBackground, BITMAP_WAVE_BACKGROUND_ID, 255U, 1U, -1);
    initializeLayer(m_waveForeground, BITMAP_WAVE_FOREGROUND_ID, 204U, 1U, 1);
 
    m_boxWave.setPosition(0, 217, 800, 33);
    m_boxWave.setColor(touchgfx::Color::getColorFrom24BitRGB(112, 148, 197));
 
    add(m_boxWave);
}
 
void container_moving_waves::startAnimation() {
    m_animationState = AnimationState::AnimationRunning;
}
 
void container_moving_waves::stopAnimation() {
    m_animationState = AnimationState::NoAnimation;
}
 
void container_moving_waves::handleTickEvent() {
    m_tickCounter++;
 
    if ((m_tickCounter % m_tickinterval) != 0U) {
        return;
    }
 
    if (m_animationState == AnimationState::AnimationRunning) {
        moveLayer(m_waveForeground, m_tickCounter);
        moveLayer(m_waveBackground, m_tickCounter);
    }
}
 
void container_moving_waves::initializeLayer(Layer& layer, const BitmapId bmp, uint8_t alpha, const uint32_t animationUpdateSpeed, const int32_t animationWidth)
{
    layer.image0.setBitmap(Bitmap(bmp));
    layer.image1.setBitmap(Bitmap(bmp));
 
    layer.image0.setXY(0, 0U);
    if (animationWidth < 0) {
        layer.image1.setXY(layer.image0.getRect().right(), 0U);
    }
    else {
        layer.image1.setXY(layer.image0.getRect().x - layer.image1.getWidth(), 0U);
    }
 
    layer.image0.setAlpha(alpha);
    layer.image1.setAlpha(alpha);
 
    add(layer.image0);
    add(layer.image1);
 
    layer.animationUpdateSpeed = animationUpdateSpeed;
    layer.animationWidth = animationWidth;
}
 
void container_moving_waves::moveLayer(Layer& layer, const uint32_t tickCount) {
    if ((tickCount % layer.animationUpdateSpeed) == 0U) {
        layer.image0.moveTo(layer.image0.getX() + layer.animationWidth, layer.image0.getY());
        layer.image1.moveTo(layer.image1.getX() + layer.animationWidth, layer.image1.getY());
 
        if (layer.animationWidth < 0) {
            //when moving left
            if (layer.image0.getRect().right() < 0) {
                layer.image0.moveTo(layer.image1.getRect().right(), layer.image0.getY());
            }
 
            if (layer.image1.getRect().right() < 0) {
                layer.image1.moveTo(layer.image0.getRect().right(), layer.image1.getY());
            }
        }
        else {
            //when moving right
            if (layer.image0.getRect().x > layer.image0.getWidth()) {
                layer.image0.moveTo(layer.image1.getRect().x - layer.image0.getWidth(), layer.image0.getY());
            }
 
            if (layer.image1.getRect().x > layer.image1.getWidth()) {
                layer.image1.moveTo(layer.image0.getRect().x - layer.image1.getWidth(), layer.image1.getY());
            }
        }
    }
}

Marco.R · ‎2019-11-21

Hi @Martin KJELDSEN

I found the reason why it takes so long. Because of the large images I use, I stored the images as L8_ARGB8888. When I change back to RGB565/ARGB8888 then the method setupDataCopy will called as expected. The rendering time falls from 73ms down to 42ms which is still high but much better. The cpu load falls from >90% down to 4%.

Is there a workaround to use L8_ARGB8888 and DMA2D together?

As written above the rendering time is still high. But I think It makes sense. I tried to calculate the theoretical renderingtime

Read from QSPI (108MHz):

Img1/Img2 (800x250px / 4Byte/px) -> 14.8ms each

Write to SDRAM (16bit data bus / 108MHz):

Background (Box / 800x480px) -> 3.6ms

Img1/Img2 -> 1.9ms each

In total it should take about 37ms. Is my calculation correct?

According to your experience is a rendering time of about 40ms expectable? Or should it go faster?

Do you have any advices how I can optimize the rendering time for large images? Should I may caching the images into the SDRAM instead of reading out of Flash?

Thank you

Marco

scottSD · ‎2019-11-25

Marco.R,

These are good questions. I have been trying to figure out an implementation of L8_ARGB8888 myself. I asked a question about it, but have not had any responses.

I am still learning about it, but I did find that for the STM32f746g-Discovery kit (which I am currently using), I found a call that the function Martin mentioned (HAL_DMA2D_BlendingStart_IT() ) located in the file TouchGFX\target\STM32F7DMA.cpp. The function making this call is setupDataCopy() which appears to setup the dma copy based on which blitOp operation being used.

However, there doesn't appear to be an operation for indexed color (BlitOp.hpp):

enum BlitOperations
{
    BLIT_OP_COPY = 1 << 0, ///< Copy the source to the destination
    BLIT_OP_FILL = 1 << 1, ///< Fill the destination with color
    BLIT_OP_COPY_WITH_ALPHA = 1 << 2, ///< Copy the source to the destination using the given alpha
    BLIT_OP_FILL_WITH_ALPHA = 1 << 3, ///< Fill the destination with color using the given alpha
    BLIT_OP_COPY_WITH_TRANSPARENT_PIXELS = 1 << 4, ///< Deprecated, ignored. (Copy the source to the destination, but not the transparent pixels)
    BLIT_OP_COPY_ARGB8888 = 1 << 5, ///< Copy the source to the destination, performing per-pixel alpha blending
    BLIT_OP_COPY_ARGB8888_WITH_ALPHA = 1 << 6, ///< Copy the source to the destination, performing per-pixel alpha blending and blending the result with an image-wide alpha
    BLIT_OP_COPY_A4 = 1 << 7, ///< Copy 4-bit source text to destination, performing per-pixel alpha blending
    BLIT_OP_COPY_A8 = 1 << 8 ///< Copy 8-bit source text to destination, performing per-pixel alpha blending
};

BlitOp.hpp contains the BlitOp struct. That struct does contain a pointer to the CLUT (pClut) , so I am not sure why there is no operation for indexed color.

struct BlitOp
{
    uint32_t        operation;  ///< The operation to perform @see BlitOperations
    const uint16_t* pSrc;          ///< Pointer to the source (pixels or indexes)
    const uint8_t*  pClut;         ///< Pointer to the source CLUT entires
    uint16_t*       pDst;          ///< Pointer to the destination
    uint16_t        nSteps;        ///< The number of pixels in a line
    uint16_t        nLoops;        ///< The number of lines
    uint16_t        srcLoopStride; ///< The number of bytes to stride the source after every loop
    uint16_t        dstLoopStride; ///< The number of bytes to stride the destination after every loop
    colortype       color;         ///< Color to fill
    uint8_t         alpha;         ///< The alpha to use
    uint8_t         srcFormat;     ///< The source format @see BitmapFormat
    uint8_t         dstFormat;     ///< The destination format @see BitmapFormat
};

And as far as I know, ChromeArt supports a CLUT.

Martin KJELDSEN · ‎2019-11-25

Hi guys,

I'll try to get to your L8 questions tomorrow. The standard HAL does not support L8, correct.

/Martin

Marco.R · ‎2019-12-04

Hi @Martin KJELDSEN

Just for information. I updated the HAL according your answer in the thread of @scottSD (here) and I get an improved rendertime. The rendering is now about 20% faster than before (about 30ms) with much less memory space is used then before and the cpu Ioad stays below 10%. If there are another possibilities to improve the rendertime, I appreciate for any hint. But I assume thats the limit for my configuration (see my last post with the calculation). Is that correct?

Thanks a lot for your help.

Marco

Martin KJELDSEN · ‎2019-12-04

That's great to hear, Marco. I _think_ i'd need to know more concretely about your application to help optimize. It may be better to simply measure the different read/write times with an oscilloscope to be more accurate. 60ms is a lot if you're aiming for 60HZ - But you also need to know the limitations of your platform in terms of achieving acceptable performance.