cancel
Showing results for 
Search instead for 
Did you mean: 

STM32N6570-DK VENC_RTSP_Server Example Performance

EmbeddedFun
Visitor
STM32CubeIde 1.19.0
STM32Cube_FW_N6_V1.2.0
RTSP Client: FFPLAY
 
I've been evaluating the VENC_RTSP_Server example for a little while as a starting point for a potential video streaming application. Unfortunately, the example does not produce a stable stream. The stream was unstable prior to FW v1.20 as well, but a new check and message was added in FW v1.20 (MSG: "Video Overflow - Skip Frame") that seems to point to the bottleneck being on the encoding side somewhere and not the network side. I'll detail some of the issues here. Some of these issues address quality problems and may or may not help this performance, but others do affect it. I'm also certainly open to the possibility that my analysis could be wrong in some cases as I haven't had much time working with this part and the STM32Cube tools. 
 
File: Appli/Src/main.c
 
Issue #1: 
A) The PLL2 configuration in SystemClock_Config() is invalid according to STM32CubeMX or at least it will not allow this configuration. However, HAL_RCC_OscConfig() does not throw an error for this invalid configuration. If this is in fact an invalid config, then an error result should be returned.
 
B) Best I can tell, there isn't anything in this example that uses PLL2 or any IC derived from it. I suspect this is leftover from something this example was based on. The example still runs as expected when setting RCC_OscInitStruct.PLL2.PLLState = RCC_PLL_NONE.
 
Issue #2: 
Ethernet clock is very low (80 MHz). Prior to FW v1.20, I wasn't sure where the bottleneck was. So, I looked for anything that would/could improve performance when I found this. Unfortunately, adjusting this up from 80 MHz to 200 MHz didn't seem to make any difference in the steam quality. 
 
void SystemClock_Config(void)
{
  ... 
  // Issue 1
  RCC_OscInitStruct.PLL2.PLLState = RCC_PLL_ON;
  RCC_OscInitStruct.PLL2.PLLSource = RCC_PLLSOURCE_HSI; // HSI = 64 MHZ
  // Broken: /32 results in a 2 MHz input to the PLL, 
  // which has a 5 MHz minimum according to STM32CubeMX.
//  RCC_OscInitStruct.PLL2.PLLM = 32;  
  RCC_OscInitStruct.PLL2.PLLM = 8;
  RCC_OscInitStruct.PLL2.PLLN = 125;
  RCC_OscInitStruct.PLL2.PLLFractional = 0;
  RCC_OscInitStruct.PLL2.PLLP1 = 1;
  RCC_OscInitStruct.PLL2.PLLP2 = 1;    // PLL2 = 64 / 8 * 125 / 1 / 1 = 1000 MHz
  
  ... 
  // Issue 2
  PeriphClkInitStruct.PeriphClockSelection = RCC_PERIPHCLK_ETH1;
  PeriphClkInitStruct.Eth1ClockSelection = RCC_ETH1CLKSOURCE_IC12;
  PeriphClkInitStruct.ICSelection[RCC_IC12].ClockSelection = RCC_ICCLKSOURCE_PLL1; // PLL1 = 1200 MHz
//  PeriphClkInitStruct.ICSelection[RCC_IC12].ClockDivider = 15; // /15 = 80 MHz -- Why so slow? 
  PeriphClkInitStruct.ICSelection[RCC_IC12].ClockDivider = 6; // /6 = 200 MHz (MAX);
  ...
} 
 
File: Appli/Src/venc_app.c
 
Issue #3:
The frame rate definition "#define FRAMERATE 20" is set and used in this file, but not consistently within this file or other files in this example, like Appli/NetXDuo/App/app_rtsp_over_rtp.c, which uses DEMO_VIDEO_FRAME_PER_SECOND. 
 
void venc_thread_func(ULONG arg)
{
  ...
  // Magic number: Clearly frame rate, but doesn't use the FRAMERATE define
  IMX335_SetFramerate(Camera_CompObj, 30); 
  ...
}

int encoder_prepare(uint32_t width, uint32_t height)
{
  ...
  /* 30 fps frame rate */
  cfg.frameRateDenom = 1;

  // Great, it uses the #define, but definitely doesn't match the 30 FPS set for the camera
  // OR the comment directly above...
  cfg.frameRateNum = FRAMERATE;
  ...
  ratectrl_cfg.qpHdr = 25;
  ratectrl_cfg.bitPerSecond = 1000000;
  ratectrl_cfg.pictureRc = 0;

  // Magic Number: Sure seems like frame rate based on context. 
  // Inconsistent with FRAMERATE.
  // Should be a configurable multipe of seconds of the frame rate. 
  ratectrl_cfg.gopLen = 30;      
  ratectrl_cfg.intraQpDelta = 0;
  ratectrl_cfg.fixedIntraQp = 0;
  ratectrl_cfg.hrd = 0;
  ret = H264EncSetRateCtrl(encoder, &ratectrl_cfg);
  ...
}

static int encode_frame(void)
{
  ...
  venc_output_frame_t frame_buffer = {0};
  int ret = H264ENC_FRAME_READY;

  // Magic Number: Sure seems like frame rate base on context. Inconsistent with FRAMERATE
  if (!(frame_nb % 30))           
  {
    /* if frame is the first : set as intra coded */
    encIn.codingType = H264ENC_INTRA_FRAME;
  }
  ...
}
 
Issue #4:
Comments and code do not align in HAL_StatusTypeDef MX_DCMIPP_ClockConfig(). Comment claims PLL2 is used, when it is PLL1. Commented values are also incorrect based on use of PLL1. Also, what is IC18 being setup for?
 
HAL_StatusTypeDef MX_DCMIPP_ClockConfig(DCMIPP_HandleTypeDef *hdcmipp)
{
 RCC_PeriphCLKInitTypeDef PeriphClkInit = {0};

  /* Configure DCMIPP ck_ker_dcmipp to ic17 with PLL2 (1000MHz) / 3 = 333MHz */ // <-- PLL1 = 1200 MHz
  PeriphClkInit.PeriphClockSelection |= RCC_PERIPHCLK_DCMIPP;
  PeriphClkInit.DcmippClockSelection = RCC_DCMIPPCLKSOURCE_IC17;
  PeriphClkInit.ICSelection[RCC_IC17].ClockSelection = RCC_ICCLKSOURCE_PLL1;
  PeriphClkInit.ICSelection[RCC_IC17].ClockDivider = 4; // <-- 1200 / 4 = 300 MHz
  if (HAL_RCCEx_PeriphCLKConfig(&PeriphClkInit) != HAL_OK)
  {
    return HAL_ERROR;
  }
  LL_RCC_IC18_SetSource(LL_RCC_ICCLKSOURCE_PLL1);
  LL_RCC_IC18_SetDivider(60);   /* 800/40=20Mhz */ // <-- Wrong: 1200 / 60 = 20 MHz
  LL_RCC_IC18_Enable();

  return HAL_OK;
}
 
Issue #5:
The xDivFactors of the downsizing configration in MX_DCMIPP_Init() appears to be off by 1. From the reference manual Table 354:
 - The optimal is computed as xRATIO = Floor (8192 * xRatioFloatingPoint), maximum value 65535 (x = H or V).
 - The optimal in function of xRATIO is xDIV = Floor((1024 * 8192 - 1) / xRATIO) (x = H or V).
 
IMX335 Resolution: 2592x1944
HRatioFloatingPoint = 2592/960 = 2.7
VRatioFloatingPoint = 1944/720 = 2.7
xRATIO = Floor(8192*xRatioFloatingPoint) = Floor(22118.4) = 22118 
xDiv = Floor((1024*8192-1)/xRATIO) = Floor(8388607/22118) = Floor(379.2661) = 379
 
Since this value is very close to what the RM claims should be used, it makes me wonder if this is an error or an undocumented conscious choice. Please confirm.
 
HAL_StatusTypeDef MX_DCMIPP_Init(DCMIPP_HandleTypeDef *hdcmipp)
{
  ...
  /* Configure the downsize */
  DonwsizeConf.HRatio      = 22118;
  DonwsizeConf.VRatio      = 22118;
  DonwsizeConf.HSize       = 960;
  DonwsizeConf.VSize       = 720;
  DonwsizeConf.HDivFactor  = 380;       // 379
  DonwsizeConf.VDivFactor  = 380;       // 379
  ...
}
 
Issue #6: 
HAL_StatusTypeDef MX_LTDC_ClockConfig(LTDC_HandleTypeDef *hltdc) is unused and appears to be leftover from another example that this was created from. No effect on application when removed.
 
File: Appli/NetXDuo/App/app_rtsp_over_rtp.c
 
Issue #7:
Related to Issue #3. This file implements another way of setting the frame rate of the stream. It defaults to 30 here, but FRAMERATE was set to 20 back in venc_app.c
 
/* Define video & audio play fps. !Note: this macro shall be the same as the real FPS to guarantee video playing normally */
#ifndef DEMO_VIDEO_FRAME_PER_SECOND
#define DEMO_VIDEO_FRAME_PER_SECOND       30
#endif /* DEMO_VIDEO_FRAME_PER_SECOND */
 
I'm on board with setting a default when one is not defined. However, the README.md says nothing about setting/changing the frame rate. Obviously, even if it was documented, it would only change it in some places as the code currently stands. Anywhere/everywhere the frame rate is used, it should be set from a single source. A preprocessor define as is intended with DEMO_VIDEO_FRAME_PER_SECOND would be ideal.
 
Issue #8:
The Demo Timer created is using DEMO_PLAY_TIMER_INTERVAL for frame packet throttling. However, DEMO_PLAY_TIMER_INTERVAL is completely decoupled from the frame rate. Another macro, DEMO_RTP_VIDEO_PLAY_INTERVAL, is calculated using DEMO_VIDEO_FRAME_PER_SECOND. While I don't like this timer based implementation for sending frames when it could/should be paced by the camera and encoder FPS, it should at least be based on the currently configured FPS of the system.
 
    /* Create the global timeout timer.  */
    status = tx_timer_create(&demo_timer, "Demo Timer", demo_timer_entry, 0,
                             (DEMO_PLAY_TIMER_INTERVAL * NX_IP_PERIODIC_RATE / 1000), // use DEMO_RTP_VIDEO_PLAY_INTERVAL instead
                             (DEMO_PLAY_TIMER_INTERVAL * NX_IP_PERIODIC_RATE / 1000), // use DEMO_RTP_VIDEO_PLAY_INTERVAL instead
                             TX_AUTO_ACTIVATE);

 

 
Despite making the frame rate consistent at 30 FPS and a few of the other fixes from above, when I run the application in debug mode I get the output below on the terminal. I get a few "Video Overflow - Skip Frame" messages right off. If the camera is stationary and pointed at something at least 8 inches away, then the messages are significantly reduced or stop completely for a few seconds. As soon as the camera is moved/moving, the messages are plentiful. The fact that it gets worse with movement seems to point toward taking too long to process. Perhaps the ISP (bad pixel removal, auto-exposure, white-balance, something). If I reduce FPS to 20, then I no longer get the overflow messages. However, I'm still getting missed packets errors on the client. However, the errors were one P-frames at 30 FPS, but I-frames at 20 FPS. 
 
<Terminal Ouptut>
Nx_RTP_RTSP_Server application started..
evision ST-AE v1.0.3
evision AWB v1.0.3
STM32 IpAddress: 10.20.30.100
RTSP server started!
RTSP request received: DESCRIBE.
Setup Video (track 0)..
RTSP request received: PLAY.
Video Overflow - Skip Frame
Video Overflow - Skip Frame
Video Overflow - Skip Frame
==== RTCP Receive report ====
Video Session: 1
Video Overflow - Skip Frame
==== RTCP Receive report ====
Video Session: 1
==== RTCP Receive report ====
Video Session: 1
Video Overflow - Skip Frame
==== RTCP Receive report ====
 
<Snip from FFPLAY terminal at 30 FPS>
[h264 @ 000001d865abe640] error while decoding MB 1 44, bytestream -8
[h264 @ 000001d865abe640] concealing 108 DC, 108 AC, 108 MV errors in P frame
[rtsp @ 000001d85f57a3c0] max delay reached. need to consume packet
[rtsp @ 000001d85f57a3c0] RTP: missed 1 packets
[rtsp @ 000001d85f57a3c0] max delay reached. need to consume packet
[rtsp @ 000001d85f57a3c0] RTP: missed 1 packets
[h264 @ 000001d865831900] error while decoding MB 27 44, bytestream -6
[h264 @ 000001d865831900] concealing 82 DC, 82 AC, 82 MV errors in P frame
[rtsp @ 000001d85f57a3c0] max delay reached. need to consume packet
[rtsp @ 000001d85f57a3c0] RTP: missed 1 packets
[h264 @ 000001d865831ec0] left block unavailable for requested intra mode
[h264 @ 000001d865831ec0] error while decoding MB 0 42, bytestream 2268
[h264 @ 000001d865831ec0] concealing 229 DC, 229 AC, 229 MV errors in P frame
[rtsp @ 000001d85f57a3c0] max delay reached. need to consume packet
[rtsp @ 000001d85f57a3c0] RTP: missed 1 packets
[h264 @ 000001d865832800] error while decoding MB 8 44, bytestream -12
[h264 @ 000001d865832800] concealing 101 DC, 101 AC, 101 MV errors in P frame
[rtsp @ 000001d85f57a3c0] max delay reached. need to consume packet
[rtsp @ 000001d85f57a3c0] RTP: missed 1 packets
[h264 @ 000001d865832bc0] left block unavailable for requested intra mode
[h264 @ 000001d865832bc0] error while decoding MB 0 43, bytestream 2676
[h264 @ 000001d865832bc0] concealing 169 DC, 169 AC, 169 MV errors in P frame
[rtsp @ 000001d85f57a3c0] max delay reached. need to consume packet
[rtsp @ 000001d85f57a3c0] RTP: missed 1 packets
[h264 @ 000001d866431f00] error while decoding MB 58 44, bytestream -26
[h264 @ 000001d866431f00] concealing 51 DC, 51 AC, 51 MV errors in P frame
[h264 @ 000001d865abdfc0] Invalid NAL unit 0, skipping.  0B
 
<Snip from FFPLAY terminal at 20 FPS>
[h264 @ 000001f94cdad040] concealing 222 DC, 222 AC, 222 MV errors in I frame
[rtsp @ 000001f94644a900] max delay reached. need to consume packet
[rtsp @ 000001f94644a900] RTP: missed 1 packets
[h264 @ 000001f94d0b5b80] error while decoding MB 14 42, bytestream -13
[h264 @ 000001f94d0b5b80] concealing 215 DC, 215 AC, 215 MV errors in I frame
[rtsp @ 000001f94644a900] max delay reached. need to consume packet
[rtsp @ 000001f94644a900] RTP: missed 1 packets
[h264 @ 000001f9464a4300] error while decoding MB 28 42, bytestream -8
 
Other Notes:
 
I'm sure some astute reader will likely comment that I should build and test in release mode. Well, I did. The performance is significantly worse. I currently have no explanation as to why that is. It just is. 
 
This particular example is using the hardware handshake (aka slice) encoding mode rather than full frame encoding mode. It would be nice to have another example or the option in this one to switch between modes. 
 
The ST documentation for the STM32N6 purports that this part is capable of 1080p at 20 FPS. Some sources even tout 30 FPS. However, the provided example is failing to achieve 720p at 20 FPS. I would appreciate some support and collaboration from ST ( @DanielS ;) ) to resolve these issues to achieve the claimed performance levels. 

 

0 REPLIES 0