cancel
Showing results for 
Search instead for 
Did you mean: 

Help needed for tuning 64Mb HYPERRAM (S27KL0642DPBHI020) on STM32H735IGT6

unsigned_char_array
Senior III

I need help with configuring HYPERRAM for my STM32 on a custom PCB. The HYPERRAM pins are about 32mm away from the MCU pins.
The HYPERRAM works, but I get strange flickering on the display even with static data in RAM which I think comes memory errors of the external RAM. My memory passes some basic memory tests, but I think things go wrong when the LTDC peripheral reads from the memory at high speeds. I could be wrong about this, but I need to tune the configuration of the memory anyway. I want to exclude this possibility.

There are so many settings and contradictory advice that I'm overwhelmed.

Turning on D-cache for the HYPERRAM doesn't help.

If I calibrate the delay block the memory tests fail. If I don't calibrate it, it doesn't fail. If I disable it it also fails.
I've implemented calibration according to the following forum post: https://community.st.com/t5/stm32-mcus-products/stm32h7-octospi-mode-hyperbus-hyperram-access-and-delay-block/m-p/143244/highlight/true#M27659

I can read the registers from the memory chip (ID0, ID1, CR0, CR1) and the values match the default values from the datasheet. But I'm not able to successfully write to a register. I want to set the output driver impedance to 46 Ohmin CR0 (instead of the default 34 Ohm) since our traces have an impedance of 50 Ohm. I don't know if this is going to improve the situation, but it would be good to know how I can set this value.

RAM is connected to OSPI2. Pins are configured with highest drive strength since I want to run the memory at 200MHz eventually. I'm now running it lower to make testing with a logic analyzer easier.

For now I clock the OSPI2 peripheral at 332MHz (333 is the max frequency for peripheral).
This allows me to use a prescaler of 2 to get 166MHz or a prescaler of 4 to get 83MHz. These are the frequencies I want to get working first (with delay block calibration). You need at least a prescaler of 2 (register value of 1) to enable DHQC so I won't be able to run the OSPI at 200MHz with DHQC.

Here is my configuration:

This configuration works, but not if I calibrate the delay block or try to write to CR0 (these blocks have been defined off).

 

 

/* OCTOSPI2 init function */
void MX_OCTOSPI2_Init(void)
{

  /* USER CODE BEGIN OCTOSPI2_Init 0 */
	HAL_Delay(1); // 1-2ms delay after power on
	HAL_GPIO_WritePin(MCU_HYPERBUS_NRESET_GPIO_Port, MCU_HYPERBUS_NRESET_Pin, 1); //de-assert reset
	HAL_Delay(1); // 1-2ms delay after reset
  /* USER CODE END OCTOSPI2_Init 0 */

  OSPIM_CfgTypeDef sOspiManagerCfg = {0};
  OSPI_HyperbusCfgTypeDef sHyperBusCfg = {0};

  /* USER CODE BEGIN OCTOSPI2_Init 1 */

  // calibrate delay block:
#if 0
  //same config as below
  hospi2.Instance = OCTOSPI2;
  hospi2.Init.FifoThreshold = 1;
  hospi2.Init.DualQuad = HAL_OSPI_DUALQUAD_DISABLE;
  hospi2.Init.MemoryType = HAL_OSPI_MEMTYPE_HYPERBUS;
  hospi2.Init.DeviceSize = 23;
  hospi2.Init.ChipSelectHighTime = 1;
  hospi2.Init.FreeRunningClock = HAL_OSPI_FREERUNCLK_DISABLE;
  hospi2.Init.ClockMode = HAL_OSPI_CLOCK_MODE_0;
  hospi2.Init.WrapSize = HAL_OSPI_WRAP_NOT_SUPPORTED;
  hospi2.Init.ClockPrescaler = 4;
  hospi2.Init.SampleShifting = HAL_OSPI_SAMPLE_SHIFTING_NONE;
  hospi2.Init.DelayHoldQuarterCycle = HAL_OSPI_DHQC_ENABLE;
  hospi2.Init.ChipSelectBoundary = 0;
  hospi2.Init.DelayBlockBypass = HAL_OSPI_DELAY_BLOCK_USED;
  hospi2.Init.MaxTran = 0;
  hospi2.Init.Refresh = 83;

  // override clock setting:
  hospi2.Init.FreeRunningClock = HAL_OSPI_FREERUNCLK_ENABLE;
  hospi2.Init.Refresh = 0;

  if (HAL_OSPI_Init(&hospi2) != HAL_OK)
  {
	  Error_Handler() ;
  }

  if (DelayBlock_Enable(DLYB_OCTOSPI2) != HAL_OK)
  {
	  Error_Handler();
  }

  HAL_OSPI_DeInit(&hospi2);
#endif


  /* USER CODE END OCTOSPI2_Init 1 */
  hospi2.Instance = OCTOSPI2;
  hospi2.Init.FifoThreshold = 1;
  hospi2.Init.DualQuad = HAL_OSPI_DUALQUAD_DISABLE;
  hospi2.Init.MemoryType = HAL_OSPI_MEMTYPE_HYPERBUS;
  hospi2.Init.DeviceSize = 23;
  hospi2.Init.ChipSelectHighTime = 1;
  hospi2.Init.FreeRunningClock = HAL_OSPI_FREERUNCLK_DISABLE;
  hospi2.Init.ClockMode = HAL_OSPI_CLOCK_MODE_0;
  hospi2.Init.WrapSize = HAL_OSPI_WRAP_NOT_SUPPORTED;
  hospi2.Init.ClockPrescaler = 4;
  hospi2.Init.SampleShifting = HAL_OSPI_SAMPLE_SHIFTING_NONE;
  hospi2.Init.DelayHoldQuarterCycle = HAL_OSPI_DHQC_ENABLE;
  hospi2.Init.ChipSelectBoundary = 0;
  hospi2.Init.DelayBlockBypass = HAL_OSPI_DELAY_BLOCK_USED;
  hospi2.Init.MaxTran = 0;
  hospi2.Init.Refresh = 83;
  if (HAL_OSPI_Init(&hospi2) != HAL_OK)
  {
    Error_Handler();
  }
  sOspiManagerCfg.ClkPort = 2;
  sOspiManagerCfg.DQSPort = 2;
  sOspiManagerCfg.NCSPort = 2;
  sOspiManagerCfg.IOLowPort = HAL_OSPIM_IOPORT_2_LOW;
  sOspiManagerCfg.IOHighPort = HAL_OSPIM_IOPORT_2_HIGH;
  if (HAL_OSPIM_Config(&hospi2, &sOspiManagerCfg, HAL_OSPI_TIMEOUT_DEFAULT_VALUE) != HAL_OK)
  {
    Error_Handler();
  }
  sHyperBusCfg.RWRecoveryTime = 7;
  sHyperBusCfg.AccessTime = 7;
  sHyperBusCfg.WriteZeroLatency = HAL_OSPI_LATENCY_ON_WRITE;
  sHyperBusCfg.LatencyMode = HAL_OSPI_FIXED_LATENCY;
  if (HAL_OSPI_HyperbusCfg(&hospi2, &sHyperBusCfg, HAL_OSPI_TIMEOUT_DEFAULT_VALUE) != HAL_OK)
  {
    Error_Handler();
  }
  /* USER CODE BEGIN OCTOSPI2_Init 2 */

  OSPI_HyperbusCmdTypeDef sCommand = {0};
  OSPI_MemoryMappedTypeDef sMemMappedCfg = {0};
//
  volatile HAL_StatusTypeDef status;


  volatile uint16_t ID0=0;
  volatile uint16_t ID1=0;
  volatile uint16_t CR0=0;
  volatile uint16_t CR1=0;


  sCommand.AddressSpace = HAL_OSPI_REGISTER_ADDRESS_SPACE;
  sCommand.AddressSize  = HAL_OSPI_ADDRESS_32_BITS; // HAL_OSPI_ADDRESS_32_BITS
  sCommand.DQSMode      = HAL_OSPI_DQS_ENABLE;

#if 0
 ///write to CR0
  volatile uint16_t CR0_new = 0x8f2f | (3<<12);
  sCommand.Address      = 0x800*2; //CR0
  sCommand.NbData       = 2;
  if (HAL_OSPI_HyperbusCmd(&hospi2, &sCommand, HAL_OSPI_TIMEOUT_DEFAULT_VALUE) != HAL_OK)
  {
    Error_Handler();
  }
  status = HAL_OSPI_Transmit(&hospi2, (uint8_t*)&CR0_new, HAL_OSPI_TIMEOUT_DEFAULT_VALUE);
#endif

  // read registers
  sCommand.Address      = 0*2; //ID0
  sCommand.NbData       = 2;
  if (HAL_OSPI_HyperbusCmd(&hospi2, &sCommand, HAL_OSPI_TIMEOUT_DEFAULT_VALUE) != HAL_OK)
  {
    Error_Handler();
  }
  status = HAL_OSPI_Receive(&hospi2, (uint8_t*)&ID0, HAL_OSPI_TIMEOUT_DEFAULT_VALUE);

  sCommand.Address      = 1*2; //ID1
  sCommand.NbData       = 2;
  if (HAL_OSPI_HyperbusCmd(&hospi2, &sCommand, HAL_OSPI_TIMEOUT_DEFAULT_VALUE) != HAL_OK)
  {
    Error_Handler();
  }
  status = HAL_OSPI_Receive(&hospi2, (uint8_t*)&ID1, HAL_OSPI_TIMEOUT_DEFAULT_VALUE);


  sCommand.Address      = 0x800*2; //CR0
  sCommand.NbData       = 2;
  if (HAL_OSPI_HyperbusCmd(&hospi2, &sCommand, HAL_OSPI_TIMEOUT_DEFAULT_VALUE) != HAL_OK)
  {
    Error_Handler();
  }
  status = HAL_OSPI_Receive(&hospi2, (uint8_t*)&CR0, HAL_OSPI_TIMEOUT_DEFAULT_VALUE);

  sCommand.Address      = 0x801*2; //CR1
  sCommand.NbData       = 2;
  if (HAL_OSPI_HyperbusCmd(&hospi2, &sCommand, HAL_OSPI_TIMEOUT_DEFAULT_VALUE) != HAL_OK)
  {
    Error_Handler();
  }
  status = HAL_OSPI_Receive(&hospi2, (uint8_t*)&CR1, HAL_OSPI_TIMEOUT_DEFAULT_VALUE);



  sCommand.AddressSpace = HAL_OSPI_MEMORY_ADDRESS_SPACE;
  sCommand.AddressSize  = HAL_OSPI_ADDRESS_24_BITS;
  sCommand.DQSMode      = HAL_OSPI_DQS_ENABLE;
  sCommand.Address      = 0;
  sCommand.NbData       = 1;

  if (HAL_OSPI_HyperbusCmd(&hospi2, &sCommand, HAL_OSPI_TIMEOUT_DEFAULT_VALUE) != HAL_OK)
  {
    Error_Handler();
  }


  sMemMappedCfg.TimeOutActivation = HAL_OSPI_TIMEOUT_COUNTER_DISABLE;

  if (HAL_OSPI_MemoryMapped(&hospi2, &sMemMappedCfg) != HAL_OK)
  {
    Error_Handler();
  }


  //memory tests
  externalRamValid =  memoryTests16BitTestAll((void*)EXTERNAL_RAM_START_ADDRESS, EXTERNAL_RAM_SIZE_BYTES, hospi2.Init.DeviceSize-1, 9, &memoryTestResult);

  if (!externalRamValid)
  {
	  Error_Handler();
  }

  /* USER CODE END OCTOSPI2_Init 2 */

}

 

 

 

 

 

Kudo posts if you have the same problem and kudo replies if the solution works.
Click "Accept as Solution" if a reply solved your problem. If no solution was posted please answer with your own.
10 REPLIES 10
unsigned_char_array
Senior III

(I still haven't got the LYB_OSPI_NOR_FastTuning/OSPI_PSRAM_MemoryMapped algorithm working. But I'll park that for now. If anyone knows how to get that working let me know.)

Here is the summary of what I needed to do to get everything to work for the s27kl0642dpbhi020 HyperRAM with the STM32H735IG with a 1024x600 display:

  • On DS13312 Rev 4, page 194 it says the max OSPI output clock frequency is 100MHz
  • The max OSPI peripheral clock frequency is 333 MHz(according to STM32CubeMX clock config), and you need a divider of at least 2 to get Delay hold quarter cycle working(RM0468 Rev 3, page 917), so I run it at 200MHz with divider 2.
  • The HyperRAM is by default configured to have an output drive strength of 34 ohm, but it can be set to 46 ohm, this can be configured in CR0. I've set this to 46 ohm since my traces have an impedance of 50 ohm.
  • The HyperRAM is by default configured to have an Initial latency of 7 clock cycles, but below 104MHz it can be set to 4 clock cycles in CR0. I've set this lower to increase performance.
  • In order to write to a configuration register of the RAM you need to set sHyperBusCfg.WriteZeroLatency = HAL_OSPI_NO_LATENCY_ON_WRITE
  • In order to write to a configuration register of the RAM you need to set sCommand.DQSMode = HAL_OSPI_DQS_DISABLE
  • Register address from the HyperRAM datasheet has to be multiplied by 2, NbData needs to be 2 bytes.
  • To get higher performance you need to enable D-cache and also enable the MPU and enable cache for external RAM (you can find an example of this configuration in TouchGFX examples for stm32h735g-dk)
  • If you enable D-cache you need to clear it prior to the DMA2D accessing it. You need to modify TouchGFXHAL::flushFrameBuffer to call this. Use the_by_Addr variants to only clean the framebuffer in use and not the complete cache. In my case either SCB_CleanDCache_by_Addr or SCB_InvalidateDCache_by_Addr work. I'm not sure which one is needed or if I need SCB_CleanInvalidateDCache_by_Addr. I no longer see artifacts if I enable it.
  • You can read CR1 to check the refresh interval. If it is higher it means you don't have to refresh as often, this can slightly increase performance.
  • I tuned the LTDC porches. I reduced horizontal back and front porches and reduced vertical back porch and increased vertical front porch. This way lockDMAToFrontPorch can be used and it will have more time before the LTDC starts accessing the memory. I'm not using lockDMAToFrontPorch at the moment.
  • I calibrate the delay block. I base it on the example DLYB_OSPI_PSRAM_ExhaustiveTuning. This simply reinitializes the delay block unit value and runs memory tests to see if it works. It applies the middle value which is between the two extremes that work. I realize this may not work if you need more than 1 delay block (sel >1) as delay is not proportional to unit or the product of unit and sel. I did not get other calibration methods working ( I get way to high values). I checked the typical delay values in the datasheet to verify if the settings make sense: DS13312 Rev 4, page 196.
  • If you are using Ethernet do not set all of the SRAM to shared. This will slow down the performance of the memory and lead to slow rendering. Ethernet DMA requires its descriptors to be in a part of SRAM that is marked as shared by the MPU.
  • In RM0468 Rev 3, page 943 it says refresh for reads occurs every refresh+4 cycles, so I set the refresh value to 100-4 or 400-4 for respectively a 1us or 4us refresh period.
  • Forcebly reset the OSPI peripheral in OCTOSPI2_MspInit using __HAL_RCC_OSPI2_FORCE_RESET() and __HAL_RCC_OSPI2_RELEASE_RESET(). Otherwise register values will not be correct after deinit + re-init.
  • Do memory performance tests. I got about 101MByte/s write speed and 158MByte/s read speed and 47MByte/s for read-modify-write. With incorrect configurations you can get less than half of that. It might still work, but you get less performance. You will never get 100% of the theoretical bandwidth of 200MByte/s because of various overheads.

 

Kudo posts if you have the same problem and kudo replies if the solution works.
Click "Accept as Solution" if a reply solved your problem. If no solution was posted please answer with your own.