cancel
Showing results for 
Search instead for 
Did you mean: 

Can I put read/write variables in DTCM, when DCache enabled?

leon64
Associate II

If DCache is enabled how then can I update a variable (I think DMA + MDMA) and after processing get results to a peripheral e.g. DAC (again MDMA?)?

Do I need to clear cache?

Will I still have the speed advantage of caching?

I am struggling how to get data into and out of DTCM when DCache is enabled.

For data into DTCM it is my experience data is not updated and for data out of DTCM there is no updated data to export ( because of caching?)

I get data from my ADC via HAL_ADC_Start_DMA and have to start HAL_MDMA_Start_IT on every conversion completed. That seems to work. Is it the best way to go about it?

When I want to get data out of DTCM I use SCB_CleanInvalidateDCache_by_Addr and then update the DAC buffer. That seems to work but I do not trust the validity of the data.

Can anyone point me in the right direction?

17 REPLIES 17
FBL
ST Employee

Hello @leon64​,

Could you please specify the STM32 product you are using ?

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

leon64
Associate II

Thank you,

I am working with a nucleo 144 board, STM32H723ZG

FBL
ST Employee

Hello again @leon64​,

Thank you for your questions and your contribution to ST community.

You need to flush the cache after the data is updated by DMA and before the variable is read by MDMA as explained in AN4839

You may refer to PM0253 to configure the DTCM registers explained in section 4.9.1 or configure an MPU region of DTCM with r/w permissions see section 4.6.8 Updating an MPU region.

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

leon64
Associate II

Thank you @F.Belaid

As mentioned in another post, I studied the examples from STM in particular the MDMA_LinkedList example. As I understand it, it defines the complete DTCM region as r/w . I also read that I can only define only 1 MPU region on the STM32H723. I have a difficult time understanding the concept of background region and I would like to configure part of DTCM as rw and another part as ro. Can you tell me how to configure this, let's say split DTCM into 2 parts? As in the code outline below

// copy code from flash to ITCM
 
// copy data from flash to DTCM
 
// copy interrupt vector table
 
MPU_Config();
 
// enable ICache
 
// enable DCache
..........
static void MPU_Config(void)
{
  MPU_Region_InitTypeDef MPU_InitStruct;
 
  /* Disable the MPU */
  HAL_MPU_Disable();
 
  /* Configure the MPU as Strongly ordered for not defined regions */
  MPU_InitStruct.Enable = MPU_REGION_ENABLE;
  MPU_InitStruct.BaseAddress = 0x20000000;
  MPU_InitStruct.Size = MPU_REGION_SIZE_128KB;
  MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
  MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;
  MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
  MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE;
  MPU_InitStruct.Number = MPU_REGION_NUMBER0;
  MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0;
  MPU_InitStruct.SubRegionDisable = 0x87;
  MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
 
  HAL_MPU_ConfigRegion(&MPU_InitStruct);
 
  /* Enable the MPU */
  HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);
}

And then every time to get data from ADC (DMA circular to D2 domain) to DTCM

// adc_values in D2 for DMA from ADC
// dtcm_data in D1 DTCM
 
MDMA_TransferCompleteCallback(MDMA_HandleTypeDef *hmdma)
{
HAL_MDMA_Start_IT(&hmdma_mdma_channel40_dma1_stream0_tc_0, (uint32_t)adc_values, (uint32_t)dtcm_data, 8, 1);
}

And to get data from DTCM to DAC where dac_values are sent to DAC by DMA (circular buffer)

........
SCB_CleanInvalidateDCache_by_Addr((uint32_t)(&dac_values[dacIndex]), 8);
 
.....
 
dac_values[dacIndex] = NewValue;

Is this about right?

leon64
Associate II

Hello @F.Belaid,

Please be patient with me because I really do not understand things completely. I came up with this code, it seems to work but I do not think it is the best way to go and I am also not sure it works correctly. So here some more questions,

If I am able to split DTCM (by configuring the MPU) into a part that is cached and one that is not....I feel I have NOT to clear the cache.

When my code is running in ITCM (D1 domain) I am able to directly access data in DTCM, D1 and I think D2 (though this will cost extra clock cycles). Is this correct?

If the above is true, why would I then need MDMA to copy data?

And if my ADC runs at say 500kHz and the STM32H723 at 500MHz, I should have about 1000 clock cycles for calculations, correct? It looks like I only can do a couple of hundred multiply/add instructions.

Truly I am just wondering whether I want too much or just not using the correct approach.

Please help me get at least the MPU configuration correct.

leon64
Associate II

@F.Belaid​ 

I Hope I did not offend you in any way. I am at this for almost 1 year trying to get things right. The code I put up a previous post is the best of what I have tried. And I would very much like to have the questions above sorted out.

Anyway, thanks for your replies.

Does it even cache TCM? Or should it?

The point of TCM is that it is close enough to the core that it is as fast as possible.

Use the cache for slower memories on AHB buses, or with higher latency, or wider lines.

DMA doesn't use the cache either, only the MCU, so you just create coherency issues.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
leon64
Associate II

@Community member​ 

Hi thank you for getting involved!

On this chip STM32H723 one can enable caching of ITCM and DTCM. It is not a must but it gives you a tremendous performance boost! So yes, I do want caching enabled and if I could split DTCM into a part that is cached and one that is not, I would have the best performance I can imagine.

FBL
ST Employee

Hi @leon64​,

Sorry for my late response. I suggest this application note which may help you AN4891. May, I ask you how do you measure the performance? As mentioned in the application note, the best performance configuration is indeed as suggested is D1_ITCM - D1_DTCM. However, the interconnect has higher latency. So, it may negatively impact performance. To mitigate this impact, it may be necessary to optimize the path and choose a different memory configuration like the one mentioned in section 4.1.1 Effects of data and instructions locations on performance.

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.