cancel
Showing results for 
Search instead for 
Did you mean: 

Cortex peripheral register access slow?

lc
Associate II
Posted on November 02, 2011 at 12:27

Hello,

I am new to cortex, upgrading a 8051 project to cortex for improved performance. One of the things I'm testing now is SPI. Not via DMA, but simple writing to SPI->DR, wait until byte out and then sending next byte. It works but is very slow. On the oscilloscope I see that writing to the SPI DR takes about 380ns, CPU is F100RB at 24MHZ, prepheral clocks max (I hope). This is very slow, knowing that clocking out the byte itself only takes about 630ns.

Checking if the SPI is not busy and then writing the next byte to SPI DR takes 900ns!!! So far this cortex is much slower than my 8051, very frustrating. Is it not possible to access the peripheral registers in a faster way? I know there are faster techniques (like using DMA), but many projects have a simple function like spi_write(), and in that case it's so slow. This topic is not particular about SPI, but why the peripheral accesss is so slow and if there are trics to make it fast? Cortex being fast, is a hoax so far... hopefully I'm doing something wrong here...

5 REPLIES 5
Posted on November 02, 2011 at 13:46

The rapidity of the SPI output depends on the shift rate, if the writing overhead is too high try 16-bit transfers.

The writes to the APB, as I recall, should be 4 cycles, and will depend on how you have the dividers set up. You should be able to set everything to DIV1 at 24 MHz.

Now, you haven't provided any code, so it's hard to know exactly what/how you're measuring this. I would also anticipate reads/writes to the GPIO's will also take 4 bus cycles.

Also, you don't specific what 8051 we're talking about, the original Intel implementations were pretty slow taking 12 cycles for each machine cycle (ie about 1us)

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
lc
Associate II
Posted on November 02, 2011 at 14:31

Hi Clive,

You wrote: The rapidity of the SPI output depends on the shift rate. Of course but that's not the problem, the problem is not the time to shift out, but the time to simply write to the shift reg data register.

Writing to the GPIO (via BSRR for example) is very fast, about 100ns. That seems to take only 2-3 cycles, probably because of bit-band. But writing to the SPI data register is very slow.

The 8051 was from silabs, executes 70% of instructions in 1 cycle. Takes more cycles for accessing RAM beyond the first 256 bytes (I think 4 cycles). In that respect cortex is faster, but I didn't expect it to be so slow for accessing the peripheral registers.

The Code:

1 SPI_GPIO->BRR = SPI_CS; // CS low

2 SPI1->DR = data; // this takes a long 380ns

3 while (!(SPI1->SR & 0x0002)); // wait until TXE (transmit empty)

4 while (SPI1->SR & 0x0080); // wait until not busy

5 SPI1->DR = data; // start next byte (just for testing)

6 while (!(SPI1->SR & 0x0002)); // wait until TXE (transmit empty)

7 while (SPI1->SR & 0x0080); // wait until not busy

8 SPI_GPIO->BSRR = SPI_CS; // CS high

The time for line 2 can be checked on the scope, trigger on falling CS and see when the clock begins. The time for executing line 2 alone is 380ns. This equals about 9 clock cycles.

Line 3+4+5 together take 630ns, time to clock out data of course NOT included. Time to clock out data depends on spi clock freq.

In this example I could do a 16bit transfer, but that's not the point, this is a TEST for check why things went so slow.

To make a long story short, why does line 2 take 380ns? Is something wrong configured or is this normal with cortex?

Posted on November 02, 2011 at 15:27

The peripherals are two buses away from the core, some of the propagation is masked by the write buffer. I'm not aware of any peripheral register that is special cased, or otherwise delayed, compared to any other interaction on the same bus.

You'd need to provide more details of your test set up (code, configuration, clocks, etc) and test methodology (what/how). Are you for instance actually measuring the latency of writing the DR, to the point where the data is emitted. The DR is moved to the shifter, and immediately flags it as empty, and it's going to be ~8/16 SPI cycles later before the next value in DR appears. A latency of ~666 ns, assuming 8-bit SPI @ 12 MHz

If I were measuring the write speed of DR, I'd probably do so by snap shooting the value of the core's cycle counter on either side of the write on DR. I'd also do it in assembler, because I've got no idea what compiler you're using, or what kind of code it's generating for ''SPI->DR = foo;''

A 25 MHz SiLabs implementation is likely to totally spank a 24 MHz M3 in terms of accessing peripherals. The SiLabs part runs 20-25 times faster than the original 8051 designs, and the peripherals are tightly coupled. Where the M3 will spank the SiLabs part is 32-bit math/registers working on single cycle 32-bit internal SRAM, and more efficient addressing, and linear memory.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
lc
Associate II
Posted on November 02, 2011 at 15:57

Thanks for your reply!

For some reason there is a button ''Show Quoted Messages'', did you click on that?

I think I explained what I measured and how I measured it, don't know what I could explain more. I did not include configuration code. I can do that later today.

You asked: 

 Are you for instance actually measuring the latency of writing the DR, to the point where the data is emitted. Yes, but to the point where it STARTS to emit data (start of first SPI clock). Like I said before of course I did NOT include the time for shifting out data.

Checking the program counter would be a good way, but I have no idea how to do that.

Your info about silabs explains what I actually see. It was not clear to me when reading all the cortex specs and advertisment that accessing peripherals would be so slow.

However, we still have a gap of 5 cycles, you say it should be 4 cycles, while I measure ~9.

I will check the clock setup once more, maybe a third person can confirm whether it should be 4 cycles indeed?

PS. I use the KEIL compiler.

Thanks!
Posted on November 02, 2011 at 16:11

Sorry, didn't see the code. The ''Show Quoted'' gets in the way sometimes depending on how the message was created, or if it thinks there is some separation. It's the stupid forum software.

I'd have to look at what code is getting generated.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..