2022-10-12 09:36 AM
I've implemented a basic performance test to assess SDRAM speed and came across a weird issue. The test is trivial: sequential write followed by a sequential read of a 16Mb block. With writes I get solid 1-1.2 SDRAM clock cycles per operation. A read however takes 6 to 8(!) cycles which is twice more than CAS+1 that I would expect without any read pipelining. Switching RBURST (read FIFO) on and off has little effect. It definitely does something, because incremental and decremental reads are affected differently: the former is almost unaffected but the latter gets worse by ~10% with RBURST on. This "something" is far from what one would expect from the feature which is anticipating and pipelining reads. The manual says that FMC should do this even with single AXI requests. This is definitely not what's happening here or at least not happening properly.
Enabling DCache and remapping SDRAM to a cacheable location helps but doesn't solve the problem completely. I still get 2-2.3 SDRAM cycles per operation. This number can't even be explained by extra CAS+1 latency between consecutive linefills, row activation latency and extra CPU cycles for non-load instructions.
I've already spent days playing around with timings, FMC setiings and invoking all the arcane magic I know to no avail.
All measurements were performed by sampling CYCCNT right before and after the read and write loops and the loops themselves are trivial only consisting of 3 instructions each. All operations are done with 32-bit words and the SDRAM chip is 32-bit as well. I've also validated the method by running the same code on AXI SRAM yielding the expected result: the number of instructions per iteration was exactly the same as the number of instructions suggesting that loads/stores themselves were done in 1 CPU cycle.
Write loop assembly (produced by GCC):
800cd84: f844 3b04 str.w r3, [r4], #4
800cd88: f1b4 4fc2 cmp.w r4, #1627389952 ; 0x61000000
800cd8c: d1fa bne.n 800cd84
Read loop assembly:
800cc18: f850 3b04 ldr.w r3, [r0], #4
800cc1c: 4281 cmp r1, r0
800cc1e: d8fb bhi.n 800cc18
System configuration: