Performance Penalty - On accessing a Array in a structure vs a simple array

hariprasad · ‎2015-10-12

Posted on October 12, 2015 at 18:59

Hi

I want to know whether there is penalty on accessing an array from a structure vs accessing by a simple array. Here is my snippet which i used when I faced the problem

unsigned 
char
array_1[10240];
typedef
struct
{
int
header;
unsgined 
char
array_2[10240]
}simple_queue;
typedef
struct
{
int
top;
simple_queue simple_arr[3]
}simple_stack;
simple_stack struct_var1;
....some file open ....
f_write
(&fp, &struct_var1.simple_arr.array_2[0], 10240);
f_write
(&fp, &array_1[0], 10240);
....

So this is a file operation function which works perfectly and I could see that the performance was varying in the both steps. What I could Observe in the benchmark test was the first f_write took 50ms time to complete , while the second f_write took only 2ms So all that differs while comparing the two f_write are , one uses array in a structure and other uses a simple array.. Please explain me about this behavior. #memory-organization #stm32f429

waclawek.jan · ‎2015-10-12

Posted on October 12, 2015 at 19:29

> So all that differs while comparing the two f_write are , one uses array in a structure and other uses a simple array..

... and the state of the file metadata (e.g. FAT table), and the state of the medium where you are writing (e.g. cache in a SD card full from previous write)...

The two arrays may be aligned differently in the mcu memory, but it's unlikely that would make such a difference.

JW

jpeacock · ‎2015-10-12

Posted on October 12, 2015 at 19:37

You are measuring the sum of two variables in your benchmark, array access and file write time. Use a generated test pattern instead of file I/O to reduce your benchmark to a single variable.

Your time difference could easily be from the file I/O. If it's flash you may be seeing a sector erase taking place.

Jack Peacock

hariprasad · ‎2015-10-12

Posted on October 13, 2015 at 08:55

Did some debugging today removing the f_write. So I used a simple memcpy() operation

unsigned char array_1[10240];

unsigned char array_2[10240];

typedef struct

{

unsgined char array_2[10240]

}simple_queue;

typedef struct

{

simple_queue simple_arr[3]

}simple_stack;

simple_stack struct_var1;

... ....

memcpy(char array_2, char array_1, 10240 );

memcpy(&struct_var1.simple_arr.array_2[0], char array_1, 10240 );

....

So I could see that first memcpy() took 90microsec, while the second mempy() took 310microsec.

I changed the

unsigned char

to

int and I could see that each memcpy() took 40microsec. So I assume this is related to compiler optimization.. Can someone please explain to me how compiler optimization can lead to this behaviour?

waclawek.jan · ‎2015-10-13

Posted on October 13, 2015 at 09:04

Is this a Cortex-M0/M0+ device? Those don't allow unaligned accesses at all, and the compiler might be aware of the alignment of the source/target.

Also, are both arrays in the same memory?

JW

stm322399 · ‎2015-10-13

Posted on October 13, 2015 at 09:31

you got it ! this is something that happens with memcpy. As f_write also uses memcpy, you can observe the same penalties.

The memcpy routine is provided by the C library, and depending on your library provider, this memcpy migth uses some optimisations based on source and target alignements. In other word, when it is possible, bytes are moved by 4 or even 8 for most of loads and stores. Otherwise bytes are moved one by one. Of course both cases ends up in very different performance.

How it is connected to your data ? Alignment. Yes try the following to understand how it works:

memcpy (dst, src+0, count)

memcpy (dst, src+1, count)

memcpy (dst, src+2, count)

memcpy (dst, src+3, count)

memcpy (dst, src+4, count)

Normally there should be a difference.

When +0 and +4 cases exhibit a significative difference, try to run until +8.

waclawek.jan · ‎2015-10-13

Posted on October 13, 2015 at 19:46

Laurent,

> you got it !

Memory access alone would explain a (310-40)us difference, not the (50-2)ms difference.

JW

stm322399 · ‎2015-10-13

Posted on October 13, 2015 at 20:17

Oh, you are certainly right. Who knows what's between f_write and memcpy that explains a 50-2 ms difference !