cancel
Showing results for 
Search instead for 
Did you mean: 

Performance Penalty - On accessing a Array in a structure vs a simple array

hariprasad
Associate III
Posted on October 12, 2015 at 18:59

Hi

I want to know whether there is penalty on accessing an array from a structure vs accessing by a simple array. Here is my snippet which i used when I faced the problem

unsigned 
char
array_1[10240];
typedef
struct
{
int
header;
unsgined 
char
array_2[10240]
}simple_queue;
typedef
struct
{
int
top;
simple_queue simple_arr[3]
}simple_stack;
simple_stack struct_var1;
....some file open ....
f_write
(&fp, &struct_var1.simple_arr.array_2[0], 10240);
f_write
(&fp, &array_1[0], 10240);
....

So this is a file operation function which works perfectly and I could see that the performance was varying in the both steps. What I could Observe in the benchmark test was the first f_write took 50ms time to complete , while the second f_write took only 2ms So all that differs while comparing the two f_write are , one uses array in a structure and other uses a simple array.. Please explain me about this behavior. #memory-organization #stm32f429
This discussion is locked. Please start a new topic to ask your question.
7 REPLIES 7
waclawek.jan
Super User
Posted on October 12, 2015 at 19:29

> So all that differs while comparing the two f_write are , one uses array in a structure and other uses a simple array..

... and the state of the file metadata (e.g. FAT table), and the state of the medium where you are writing (e.g. cache in a SD card full from previous write)...

The two arrays may be aligned differently in the mcu memory, but it's unlikely that would make such a difference.

JW

jpeacock
Associate III
Posted on October 12, 2015 at 19:37

You are measuring the sum of two variables in your benchmark, array access and file write time.  Use a generated test pattern instead of file I/O to reduce your benchmark to a single variable. 

Your time difference could easily be from the file I/O.  If it's flash you may be seeing a sector erase taking place.

  Jack Peacock
hariprasad
Associate III
Posted on October 13, 2015 at 08:55

Did some debugging today removing the f_write. So I used a simple memcpy() operation

unsigned char array_1[10240];

unsigned char array_2[10240];

typedef struct

{

unsgined char array_2[10240]

}simple_queue;

typedef struct

{

simple_queue simple_arr[3]

}simple_stack;

simple_stack struct_var1;

... ....

memcpy(char array_2, char array_1, 10240 );

memcpy(&struct_var1.simple_arr.array_2[0], char array_1, 10240 );

....

So I could see that first memcpy() took 90microsec, while the second mempy() took 310microsec.

I changed the

unsigned char

to

int and I could see that each memcpy() took 40microsec.    So I assume this is related to compiler optimization..    Can someone please explain to me how compiler optimization can lead to this behaviour? 

waclawek.jan
Super User
Posted on October 13, 2015 at 09:04

Is this a Cortex-M0/M0+ device? Those don't allow unaligned accesses at all, and the compiler might be aware of the alignment of the source/target.

Also, are both arrays in the same memory?

JW
stm322399
Senior
Posted on October 13, 2015 at 09:31

you got it ! this is something that happens with memcpy. As f_write also uses memcpy, you can observe the same penalties.

The memcpy routine is provided by the C library, and depending on your library provider, this memcpy migth uses some optimisations based on source and target alignements. In other word, when it is possible, bytes are moved by 4 or even 8 for most of loads and stores. Otherwise bytes are moved one by one. Of course both cases ends up in very different performance.

How it is connected to your data ? Alignment. Yes try the following to understand how it works:

memcpy (dst, src+0, count)

memcpy (dst, src+1, count)

memcpy (dst, src+2, count)

memcpy (dst, src+3, count)

memcpy (dst, src+4, count)

Normally there should be a difference.

When +0 and +4 cases exhibit a significative difference, try to run until +8.

waclawek.jan
Super User
Posted on October 13, 2015 at 19:46

Laurent,

> you got it !

Memory access alone would explain a (310-40)us difference, not the (50-2)ms difference.

JW

stm322399
Senior
Posted on October 13, 2015 at 20:17

Oh, you are certainly right. Who knows what's between f_write and memcpy that explains a 50-2 ms difference !