2023-01-14 08:03 AM
I want to write simple c=a+b level code that will always give the same result, but use the maximum of possibilities.
Only 4 direct execution options are available for your architecture
typedef int8_t vi8_4 __attribute__ ((vector_size(4),aligned(4)));
typedef int16_t vi16_2 __attribute__ ((vector_size(4),aligned(4)));
typedef uint8_t vu8_4 __attribute__ ((vector_size(4),aligned(4)));
typedef uint16_t vu16_2 __attribute__ ((vector_size(4),aligned(4)));
And set at vector_size (4*n) through the built-in loop.
GCC recognizes vectors for A8 and above from the NEON library. It turns out very nice and compact.
And you get it https://godbolt.org/z/asbKj4rns