I'm reading 1025 index float lookup tables where the last item is for looping the data. The input to the function is a 32-bit phase number which is converted to 10-bit for indexing and a 22-bit rest for interpolation.
This is the function
float readinterpolated(const uint32_t x, const float *datapt)
and the generated assembly code is
lsrs r3, r0, #22 @ coarse, x,
ubfx r2, r0, #0, #22 @ D.12138, x,,
add r1, r1, r3, lsl #2 @ tmp127, datapt, coarse,
vmov s15, r2 @ int @ D.12138, D.12138
vldr.32 s0, [r1] @ D.12139, *_8
vcvt.f32.s32 s15, s15, #22 @ fine, D.12138,
vldr.32 s14, [r1, #4] @ *_12, *_12
vsub.f32 s14, s14, s0 @ D.12139, *_12, D.12139
vfma.f32 s0, s15, s14 @, fine, D.12139
bx lr @
Is this the fastest way of doing it? I'm using it in a DSP application to generate waveforms and I need the optimum performance possible since my application is running near full capacity now without all features added to it.