AnsweredAssumed Answered

How to improve performance of linear interpolated table read

Question asked by J.Tobbe on Mar 20, 2017
Latest reply on Mar 22, 2017 by waclawek.jan



I'm reading 1025 index float lookup tables where the last item is for looping the data. The input to the function is a 32-bit phase number which is converted to 10-bit for indexing and a 22-bit rest for interpolation.


This is the function

float readinterpolated(const uint32_t x, const float *datapt)
     uint32_t coarse=x>>22;
     float fine=(x%4194304)/4194304.0f;
     return datapt[coarse]+(datapt[coarse+1]-datapt[coarse])*fine;


and the generated assembly code is

    lsrs    r3, r0, #22    @ coarse, x,
    ubfx    r2, r0, #0, #22    @ D.12138, x,,
    add    r1, r1, r3, lsl #2    @ tmp127, datapt, coarse,
    vmov    s15, r2    @ int    @ D.12138, D.12138
    vldr.32    s0, [r1]    @ D.12139, *_8
    vcvt.f32.s32    s15, s15, #22    @ fine, D.12138,
    vldr.32    s14, [r1, #4]    @ *_12, *_12
    vsub.f32    s14, s14, s0    @ D.12139, *_12, D.12139
    vfma.f32    s0, s15, s14    @, fine, D.12139
    bx    lr    @


Is this the fastest way of doing it? I'm using it in a DSP application to generate waveforms and I need the optimum performance possible since my application is running near full capacity now without all features added to it.