I tried to completely redo the benchmark from scratch on a new project, and now I measure correct throughput for both ADD and ORR. I still don't understand what happened, but now I consider the issue as solved.Sorry for the disturbance and thank you ...
Well, I checked the assembly generated and it looks exactly like I expect. In both cases, 32-bit instructions were used so I don't expect the instruction alignment to be different in the 2 cases.Everything is in register in this benchmark, so there i...