Pure arithmetic (mul+add) to reconstruct centroids without any memory access
7 ALU ops is slower than 4 constant reads on Apple8. Constant memory wins over arithmetic.