Top Experiments
| Title | Result | Confidence | Repro | Metrics |
|---|---|---|---|---|
|
turbo4 K Q pre-rotation bug fix
turbo4 K produces garbage because Q pre-rotation guard only checks TURBO3_0, not TURBO4_0
|
success |
1/5
|
||
|
TurboQuant vs rotated q4_0/q8_0 (upstream PR #21038)
turbo3 at 3.5 bpv competes with upstream rotated q4_0 at 4.5 bpv
|
success |
1/5
|
||
|
Dequant optimization — non-vec FA kernel (FAILED)
Forcing non-vectorized FA kernel (nl=2) improves single-token decode
|
failure |
1/5
|
||
|
Native VEC decode (scalar dequant in attention kernel)
Reading turbo3 directly in VEC attention kernel (scalar dequant, no fp16 buffer) saves 5x bandwidth by avoiding fp16 materialization
|
negative |
1/5
|
||
|
Multi-sequence (n_seq > 1) dequant fix
turbo dequant-to-fp16 kernels ignore stream dimension ne[3], causing catastrophic PPL with n_seq > 1
|
success |
1/5
|