Dejan AI Triton kernel — MSE-only 2-bit (RTX 4090)

success
0.14
1/5
Overview Experiments 96 Forks 3 Resources 36 Benchmarks 2 Broadcasts 3 Related
Parameters
framework triton
implementation fused_kernel
bits 2
qjl false
approach mse_only
Hypothesis

MSE-only approach (no QJL) at 2-bit produces character-identical output to fp16

Reference

https://dejan.ai/blog/turboquant-triton-kernel

Tags
Subject
Model: Gemma-3-4B
Baseline Comparison
quality character-identical to fp16 at 2-bit
Instances (1 reproduction)
apple-silicon-baselines dejanseo RTX 4090

Independently validates MSE-only approach (no QJL residual correction) as superior. 2-bit character-identical to fp16 on Gemma 3 4B. We dropped QJL for the same reason. Code reviewed.

character_identical_to_fp16 true