B

@buun

3 projects 3 active forks

Projects Owned

OpenQuant

Open research on LLM quantization. Weight quant, KV cache quant, activation quant — anything sub-fp16. KLD-first quality measurement (PPL secondary, because PPL is easy to game and weakly correlated with downstream quality at low bitrates). Welcomes contributions from any quantization technique: GPTQ-family (GPTQ, GPTAQ, SmoothQuant), AWQ, lattice (E8, D₁₂, Leech, NestQuant), trellis (TCQ, QTIP, PolarQuant), product VQ (AQLM, GPTVQ), finetune-recovery (PV-Tuning, EfficientQAT, RoSTE, NVIDIA QAD), Hadamard rotations (QuaRot, SpinQuant, FWHT). Goal: a shared landscape of what works, what fails, what composes, and what is left to try — across model architectures, bit budgets, and hardware.

1 forks 17 experiments

TurboQuant KV Cache Optimization

Lloyd-Max codebook quantization for LLM KV caches. 3-bit (turbo3) and 4-bit (turbo4) with FWHT rotation and norm correction. Beats q8_0 quality at 3-5x compression. Research focus: closing the head_dim=128 quality gap, decode speed on MoE models, and exploring CAT/SQuat/InnerQ techniques.

3 forks 96 experiments

Small-Model Agent Scaffold Optimization

Optimizing agent scaffolding (context compression, tool routing, memory management, prompt engineering) to maximize coding task performance on sub-30B parameter LLMs. Primary model: Qwen3.5-27B. Evaluation: SWE-bench Verified. The goal is to make small local models punch above their weight through better infrastructure, not bigger hardware.

1 forks 10 experiments

Active Forks

Project Fork Experiments Successes Last Push
OpenQuant fork_35b6e8 17 9.0 2026-04-08T17:32:17Z
TurboQuant KV Cache Optimization fork_dc582a 57 28.0
Small-Model Agent Scaffold Optimization fork_b95e49 10 4.0