kv-cache-quantization

Star

Here are 11 public repositories matching this topic...

Zefan-Cai / Awesome-LLM-KV-Cache

Star

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

kv-cache llm kv-cache-quantization kv-cache-compression

Updated Jun 17, 2026

jaylfc / taOS

Sponsor

Star

Self-hosted AI agent OS. Your memory, chat, agents, and files stay on hardware you own, offline by default, cloud by choice. Offline AI memory (taOSmd), self-hosted multi-framework group chat, a full web desktop + app store, and auto-clustering across the consumer hardware you already have (Orange/Raspberry Pi, Mac mini, gaming PC).

raspberry-pi privacy offline-first distributed-computing self-hosted orange-pi ai-agents data-sovereignty ai-platform local-first agent-framework apple-silicon llm vllm local-llm llm-inference kv-cache-quantization rockchip-npu turboquant

Updated Jun 22, 2026
Python

shadowpa0327 / Palu

Star

[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection

mla kv-cache-quantization deepseek kv-cache-compression

Updated Feb 20, 2025
Python

suraj-ranganath / kv-quant-longhorizon

Star

Empirical study of KV-cache quantization in self-forcing video generation

quantization kv-cache-quantization autoregressive-video-generation

Updated Mar 14, 2026
Python

Henvezz95 / VAR-Compressor

Star

W4A4 and INT8 KV-cache quantization for Infinity VAR models. Optimized for high-fidelity generative AI deployment on edge GPUs (e.g. NVIDIA Jetson).

computer-vision pytorch gpu-acceleration quantization model-compression nvidia-jetson inference-optimization edge-ai on-device-ml weight-quantization post-training-quantization autoregressive-models generative-ai kv-cache-quantization activation-quantization visual-autoregressive-model svdquant infinity-var

Updated Jun 11, 2026
Python

IonDen / mlx-quant-fidelity

Sponsor

Star

Measure MLX quantization quality loss — KL divergence, perplexity, top-token agreement for KV cache and weights

python machine-learning metal diagnostics quantization mlx kl-divergence perplexity model-quantization kv-cache apple-silicon llm llm-inference llm-eval kv-cache-quantization mlx-lm quantization-quality

Updated Jun 18, 2026
Python

gvillines-hub / nd-kv-quant

Star

Evaluation harness and norm-direction method for KV cache compression. Cross-model worst-case quality metrics.

inference llama quantization mistral tranformer kv-cache llm qwen kv-cache-quantization

Updated May 16, 2026
Python

NeuraLiying / TurboQuant

Star

Reproduction of TurboQuant.

transformers quantization kv-cache large-language-models llm-inference kv-cache-quantization polarquant

Updated Jun 19, 2026
Python

eplt / mlx-kvarn

Star

MLX-native port of KVarN — variance-normalized KV-cache quantization for Apple Silicon. 3.3× compression at 71% FP16 speed, matches FP16 on GSM8K within ~2%.

mac metal transformer quantization mlx kv-cache apple-silicon llm llm-inference kv-cache-quantization apple-ml

Updated Jun 12, 2026
Python

thupalo / llama-on-dgx-spark

Star

Deploy Nemotron 3 Nano 30B with 1M context window on NVIDIA DGX Spark using llama.cpp (Blackwell sm_121, Q4_0 KV cache quantization)

cuda inference aarch64 mamba mixture-of-experts blackwell long-context llama-cpp local-llm gguf kv-cache-quantization nemotron nvidia-dgx-spark 1m-context-window

Updated Mar 22, 2026
Shell

ersiner / kvqbench

Star

Neutral tensor-level KV-cache quantization benchmark: 11 methods as first-class peers, per-metric leaderboards, matched-budget Pareto, multi-seed CI, and per-method fidelity tags.

benchmark machine-learning pytorch quantization inference-optimization kv-cache llm llm-inference kv-cache-quantization

Updated Jun 8, 2026
Python

Improve this page

Add a description, image, and links to the kv-cache-quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the kv-cache-quantization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv-cache-quantization

Here are 11 public repositories matching this topic...

Zefan-Cai / Awesome-LLM-KV-Cache

jaylfc / taOS

shadowpa0327 / Palu

suraj-ranganath / kv-quant-longhorizon

Henvezz95 / VAR-Compressor

IonDen / mlx-quant-fidelity

gvillines-hub / nd-kv-quant

NeuraLiying / TurboQuant

eplt / mlx-kvarn

thupalo / llama-on-dgx-spark

ersiner / kvqbench

Improve this page

Add this topic to your repo