Add staged (tiled_reduce_staged) multi-vector MaxSim kernel POC by suri-kumkaran · Pull Request #1200 · microsoft/DiskANN

suri-kumkaran · 2026-06-24T19:24:19Z

Introduce an experimental, generic cache-tiled reduction driver, tiled_reduce_staged, that computes multi-vector MaxSim/Chamfer for any element type and quantization by swapping four pluggable stages (StagedKernel, Postprocess, Reducer, StagedConvert) instead of forking the tiled loop nest per datatype.

Validated with two instantiations:

f32: bit-identical to and on par with the hand-fused V3 kernel (selectable for A/B as MaxSimIsa::X86_64_V3_Staged).
4-bit MinMax-quantized i8: a new datatype added by swapping Stage A + the Acc type + Stage B only; 1.5-4.1x over the per-pair SIMD reference.

The driver owns all scratch, allocated from a caller-supplied ScopedAllocator; zero-allocation steady state is provided by a single-owner resettable bump arena (ResettableArena). A crisp design overview lives in the staged module README (diskann-quantization/.../kernels/staged/README.md).

Also adds benchmark examples (multi-vector-{staged,quant,3way}.json) and the quantized multi-vector benchmark backend.

Introduce an experimental, generic cache-tiled reduction driver, `tiled_reduce_staged`, that computes multi-vector MaxSim/Chamfer for any element type and quantization by swapping four pluggable stages (StagedKernel, Postprocess, Reducer, StagedConvert) instead of forking the tiled loop nest per datatype. Validated with two instantiations: - f32: bit-identical to and on par with the hand-fused V3 kernel (selectable for A/B as MaxSimIsa::X86_64_V3_Staged). - 4-bit MinMax-quantized i8: a new datatype added by swapping Stage A + the Acc type + Stage B only; 1.5-4.1x over the per-pair SIMD reference. The driver owns all scratch, allocated from a caller-supplied ScopedAllocator; zero-allocation steady state is provided by a single-owner resettable bump arena (ResettableArena). A crisp design overview lives in the staged module README (diskann-quantization/.../kernels/staged/README.md). Also adds benchmark examples (multi-vector-{staged,quant,3way}.json) and the quantized multi-vector benchmark backend. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

codecov-commenter · 2026-06-24T19:39:51Z

Codecov Report

❌ Patch coverage is 92.34592% with 77 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.82%. Comparing base (ab967c1) to head (90fc533).

Files with missing lines	Patch %	Lines
...ion/src/multi_vector/distance/kernels/staged/i8.rs	92.35%	35 Missing ⚠️
...ion/src/multi_vector/distance/kernels/staged/v3.rs	91.97%	15 Missing ⚠️
...-quantization/src/multi_vector/distance/factory.rs	84.72%	11 Missing ⚠️
.../src/multi_vector/distance/kernels/staged/arena.rs	85.10%	7 Missing ⚠️
diskann-benchmark/src/inputs/multi_vector.rs	0.00%	2 Missing ⚠️
...kann-quantization/src/multi_vector/distance/isa.rs	0.00%	2 Missing ⚠️
...src/multi_vector/distance/kernels/staged/driver.rs	98.41%	2 Missing ⚠️
...on/src/multi_vector/distance/kernels/staged/mod.rs	96.92%	2 Missing ⚠️
...src/multi_vector/distance/kernels/staged/maxsim.rs	97.87%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1200      +/-   ##
==========================================
+ Coverage   89.79%   89.82%   +0.02%     
==========================================
  Files         488      494       +6     
  Lines       93182    94188    +1006     
==========================================
+ Hits        83676    84604     +928     
- Misses       9506     9584      +78

Flag	Coverage Δ
miri	`89.82% <92.34%> (+0.02%)`	⬆️
unittests	`89.48% <92.34%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
diskann-benchmark/src/multi_vector/mod.rs	`100.00% <ø> (ø)`
...n-quantization/src/multi_vector/distance/kernel.rs	`100.00% <ø> (ø)`
...ntization/src/multi_vector/distance/kernels/mod.rs	`100.00% <ø> (ø)`
...src/multi_vector/distance/kernels/staged/maxsim.rs	`97.87% <97.87%> (ø)`
diskann-benchmark/src/inputs/multi_vector.rs	`19.04% <0.00%> (-0.63%)`	⬇️
...kann-quantization/src/multi_vector/distance/isa.rs	`0.00% <0.00%> (ø)`
...src/multi_vector/distance/kernels/staged/driver.rs	`98.41% <98.41%> (ø)`
...on/src/multi_vector/distance/kernels/staged/mod.rs	`96.92% <96.92%> (ø)`
.../src/multi_vector/distance/kernels/staged/arena.rs	`85.10% <85.10%> (ø)`
...-quantization/src/multi_vector/distance/factory.rs	`80.26% <84.72%> (+1.05%)`	⬆️
... and 2 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add staged (tiled_reduce_staged) multi-vector MaxSim kernel POC#1200

Add staged (tiled_reduce_staged) multi-vector MaxSim kernel POC#1200
suri-kumkaran wants to merge 1 commit into
mainfrom
users/suryangupta/staged_maxsim_kernel

suri-kumkaran commented Jun 24, 2026

Uh oh!

codecov-commenter commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

suri-kumkaran commented Jun 24, 2026

Uh oh!

codecov-commenter commented Jun 24, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants