Skip to content

Add staged (tiled_reduce_staged) multi-vector MaxSim kernel POC#1200

Draft
suri-kumkaran wants to merge 1 commit into
mainfrom
users/suryangupta/staged_maxsim_kernel
Draft

Add staged (tiled_reduce_staged) multi-vector MaxSim kernel POC#1200
suri-kumkaran wants to merge 1 commit into
mainfrom
users/suryangupta/staged_maxsim_kernel

Conversation

@suri-kumkaran

Copy link
Copy Markdown
Contributor

Introduce an experimental, generic cache-tiled reduction driver, tiled_reduce_staged, that computes multi-vector MaxSim/Chamfer for any element type and quantization by swapping four pluggable stages (StagedKernel, Postprocess, Reducer, StagedConvert) instead of forking the tiled loop nest per datatype.

Validated with two instantiations:

  • f32: bit-identical to and on par with the hand-fused V3 kernel (selectable for A/B as MaxSimIsa::X86_64_V3_Staged).
  • 4-bit MinMax-quantized i8: a new datatype added by swapping Stage A + the Acc type + Stage B only; 1.5-4.1x over the per-pair SIMD reference.

The driver owns all scratch, allocated from a caller-supplied ScopedAllocator; zero-allocation steady state is provided by a single-owner resettable bump arena (ResettableArena). A crisp design overview lives in the staged module README (diskann-quantization/.../kernels/staged/README.md).

Also adds benchmark examples (multi-vector-{staged,quant,3way}.json) and the quantized multi-vector benchmark backend.

Introduce an experimental, generic cache-tiled reduction driver,
`tiled_reduce_staged`, that computes multi-vector MaxSim/Chamfer for any
element type and quantization by swapping four pluggable stages
(StagedKernel, Postprocess, Reducer, StagedConvert) instead of forking the
tiled loop nest per datatype.

Validated with two instantiations:
- f32: bit-identical to and on par with the hand-fused V3 kernel
  (selectable for A/B as MaxSimIsa::X86_64_V3_Staged).
- 4-bit MinMax-quantized i8: a new datatype added by swapping Stage A + the
  Acc type + Stage B only; 1.5-4.1x over the per-pair SIMD reference.

The driver owns all scratch, allocated from a caller-supplied ScopedAllocator;
zero-allocation steady state is provided by a single-owner resettable bump
arena (ResettableArena). A crisp design overview lives in the staged module
README (diskann-quantization/.../kernels/staged/README.md).

Also adds benchmark examples (multi-vector-{staged,quant,3way}.json) and the
quantized multi-vector benchmark backend.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.34592% with 77 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.82%. Comparing base (ab967c1) to head (90fc533).

Files with missing lines Patch % Lines
...ion/src/multi_vector/distance/kernels/staged/i8.rs 92.35% 35 Missing ⚠️
...ion/src/multi_vector/distance/kernels/staged/v3.rs 91.97% 15 Missing ⚠️
...-quantization/src/multi_vector/distance/factory.rs 84.72% 11 Missing ⚠️
.../src/multi_vector/distance/kernels/staged/arena.rs 85.10% 7 Missing ⚠️
diskann-benchmark/src/inputs/multi_vector.rs 0.00% 2 Missing ⚠️
...kann-quantization/src/multi_vector/distance/isa.rs 0.00% 2 Missing ⚠️
...src/multi_vector/distance/kernels/staged/driver.rs 98.41% 2 Missing ⚠️
...on/src/multi_vector/distance/kernels/staged/mod.rs 96.92% 2 Missing ⚠️
...src/multi_vector/distance/kernels/staged/maxsim.rs 97.87% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1200      +/-   ##
==========================================
+ Coverage   89.79%   89.82%   +0.02%     
==========================================
  Files         488      494       +6     
  Lines       93182    94188    +1006     
==========================================
+ Hits        83676    84604     +928     
- Misses       9506     9584      +78     
Flag Coverage Δ
miri 89.82% <92.34%> (+0.02%) ⬆️
unittests 89.48% <92.34%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark/src/multi_vector/mod.rs 100.00% <ø> (ø)
...n-quantization/src/multi_vector/distance/kernel.rs 100.00% <ø> (ø)
...ntization/src/multi_vector/distance/kernels/mod.rs 100.00% <ø> (ø)
...src/multi_vector/distance/kernels/staged/maxsim.rs 97.87% <97.87%> (ø)
diskann-benchmark/src/inputs/multi_vector.rs 19.04% <0.00%> (-0.63%) ⬇️
...kann-quantization/src/multi_vector/distance/isa.rs 0.00% <0.00%> (ø)
...src/multi_vector/distance/kernels/staged/driver.rs 98.41% <98.41%> (ø)
...on/src/multi_vector/distance/kernels/staged/mod.rs 96.92% <96.92%> (ø)
.../src/multi_vector/distance/kernels/staged/arena.rs 85.10% <85.10%> (ø)
...-quantization/src/multi_vector/distance/factory.rs 80.26% <84.72%> (+1.05%) ⬆️
... and 2 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants