feat: Add inference support for MiniT2I model. by KenForever1 · Pull Request #1683 · leejet/stable-diffusion.cpp

KenForever1 · 2026-06-19T11:02:08Z

Summary

Add inference support for MiniT2I in stable-diffusion.cpp.

This PR adds a MiniT2I diffusion runner, T5/flan-t5 text conditioning integration, model detection/loading support, and MiniT2I-specific sampling flow. It also caches step-invariant positional embeddings/RoPE tensors and removes an unused conditioning branch after validating output consistency.

Changes

Add MiniT2I model type detection and loading path.
Add MiniT2I::MiniT2IRunner implementation for MMJiT-style diffusion inference.
Add MiniT2I conditioner path using google/flan-t5-large.
Add MiniT2I sampling path with conditional/unconditional forward and CFG update.
Add backend support needed by MiniT2I graph execution.
Cache MiniT2I positional embeddings, text RoPE, and vision/joint RoPE in runner-level backend buffers.
Remove unused t_vec + pooled_text conditioning branch that is not consumed by the current MiniT2I graph.

Commits

b9493fa Add MiniT2I inference support
8de8f95 Optimize MiniT2I position cache
dfb6ca2 Remove unused MiniT2I conditioning branch

Models Used

MiniT2I diffusion model:

Model: MiniT2I/minit2i-b-16
Weight: transformer/diffusion_pytorch_model.safetensors

Text encoder:

Model: google/flan-t5-large
Weight: model.safetensors

Test Commands

Mac Metal test:

cd stable-diffusion.cpp

./build/bin/sd-cli \
  --backend metal \
  --model MiniT2I/MiniT2I/minit2i-b-16/transformer/diffusion_pytorch_model.safetensors \
  --t5xxl google/flan-t5-large/model.safetensors \
  --prompt "a cat" \
  --steps 100 \
  --cfg-scale 6 \
  --width 512 \
  --height 512 \
  --seed 42 \
  --sampling-method euler \
  --rng cpu \
  --output /private/tmp/minit2i_metal.png \
  --threads 8

CUDA with diffusion flash attention:

cd stable-diffusion.cpp

./build-cuda/bin/sd-cli \
  --backend cuda \
  --diffusion-fa \
  --model MiniT2I/MiniT2I/minit2i-b-16/transformer/diffusion_pytorch_model.safetensors \
  --t5xxl google/flan-t5-large/model.safetensors \
  --prompt "a cat" \
  --steps 100 \
  --cfg-scale 6 \
  --width 512 \
  --height 512 \
  --seed 42 \
  --sampling-method euler \
  --rng cpu \
  --output /tmp/minit2i_cuda_diffusion_fa.png \
  --threads 8

Validation Notes

MiniT2I generation succeeds on CUDA and Metal.
Position/RoPE cache optimization preserves model batch semantics.
Removing the unused conditioning branch produced identical output in local validation.
CUDA --diffusion-fa works with MiniT2I and reduces stable diffusion forward time significantly.

Cache MiniT2I positional embeddings and text/vision RoPE tensors in a runner-level backend buffer. This avoids regenerating and uploading the same step-invariant constants for every denoise graph while preserving model batch semantics.

Drop the unused timestep and pooled-text vec path from MiniT2I graph construction. The Python reference currently passes this vec through unused block/final-layer parameters, and local validation produced identical output hashes before and after the cleanup.

KenForever1 added 3 commits June 18, 2026 15:49

Add MiniT2I inference support

b9493fa

Optimize MiniT2I position cache

8de8f95

Cache MiniT2I positional embeddings and text/vision RoPE tensors in a runner-level backend buffer. This avoids regenerating and uploading the same step-invariant constants for every denoise graph while preserving model batch semantics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add inference support for MiniT2I model.#1683

feat: Add inference support for MiniT2I model.#1683
KenForever1 wants to merge 3 commits into
leejet:masterfrom
KenForever1:master

KenForever1 commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KenForever1 commented Jun 19, 2026

Summary

Changes

Commits

Models Used

Test Commands

Validation Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant