Skip to content

feat: Add inference support for MiniT2I model.#1683

Open
KenForever1 wants to merge 3 commits into
leejet:masterfrom
KenForever1:master
Open

feat: Add inference support for MiniT2I model.#1683
KenForever1 wants to merge 3 commits into
leejet:masterfrom
KenForever1:master

Conversation

@KenForever1

Copy link
Copy Markdown

Summary

Add inference support for MiniT2I in stable-diffusion.cpp.

This PR adds a MiniT2I diffusion runner, T5/flan-t5 text conditioning integration, model detection/loading support, and MiniT2I-specific sampling flow. It also caches step-invariant positional embeddings/RoPE tensors and removes an unused conditioning branch after validating output consistency.

Changes

  • Add MiniT2I model type detection and loading path.
  • Add MiniT2I::MiniT2IRunner implementation for MMJiT-style diffusion inference.
  • Add MiniT2I conditioner path using google/flan-t5-large.
  • Add MiniT2I sampling path with conditional/unconditional forward and CFG update.
  • Add backend support needed by MiniT2I graph execution.
  • Cache MiniT2I positional embeddings, text RoPE, and vision/joint RoPE in runner-level backend buffers.
  • Remove unused t_vec + pooled_text conditioning branch that is not consumed by the current MiniT2I graph.

Commits

  • b9493fa Add MiniT2I inference support
  • 8de8f95 Optimize MiniT2I position cache
  • dfb6ca2 Remove unused MiniT2I conditioning branch

Models Used

MiniT2I diffusion model:

  • Model: MiniT2I/minit2i-b-16
  • Weight: transformer/diffusion_pytorch_model.safetensors

Text encoder:

  • Model: google/flan-t5-large
  • Weight: model.safetensors

Test Commands

Mac Metal test:

cd stable-diffusion.cpp

./build/bin/sd-cli \
  --backend metal \
  --model MiniT2I/MiniT2I/minit2i-b-16/transformer/diffusion_pytorch_model.safetensors \
  --t5xxl google/flan-t5-large/model.safetensors \
  --prompt "a cat" \
  --steps 100 \
  --cfg-scale 6 \
  --width 512 \
  --height 512 \
  --seed 42 \
  --sampling-method euler \
  --rng cpu \
  --output /private/tmp/minit2i_metal.png \
  --threads 8

CUDA with diffusion flash attention:

cd stable-diffusion.cpp

./build-cuda/bin/sd-cli \
  --backend cuda \
  --diffusion-fa \
  --model MiniT2I/MiniT2I/minit2i-b-16/transformer/diffusion_pytorch_model.safetensors \
  --t5xxl google/flan-t5-large/model.safetensors \
  --prompt "a cat" \
  --steps 100 \
  --cfg-scale 6 \
  --width 512 \
  --height 512 \
  --seed 42 \
  --sampling-method euler \
  --rng cpu \
  --output /tmp/minit2i_cuda_diffusion_fa.png \
  --threads 8

Validation Notes

  • MiniT2I generation succeeds on CUDA and Metal.
  • Position/RoPE cache optimization preserves model batch semantics.
  • Removing the unused conditioning branch produced identical output in local validation.
  • CUDA --diffusion-fa works with MiniT2I and reduces stable diffusion forward time significantly.

Cache MiniT2I positional embeddings and text/vision RoPE tensors in a runner-level backend buffer. This avoids regenerating and uploading the same step-invariant constants for every denoise graph while preserving model batch semantics.
Drop the unused timestep and pooled-text vec path from MiniT2I graph construction. The Python reference currently passes this vec through unused block/final-layer parameters, and local validation produced identical output hashes before and after the cleanup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant