feat: Add inference support for MiniT2I model.#1683
Open
KenForever1 wants to merge 3 commits into
Open
Conversation
Cache MiniT2I positional embeddings and text/vision RoPE tensors in a runner-level backend buffer. This avoids regenerating and uploading the same step-invariant constants for every denoise graph while preserving model batch semantics.
Drop the unused timestep and pooled-text vec path from MiniT2I graph construction. The Python reference currently passes this vec through unused block/final-layer parameters, and local validation produced identical output hashes before and after the cleanup.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add inference support for MiniT2I in stable-diffusion.cpp.
This PR adds a MiniT2I diffusion runner, T5/flan-t5 text conditioning integration, model detection/loading support, and MiniT2I-specific sampling flow. It also caches step-invariant positional embeddings/RoPE tensors and removes an unused conditioning branch after validating output consistency.
Changes
MiniT2I::MiniT2IRunnerimplementation for MMJiT-style diffusion inference.google/flan-t5-large.t_vec + pooled_textconditioning branch that is not consumed by the current MiniT2I graph.Commits
b9493fa Add MiniT2I inference support8de8f95 Optimize MiniT2I position cachedfb6ca2 Remove unused MiniT2I conditioning branchModels Used
MiniT2I diffusion model:
MiniT2I/minit2i-b-16transformer/diffusion_pytorch_model.safetensorsText encoder:
google/flan-t5-largemodel.safetensorsTest Commands
Mac Metal test:
CUDA with diffusion flash attention:
Validation Notes
--diffusion-faworks with MiniT2I and reduces stable diffusion forward time significantly.