Author of "Reference-to-Video Generation" in arXiv
How this journalist typically writes
Zijian Zhou as author
Saber, a zero-shot framework trained only on video-text pairs, generates reference-to-video content while preserving subject identity without requiring explicit R2V training datasets, outperforming models trained on costly triplet data.
“Author of "Reference-to-Video Generation" in arXiv”
Mixture of States (MoS) introduces a token-wise router for multimodal diffusion models that achieves state-of-the-art text-to-image generation and editing with 3B-5B parameters, matching models 4× larger while requiring minimal computational overhead.
“Author of "Mixture of States: Routing Token-Level Dynamics for Multimodal Generation" on arXiv”