🎭 OmniAvatar - Lipsynced Avatar Video Generation

Generate videos with lipsynced avatars using a reference image and audio file. Based on Wan2.1 with OmniAvatar enhancements for audio-driven avatar animation. Note: this Gradio Space demo uses Wan2.1 1.3B and not Wan 14B. It takes about 4 minutes to generate a 4s long video (like in the examples), so we recommend you to duplicate this space.

Reference Avatar Image

Speech Audio File

Video Description

Seed used

Generated Avatar Video

Example Inputs

Reference Avatar Image	Speech Audio File	Video Description

📝 Notes

The reference image should be a clear frontal view of the person
Audio should be clear speech without background music
Generation may take several minutes depending on video length
For best results, use high-quality input images and audio