🎭 OmniAvatar - Lipsynced Avatar Video Generation

Generate videos with lipsynced avatars using a reference image and audio file. Based on Wan2.1 with OmniAvatar enhancements for audio-driven avatar animation. Note: this Gradio Space demo uses Wan2.1 1.3B and not Wan 14B. It takes about 4 minutes to generate a 4s long video (like in the examples), so we recommend you to duplicate this space.

0 2147483647
Resolution
10 50
1 10
0 10
1 25
10 30
0 2
Example Inputs
Reference Avatar Image Speech Audio File Video Description

📝 Notes

  • The reference image should be a clear frontal view of the person
  • Audio should be clear speech without background music
  • Generation may take several minutes depending on video length
  • For best results, use high-quality input images and audio