Davinci Magihuman
To be verified
Open-source AI generating lip-synced talking videos from a single photo and audio/text.
daVinci-MagiHuman is an advanced, open-source 15B-parameter AI model developed by Sand.ai and GAIR Lab at Shanghai Jiao Tong University. It is designed to generate high-quality, lip-synced talking videos from a single portrait image and a script or audio file. Unlike traditional methods that combine separate text-to-speech and video pipelines, daVinci-MagiHuman utilizes a unified single-stream Transformer to jointly denoise video and audio tokens simultaneously. Released under the Apache 2.0 license, it allows users to inspect weights, run inference locally, and use the technology for commercial purposes. It is optimized for speed, capable of generating short clips in just seconds on professional-grade hardware like the NVIDIA H100.
- Creating AI-powered marketing avatars from static portraitsTo be verified.
- Developing multilingual educational content with synchronized lip motionTo be verified.
- Generating low-latency digital humans for interactive applicationsTo be verified.
- Prototyping realistic talking head animations for social mediaTo be verified.
- To use daVinci-MagiHuman
- upload a clear
- front-facing portrait photo and provide a script or audio file. Select your desired output resolution (e.g.
- 256p
- 720p
- or 1080p) and start the generation process. Once the AI completes the job
- you can download your talking video. For local deployment
- users can download the model checkpoints from Hugging Face and follow the provided CLI instructions.
