AsianFin — JD Yanshi, a leading AI innovation lab under JD.com, has officially launched its cutting-edge voice synthesis model, LiveTTS, along with its Universal Digital Human Model 2.0. These advancements are designed to enhance human-machine interactions, offering improved voice quality and more precise lip-syncing for digital humans, elevating the experience of natural communication between humans and machines.
The new LiveTTS model supports zero-shot voice cloning, enabling the creation of realistic voice replicas with no prior training data, and allows for fine-tuning of premium voices for specific needs. The model also offers enhanced digital human voice lip synchronization, a crucial upgrade for applications in live-streaming, customer service, marketing, and outbound calling.
In the SeedTTS test-hard evaluation, LiveTTS outperformed models from other leading voice synthesis providers, showing a Character Error Rate (CER) reduction of 0.2% to 5.12%. This means that LiveTTS can reduce pronunciation errors by up to 512 mistakes per 10,000 words, making it one of the most accurate models on the market.
Explore more exclusive insights at nextfin.ai.