🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
-
Updated
Aug 16, 2024 - Python
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
End-to-End Speech Processing Toolkit
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
so-vits-svc fork with realtime support, improved interface and more features.
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
Foundational model for human-like, expressive TTS
Speech To Speech: an effort for an open-sourced and modular GPT4-o
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
Converts text to speech in realtime
Lingvo
WaveNet vocoder
DeepMind's Tacotron-2 Tensorflow implementation
Add a description, image, and links to the speech-synthesis topic page so that developers can more easily learn about it.
To associate your repository with the speech-synthesis topic, visit your repo's landing page and select "manage topics."