Creating Custom Voices with Piper
Introduction to Piper
Piper is a fast, local text-to-speech engine optimized for low-end hardware like Raspberry Pi. It uses VITS, an end-to-end speech synthesis model, and exports voices to the ONNX runtime.
While initially developed for screen readers like NVDA, Piper's potential extends far beyond. Its performance is continually improving, making it suitable for various applications such as web browsing, email reading, and social media consumption.
Creating Your Dataset
For TTS, your dataset should include:
- Audio files: 16 or 22.5 kHz mono .wav files (16-bit resolution)
- Text transcripts: Formatted according to LJSpeech conventions
The format for text transcripts looks like this:
audio1|This is the first sentence.
audio2|This is the second sentence.
For best results, ensure your recordings are clear, with minimal background noise. Studio-quality is ideal, but a quiet room and a decent microphone can suffice. Aim for at least five minutes of audio to start, though more data will generally yield better results.
Training Your Model
Piper models are trained using Google Colab, a cloud-based Jupyter notebook environment. This approach provides access to powerful GPUs necessary for efficient training. The process involves:
- Uploading your dataset to Google Drive
- Setting up the training environment in Colab
- Configuring and initiating the training process
- Exporting the trained model for use
Use these Colab notebooks for training and exporting your model:
When configuring your model, consider increasing the "validation split" to 0.05, depending on your dataset size. For English voices, choosing US English is recommended for the best results with pre-trained models.
Testing and Using Your Model
After training, you can test your model using the provided Colab notebooks. These allow you to synthesize speech from text input, giving you a chance to evaluate and refine your model.
Once satisfied with your model, you can use it with compatible software like the NVDA screen reader (with the appropriate add-on) or any other application that supports Piper voices.
To install a new voice in NVDA:
- Go to the NVDA settings
- Locate the Piper category
- Find the button to install voices from a local archive
- Choose the voice you want and press enter
Join the Piper Community
Creating custom voices with Piper is an exciting journey into the world of AI-driven speech synthesis. Whether you're looking to create voices for accessibility purposes, creative projects, or just out of curiosity, Piper offers a powerful and flexible platform.
We encourage you to experiment, share your experiences, and contribute to the growing Piper community. Your innovations could help shape the future of accessible and personalized text-to-speech technology!
For more detailed instructions and updates, please refer to the original guide.