Custom AI TTS Voices for Twitch
Nowadays I feel like 'artificial intelligence' is appended to nearly anything. It was only a matter of time before Twitch was littered with the phrase.
I've built this website up around Twitch Channel Points, but that doesn't fully satiate my mind. December of 2020 I took a dive into AI/ML TTS, and I think It's ready.
Custom voice for Brittballs
I was watching Brittballs (www.twitch.tv/brittballs) during her uncapped-subathon (which at the time of writing is still live, and has been going on for 600+ hours). I was doing some chatting, as any Twitch user should, and somehow the topic of a custom AI TTS voice unraveled itself, which honestly, was lucky timing. Prior to this interaction, I was just wrapping up trying to streamline the process of custom voice creation.
I've talked to many twitch streamers about having their own TTS voice, but the main hurdle is obtaining audio of the streamer, and not just any audio. Clean audio. I can do the hard work of sifting through the streamers VODs and transcribing the audio myself, but it's like trying to hammer a nail into concrete with a spoon. It will work, eventually, maybe. Manual transcription is not even worth the effort either. All that hard work usually turns into a bad voice because of the dirty audio. The goal is to have the audio clean, and also transcribed.
One perk of approaching streamers for their own custom TTS voice is their microphone. There's a reason why people call their stream room their studio. It's a miniature radio station most of the time, and Brittballs has the cream of the crop, the crème de la crème of microphones. The Shure SM7B with the cloud lifter.
Anyways, back to me chatting in Brittballs' stream. It really was just lucky timing with it all. Britt showed interest for having her own TTS voice, she has a phenomenal microphone to record the audio with, and she also had time. Many, many hours of time on her hand, so I told Britt what I've been working on behind the scenes. She not hesitant to record the audio, I sent over the program to record it.
The program I've made is kinda simple. It makes recording the required lines nice and easy. There is not really a magic number for how many lines of audio one has to record, but the more the merrier. As of right now there are ~700 lines in the total list, and while you don't have to record them all, the more you record the better your custom voice will end up sounding. Britt recorded all 700+ lines in an hour and half on stream. I will say it again, you don't have to record all 700, but her voice turned out really great.
Britt sent over the voice lines.
Britt’s AI voice
The morning Britt sent over the voice lines I had class, so I woke up early and started training the AI TTS model before I left for class. When I got home the room was hot and toasty from my GPU doing a bunch of work, but the voice model was done.
I added the model to my server, and let Brittballs know it was completed and ready for use. I tried to get a clip that has Britt talking shortly after the TTS went through.
How to get your own AI TTS Voice
Currently the only way to get your own TTS voice with my system is to DM me on twitter. My DMs should be open. If not, email me at [email protected] and let me know you are looking for a Custom AI TTS voice. As of right now, the AI TTS service I offer is only for Twitch Channel Points, but I plan on adding Bits and Subscriptions later.
A rundown of how it works
- You DM or Email me asking for a custom TTS voice
- I send over the program needed to record the audio
- You record the audio (Should take about one and a half hours)
- I take that data, and put it into the machine learning algorithm
- I get the voice model ready for use, and I let you know once it's good to go
- If you've never used my system before, check out the Getting Started page, otherwise all you will need to do is change the Voice ID on the TTS reward.
What you will need is:
It's $25/month for everything. That includes data processing, model training, model hosting and rendering of the TTS audio. There is no cap on usage. All you gotta do is record some audio, and add a browser source to OBS. :)