A review of models.
Kling TTS was my favourite. $0.007 per generation. Used for a short video. Using the UK old man 3 voice.
ElevenLabs TTS Turbo v2.5. $0.05 per 1000 character. Used Daniel (male, authoritative, news, commanding). Examples of other voices: https://help.scenario.com/articles/2103421335-elevenlabs-text-to-speech-models-the-essentials#4-2-voice-selection
Inworld TTS-1.5 Max. Used Craig – most similar to the above two voices.
VibeVoice. 0.5b. Generate long speech snippets fast using Microsoft’s powerful TTS. $0.02 per minute. I generated a 2 minute audio and one sentence was not spoken coherently. So it can be low quality, not entirely surprising for a 0.5 billion parameter model.
xAI Text to Speech. I currently use this for my digest audios, which are audio news digests, approx 10 mins duration. $0.0042 per 1000 characters. Max 15,000 characters. Examples of costs: 10 cents for 7 mins, 12 cents for 11 mins.
Note: xAI TTS is moving out of beta. Pricing is updated to $15 per 1M characters (previously $4.20).
MiniMax Speech 2.8 has been a favourite in the past. Turbo is great and they also have HD.