Reading Progress:

Latest News

by Article

Reading Time: ( Word Count: )

Sad Girl Robot. Image by 0fjd125gk87 (Pixabay)

By Jeremy Bursey

Jeremy Bursey writes short stories, essays, poems, novels, and screenplays. He appreciates feedback for anything he offers to the public. He also takes too many pictures of cats and the ocean.

April 29, 2022

AI Human Voices: A Blakify Review

Products I Use Series: Part 1

Note: This article is the first of many I’ll be writing in 2022 that cover the tools and apps I use to improve my writing and publishing workflow. To stay up-to-date with the latest news, please be sure to sign up for my mailing list or subscribe to my push notifications system.

Update (6/26/22): Just a point of correction, Blakify uses voice libraries from not only Google, but also Amazon, Microsoft, and IBM. Any time I refer to Google voices, I actually mean all of them.

Update (3/14/24): Even though services like Blakify (renamed UTRRR in late 2022) exist under different brand names, Blakify/UTRRR itself has recently closed its doors due to inability to keep up with costs, so it no longer works. If you choose to read this article, read it for the information about text-to-speech software and replace Blakify’s name with anyone else, like Odio, which is the service I’m using now to do exactly what Blakify did. Also note that I’m removing two links from this article that have since gone dead.

The Evolution of Convenience

AI technology is the hot new thing in the 21st century, along with social media, fad diets, and global pandemics. But do we really need so much AI invading our way of life, and can we trust it as far as we can throw it?

That last part’s a trick question, by the way. We can’t hold AI, so we can’t throw it. But you knew that. You’re a free thinker.

Now, regardless of what our instincts may tell us, in some cases, if not many, we can confidently say that, yes, we really do need so much AI invading our lives. If we’re honest, we’ll agree that it saves us time auto-accomplishing tasks that we’d normally have to do ourselves.

Take speech, for example. If I wanted to provide you an audio version of this article, under 20th century rules (and maybe even early 2000s rules), I’d have to record myself reading this article into a microphone, check it for glitches, scratches, or hiccups, then edit out the mistakes, possibly re-record the ultra-screw ups, and somehow convert the speech into a digitized audio file or process the one I’d recorded out of the box. The entire cycle could take an hour or longer. If I wanted a professional recording, I’d need the better part of a day.

All for just this one article.

That’s a lot of work and a lot of time for potentially minimal payoff. After all, how do I know you want to read this article? And how do I know you want to listen to the audio version?

Now, I realize you’re reading it now, so the answer to my first question is obvious. But are you listening to it? Maybe, maybe not.

Girl with headphones. Image by Yuri Manei (Pexels)

Girl with headphones. Image by Yuri Manei (Pexels)

Truth is, I don’t know your preference, but I still want to give you the option to choose. So, I’d want to provide you with an audio version of this text that’s as rich in quality as a desert is rich in sand. But I also know my voice, and I know my processing abilities. While my voice is clear some of the time, I can’t help but to mumble at other times. And, as hard as I would try to deliver a solid recording experience on my own, I just don’t have the vocal skill or the teeth to pull it off.

So, if I really wanted to give you quality output, I’d hire a voice actor to read it for me.

Now, that would save me time, which, if you recall, is the point of having AI technology. If I can hire a voice actor to do the work for me, then I can reclaim my productivity and let the voice actor worry about the nuances of voice production. But, the trade-off is that I’d have to pay real money for it, and voice actors, especially the professional ones, don’t come cheap. Is it still worth it for me, then, to outsource the voice work when I can’t even be sure you’ll listen to it?

There’s got to be a better way.

Well, for the professional voice actor, there is no better way. But for me, I could resort to AI technology to close the time and budget gaps that form as a result of my desire to bring you quality.

Guy with microphone. Image by Pexels (Pixabay)

Guy with microphone. Image by Pexels (Pixabay)

Google Has Voices, Lots of Voices

If you’re at all familiar with Google, then you’ll know that, within its ever-expanding wings of global information dominance, it features a department focused on AI voices. With over 700 voice types to choose from, ranging from various ages, genders, languages, and accents, as well as adjustable speeds and inflections, Google has an entrenched market on providing quality audio narrations for just the price of a token. To get results, you simply have to copy/paste your text into a plain-looking input box and hit “convert” or “play” or whatever the latest, hottest lingo may be, and listen to the automagical transition of your brilliant words go from text to speech. It’s quite revolutionary.

Well, it’s also a technology that’s been around for a decade or more, but like any technology, its life began on a bit of the rough side.

If you recall the old text-to-speech readers on a PDF, you may remember them sounding exactly like a robot that sounds out its words by the letter. “Hy-my-fry-end. How-ur-yoo?” This was the behavior during a time when digital video files were still pixelated and about the size of your thumb.

But like all improvements to technology, AI human voices got better. The program-sounding robot eventually became self-aware and figured out how to not only sound more human, “Um, how are you, friend?” but also more human from any age, gender, region, or attitude, “Hey, how’s it going, my awesome friend, dude?”

At some point, as we should assume if we know anything about progress or innovation, the tech industry would want to capitalize on this emerging voice technology. After all, the AI tech industry knows that what we really want is not a gimmick but for someone else to do our work for us.

Enter Blakify.

Image: Blakify Homepage

Image: Blakify Homepage

No, Really, Enter Blakify

As of now, certain reading apps will provide native support for text-to-speech audio and offer a small selection of fake voice actors to listen to. It gets the job done, but sometimes the quality of the voice sounds like it was recorded in 2008, and listeners today would much rather hear the luscious tones of a fake voice actor from 2022. In the case of native reading apps with built-in voice support, the user will have to take what’s offered.

But for content creators who want to convert their written words into audio without hiring a professional or doing it themselves, and post that audio file into a market or storefront that they can control, then using a sophisticated tool that draws in from the entire pool of available Google voices would be highly useful. It would certainly save a lot of time.

Blakify is an app that can do exactly that.

Image: Blakify User Dashboard

Image: Blakify User Dashboard

By uploading a text document or pasting the text directly into an input box, the user can queue a text-to-speech conversion that requires minimal editing while still offering maximum quality output. Depending on the chosen voice and the selected word, the robotic twang may still surface in the pronunciation, especially if that voice comes from an older library. But as new voice technologies like neural voices emerge, more are capable of synthesizing normal speech, and the gulf between real and simulated voices shrinks to almost imperceptible levels.

By this metric, the need for professional voice actors to narrate your text also shrinks to imperceptible levels.

Where this benefits the blog, e-book, or related author like me is that the barriers for entry into the audiobook orbit vanishes. If the writer can afford a monthly fee to maintain the service, then he or she can dip back into the well of supporting voices any time he wants and convert his latest article or story into an audio format that can increase that author’s audience reach.

Imagine a world where a single tool can double an article’s impact just by turning the written word into the spoken word. All through the low-friction world of AI voice technology.

The question you may be asking now is, how well does it work?

Well, if you’re listening to this article, then you already hear the results.

Listening for words. Image by Andrea Piacquadio (Pexels)

Listening for words. Image by Andrea Piacquadio (Pexels)

Using Blakify to Maximize Text-to-Speech Effectiveness

So, how does Blakify work exactly? Well, as I said earlier, you just import a text file into the system or copy one into the input box. But if you want variety in your delivery, such as having more than one speaker deliver your lines, then you’ll need to adjust your import strategy.

This is what I recommend:

Don’t bother with the import button. Although it’s better now than when I’d first bought the app, it’s still limited in its effectiveness. I recommend copying and pasting your text directly into the box. This is for two reasons.

  1. You have more control over the text layout, meaning you can provide better voice direction when you strip out document formatting or unnecessary sections. By copying and pasting in smaller chunks, you also have easier tracking of mispronounced words.
  1. By pasting text in chunks, you can actually assign different blocks of texts to different AI voice actors. So, for example, if you’d rather have Christopher read headers and Guy read body paragraphs (my use case), you can do that. Blakify gives you the option to add new blocks into the same recording so that you can keep better control of your actors, their speed and inflections, and ultimately their delivery of your content to your listeners.
Image: Blakify Text-to-Speech Dashboard

Image: Blakify Text-to-Speech Dashboard

Now, once you’ve decided on a system for import, you’ll need to insure the best experience for export. So, consider these steps when prepping your audio file.

  1. You need to figure out which voices you want to use. First, visit the “Voices” panel and sample those that best match your preferred gender, nationality, and engine (standard or neural). Once you find the characters you like, mark them as “favorites.” When it comes time to select your voice in the text-to-speech window, you’ll have to choose from your favorites.
  1. Remember that you have an output limit of 30,000 characters per recording. While this should be enough for most audio needs (as it equates to roughly 5,000 words, and unless you’re writing a feature article for a national magazine that performs all publishing duties for you, it’s unlikely you’ll ever need that much space), you may want to look for a reasonable breakpoint if you know your text will run longer than that. Then, when you output the audio, make sure to label the file “part 1,” “part 2,” etc.
  1. In the event that you do have an article longer than 3,000 words, you’ll want to get your copy of Audacity ready. And if you don’t yet have Audacity, now is a great time to download it. It’s still free, last I checked. You’ll need it to merge your sound files together.
  1. Before you compile your text, make sure the voice actors pronounce your words correctly. You’ll find that it gets it right 99% of the time, but every so often it will mispronounce a word. In this case, you’ll need to sound it out. For example, in my article about AI writing, the AI voice could not pronounce “how-tos” correctly, so I had to write it as “how-toos” or “how-to’s” for it to better deliver the word. (Note to Listeners: I’m leaving the original spelling intact to prove my point.)
  1. Remember that the length of a pause is conditional on the space between the words. Words that follow periods (or full stops) take the robot a split-second longer to say than those that follow commas or other words in the same sentence. But those that begin on a new line take the longest. If you need a good-sized pause, put the next word on a new line.
  1. If you need your character to change his speed or inflection of voice, then you’ll want to generate a new box and paste the affected text inside, but remember that a new box means you’ll create a pause between words, and this could negatively affect the quality of your output. Use this strategy between ideas only, not in the middle of a sentence or even the same paragraph, if possible.
  1. If you need to add music to your text, perhaps for a podcast or dramatic reading, then you can do so in post by visiting the “My Music” tab and uploading your preferred music in with the text recording. Because I haven’t actually tried out this feature yet, I cannot comment on the quality, but the option is there, nevertheless.
  1. Once you’re sure you’ve got your sound presentation exactly as you want it, hit “Convert-to-Speech.” Your converted audio file will be ready for download at the bottom of the page.

And that’s all there is to it.

Image: Blakify Converted Audio List

Image: Blakify Converted Audio List

Is It Worth the Price?

So, the question that inevitably appears at the end of any product review asks whether it’s worth the price. Well, I can’t answer that because I don’t know your budget. But as I’d like to remind you:

  • Professional voice actors charge by each processed hour and can cost you anywhere from hundreds to thousands of dollars for a finished product. Is your article, e-book, or video worth that cost? Will you ever see that money again?
  • Paying the voice artist for a single project means paying a one-time cost for that project. If that’s all you need, then maybe it’ll pay for itself in a few years. That may also be worth investing in a superior quality product.
  • Professional voice actors are not easily duplicated, and they can “act” your words by using the power of sensibility. Show me a robot that has sensibility.
  • Text-to-speech programs use canned voices. Even though they’ve evolved into sophisticated speaking machines in the last few decades, certainly a giant leap past the old Speak & Spell, they’re still canned.
  • For text-to-speech programs that require a subscription fee, you’ll always be paying. Maybe it’s a few dollars a month, but there’s always a new month.
  • Blakify has three payment plans, “Lite” at $29.99 a month, “Pro” at $59.99 a month, and “Elite” at $99.99 a month. All three plans give users the same full commercial license and unlimited storage of sound files, as well as access to standard and neural voices. The difference comes down to how many characters the user can generate. But with the lowest tier, “Lite,” allowing a generation of up to a million characters a month, it’s pretty unlikely you’ll ever run out of room. For context, this article has less than 15,000 characters. I would have to write 67 articles of the same length in a single month just to use it up. The “Pro” plan gives users five times the amount of characters, and the “Elite” plan offers unlimited characters. So, depending on your output, any of these plans might be worth it.

It really comes down to how much you like the output. Again, if you’re listening to this article, then you already know whether you like it.

If you’d like to see the app in action, I also recorded a video for my YouTube channel showing off its capabilities. Check it out below.

Cover Image: 0fjd125gk87 (Pixabay)

Additional Note: I bought Blakify (Elite plan) through a lifetime deal on AppSumo for just $69. Before you buy one of the monthly plans, check to see if the AppSumo deal is still active. You’ll save a ton if you do. But remember that AppSumo deals eventually go away, so if you want it, don’t procrastinate. It won’t be there forever. I don’t even know if it’ll be there for long.

Update (3-14-24): The deal is long gone. Look up for a comparable deal on Appsumo.

About This Site

Welcome to Jeremy Bursey’s information superhighway. Why is your seatbelt on?

You May Also Like:


Submit a Comment

Your email address will not be published. Required fields are marked *