When there’s no I in my AI

One of the things I do is create eLearning courses. I’m a hybrid – A techie that can teach, but also can translate that material in recorded, or eLearning format and still make it interesting. A high tech instructional designer.

Many instructional designers like those AI text-to-speech generators to craft their narrations. There are many types, such as Amazon Polly, Speechelo, and others. Amazon, or AWS Polly works pretty well, and gives you the ability to modify the AI speech to add pauses, inflections, and emphasis where you need it. On the other end of the spectrum is Speechelo, which is easy to use, gets things right most of the time, but has little in the way of hardcore customization.

Instructional designers like these services because they can stuff a script into them and get out passable speech. And if they have to update or modify the course, they don’t have to track down the same voice talent and hope they can replicate the tone that was used last time. If you can’t find the original voice, you have to do it all over.

The product that I support has had a speech engine for over a decade, and I’ve created courses on how to work with the grammar and phonetics to make it sound more lifelike. Better the grammar engine, the better it works. It knows how to inflect to, too, and two based on context. It also knows “Read a book” as opposed to “I read it”. So I’m well versed at working between the text and AI voice worlds.

That said, I myself prefer to narrate my work. First, because I include demo video and I want to match the voice with the interactive bits. But second, because it’s easier on the ears. I craft my courses as if I’m the one that’ll have to view them. Which I do anyway. I consider having to endure an eLearning course, especially if it’s a horrible compulsory HR course, done with a crappy AI speech engine, to be one of the utmost soul sucking experiences.

Until I had to convert someone elses work into AI speech.

I was given a task to help out another team and compile scripts. No idea who’s behind it, no idea who wrote them, other than by my eyes I see two different styles and attentions to detail. Maybe it’s a guy that got a case of IDon’tGiveAShitItis on some of them.

The slides (why I have them is a mystery) have the speech that’s needed in the notes, and it’s a jumble of bad grammar, poor word choice, non sequitors, dangling participles, wrong words (it’s We’ll, or We will, NOT Will), and incomplete sentences. Whoever it was added the code for pauses where they wanted them. Something you really don’t need with AI speech.

Add all that to it needs to be written in the same voice, pick a person and stick with it, and written in conversational english. AI can’t, at least now, deal with bullet points. You have to write in whole sentences, and then connect them together. This is because it’s not AI, per se. They put all the grammar rules and context based inflection into the engine. But it won’t fix your broken thoughts and bad grammar. It won’t even recognize it and tell you it’s wrong and refuse to read it.

Some of the text is astonshingly bad. I’ll clean up the word choice, fix the sentences, and put it in place where it can be read without sounding like Robby the Robot with a burnt bearing, and generate the speech. I’m using Speechelo, because it’s what we use and it’s preview feature is a piece of shit. It reads a sentence or two then tells you to generate it. So, I generate it, then I have to play it. Every one. And there are hundreds. That’s when I find the bad word choices and wrong word use. I’ve had to rewrite whole paragraphs because they made no sense when read, let alone when the computer box reads them.

Look away from the text and listen. Fix it and regenerate, or save it.

Lather, rinse, repeat.

Well, in any event, it keeps me off the streets and gives me something to complain about.

But it’s an example of the horrible written word that’s found these days on everything from emails, to posts, to even web articles.