AwesomeTTS Anki addin with Azure Neural voice is surprisingly good

The new AwesomeTTS addin for Anki 2.1 produced surprisingly good quality speech when used with the Microsoft Azure ‘Nanami’ Female Neural voice.

You can clearly hear the intonation and it seems to pick up the correct reading depending on the context. Single words sound clipped, which is probably to be expected from the way the neural network has been trained on continuous speech and single words have to be ‘cut-out’ from that.

It only occasionally struggled with 市場, always pronoucing as しじょう (stock/foreign) market, when I would have sometimes expected いちば (shipping) marketplace, showing that it had been trained mainly on news sources.

You don’t need to become a patron of AwesomeTTS but you will need to register for a free Micosoft Azure cloud account at https://azure.microsoft.com. Eventually, Microsoft’s registration process will ask you for a credit card number. You will only get charged if you go over the 500,000 free characters per month. Also, you can set alerts if you get charged.

On Azure, you will need to generate an API key which you paste into the setting for the AwesomeTTS addin. You can generate audio files and fields for up to twenty Anki cards in one go. Then you have to pause a few seconds before making another such request.

3 Likes

Could you post a sample MP3? I’ve messed around with speech generation on Google and AWS, but I haven’t tried Azure yet, so I’m curious what it sounds like.

Sure, here is the audio from one of my Anki cards 郵便物を区分する:

Please feel free to post some Japanese text here.

3 Likes

That’s really good. I never bothered with TTS as it was always robotic. I took a look at some of the VoiceDroid voices in Japan, which were a little better but still robotic (and expensive!)

I’ve actually raised a PR against that add-on to enable the pitch variation. The service also has more speed levels that in the plugin, but updating that wasn’t so easy (it breaks presets)

If you’re willing to edit the file, see here: https://github.com/AwesomeTTS/awesometts-anki-addon/pull/131/files

If you are willing to delete your presets, then you can replace Azure.py with the below. I like setting “speed” to a small negative value like -5/-8, which is only slightly slower than the default settings. Plus I have a few presets with tiny pitch differences, as I like to have different voices in my decks!

Rename this file from .apkg to .py and replace the one in add-ons/TTS AddonNumber/awesometts/service

azure.apkg (38.8 KB)

Some examples of playing with pitch:

2 Likes