Japanese pronunciation editing

General discussion forum to talk about various TTS Voices.

Moderators: kdwhite, Jim Bretti, D.Leikin

Japanese pronunciation editing

Postby Qvidnon » Wed Aug 01, 2007 10:17 am

Hello.
I've just begun using Japanese voices and all-in-all I'm quite impressed. However, I need to know how to edit pronunciation of words for specific voices to do a little 'tuning'.

My example is おだいじに (odaijini) which is pronounced correctly by Kyoko (o-die'-gee-nee) but not by Miyu or Show (o-da-ee'-gee-nee). There are other things I want to change too, like the length of a pause for a unicode space seems longer than that for an ASCII one.

Could someone point me in the right direction? Is there documentation for the phonemes in these languages?

Kindest regards,
Stu
Qvidnon
 
Posts: 7
Joined: Thu Jul 26, 2007 5:37 pm
Location: Burlington, Ontario, Canada

Postby D.Leikin » Thu Aug 02, 2007 8:21 am

Hello Stu,

If Japanese voices are shipped with engine-level dictionary editor, it should probably be found in

C:\Program Files\NeoSpeech\<voice_name>\lib\

If you locate an executable file named like UserDic*.exe in this folder, you should probably be able to run it to edit pronunciations.

Presumably, user dictionary file (i.e., the file that stores pronunciation edits) is located in

C:\Program Files\NeoSpeech\<voice_name>\data-common\userdict\

I wish i could be more specific on the point, but just don’t have Japanese voices on my system.
D.Leikin
 
Posts: 682
Joined: Sat Jan 14, 2006 2:15 pm

Postby Qvidnon » Thu Aug 02, 2007 10:09 am

The file UserDicJpn.exe is where you said it would be. The only problem is that all the labels, buttons and text boxes are in Kanji! I hope my professor won't mind doing a little translation for me!

Regards, Stu
Qvid Non Solutions
www.qvidnon.ca
Qvidnon
 
Posts: 7
Joined: Thu Jul 26, 2007 5:37 pm
Location: Burlington, Ontario, Canada

Postby D.Leikin » Thu Aug 02, 2007 11:21 am

Stu, should you be able to locate and double-click the file

C:\Program Files\NeoSpeech\<voice_name>\lib\UserDic*_e.chm

your professor wouldn’t mind for sure, since the suffix ‘_e’ indicates the help file is written in English.
D.Leikin
 
Posts: 682
Joined: Sat Jan 14, 2006 2:15 pm

Postby Qvidnon » Thu Aug 02, 2007 11:30 am

There is a UserDicJpn_k.chm which is in Katakana and a UserDicJpn_j.chm which is in Kanji, but no _e file unfortunately. However, having both these files will certainly help her with any technical specifics of the buttons.

Thanks for the lead.

Regards, Stu
Qvid Non Solutions
www.qvidnon.ca
Qvidnon
 
Posts: 7
Joined: Thu Jul 26, 2007 5:37 pm
Location: Burlington, Ontario, Canada

Postby Qvidnon » Thu Aug 02, 2007 11:34 am

Spoke too soon. The UserDicEng_e.chm file from one of the English voices is also available, and it has what I need!
I assume the program works the same since it was the same vendor!.

Thanks again for all your help.

Regards, Stu
Qvid Non Solutions
www.qvidnon.ca
Qvidnon
 
Posts: 7
Joined: Thu Jul 26, 2007 5:37 pm
Location: Burlington, Ontario, Canada

Postby Qvidnon » Sat Aug 04, 2007 11:17 am

My sensei replied with the translations of the buttons. I have placed a labelled image of the Japanese dialog box on my server. It is at http://www.qvidnon.ca/TextAloud-Japanese-Editor.png

Best of luck to all.

Regards, Stu
Qvid Non Solutions
www.qvidnon.ca
Qvidnon
 
Posts: 7
Joined: Thu Jul 26, 2007 5:37 pm
Location: Burlington, Ontario, Canada

Pronunciation

Postby Josh Scholar » Wed Aug 22, 2007 9:28 pm

As far as I can tell (and I'm just a beginning student) Show and Miyu seem to pronounce the Kanji correctly, but when they're reading pure hiragana they don't glide the vowels together when reading だい, so it always sounds a bit wrong.

Once again, I've only taken a couple of classes so I don't know if it's always wrong to pronounce "だい" as "だ、い" instead of 大, but that's what they do.

Another problem with all of the Japanese voices I've used is that they turn each whitespace character into a long pause, so if you try to read an all hiragana conversation, you'll have a problem because people DO put spaces between words when they're not using Kanji.

It would be useful to have an option to drop the spaces (including those wide japanese spaces) before dumping the text to the reader.

In fact I altered a version of the speakit plugin for firefox so that it does delete the whitespace before sending the text to the speech engine.

(Firefox plugins are zip files of javascript source code, so you can change them with some effort). I also altered the RikaiChan popup Japanese English dictionary to speak each word it looks up.

By the way, if NextUp wants it's plugin to have that feature, the Javascript to delete the whitespace from a string is:

aString = aString.replace(/\s+/g,'');

Josh Scholar
Josh Scholar
 
Posts: 6
Joined: Wed Aug 22, 2007 8:49 pm

Oops, wrong

Postby Josh Scholar » Wed Aug 22, 2007 9:39 pm

A little more playing around, and I see that Miyu and Show only put a pause between 「だ」 and 「い」 when there's an 「お」 in front of the 「だ」, other hiragana ending in that vowel don't cause that either - maybe there's some rule I don't know about.

Though as I said before there is no pause if it is kanji (is 「を大事に」 right - that's what Rikaichan gave me as the word).
Josh Scholar
 
Posts: 6
Joined: Wed Aug 22, 2007 8:49 pm

One more oops.

Postby Josh Scholar » Wed Aug 22, 2007 9:44 pm

The Rikaichan dictionary didn't make the mistake of giving me 「を大事に」 instead of 「お大事に」.

I made that mistake when I copied the text without the "お" and tried to add it later.
Josh Scholar
 
Posts: 6
Joined: Wed Aug 22, 2007 8:49 pm

How to make japanese voices work better without breaking oth

Postby Josh Scholar » Tue Sep 11, 2007 4:41 am

As I said before there's a problem with reading Japanese web sites in that normal Japanese doesn't have spaces, so the various readers put in long pauses for whitespace, but sometimes people write phonetically and do leave spaces and then the reading is awful.

Here is a javascript for a filter that removes whitespace that comes after Japanese characters, but no where else (in Javascript, using regular expressions):

text.replace(/([\u3001-\u30fe\u3220-\u33fe\u4e00-\u9fa5\uf929-\ufa2d\uff61-\uff9f])\s+/g,"$1" )

The one possible problem is that Japanese and Chinese characters may use the same unicode since they display some of the same characters (I don't know). So I don't know if this fix would have any effect on Chinese readers. Probably it would not cause any problems. Chinese text doesn't have whitespace usually either, I think.
Josh Scholar
 
Posts: 6
Joined: Wed Aug 22, 2007 8:49 pm


Return to Voices Forum

Who is online

Users browsing this forum: Google [Bot] and 0 guests