Thursday 12 December 2013

Sphynx4 experience Round 2: Improve accuracy

After running the pocketsphynx transcriber a few times, I realized the text to speech accuracy is utter crap.  So, now I have to find a way to make it understand me better.

As a first example, I run the following command:

pocketsphinx-0.8/src/programs/pocketsphinx_continuous -infile sound3.wav  -hmm hub4wsj_sc_8k -dict pocketsphinx-0.8/model/lm/en_US/cmu07a.dic -lm pocketsphinx-0.8/model/lm/en_US/wsj0vp.5000.DMP  2>/dev/null

The sound3.wav file is a 8khz sample where I speak the words "take note, this information is confidential". Now, I do have a non-native English accent (being Mexican myself) but I cannot understand how the Sphynx program is understanding the following:

000000000: her

000000001: her to him crude high in the hot

Running it several times, I get the same results so... I guess Sphynx has no idea of what I am saying. Now, in theory the program sphynxtrain should be used to train the sphynx program to recognize my voice... however, according to the documentation, what I need is to adapt  the available voice samples to my voice:

http://cmusphinx.sourceforge.net/wiki/tutorialadapt

EDIT 1:
Ok, so after reading a bit more I got to the following page:
http://www.jaivox.com/pocketsphinx.html

Which compares pocketsphynx with sphynx4. It seems that sphynx4's accuracy is much better. It occurred to me to use the sphyx4-batch program provided :

http://www.jaivox.com/sites/default/files/downloads/pocket.zip

And see how does it perform with my own sound. To to this I unzipped the archive files into the same folder as the sphynx4 folder, which I previously downloaded from:

http://downloads.sourceforge.net/project/cmusphinx/sphinx4/1.0%20beta6/sphinx4-1.0beta6-src.zip?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fcmusphinx%2Ffiles%2Fsphinx4%2F1.0%2520beta6%2F&ts=1386876518&use_mirror=colocrossing

And then I ran it with the command:

java -cp "lib/*:." sphinx4batch .

Then I modified the sphynx4batch.java file and compiled it with:

javac -cp "lib/*:." sphinx4batch.java


After modifying the sphynx4batch.java file to process my ound sound1.wav, I got following (bad) results:

Origina: Take note, this information is confidential
Recognized: and are


I wasn't very optimistic, given that I learnt that for their example, sphyx4batch had a specific dictionary

I tried to manually add words "take", "note", "information", "confidential" to the dictionary, but that didn't have any effect.

:(

No comments: