Thursday, 12 December 2013

Sphynx4 experience Round 2: Improve accuracy

After running the pocketsphynx transcriber a few times, I realized the text to speech accuracy is utter crap.  So, now I have to find a way to make it understand me better.

As a first example, I run the following command:

pocketsphinx-0.8/src/programs/pocketsphinx_continuous -infile sound3.wav  -hmm hub4wsj_sc_8k -dict pocketsphinx-0.8/model/lm/en_US/cmu07a.dic -lm pocketsphinx-0.8/model/lm/en_US/wsj0vp.5000.DMP  2>/dev/null

The sound3.wav file is a 8khz sample where I speak the words "take note, this information is confidential". Now, I do have a non-native English accent (being Mexican myself) but I cannot understand how the Sphynx program is understanding the following:

000000000: her

000000001: her to him crude high in the hot

Running it several times, I get the same results so... I guess Sphynx has no idea of what I am saying. Now, in theory the program sphynxtrain should be used to train the sphynx program to recognize my voice... however, according to the documentation, what I need is to adapt  the available voice samples to my voice:

http://cmusphinx.sourceforge.net/wiki/tutorialadapt

EDIT 1:
Ok, so after reading a bit more I got to the following page:
http://www.jaivox.com/pocketsphinx.html

Which compares pocketsphynx with sphynx4. It seems that sphynx4's accuracy is much better. It occurred to me to use the sphyx4-batch program provided :

http://www.jaivox.com/sites/default/files/downloads/pocket.zip

And see how does it perform with my own sound. To to this I unzipped the archive files into the same folder as the sphynx4 folder, which I previously downloaded from:

http://downloads.sourceforge.net/project/cmusphinx/sphinx4/1.0%20beta6/sphinx4-1.0beta6-src.zip?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fcmusphinx%2Ffiles%2Fsphinx4%2F1.0%2520beta6%2F&ts=1386876518&use_mirror=colocrossing

And then I ran it with the command:

java -cp "lib/*:." sphinx4batch .

Then I modified the sphynx4batch.java file and compiled it with:

javac -cp "lib/*:." sphinx4batch.java


After modifying the sphynx4batch.java file to process my ound sound1.wav, I got following (bad) results:

Origina: Take note, this information is confidential
Recognized: and are


I wasn't very optimistic, given that I learnt that for their example, sphyx4batch had a specific dictionary

I tried to manually add words "take", "note", "information", "confidential" to the dictionary, but that didn't have any effect.

:(

Sphynx4 experience: Transcribing a wav file

This post documents the steps I had to perform in order to setup Sphynx4 to transcribe a wav file in English (and later maybe in Spanish).

Basically, I don't know what am I doing. It shouldn't be so complicated to setup a voice recognition system in 2013. Let's see how this goes (I might get bored before being able to achieve my goals).

I chose to use Sphynx4 platform, which is available at: http://cmusphinx.sourceforge.net/


First, I downloaded the sphynxbase and sphynxtrain... files. From the few thinks I read, this should be used to create a voice file which will make Sphynx understand my voice.

Additionally, supposedly, the voxforge site ( http://www.repository.voxforge1.org/downloads/Main/Trunk/AcousticModels/Sphinx/ ) has some voice model already available for Sphynx4. I downloaded them and ran the "build.sh" script. This script doesn't do anything "as is", but it seems that you need to modify it by adding a call to the "downloads" function. This function seems to download a bunch of wav files.

EDIT 1:

While compiling some of the Sphynx4 stuff (basesphynx mainly) I found the following blog post:
http://nshmyrev.blogspot.mx/2010/09/voicemail-transcription-with.html which seems to achieve something similar to what I want using pocketsphynx... so now I am downloading pocketsphynx too and following those instructions. Let's hope that it works.


EDIT 2:

So, while trying to build sphyxbase, I stumbled into some problem (the make script was doing nothing) so I googled a bit more. I found the page http://www.cs.columbia.edu/~ecooper/CS4706/ps-mac.html  which tells to use ./autogen.sh  ... I am doing that now :).

After running ./autogen.sh I ran make. This time I got the following error:
autom4te: m4sugar/m4sugar.m4: no such file or directory

From this page http://stackoverflow.com/questions/6033989/aclocal-autoconf-reports-missing-m4sugar-m4-on-mac-os-x   it seems that we need to run:

sudo ln -s  /Developer/usr/share/autoconf /usr/share

So that make can find the required libraries.

*sigh*... after that, I get the following error:

libtool: Version mismatch error.  This is libtool 2.2.10, but the
libtool: definition of this LT_INIT comes from libtool 2.2.6b.
libtool: You should recreate aclocal.m4 with macros from libtool 2.2.10
libtool: and run autoconf again.

Let's see how to troubleshoot this...

Ok, I ran sphynxbase's ./autogen.sh again, and then make. It seems that now it is doing something. I assume that the autogen.sh should be run after doing the ln -s (shown above).


Right, so the "make" command seems kind of stuck. Since I was doing all this in OSX, I decided to try it on a Linux instance.  I repeated all the steps on Linux, and sphynxbase compiled without issues... So I will continue with this approach.


I then run pocketsphynx's ./autogen.sh  and it runs without problems, then I successfully do make.

EDIT 3:

Ok, so I could run pocketsphynx_continuous, however after getting the error:

ERROR: "pocketsphinx.c", line 625: No search module is selected, did you forget to specify a language model or grammar?


I searched a bit more and found the following page:
http://mariangemarcano.blogspot.mx/2012/09/speech-recognition-with-pocketsphinx.html

Where they say to run the pocketsphynx_continuous program in the following way:

pocketsphinx_continuous.exe -hmm C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\hmm\en_US\hub4wsj_sc_8k 
-dict C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\lm\en_US\cmu07a.dic 
-lm C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\lm\en_US\wsj0vp.5000.DMP


I followed those instructions, adding the -infile wav/sound1.wav   parameter (with a sound I previously recorded) and the program ran!. However the transcription was completely wrong haha. I guess now I need to find out how to improve its quality.

This will be continued in another post