Thursday 12 December 2013

Sphynx4 experience: Transcribing a wav file

This post documents the steps I had to perform in order to setup Sphynx4 to transcribe a wav file in English (and later maybe in Spanish).

Basically, I don't know what am I doing. It shouldn't be so complicated to setup a voice recognition system in 2013. Let's see how this goes (I might get bored before being able to achieve my goals).

I chose to use Sphynx4 platform, which is available at: http://cmusphinx.sourceforge.net/


First, I downloaded the sphynxbase and sphynxtrain... files. From the few thinks I read, this should be used to create a voice file which will make Sphynx understand my voice.

Additionally, supposedly, the voxforge site ( http://www.repository.voxforge1.org/downloads/Main/Trunk/AcousticModels/Sphinx/ ) has some voice model already available for Sphynx4. I downloaded them and ran the "build.sh" script. This script doesn't do anything "as is", but it seems that you need to modify it by adding a call to the "downloads" function. This function seems to download a bunch of wav files.

EDIT 1:

While compiling some of the Sphynx4 stuff (basesphynx mainly) I found the following blog post:
http://nshmyrev.blogspot.mx/2010/09/voicemail-transcription-with.html which seems to achieve something similar to what I want using pocketsphynx... so now I am downloading pocketsphynx too and following those instructions. Let's hope that it works.


EDIT 2:

So, while trying to build sphyxbase, I stumbled into some problem (the make script was doing nothing) so I googled a bit more. I found the page http://www.cs.columbia.edu/~ecooper/CS4706/ps-mac.html  which tells to use ./autogen.sh  ... I am doing that now :).

After running ./autogen.sh I ran make. This time I got the following error:
autom4te: m4sugar/m4sugar.m4: no such file or directory

From this page http://stackoverflow.com/questions/6033989/aclocal-autoconf-reports-missing-m4sugar-m4-on-mac-os-x   it seems that we need to run:

sudo ln -s  /Developer/usr/share/autoconf /usr/share

So that make can find the required libraries.

*sigh*... after that, I get the following error:

libtool: Version mismatch error.  This is libtool 2.2.10, but the
libtool: definition of this LT_INIT comes from libtool 2.2.6b.
libtool: You should recreate aclocal.m4 with macros from libtool 2.2.10
libtool: and run autoconf again.

Let's see how to troubleshoot this...

Ok, I ran sphynxbase's ./autogen.sh again, and then make. It seems that now it is doing something. I assume that the autogen.sh should be run after doing the ln -s (shown above).


Right, so the "make" command seems kind of stuck. Since I was doing all this in OSX, I decided to try it on a Linux instance.  I repeated all the steps on Linux, and sphynxbase compiled without issues... So I will continue with this approach.


I then run pocketsphynx's ./autogen.sh  and it runs without problems, then I successfully do make.

EDIT 3:

Ok, so I could run pocketsphynx_continuous, however after getting the error:

ERROR: "pocketsphinx.c", line 625: No search module is selected, did you forget to specify a language model or grammar?


I searched a bit more and found the following page:
http://mariangemarcano.blogspot.mx/2012/09/speech-recognition-with-pocketsphinx.html

Where they say to run the pocketsphynx_continuous program in the following way:

pocketsphinx_continuous.exe -hmm C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\hmm\en_US\hub4wsj_sc_8k 
-dict C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\lm\en_US\cmu07a.dic 
-lm C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\lm\en_US\wsj0vp.5000.DMP


I followed those instructions, adding the -infile wav/sound1.wav   parameter (with a sound I previously recorded) and the program ran!. However the transcription was completely wrong haha. I guess now I need to find out how to improve its quality.

This will be continued in another post



1 comment:

Nicolas B said...

FINALLY! I've been messing with pocketsphinx a few months ago, and I still refer to it as the worst open source project I've dealt with.

I've offered the authors to get involved in the documentation process, but frankly, I doubt I'll ever hear back from them.