This post documents the steps I had to perform in order to setup Sphynx4 to transcribe a wav file in English (and later maybe in Spanish).
Basically, I don't know what am I doing. It shouldn't be so complicated to setup a voice recognition system in 2013. Let's see how this goes (I might get bored before being able to achieve my goals).
I chose to use Sphynx4 platform, which is available at: http://cmusphinx.sourceforge.net/
First, I downloaded the sphynxbase and sphynxtrain... files. From the few thinks I read, this should be used to create a voice file which will make Sphynx understand my voice.
Additionally, supposedly, the voxforge site ( http://www.repository.voxforge1.org/downloads/Main/Trunk/AcousticModels/Sphinx/ ) has some voice model already available for Sphynx4. I downloaded them and ran the "build.sh" script. This script doesn't do anything "as is", but it seems that you need to modify it by adding a call to the "downloads" function. This function seems to download a bunch of wav files.
EDIT 1:
While compiling some of the Sphynx4 stuff (basesphynx mainly) I found the following blog post:
http://nshmyrev.blogspot.mx/2010/09/voicemail-transcription-with.html which seems to achieve something similar to what I want using pocketsphynx... so now I am downloading pocketsphynx too and following those instructions. Let's hope that it works.
EDIT 2:
So, while trying to build sphyxbase, I stumbled into some problem (the make script was doing nothing) so I googled a bit more. I found the page http://www.cs.columbia.edu/~ecooper/CS4706/ps-mac.html which tells to use ./autogen.sh ... I am doing that now :).
After running ./autogen.sh I ran make. This time I got the following error:
autom4te: m4sugar/m4sugar.m4: no such file or directory
From this page http://stackoverflow.com/questions/6033989/aclocal-autoconf-reports-missing-m4sugar-m4-on-mac-os-x it seems that we need to run:
sudo ln -s /Developer/usr/share/autoconf /usr/share
So that make can find the required libraries.
*sigh*... after that, I get the following error:
libtool: Version mismatch error. This is libtool 2.2.10, but the
libtool: definition of this LT_INIT comes from libtool 2.2.6b.
libtool: You should recreate aclocal.m4 with macros from libtool 2.2.10
libtool: and run autoconf again.
Let's see how to troubleshoot this...
Ok, I ran sphynxbase's ./autogen.sh again, and then make. It seems that now it is doing something. I assume that the autogen.sh should be run after doing the ln -s (shown above).
Right, so the "make" command seems kind of stuck. Since I was doing all this in OSX, I decided to try it on a Linux instance. I repeated all the steps on Linux, and sphynxbase compiled without issues... So I will continue with this approach.
I then run pocketsphynx's ./autogen.sh and it runs without problems, then I successfully do make.
EDIT 3:
Ok, so I could run pocketsphynx_continuous, however after getting the error:
ERROR: "pocketsphinx.c", line 625: No search module is selected, did you forget to specify a language model or grammar?
I searched a bit more and found the following page:
http://mariangemarcano.blogspot.mx/2012/09/speech-recognition-with-pocketsphinx.html
Where they say to run the pocketsphynx_continuous program in the following way:
I followed those instructions, adding the -infile wav/sound1.wav parameter (with a sound I previously recorded) and the program ran!. However the transcription was completely wrong haha. I guess now I need to find out how to improve its quality.
This will be continued in another post
Basically, I don't know what am I doing. It shouldn't be so complicated to setup a voice recognition system in 2013. Let's see how this goes (I might get bored before being able to achieve my goals).
I chose to use Sphynx4 platform, which is available at: http://cmusphinx.sourceforge.net/
First, I downloaded the sphynxbase and sphynxtrain... files. From the few thinks I read, this should be used to create a voice file which will make Sphynx understand my voice.
Additionally, supposedly, the voxforge site ( http://www.repository.voxforge1.org/downloads/Main/Trunk/AcousticModels/Sphinx/ ) has some voice model already available for Sphynx4. I downloaded them and ran the "build.sh" script. This script doesn't do anything "as is", but it seems that you need to modify it by adding a call to the "downloads" function. This function seems to download a bunch of wav files.
EDIT 1:
While compiling some of the Sphynx4 stuff (basesphynx mainly) I found the following blog post:
http://nshmyrev.blogspot.mx/2010/09/voicemail-transcription-with.html which seems to achieve something similar to what I want using pocketsphynx... so now I am downloading pocketsphynx too and following those instructions. Let's hope that it works.
EDIT 2:
So, while trying to build sphyxbase, I stumbled into some problem (the make script was doing nothing) so I googled a bit more. I found the page http://www.cs.columbia.edu/~ecooper/CS4706/ps-mac.html which tells to use ./autogen.sh ... I am doing that now :).
After running ./autogen.sh I ran make. This time I got the following error:
autom4te: m4sugar/m4sugar.m4: no such file or directory
From this page http://stackoverflow.com/questions/6033989/aclocal-autoconf-reports-missing-m4sugar-m4-on-mac-os-x it seems that we need to run:
sudo ln -s /Developer/usr/share/autoconf /usr/share
So that make can find the required libraries.
*sigh*... after that, I get the following error:
libtool: Version mismatch error. This is libtool 2.2.10, but the
libtool: definition of this LT_INIT comes from libtool 2.2.6b.
libtool: You should recreate aclocal.m4 with macros from libtool 2.2.10
libtool: and run autoconf again.
Let's see how to troubleshoot this...
Ok, I ran sphynxbase's ./autogen.sh again, and then make. It seems that now it is doing something. I assume that the autogen.sh should be run after doing the ln -s (shown above).
Right, so the "make" command seems kind of stuck. Since I was doing all this in OSX, I decided to try it on a Linux instance. I repeated all the steps on Linux, and sphynxbase compiled without issues... So I will continue with this approach.
I then run pocketsphynx's ./autogen.sh and it runs without problems, then I successfully do make.
EDIT 3:
Ok, so I could run pocketsphynx_continuous, however after getting the error:
ERROR: "pocketsphinx.c", line 625: No search module is selected, did you forget to specify a language model or grammar?
I searched a bit more and found the following page:
http://mariangemarcano.blogspot.mx/2012/09/speech-recognition-with-pocketsphinx.html
Where they say to run the pocketsphynx_continuous program in the following way:
pocketsphinx_continuous.exe -hmm C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\hmm\en_US\hub4wsj_sc_8k -dict C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\lm\en_US\cmu07a.dic -lm C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\lm\en_US\wsj0vp.5000.DMP
I followed those instructions, adding the -infile wav/sound1.wav parameter (with a sound I previously recorded) and the program ran!. However the transcription was completely wrong haha. I guess now I need to find out how to improve its quality.
This will be continued in another post
1 comment:
FINALLY! I've been messing with pocketsphinx a few months ago, and I still refer to it as the worst open source project I've dealt with.
I've offered the authors to get involved in the documentation process, but frankly, I doubt I'll ever hear back from them.
Post a Comment