A few days ago a user made a comment on my ColdFusion 8/CAPTCHA blog post. He reminded us (and it is a good reminder) that CAPTCHA has some serious accessibility issues. This got me thinking about converting the CAPTCHA image into spoken letters. I've seen a few sites that do this and, frankly, whether it helped with CAPTCHA or not I thought it would be cool to see if ColdFusion could generate speech. I did some digging and the primary library that folks seem to use in the Java world is FreeTTS (TTS is short for text to speech). There are probably many other alternatives out there, but that's the one I went with.
I began by downloading the compiled code for FreeTTS and confirmed the example application ran from the command line. I then began to dig into the docs a bit. I then began to cry a little bit as I realized that "documentation" was probably too strong of a word for what I found at the project. The API is fully documented. Examples do exist. But I couldn't find anything close to what I'd consider to be good documentation. (Full disclosure time. I will admit to not always providing great docs with my own projects!) Specifically it wasn't difficult to get code that would say something. I had that running with 15 minutes. What took forever was getting the audio saved to a file. The code that follows works, but please note that the code could probably be done better.
Once you've downloaded the FreeTTS code, extract it to your file system. All you really need are the JAR files from the lib folder. I loaded all the JARs using the wonderful, super-awesome, JavaLoader from Mark Mandel. I really hope dynamic class loading comes to ColdFusion 9 because it's so darn useful. Here is how I used it to suck in all the JARs from the lib folder:
1 <cfset jardir = expandPath("./freetts-1.2.2-bin/freetts-1.2/lib")>
2 <cfset jars = []>
3 <cfdirectory name="jarList" directory="#jardir#">
4 <cfloop query="jarList">
5 <cfset arrayAppend(jars, jardir & "/" & name)>
6 </cfloop>
7
8 <cfset loader = createObject("component", "javaloader.JavaLoader").init(jars)>
Now for the fun part. FreeTTS works by creating a voice and having the voice speak. So at a basic level, this code alone will work to create the speech.
2 <cfset jars = []>
3 <cfdirectory name="jarList" directory="#jardir#">
4 <cfloop query="jarList">
5 <cfset arrayAppend(jars, jardir & "/" & name)>
6 </cfloop>
7
8 <cfset loader = createObject("component", "javaloader.JavaLoader").init(jars)>
1 <cfset voiceManager = loader.create("com.sun.speech.freetts.VoiceManager")>
2 <cfset vm = voiceManager.getInstance()>
3 <cfset voice = vm.getVoice("kevin16")>
4
5 <cfset voice.allocate()>
6 <cfset voice.speak("Hello World. This is a test of text to speech. It was a real pain in the ass. Really.")>
On my system this resulted in the words being spoken out of my laptop speakers. Did this surprise me. Heck yes. Did I scream like a little girl? I'm not telling. So as I said, this was relatively simple. Getting it to save to the file system though was a royal pain in the rear. Sure the code isn't that much different, it just took me forever to figure it out. The basic idea is to tell FreeTTS to use another audio player. FreeTTS has a 'player' called SingleFileAudionPlayer. As you can guess, this essentially turns a file into an audio player. In this version of the code, I set up the player and pass it to the voice. When run, it generates this wav file:
http://www.coldfusionjedi.com/images/test1.wav
I then switched the text to be something close to a CAPTHA. The result was a bit too quick to understand. Looking at the API, I saw that there was a WPM (words per minute) setting. You would think this would simply slow down the amount of words spoken per minute. Instead it simply slowed down the speech. So instead of hearing: "Something ..... something ....". It was more like "Sooooommmmmeeeething." I played with it a bit and got to be a bit slower, but, it's not perfect. Here is the final template I ended up with:
2 <cfset vm = voiceManager.getInstance()>
3 <cfset voice = vm.getVoice("kevin16")>
4
5 <cfset voice.allocate()>
6 <cfset voice.speak("Hello World. This is a test of text to speech. It was a real pain in the ass. Really.")>
1 <cfset jardir = expandPath("./freetts-1.2.2-bin/freetts-1.2/lib")>
2 <cfset jars = []>
3 <cfdirectory name="jarList" directory="#jardir#">
4 <cfloop query="jarList">
5 <cfset arrayAppend(jars, jardir & "/" & name)>
6 </cfloop>
7
8 <cfset loader = createObject("component", "javaloader.JavaLoader").init(jars)>
9
10
11 <cfset audioFileFormatType = createObject("java", "javax.sound.sampled.AudioFileFormat$Type").init("WAVE","wav")>
12 <cfset sfAudio = loader.create("com.sun.speech.freetts.audio.SingleFileAudioPlayer").init("/Library/WebServer/Documents/tts/test",audioFileFormatType)>
13
14
15 <cfset voiceManager = loader.create("com.sun.speech.freetts.VoiceManager")>
16 <cfset vm = voiceManager.getInstance()>
17 <cfset voice = vm.getVoice("kevin16")>
18
19 <cfset lex = loader.create("com.sun.speech.freetts.en.us.CMULexicon").init()>
20 <cfset voice.setLexicon(lex)>
21 <cfset voice.setRate(110)>
22 <cfset voice.setAudioPlayer(sfAudio)>
23 <cfset voice.allocate()>
24 <cfset voice.speak("A 9 ## 2 L K 8 0")>
25 <cfset sfAudio.close()>
26
27 <p>
28 done
29 </p>
FreeTTS comes with more voices, and if I spent more time on it I could make it a bit nicer probably, but for now I'll stop and let folks comments. In the next blog entry, I'll show this in use with CAPTCHA.
As a reminder, in order for the template to work, you will need both JavaLoader and FreeTTS copied to your file system.
2 <cfset jars = []>
3 <cfdirectory name="jarList" directory="#jardir#">
4 <cfloop query="jarList">
5 <cfset arrayAppend(jars, jardir & "/" & name)>
6 </cfloop>
7
8 <cfset loader = createObject("component", "javaloader.JavaLoader").init(jars)>
9
10
11 <cfset audioFileFormatType = createObject("java", "javax.sound.sampled.AudioFileFormat$Type").init("WAVE","wav")>
12 <cfset sfAudio = loader.create("com.sun.speech.freetts.audio.SingleFileAudioPlayer").init("/Library/WebServer/Documents/tts/test",audioFileFormatType)>
13
14
15 <cfset voiceManager = loader.create("com.sun.speech.freetts.VoiceManager")>
16 <cfset vm = voiceManager.getInstance()>
17 <cfset voice = vm.getVoice("kevin16")>
18
19 <cfset lex = loader.create("com.sun.speech.freetts.en.us.CMULexicon").init()>
20 <cfset voice.setLexicon(lex)>
21 <cfset voice.setRate(110)>
22 <cfset voice.setAudioPlayer(sfAudio)>
23 <cfset voice.allocate()>
24 <cfset voice.speak("A 9 ## 2 L K 8 0")>
25 <cfset sfAudio.close()>
26
27 <p>
28 done
29 </p>


Comment 1 written by Scott P on 29 May 2009, at 12:22 AM
I played around with taking twitter rss piping it through say to speak tweets as they roll in.
Comment 2 written by Andy Sandefer on 29 May 2009, at 12:35 AM
Comment 3 written by Erik-Jan on 29 May 2009, at 2:30 AM
I tried to copy your example, but on my system (Linux Ubuntu with Apache and CF8) it won't work. It loads the java classes just fine, but the page hangs on the voice.allocate() line. It's just 'waiting' there. COuld this be because I am using Linux? Any ideas?
Comment 4 written by Brian Swartzfager on 29 May 2009, at 6:49 AM
Comment 5 written by Raymond Camden on 29 May 2009, at 7:39 AM
Seriously - not sure. What I began with was the freetts.jar demo program. On the web site, they walk you through calling that at the command line. Can you try that and see if it works?
Comment 6 written by Raymond Camden on 29 May 2009, at 7:41 AM
Comment 7 written by Erik-Jan on 29 May 2009, at 9:52 AM
I keep getting errors, even when I run the examples on the FreeTTS website. I already tried the mos 'simplified' version of the code, but with no luck... Can't run the FreeTTS.jar demo either. I will have to luck into it this weekend.
Erik-Jan
Comment 8 written by Garrett Johnson on 29 May 2009, at 10:33 AM
If you ever get bored this is something kinda similar that can be fun to play with. http://www.jfugue.org/
@Erik, maybe a codec thing?
Comment 9 written by Alan McCollough on 29 May 2009, at 11:55 AM
With their TTS, it seems to work to have mutliple spaces in a phrase. Also, a period seems to make it pause a bit. Not sure if the FreeTTS uses similar logic, but you might get pauses with "A. B. C." or "A. B. C" insetad of "A B C".
Comment 10 written by Alan McCollough on 29 May 2009, at 11:57 AM
Comment 11 written by Ben on 29 May 2009, at 3:21 PM
Comment 12 written by Ernst van der Linden on 29 May 2009, at 4:06 PM
Comment 13 written by Raymond Camden on 30 May 2009, at 8:31 PM
Comment 14 written by Sam on 16 November 2009, at 9:50 AM
To avoid leaks you need to use server scope variables. More info is available on this link;
http://www.compoundtheory.com/?ID=212&action=d...
I'm still having momory leak issues currently, but working to resolve them.
Comment 15 written by Raymond Camden on 16 November 2009, at 9:54 AM
Comment 16 written by Leigh on 16 November 2009, at 10:34 AM
Try calling deallocate() at the very end. It is not in the API example (of course). But I noticed it in one of the demo examples. It closes files and releases resources, which seems to help the memory issue:
...
<cfset voice.speak("A 9 ## 2 L K 8 0")>
<cfset voice.deallocate()>
<cfset sfAudio.close()>
-Leigh
Comment 17 written by Sam on 17 November 2009, at 9:33 AM
Unfortunately <cfset voice.deallocate()> does not work, it throws an error.
Sam.
Comment 18 written by Leigh on 17 November 2009, at 9:44 AM
Comment 19 written by Sam on 22 December 2009, at 4:30 AM
Comment 20 written by Derek on 2 August 2010, at 7:55 AM
Thanks in advance.
Comment 21 written by Raymond Camden on 2 August 2010, at 8:06 AM
Comment 22 written by Derek on 2 August 2010, at 9:03 AM
This tasks is a side note to something I'm working on so I'm not sure I'll get through it, but if I do will post something here for what its worth.
Thanks again for sharing all your work/thoughts/experience etc.
Comment 23 written by Raymond Camden on 2 August 2010, at 9:07 AM
Comment 24 written by Derek on 2 August 2010, at 9:22 AM
:-)
Thanks again.
Comment 25 written by Derek on 2 August 2010, at 9:41 AM
Comment 26 written by Raymond Camden on 2 August 2010, at 9:43 AM
Comment 27 written by Derek on 2 August 2010, at 10:45 AM
But if I see the "done" text, should I also hear the text or is there the WAV file somewhere on the server?
Thanks for your patience and your time.
Comment 28 written by Raymond Camden on 2 August 2010, at 10:48 AM
Comment 29 written by Derek on 2 August 2010, at 10:54 AM
:-(
Changed:
.init("/Library/WebServer/Documents/tts/test",audioFileFormatType)
to:
.init("/aud",audioFileFormatType)
but do not see a file created. Do not see anything recorded in the CF Administrator either.
Comment 30 written by Raymond Camden on 2 August 2010, at 10:57 AM
Comment 31 written by Derek on 2 August 2010, at 11:05 AM
I'll keep hacking away. Thanks again for your time - I'll post something should I get lucky!
:-)
Comment 32 written by Leigh on 2 August 2010, at 11:11 AM
It may be sending the file somewhere you are not expecting. For example, I used this on windows
.init("/dev",audioFileFormatType)>
And the file ended up as c:\dev.wav
-Leigh
Comment 33 written by Derek on 2 August 2010, at 11:33 AM
That turns out to be a relative path (to the OS not the web site or CF). So to get the file where I wanted it, I was able to change "\aud" to "D:\website\aud\captcha" with success.
To round out my work, I'll substitute my application and session scope variables accordingly. The filename "captcha" will be replaced with the user's jsession ID and deleted when the form is posted - that way I do not have any collisions.
Thank for the help guys, this is looking good.
:-)
Comment 34 written by Leigh on 2 August 2010, at 12:17 PM
Yes, I think that is what Raymond was suggesting before ie substitute an absolute path (for whatever o/s you are using). But I am glad you got things sorted out.
-Leigh
[Add Comment] [Subscribe to Comments]