EvalDictator
|
|
EvalDictator Links
SourceForge Hosted by SourceForge.net
Developed with IntelliJ
Profiled with JProfiler |
General InformationInstallationNot Using an IDE
Using an IDE (forthcoming)EvalDictator in Detail
|
EvalDictator team consists of many senior people from CMU, MERL, NIH, Sun and ex-Dragon.
Our overall goal is to encourage a new generation of speech recognition research and entrepreneurs by releasing state of the art open source speech technology, and making massive amounts of speech data freely available. Currently, speech recognition technology is only available from a handful of very large companies. Licensing and customizing this technology is very expensive and involves onerous and lengthy negotiations. We believe that is a barrier to innovation in this market.
In order to achieve these ends, we want to popularize speech recognition technology by building open source applications. These tools will be written in Java and will run on every major platform including Windows, OSX and Linux. Our target is computer users who wish to enter text in their native language, and prefer speech to the keyboard. These users may be professionals who require hands free text entry. For example, many Doctors prefer to enter reports via dictation. Another target is users who find it difficult to type text in their native language. For example, in many countries users must memorize many complicated key combination to enter basic text. Large countries ( e.g. China, Japan...) and countless smaller ethnic groups have this problem.
We also want to collect a huge corpus of speech and language data. This data will be used for speech research, and to further improve the EvalDictator technology. EvalDictator differs from standard Desktop dictation products such as NaturallySpeaking and ViaVoice because speaker adaptation and custom pronunciation dictionaries are performed as an Internet service. The speech and language data used to build custom models for individual users will be stored and used to improve the general models for everyone. EvalDictator applications will automatically update themselves as new models become available. The result is that the accuracy of EvalDictator applications will improve as more people use them.
We strongly believe that technology should be application driven. In other words, we want to solve real problems using speech recognition applications, and only extend the core technology as required by those applications. The domain of speech recognition is far too big for us to address all at once, so we want to focus on the tasks that will make the technology popular and successful.
Finally, we are seeking support and funding. Collecting and storing massive amounts of audio data will require a serious infrastructure. We will need machines, storage, backup and a fat connection to the Internet. We will also need people to maintain the system. Also, if we afford to work on our software full time we can make much more rapid progress. Therefore we are seeking customers who require a custom solution that is not available from the existing vendors.
Therefore, we are currently building general tools, but looking for vertical domains. We want to release focused applications that solve a problem for some set of users. We hope to build our "Eval Empire" one happy user group at a time.
EvalDictator source code is free and open source with an "Apache" style license. The code may be used in proprietary products, even if the products are not open source. The source code will be publicly available on SourceForge.net. EvalDictator will contain no patented IP.
The acoustic and language models are also free, however, they need not be built from publicly available speech data, or be built by publicly available tools. In this way companies and researchers can donate "eval" models to the project while keeping the data and tools proprietary.
Currently, there is a complete working prototype code named "HumbleBeginnings".
HumbleBeginnings is written in 100% Java, and consists of two parts: the Dictator and the Collector.
The Dictator is a desktop application that performs Dictation via automatic speech recognition. It is based on the Sphinx4 recognizer from Carnegie Mellon. Sphinx4 is a state-of-the-art large vocabulary recognizer. The accuracy is as good as anything, and the size and speed are sufficient for a modern PC. It currently supports English, Arabic and Mandarin using the Wall Street Journal and GALE models.
The Dictator UI is very simple. It is merely a text pane, where the user can insert text by speaking. The user simply selects the insert position with the mouse, clicks and holds while talking, and the text is inserted. The complete text is automatically placed on the clipboard. The user then goes to his/her favorite application (e.g. Word, Excel, OpenOffice), pastes the text and completes editing.
The Collector is a server based web application that creates custom dictionaries, language models and acoustic models. Users upload a collection of their own documents to the server. They are then asked to read a selection of their own text out load. The Collector then automatically creates a version of Dictator customized to their voice, language and vocabulary. Users may offer to donate this data to the project in order to improve future versions, otherwise it is deleted.
HumbleBeginnings is very early in its development. Dictator has very poor performance-- it is big, slow, inaccurate, and also has no command and control, no editing, no adaptation, and no vocabulary editor. The Collector is in development and not yet ready to process data. The team is discussing what techniques and features need to be implemented.
(NOTE: The links in this section point to local files created by javadoc. If they are broken, please follow the instructions on Creating Javadocs to create these links.)
EvalDictator has been built and tested on the Solaris TM Operating Environment, Mac OS X, Linux and Win32 operating systems. Running, building, and testing EvalDictator requires additional software. Before you start, you will need the following software available on your machine.
ant. You
will only need ant if you wish to build EvalDictator from the source
distribution and prefer it to scons.scons. You will need it if you wish to build
EvalDictator from the source distribution and prefer it to
ant. scons requires you to also install
Python, available at python.org, if you do not already have
it. A tarball is built nightly, and it is available at cmusphinx.org. Alternatively, you can download the JAR file.
If you want to be able to get the latest updates from the svn repository, you should retrieve the code from the repository on SourceForge. The EvalDictator code is located at sourceforge.net as open source. Please follow the instructions below to retrieve it.
% svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/tools
scons, you will also need to download the scons folder:
% svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/scons
JAVA_HOME and PATH
environment variables as described in the troubleshooting section.
ant, also set
ANT_HOME according to their site's
instructions.
To build EvalDictator, at the command prompt change to the dictator build directory (usually, a simple "cd tools/dictator/build" will do). Then type the following:
ant
This executes the Apache Ant
command to build the EvalDictator classes under the bld
directory, the jar files under the lib directory,
and the demo jar files under the bin directory.
To delete all the output from the build to give you a fresh start:
ant clean
scons, at the command prompt change to the
scons build directory (usually, a simple "cd
scons" will do). Then type the following:
In Linux, Mac OS X, etc:
scons.py
In Windows:
scons.bat
This executes the Scons
command to build the EvalDictator classes under the ../../scons_build/classes
directory, the jar files under the ../../scons_build/jars directory,
and the demo jar files under the ../../scons_ship directory.
To delete all the output from the build to give you a fresh start:
scons -c
To build the javadocs, go to the dictator build directory (tools/dictator/build), and type:
ant javadoc
This will build javadocs from public classes, displaying only the public methods and fields. In general, this is all the information you will need. If you need more details, such as private or protected classes, you can generate the corresponding javadoc by doing, for example:
ant -Daccess=private javadoc
setenv JAVA_HOME /lab/speech/java/jdk1.5.0_06
export JAVA_HOME='/lab/speech/java/jdk1.5.0_06'
export JAVA_HOME='c:/Progra~1/Java/jdk1.5.0_06'
PATH
variable. If you are not familiar with this environmental
variable, PATH contains the set of folders that the
operating system uses to find the executables. You have to add the
path containing javac to your PATH variable. In
Windows, you can do this by following the instructions to change the variable PATH
d:\work\sphinx4>ant clean /c: Can't open /c: No such file or directory
HOME, your userid is "johndoe", and you want
to set this variable to "/home/johndoe"
HOME
c:\cygwin\home\johndoe
PATH
PATH will be in the
"System variables" box.
;c:\python24\bin to the end of the string