An interesting take on PC speech recognition from Tom's Hardware Guide (11/29/2002) describes utilizing Analog Device's SoundMAX Cadenza and Andrea's Superbeam Array Microphone to produce a quality speech recognition product. They tested both Microsoft's Windows XP built-in speech engine and Dragon Naturally Speaking. Definitely worth reading.
VoiceXML and SALT
9/23/2003: Multimodal access to the World-Wide-Web, with speech recognition at the center, is heating up and is the focus of two technologies: VoiceXML and SALT. Web sites built using either of these technologies can be navigated by phone. Here are some links of interest (there is more info here):
The VoiceXML Forum on 2/25/2002 announced their support of the World Wide Web Consortium's (W3C) Multimodal Activity - a newly-formed working group which will look into standards and software to access web applications and services by voice, keyboard, key pad, mobile phones and devices.
As reported on 8/13/02 in the Cover Pages, the SALT Forum announced the contribution of the SALT Spec to the W3C.
The "Voice Browser" Activity -- Voice enabling the Web! describes current W3C work to allow people to access the Web using spoken commands, key pads, listening to synthesized and prerecorded speech, and music. Recent items:
The W3C has received (11/30/2001) the XHTML + Voice submission from IBM, Motorola, and Opera Software. To quote the W3C comment, "This submission describes a means to modularize VoiceXML 2.0 and outlines how these modules can be combined with XHTML for multimodal interaction, where users can interact with web pages using a combination of visual and aural interaction." (Also see this presentation).
The SALT forum companies (launched 10/15/2001) are designing a markup language called SALT (Speech Application Language Tags) to enable speech recognition and TTS in web pages. SALT is XML based and will extend HTML, XHTML, and XML. The 1.0 specification was released July 15, 2002 and is available here.
The CMU OpenSALT project makes available a SALT 1.0 compliant open-source browser based on Mozilla utilizing Sphinx recognition and Festival synthesis.
Intel has a SALT Call Control Technology Preview online. The preview walks a developer through building and executing experimental SALT-enabled web pages with telephony features.
noHands is a two part example usage of SALT and ASP.net.
The XML Cover Pages has a thorough status and history of the SALT forum and standard.
SpeechTek 2003, the 9th annual SpeechTek International Exposition and Educational Conference will be held September 29-October 2, 2003 in New York City.
VoiceWorld Europe 2004, May 5-6, 2004 London, claims to be Europe's voice businesss event.
Company Web Sites - Recognition Engines
PC Dictation - Packaged Software
Dragon Naturally Speaking 6 is the Dragon dictation engine now offered by Scansoft, the company that purchased the Dragon technology from Lernout and Hauspie.
IBM ViaVoice - web site for PC-based dictation application. Runs on Windows, MacIntosh, and Linux.
RealSpeak is a TTS engine offered by Scansoft, which acquired it from Lernout and Hauspie. See the acquisition press release here.
See Resellers below for a list of local distributors and value-added resellers of dictation packages.
PC Recognition Engines
Microsoft's SAPI provides a freely downloadable SDK with COM, ActiveX, VB, and VC++ interfaces for speech recognition and speech synthesis under Windows. This free download includes freely distributable text-to-speech (TTS) engines (in U.S. English and Simplified Chinese) and speech recognition (SR) engines (in U.S. English, Simplified Chinese, and Japanese).
NeuVoice has a noise-robust small-footprint speech recognition engine ported to Win32/CE, Symbian, and some DSPs. They also have a Voice Dialler release for the Nokia 9210/9290 smart phone.
The AT&T Advanced Speech Products Group offers WATSON, with SAPI-compatible speech recognition and speech synthesis, as well as speaker verification technologies.
SRI International's STAR Lab has a staff of more than 20 working on continuous speech recognition. Their SR system is called DECIPHER. Two products are available, EduSpeak for multimedia desktop applications and DynaSpeak for mobile devices. SRI has transferred technology to Nuance.
Apple's PlainTalk incorporates speech recognition and synthesis into the Mac OS.
The DDLinux page covers speech engines that have been ported to Linux and even lists a handful of open source engines.
A variety of PC applications employ dictation engines provided by IBM or Dragon (see above).
Telephony and Call Center Engines
Looking for a telephony speech engine? Here are the demo pages from several major vendors: SpeechWorks, Nuance, Locus, and Vocalis.
SpeechWorks (previously ALTech) develops speech recognition engine technology for over-the-telephone customer service and transactions.
Nuance offers a continuous speech recognizer for large vocabulary applications, including stock quotes.
Fonix provides FAAST embedded TTS and IVR speech recognition products for Windows CE. The Fonix TTS technology is excellent and underlies their iSpeak text reader and TimeTalk retail products.
Locus Speech Corporation in Canada has recently announced a speaker-independent, flexible vocabulary telephony speech engine in English and French.
Vocalis focuses on speech for computer telephony in Europe. Started in 1993 by a management buy-out of Logica's Speech and Natural Language Group, they are now a public company based in the UK.
Philips Speech Processing offers leading edge speech recognition, including natural dialogue systems for telephony, consumer products and the IT industry.
BaBel Technologies SA in Belgium sells speech recognition and speech synthesis engines for English, French, German and Dutch.
Embedded Speech Recognition/Device ICs
Natural Speech Communication (NSC)offers speech recognition technology on DSP-based PCI boards and rack-mounted boxes. NSC targets the telephony market. NSC packs an impressive number of simultaneous speech recognition channels onto a single board.
VoxTec offers large vocabulary, speaker independent phrase recognition on a PDA. The Phraselator embodies this technology and is in use by the US Military in Afghanistan.
Rubidium Ltd.'s Dialog Engine supports speech recognition, TTS, and dialog management. It is available as a System On a Chip or a software only solution.
ART, Advanced Recognition Technologies, Inc. develops handwriting and speech recognition solutions that are appropriate for embedding in consumer devices such as cell phones, PDAs and toys.
Sensory Inc., previously Sensory Circuits, sells speech recognition hardware chips ideal for communications devices and consumer electronics. Technologies available include speech recognition, speech synthesis, voice recording and playback, and music synthesis. They have good tips on using speech recognition in product design.
Fluent Speech Technologies is a spin-off of the well-regarded Center for Spoken Language Understanding at OGI. They have a wide range of products with strengths in embedded recognizers for device control and lip-synching speech synthesis using 3D animated agents.
Images sells a hobbyist kit with speech recognition chip for use as a control interface to other circuits.
Voice Signal has its sights set on low-cost speech recognition chips for use in consumer electronics. Their ELVIS (Embedded Large Vocabulary Interface System) will be available in Q4 2001. They offer a suite of capabilities including speech recognition, noise cancellation, speaker verification and word spotting.
voice INTER connect markets a speech synthesis system and a speech recognizer suitable for embedded speech recognition on DSP's or ASIC's. They also offer speech software development for speech data recording.
PC and Web-Based Software Development Tools
SpeechStudio Inc. provides SpeechStudio Suite, developer friendly tools and components supporting Microsoft's SAPI. Tools are for grammar construction and building and running unattended regression tests for apps using voice input and TTS. Supports Visual C++ and Visual Basic. SpeechStudio's TAPI Control connects SAPI speech recognition to telephone devices. Consulting available.
Speech Solutions is a maker of ActiveX controls and Custom applications using speech recognition.
United Research offers speech applications as well as components and utilities for developers. Latest products are Dictation 2002 (consumer) and Wave-to-Text and Text-to-Wave Server side ActiveX SDKs (developers).
Digital Dreams offers speech plug-ins for multimedia authoring tools such as Macromedia Director for the Macintosh and PC.
Voicenet has architected SpeechWare as a foundation for cross-engine voice UI development tools; they also appear to be working on a way for a thin client to accept speech recognition commands that are digitized and sent over the Internet to control a host computer.
Sun Microsystems' SpeechActs web site explores uses of speech; it is newly revised with the Java Speech API.
Chant offers SpeechKit 3, components that simplify Microsoft's SAPI and IBM's SMAPI into higher-level interfaces for use in C/C++, C++Builder, Delphi, Java, JavaScript, JBuilder, Visual Basic, Visual C++, and VBScript.
TMA Associates reports on recent (8/2002) SALT and VoiceXML developments, with an accent on SALT.
Vocalocity provides Voice Gateway 2.0, containing Vocalocity's VoiceXML 2.0 and SALT 1.0 compliant interpreter.
Voice Components LLC delivers Voice Essentials 1 & 2, components for building VoiceXML 1.0 and 2.0 Java applications. Application developers can use Voice Essentials in Java IDEs from Borland (JBuilder), WebGain, Sun Microsystem's (Forte), or IBM (Visual Age).
Cambridge VoiceTech builds Cambridge Voice Studio, a desktop environment for TTS-enabled VoiceXML applications. The Cambridge Voice Gateway is a complete VoiceXML implementation.
Adium has Adium 2, a Java-based software platform for development, deployment, and maintenance of VoiceXML applications.
Voxeo has an advanced infrastructure of hardware and software Voice Centers that bridge the Internet and Public Switched Telephone Network. VoiceXML development (provided by Audium) is leveraged to speed application deployment and provide portability.
Verascape produces infrastructure products that enable you to converse with Web sites over any kind of phone. These products support the development and hosting of VoiceXML applications linking today's converging Internet and telecommunications industries.
Kirusa develops and license multimodal wireless platforms that enable wireless carriers and service providers to offer applications with integrated voice and visual interfaces.
Holly Australia develops and hosts speech applications for businesses with a mobile workforce. Examples: voice activated dialler, Email centre - reads and replies to emails and voice mails, etc. Holly provides a VoiceXML development environment.
Telisma claims to offer "the one and only telcograde voiceXML interpretter."
The IBM WebSphere Voice Server provides a development environment for building Voice-enabled (e.g., wireline and wireless) applications utilizing VoiceXML.
VoiceGenie Technologies Inc.(caution: if you browse there, you can't come back.) has a VoiceXML telephony platform on Unix as well as PC-based VoiceXML products. Browse the internet from your phone.
Voice Web Solutions has products that lets you convert (or author) text documents into VoiceXML so they can listened to (e.g., on the phone) and navigated.
Ubicall Communications supports VoiceXML as one component of voice-activated telephony applications.
Company Web Sites - Resellers and Integrators
General Purpose Dictation
VoiceRecognition.com.au is a Dragon Systems reseller in Australia. They provide microphones, headsets, and other hardware as well as technical consulting.
1st Voice is a Dragon Systems certified reseller in Palo Alto, CA providing sales and training in DragonDictate products for Windows and Macintosh.
21st Century Eloquence, a Florida-based reseller, fields an attractive web site that covers industry news and an array of products including accessories like microphones, recorders and special add-on vocabularies.
ABN, Inc. serves as a system consultant and network integrator for the Northeast, specializing in law firm automation and business applications. They integrate and resell IBM, Kurzeil, Dragon and even Wildcard (previously Kolvox) products.
AllVoice Computing has deployed Dragon-based and proprietary speech solutions to over 1000 sites in the UK.
Alma Information Systems in Houston, TX is a Kolvox reseller for LawTalk and OfficeTalk software.
AM Technology offers WhyType®, a practical speech recognition family of products. The Hands-Free model presents a voice interface to common computer tasks. There are also models for medical and legal transcription.
The Computing Out Loud site compiles advice, macros, and essential tips for users of dictation software. Written by a power user with RSI.
InSync Software is a Dragon Systems Premier VAR in Canada; they have special microphones for applications like tape dictation and court stenography.
Lab Resources resells Dragon products in Wisconsin and Illinois.
M-CBS operates out of Oregon with dictation products for the legal and medical communities.
New World Creations offers a comparison of dictation and other speech recognition products organized by price point and application, as well as a downloading resource for product upgrades.
SayICan sells the major dictation packages and offers tips for users; other products include a mobile recorder for Dragon and machine translation software. Hosted by a power user of dictation software.
The Speech Recognition Company in the UK resells Dragon and IBM and offers training and consultancy to corporations.
SpeechStudio provides LexionBuilder a tool for incorporating custom lexicons into Microsoft's SAPI. Any application using SAPI for dictation, such as Word from Microsoft Office XP, utilizes the new lexicon.
Voice Recognition Systems (Lexington, KY) resells Dragon dictation software adapted for various applications and is launching an add-on product that permits use of a tape recorder instead of microphone.
Looking for entry-level dictation software? Try here.
Medical Focus
AccuDatasys.com provides speech enabled SQL databases and medical EMR systems.
Narratek carries voice control and dictation products used in medical transcription.
Computerized Business Systems based in Charlotte sells the complete line of IBM VoiceType and MedSpeak Radiology dictation products and has several downloadable demos.
VoiceAutomated is a software distributor and integrator with custom templates designed for medical applications including General Medicine, General Surgery, Urgent Care and Podiatry.
Voice Activated Systems Technologies (VAST) in California has designed medical modules for Psychiatry, Urology, Opthamology, Podiatry and Endocrinology. The modules are based on Dragon Naturally Speaking and offer dictation templates for progress notes, referral letters, prescriptions and reports.
Part of the KAMT site is dedicated to the transcription of medical records into digital form, and discusses dictation software.
WorkFlow Solutions is a value added technology integrator specializing in medical practice and hospital automation applications. They build medical vocabularies to integrate with IBM and Dragon speech recognition products as well as medical records databases.
KorTeam develops speech applications for the medical industry especially healthcare documentation, using IBM and Dragon engines.
Voice Automated develops speech recognition applications and language models for medical field specialties.
Portset Systems On-Line provides speech enabled products for disability groups; internet browser, reading machine, talking newspaper, etc.
A.D.A. Solutions by WorkLink located in Berkeley, California across from the Center for Independent Living offers speech recognition, computers, and assistive devices based on Dragon software.
Freedom of Speech is a full service provider of Assistive Computer Technology (ACT) to people with disabilities.
E-Triloquist has a PC-based communication aid for speech impaired. It serves as an electronic voice and is available free to ALS patients or anyone who needs it for medical reasons. Otherwise, a donation to an ALS founcation is suggested.
VoiceCode Programming is an Open Source initiative started by the National Research Council of Canada, to develop tools for programming by voice. Their admirable goal is to enable programmers with RSI to continue programming.
Synapse is a reseller offering solutions for the disabled including voice recognition software that permits hands-free control of computers.
Speak2Write is a federally funded study exploring the use of speech recognition to improve writing skills among students with disabilities.
MathTalk by Metroplex uses DragonDictate to enable voice entry of mathematical notation; future products will also enable voice-entry of VisualBasic and AutoCad.
Workforce Automation
Ficomp Systems is a reseller and integrator of continuous speech recognition, having developed a successful voice activated price reporting system for a major Chicago exchange floor.
VoiceAutomated can develop custom language model templates for Dragon and IBM VoiceType for a specific application.
Voxware acquired Verbex Voice Systems (2/99), one of the first providers of continuous speech recognition technology. The merged company has a strong industrial automation focus. Prior applications include work with the UPS, L'Eggs, Canada Post, and Medical Labs.
SyVox, Inc. provides speech solutions for locally mobile workers.
Xybernaut manufactures voice-activated wearable computers for vertical applications.
The Vocera wireless Communications system is hands-free and voice-activated. The system consists of a central server and a 2 oz. communications badge worn by each team member.
Vocollect provides "verbal computing solutions" for workforce automation such as order picking.
Company Web Sites - Consultants
Companies focused on speech technology consulting. Many product and service companies also offer consulting.
Speech Cybernetics provides consulting and custom applications of speech recognition technology both in the telephony and desktop markets. They offer WebVoice, a browser based on Internet Exporer, that offers a powerful alternative to SALT.
Larson Technical Services offers consulting in Voice-enabling applications (among other things). Dr. James Larson is the author of the book VoiceXML: Introduction to Developing Speech Applications and chairs the World Wide Web Consortium's Voice Browser Working Group.
Voice Applications Incorporated offers a full range of speech technology consulting for VoiceXML and conventional speech applications.
Canadian MCK Consulting provides training and consulting for IBM ViaVoice and the Olympus DS-2000 professional digital voice recorder. McK is the IBM Advanced Business Partner certified in speech. They consult worldwide.
Speech Technology and Applied Research (STAR) is a seasoned R&D team, experts in English Language speech variation, Speech modification and interpretation, and Speech-based operations.
ejTalk focuses on advanced spoken dialog development and research. They have a trademarked Conversation Manager technology that allows them to build synthetic conversations quickly.
Applications
Personal Assistants, Auto-Attendants, Voice-Dialers, and More
(also see the Speech Engine Providers above)
Sprex, Inc. provides products and services for over-the-{phone/internet} applications using speech recognition and synthesis. Products include ANSR (access and control any product or service by voice), AudioCat (concatenative synthesis system for public announcements, customizable).
VoiceMethods, LLC sells the Health Care Translator, which utilizes speech recognition and TTS to conveniently translate a physician's questions and statements into a foreign (non-english) language. The SR and TTS technology that underly this product are also available from VoiceMethods. The language dictionaries for this technology are from Ectaco, which is the parent company.
UbiCall Communications sells Voxplorer Receptionist, an automated auto-attendent utilizing speech recognition. They also market the software platform that Voxplorer is based on. Supports VoiceXML.
Wildfire Communications is trying to create a telephone-based secretary that responds to voice commands
VoiceFlash Networks (Used to be Registry Magic) sells a conversational auto-attendant (Virtual Operator), a voice activated dialing assistant (Virtual Dialer). They are also involved in Bluetooth and a point of sale network.
Athena is a new software package from Genie Telecom that turns your PC into a virtual personal assistant that can answer calls and read email over the telephone. It is now available and can be ordered from the site.
BrightArrow Technologies has developed a speech-enabled PhoneAssistant for home or office to handle two key tasks: Call Screening and Voice Dialing.
nameConnector is an auto-attendant service created by Parlance Corporation to quickly connect callers with the right person or department.
Conversa has developed Conversa Messenger, a voice-operated messaging and information assistant, and Conversa Web, a voice-operated browser.
Sound Advantage markets SANDi, a voice activated office assistant for all office communications and unified messaging.
Wizzard Software's Interactive Voice Assistant lets you voice control aspects of your PC and responds in a conversational manner using TTS.
Sort It, Inc. utilizes speech recognition to perform incoming mail sorting, mail tracking, and mail locating systems.
Parliant sells Tell A Phone, which connects to your existing phone line and your computer. Provides voice dialing: speak a name into any phone on the line and the phone number is dialed. Logs calls. Uses TTS to announce callers on computer speakers.
Parliant has licensed exclusive rights to Digital Acoustic'sTell A Phone technology.
Educational Software
ELVIRA is a VoiceXML development environment available for download.
Metroplex Voice Technologies announces MathTalk, a math calculation program using macros and the Dragon Dictate engine.
Sprex, Inc. has Teachionary which utilizes SR and TTS to teach basic vocabulary of foreign languages including Hebrew, Farsi, Japanese, Quebec French, Tamil, and many more.
Idioma Ltd. has developed a proprietary speech recognition engine specialized for "personal tutors" that teach speech pronunciation and language skills. They have tutors for speech therapy, accent reduction, even basic reading for children.
The MBRDICO Project is a talking dictionary for American, Arabic, British English, Dutch, French, and Spanish. It is a collaboration of three universities. Downloads are available.
Peripherals
Microphones, Headsets and Portable Recorders
Andrea Electronics makes a line of noice-canceling microphones that are ideal for use in speech recognition applications.
Shure Brothers offers high-performance and SoundBlaster-certified microphones.
The VoiceIt organizer is described here -- no speech recognition in this handheld recorder, but it is a really fun site.
SCCS, Inc. has a focused passion: they evaluate and sell microphones expressly for speech recognition applications.
Telex produces computer microphones, noise canceling
headsets, a special dictation microphone for speaking from a distance, and other audio input devices designed with speech recognition in mind.
Labtec offers a variety of microphones and headsets for PC-based speech recognition.
Norcom Electronics makes a line of analog minicassette recorders that are packaged with Dragon dictation software or IBM ViaVoice.
VXI Corp sells the PARROTT, a noice-cancelling headset for speech recognition.
Communitech is a stocking supplier of hands-free and wireless headsets from major manufacturers for use in call center applications. Subsidiary companies can help set up a turnkey call center including both project management and equipment.
Telset Technologies is an Australian distributor of Andreas brand speech recognition microphone and headsets.
Computer-Telephony Integration Vendors
One Voice Technologies' 4th Generation Voice technology allows making calls, sending & receiving E-mail, SMS, Paging and Instant Message from any phone and control devices.
Digital Sound Corporation has network-based voice-mail, Voice Forms, and the Passport graphical telephony builder tool
TELE DATA Consultants are custom software developers of telephone interfaces; they have a nice overview of IVR applications in general
ITI Software has C-code libraries for computer-telephony integration, with "over 30,000 ports in use"
Katalina Technologies markets VoiceGuide, a graphical design tool for IVR, fax-back and voice-mail applications
T-NETIX is a call-processing vendor offering SpeakEZ voice verification technology.
Artisoft has a sharp Web site; the main product is the TeleVantage telephony toolkit with hooks to speech recognition engines.
Verascape produces infrastructure products to enable phone users to converse with web sites.
Audiopoint utilizes it's Audiopoint 3.0 platform to develop and host large-scale commercial voice applications.
HeyAnita's software platform delivers voice access, navigation, and management of internet data and information.
VoiceGenie Technologies Inc. has a VoiceXML telephony platform on Unix as well as PC-based VoiceXML products. Browse the internet from your phone.
SandCherry Networks' SoftServer product includes Speech Recognition, a VoiceXML Browser, voice authentication along with Unified Messaging functions.
Icescape offers ICE³, a contact center product enabling agents to interact with customers, prospects, staff, and suppliers through telephone calls, e-mail, web chat, and voice messages.
Interactive Intelligence offers sophisticated unified messaging products that include speech recognition.
Novavox AG a developer of Unified Comunications solutions (SmartPhone and SmartPhone Pro) for small and medium size companies.
Parity Software is one of the leading computer-telephony tool vendors (offering a scripting environment, ActiveX tools and GUI-based wizards) and they have recently come out with an ActiveX that integrates with SAPI-compliant speech recognition engines.
Human Language Systems offers executive workshops, consulting and technical training for continuous speech recognition systems including automated call centers. Offices in Boulder, CO and France.
Brite Voice Systems provides a range of CTI products including voice dialing for cellular telephony.
Brooktrout manufactures computer telephony hardware and has grown larger recently with the acquisition of Rhetorex.
Aspect Telecommunications provides a full suite of hardware, software and services for call centers. They see speech recognition as an ideal way to extend IVR services to parts of the country where touchtone is not yet available.
HTK develops custom IVR applications and sells an advanced IVR platform product. They recently incorporated British Telecom's recognition and synthesis engines (STAP and Laureate) for telephony and non-telephony environments.
Brooktrout Fax & Voice API offers IVR and computer telephony solutions with integrated speech recognition software modules.
A&G Graphics Interface has SDK toolkits available for creating telephony-based speech recognition applications.
Periphonics Speech Processing Platform (OSCAR) from Nortel Networks brings together Nortel's speech recognition engine, TTS engine, a voice verification service, and an applications framework for developers who want to build their own speech-enabled telephony services.
Sunny Beach has a TAPI/SAPI-compliant voice telephony tool for Windows 95/98/NT for building automated phone attendants with back-end database access. Voice recognition and text-to-speech are supported.
Resources for Speech Scientists & Engine Developers
Appen provides speech technology products and services. Appen provides high-level expertise in speech science and technology.
Speech/Acoustic WWW information list is an impressive collection of links to speech research around the world. The Japanese links appear quite comprehensive.
Wavemakers ClearStream(TM) voice optimization software detects and extracts voice from virtually any audio signal (i.e., separates voice from background and transient noise).
nellymoser, Inc. creates communication technology products that provide genuine advantages in improving audio quality and efficiency. They have six types of software processing components: [speech] compression, modification, identification, synchronization, spatialization and conversion.
Speed of Sound does custom speech collection and transcriptions.
The Institute for Signal and Information Processing at Mississippi State has created (4/15/02) a public domain speech recognition system that can be easily modified for different kinds of speech research.
Hidden Markov Model Toolkit (HTK) is freely available from Cambridge University. Microsoft acquired HTK as part of Entropic and gave the rights back to Cambridge.
Index - Speech has no less than 50 links to research labs around the world -- each with a beautiful thumbnail graphic (maintained by UCSC)
Haskins Laboratories offers almost 100 links to speech-related pages on the Web! It also has a good listing of upcoming conferences around the world.
Brainhat has a platform for building natural language-based systems. Their NLP core has interfaced to speech engines and VoiceXML as well as other environments.
The Natural Language Software Registry has a great deal of information on various natural language projects, including COSIMA, a continuous speech recognizer for German.
IBM's speech research group has its own web site. Also, IBM's AlphaWorks will let you experience and provide feedback on their recently posted (2/2002) low-bit-rate voice coder/decoder (CODEC) RECOVC software that significantly compresses speech, but keeps the recognition rates intact. An older AlphaWorks speech-related technology (3/2000) is Speech for Java, an API to IBM's ViaVoice from the Java programming language.
Lawrence Livermore National Labs is licensing a novel microphone technology that uses cheap electromagentic radar sensors to sense the actual movements of the mouth and throat, thereby providing an additional stream of data so that speech engines can improve accuracy (circa 1996).
NCT Group develops ways to electronically manipulate sound and signal waves to reduce noise, improve signal-to-noise ratios and enhance sound quality.
Defense Group Inc. (DGI) is a developer of special methods for improving background noise resistance in speech recognition and is open to working with developers of speech recognition engines.
Text to Speech and Speech Synthesis
FreeTTS 1.1 (1/31/2002) is a free speech synthesizer (including source) written entirely in Java, by Sun Microsystem's Speech Group and others.
TextAloud is one of three "Talking Software" products from NextUp Technolgies.
More speech synthesis companies: Willow Pond has sample recordings of their engine.
French TTS provider Elan is doing research on both hardware and software products. Elan offers the Elan speech synthesis (TTS) engine, available for seven languages for platforms ranging from Microsoft's new Pocket PC to computer-telephony hosted on Windows and UNIX (Sun, Linux, SCO) to multimedia desktop. Online and downloadable demos.
Eloquent Technology has a Windows demo of their text-to-speech software available on request.
Lucent Technologies Bell Labs has a new online demo of their text-to-speech system in seven languages: English, Spanish, French, Italian, Canadian French, Mandarin and German.
a href="http://www.fonix.com">Fonixhas FAAST TTS technology for developers as well as two retail products: iSpeak and TimeTalk.
ScanSoft provides speech compression and text-to-speech synthesis in multiple languages. See their TTS3000 product.
BaBel Technologies SA in Belgium sells speech recognition and speech synthesis engines for English, French, German and Dutch.
SoftVoice offers a text-to-speech engine for the Windows environment that is available for license.
Black Ice sells a speech synthesis SDK with Active X control.
ECTL - Electronic Communal Temporal Lobe is a moderated mailing list for researchers with interests in computer speech interfaces. To subscribe, send your name, institute, department, daytime phone, and email address.
Prosody mailing list; send one-line message "subscribe prosody Your Name"
foNETiks is a moderated monthly newsletterfor phoneticians and speech scientists; send one-line message "join fonetiks your_first_name your_second_name"
PC-TELEPHONY mailing list explores the integration of computers and telephones; send one-line message "SUBSCRIBE PC-TELEPHONY"
TOUCHTON mailing list is created to promote discussion of specification. maintenance of 'touch-tone' or voice response technology, especially campus automated registration; send one-line message "SUBSCRIBE TOUCHTON Your Name" -- If you would like to receive the contents of the list as a periodic "digest" then send the command SET TOUCHTON DIGEST after receiving your subscription confirmation.
Speech Technology Magazine is published monthly. They also publish a free weekly electronic newletter: e-Blast!, focused on the speech recognition industry.
UC Alert Month in Review (UC for Unified Communications) is published monthly and often has speech recognition news under the heading "Voice Portals, Speech Recognition & TTS".
This site was originated by Russ Wilcox as a service to those interested in the business side (as opposed to the technical underpinnings) of speech recognition. Content, suggestions, and corrections are heartily welcomed! All rights reserved. Copyright (c) 1995-2003 by Sam Quiring.
Send
DISCLAIMER: All selections and descriptions are my own.