Text-to-Speech / Speech Synthesis

Speech synthesis is the artificial production of human speech and is used extensively to aid people with a wide variety of disabilities. Text-to-speech is the feature found in a range software or hardware that converts text into synthesised speech.

Applications of Speech Synthesis

Speech Difficulties (AAC)

Many people overcome the barriers associated with a severe speech difficulty using text-to-speech software, usually accessed through a dedicated Voice Output Communication Aid (VOCA). This site does not currently cover augmentative and alternative communication (AAC) but if you'd like to know more I can recommend the following websites:

No Functional Vision / Blindness

People with no functional vision can often benefit from the enabling features of a computer with a screenreader installed. The screenreader uses text-to-speech technology to provide a totally aural (and sometimes tactile) method of accessing the standard Microsoft Windows interface. In addition to standard applications such as email, the web and word processing, speech synthesis is also used to access paper documents through OCR software.

Partial Vision

Supportive speech synthesis is often combined with specialist magnification programs to create efficient and comfortable access to a computer through a balance of the auditory and visual modalities, i.e. using both sight and hearing. Speech can be accessed 'on-demand' or as a constant supplement to the visual information. As with the full screenreader users, many partially sighted people use OCR software to have paper journals and other documents read aloud via a flatbed scanner.

Reading Difficulties such as Dyslexia

A great many people who are unable to access text in its written form are quite confident in understanding it when it is read aloud. Speech synthesis helps people with a range of reading difficulties including pre-literate young children, people for whom English is not their first languge and people with dyslexia.

Text-to-speech is found in 'talking' word-processors such as Clicker 4, Writing With Symbols and TextEase. Some software will go beyond the working document - for example TextHelp Read & Write also reads aloud PDF documents and web pages.

The Voices

Most people are familiar with the robotic computer voices of the 1980s and 90s. More recently the quality of voices has been improving so that they sound more like a real person's voice and are easier to understand.

The companies who the make software that incorporates text-to-speech usually use synthesised voices made by other specialist companies.

For example Clicker 5 uses Elan Sayso Speech and TextHelp Read and Write uses the Nuance RealSpeak Solo voices. AT&T make the popular Natural Voices that are used in a range of software with text-to-speech such as ReadPlease Plus and TextAloud.

I have recently discovered a new set of voices from Edinburgh-based CereProc. They produce a wide range of voices for both the Windows and Mac platforms and cover a range of accents including Scottish, West Midlands and Southern English. You can listen to demonstrations of all their voices online (see below) and download them from their online store. The Scottish Heather voice is particularly noteworthy and I am using it myself to proofread important documents.


All three companies mentioned above have online demonstrations of their voices, plus a new manufcaturer I found when researching this article:

Microsoft Anna

Microsoft Anna is the new voice included with Windows Vista.

Anna replaces the robotic Microsoft Sam with a smoother, more realistic voice (although somewhat ironically struggles to pronounce her own name). She is included free with Windows Vista and should work in any program that uses SAPI speech including Clicker 4 or 5, Writing With Symbols and the free text-to-speech programs such as Windows Narrator, ReadPlease and NaturalReader. Unfortunately Microsoft has no plans to make Anna available to users of Windows XP or any operating system other than vista.

Older Voices

Some of the older, robotic voices are available for free. They are usually included with cheaper or free software that incorporates text-to-speech. Microsoft provides downloads of its voices from its website:

What is SAPI?

The "Speech Application Programming Interface" is a common platform for voice manufacturers and programmers that make software that includes the text-to-speech feature.

For the end-user this means that in theory you can use any SAPI-compliant voice with any SAPI-compliant program.

For example: as mentioned above Clicker 5 is shipped with the Elan Sayso voices. Because these voices are SAPI 5 compliant they should work with any software that supports SAPI 5 voices which includes pretty much all software with text-to-speech features. Notice I wrote they should work - most people have found that multiple voices do not always sit comfortably alongside each other on a single computer.

The SAPI platform also allows you to purchase or download new voices to become immediately available for any SAPI-compliant text-to-speech software you have installed.

It does work sometimes. The Hal (or Supernova) and JAWS screenreaders support the SAPI 5 voices. Plus they allow different voices to be used for different elements of the program. This means that you can use the fast screenreader voice for navigating menus and dialogs while using a much more friendly voice for reading documents and emails.

SAPI 5 is the successor to SAPI 4 and allows more realistic voices to be constructed. More information can be found, albeit rather technical, at External Link.

Computer Programs with Text-to-speech

Free Programs

Paid-for Programs