Microsoft Text-to-Speech in Python (pyTTS)
Microsoft Text-to-Speech in Python (pyTTS)
How to speak text to a user, with proper pronunciation and event handling
by Peter Parente
Abstract
The pyTTS module wraps the Microsoft Speech API (SAPI) for use in Python. It relies on the
win32com library for obtaining and communicating with the SAPI COM interfaces. The pyTTS
module currently provides support for the text-to-speech services provided by SAPI.
In this tutorial, I will present examples for performing a variety of text-to-speech tasks
including speaking simple text, changing voice parameters, speaking to or from a WAV file,
correcting pronunciation, and handling speech events.
Prerequistes
You will need to download and install the following components to use the pyTTS module
with your Python installation:
Simple Speech
Producing simple speech is very easy. The following example demonstrates how to speak
a string of words.
from pyTTS import pyTTS tts = pyTTS()tts.Speak(’This is the sound of my voice.’)
Not much else can be said about this example. It speaks for itself. (Hyuck hyuck.)
Properties
Various voice properties can be modified including rate of speech, speech volume, and
voice. The next example shows how these properties can be changed.
from pyTTS import pyTTStts = pyTTS() #set the speech rate between -10 (slowest) and 10 (fastest), 0 is the default#use the function call directlytts.SetRate(4) #set the speech volume percentage (0-100%)#this time use a managed propertytts.Volume = 40 #get a list of all the available voice actorsprint tts.GetVoiceNames() #explicitly set a voicetts.SetVoiceByName(’MSMary’) #speak the texttts.Speak(’This is the sound of my voice.’)
Properties that affect the speech stream are also exposed by pyTTS. Flags can be passed
to the speech function that change how the text string is interpretted, how speech is handled,
and when speech will begin. The following example demonstrates some of these flags.
from pyTTS import pyTTSfrom pyTTS import tts_async, tts_purge_before_speakfrom time import sleep tts = pyTTS() #the tts_async flag causes the program to continue immediately#after starting speech; the speak function will not blocktts.Speak(’The rain in Spain falls mainly on the plain.’, tts_async) #wait for one secondsleep(1) #now begin speaking the next stream, purging the remainder of the first streamtts.Speak(’This is the sound of my melodious voice!’, tts_purge_before_speak)
WAV Files
The pyTTS class can also speak to a file in WAV format on disk. The file can be played back
later by the pyTTS class or any other audio playback software. The following example shows
how to write and playback WAV files.
from pyTTS import pyTTS tts = pyTTS() #write to a wave filetts.SpeakToWave(’spain.wav’, ‘The rain in Spain falls mainly on the plain.’) #speak from the wave filetts.SpeakFromWave(’spain.wav’)
It is important to note that when any of the XML tags described below are inserted into the
spoken to a WAV file, they are applied when the WAV file is played back by pyTTS
Pronunciation
The Microsoft speech engine has difficulty pronouncing some words, especially when the
pronunciation depends on the semantic context. There are two ways of correcting pronunciation:
- Mispelling. Simply mispell the word that is mispronounced until it sounds right when
spoken by pyTTS. - Phonetic spelling. Insert an XML pronunciation tag into that spells a word phonetically.
Both of these approaches are shown in the example below.
from pyTTS import pyTTSfrom pyTTS import tts_is_xml tts = pyTTS() #MSSam mispronounces the word ’sonified’tts.Speak(’Sonified.’) #we can fix it by misspellingtts.Speak(’Sahnified.’) #or we can use XML to tell the speech engine to use the provided phoneticstts.Speak(’<pron sym="s aa n ih f ay d" />.’, tts_is_xml)
The pyTTS module includes a class that makes pronunciation correction easier. The class
stores pairs of mispronounced words and their pronunciation corrections to disk. An example of
how to use the class is shown below.
from pyTTS import pyTTS, tts_is_xml, Pronounce tts = pyTTS() #create an instance of the pronunciation correctorp = Pronounce() #add an entry for the phonetic pronunciation of the abbreviation Altp.AddPhonetic(’Alt’, ‘ao l t 1′) #add an entry for the purposeful misspelling of the abbreviation Controlp.AddMisspelled(’Ctrl’, ‘Control’) #now quickly correct a sentence using pronunciations in the dictionarytext = p.Correct(’The alt key is fun, but the Ctrl key is cooler!.’) #print the text to see what it looks like and then speak itprint texttts.Speak(text, tts_is_xml) #the pronunciation dictionary can be saved to disk toop.Save(’my.dict’)
The pyTTS installation also includes a module that has the start of a wxPython GUI that
makes creating pronunciation dictionaries easy. The GUI panel can be integrated into other
apps to support pronunciation creation and use. A simple application that uses this GUI can
be seen by running the PronuncationEditor.py in your Python/libs/site-packages folder.
Events
The Microsoft speech engine also supports speech event callbacks. The pyTTS allows you to
register callback functions that will be notified when speech events occur. Some of these
events include end of sentence signals, end of stream signals, and bookmark signals. The
available events are shown in the list below.
This final example demonstrates how to register callback functions for some simple signals.
It requires wxPython or some other library with an event processing loop in order to run
properly.
from pyTTS import *import wximport time
#create a wxPython frame with a single button in itclass myFrame(wx.Frame): def __init__(self): wx.Frame.__init__(self, None, -1, ‘My Frame’, size = wx.Size(120, 80))
#create the button id = wx.NewId() wx.Button(self, id, ‘Press me’)
#create the TTS object and assign the callback functions self.tts = pyTTS() self.tts.OnBookmark = self.OnBookmark self.tts.OnWord = self.OnWordSentence self.tts.OnSentence = self.OnWordSentence
wx.EVT_BUTTON(self, id, self.OnSpeak)
#when the button is pressed, speak the current time with a bookmark in it def OnSpeak(self, event): self.tts.Speak(’The current time is <bookmark mark="begin time" />’+ time.asctime()+’. End of line.’, tts_is_xml, tts_async)
#print when the bookmark is encountered def OnBookmark(self, event): print event.Name, event.Bookmark
#print whenever a word or sentence boundary is encountered def OnWordSentence(self, event): print event.Name, event.CharacterPosition
if __name__ == ‘__main__’: app = wx.PySimpleApp(0) frame = myFrame() app.SetTopWindow(frame) frame.Show() app.MainLoop()
Bookmark events could be particularly useful for building interactive speech interfaces.
With bookmarks, the code can detect when a key word in a sentence is being spoken, and respond
to user input appropriately. For instance, the speech engine could read a list of menu items
with bookmarks before their text. The program could then monitor what item is being read when
a user presses a key, and execute that menu action