Pyttsx3 Library in Python: A Powerful Text-to-Speech Conversion Tool

Pyttsx3 is a Python library for Text-to-Speech (TTS) conversion. It is a cross-platform library that provides a simple way to generate speech using both the male and female voices in a range of languages. Pyttsx3 is built on top of the speech engine provided by Microsoft Windows, macOS, and Linux.

Pyttsx3 allows developers to create a wide range of applications that can generate speech, including interactive voice assistants, automated customer service systems, and audiobook readers. The library provides a range of features, including:

  • A simple, intuitive API for generating speech from text
  • Support for a range of languages and voices
  • Customizable voice parameters, such as speed and volume
  • Support for interrupting speech and pausing and resuming speech generation
  • Support for synchronous and asynchronous speech generation
  • Compatibility with both Python 2 and Python 3

Step-by-Step Guide: Installing and Utilizing Pyttsx3 Library in Python for Text-to-Speech Conversion

To install Pyttsx3, you can use the pip package manager by running the following command in your terminal or command prompt:

pip install pyttsx3                          

This will download and install the latest version of Pyttsx3 and its dependencies.

Once you have installed Pyttsx3, you can use it to generate speech from text using the following steps:

  1. Import the pyttsx3 library:
  2. import pyttsx3                          
    
    
  3. Create a new TTS engine using the init() method:
  4. engine = pyttsx3.init()                          
    
    
  5. Set the properties of the voice, such as the speed and voice type, using the setProperty() method:
  6. voices = engine.getProperty('voices')
    engine.setProperty('voice', voices[1].id)
    engine.setProperty('rate', 150)                          
    
    

    *Note that the voices variable contains a list of available voices, and you can use the id property to set the voice you want to use.

  7. Generate speech from text using the say() method:
  8. engine.say("Hello, world!")                          
    
    
  9. Wait for the speech to complete using the runAndWait() method:
  10. engine.runAndWait()                          
    
    

Here is the complete code:

import pyttsx3

engine = pyttsx3.init()

voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)
engine.setProperty('rate', 150)

engine.say("Hello, world!")
engine.runAndWait()                          

This code will generate speech saying "Hello, world!" in the voice and at the speed that you have specified. You can change the text and voice properties to generate speech in different languages and with different voices.

Python Tkinter and Pyttsx3: Building an Advanced Text-to-Speech Converter App Tutorial

This app is an advance text-to-speech (TTS) converter that allows users to convert any written text into spoken words.

The user interface of the app is built using the Tkinter module in Python. The app has a text box where users can enter any text they want to convert to speech. Once the user enters the text, they can either convert and play the text as speech, or save it as an audio file.

The app allows users to change the speed and voice of the speech output. Users can select either male or female voices and adjust the speed of the speech output using the sliders provided in the user interface.

The app uses the pyttsx3 library to generate the speech output from the entered text. The app initializes the pyttsx3 library and sets the default voice and speed. Once the user inputs the text and selects the desired voice and speed, the app uses the pyttsx3 library to generate the speech output and play it through the speakers or save it as an audio file.

Here's a breakdown of what the code is doing:

This section imports the necessary libraries for the application. Tkinter is a standard GUI (graphical user interface) library for Python, and ScrolledText is a widget that provides a scrolling text area. Pyttsx3 is a text-to-speech conversion library that will be used in this application.

import tkinter as tk
from tkinter import ttk, filedialog, messagebox
from tkinter.scrolledtext import ScrolledText
import pyttsx3 
import threading

This section defines a function named convertAndPlay(). The function uses the Pyttsx3 library to convert the text entered in the text box to speech and plays it. The voices and speed are set according to the user's choice using the voice_var and speed_scale variables.

def convertAndPlay():
    voices = bot.getProperty('voices')
    # change Voice
    bot.setProperty('voice',voices[voice_var.get()].id)
    # change Speed (word per minutes)
    bot.setProperty('rate',speed_scale.get() )
    text = text_box.get(0.0,tk.END)
    if len(text)>1:
        bot.say(text)
        bot.runAndWait()
    else:
        messagebox.showwarning('Warning','First Enter Some Data to convert into audio')

This section defines a function named saveAudio(). This function saves the text entered in the text box as an audio file. The voice and speed are set according to the user's choice using the voice_var and speed_scale variables.

def saveAudio():
    bot = pyttsx3.init()
    voices = bot.getProperty('voices')
    # change Voice
    bot.setProperty('voice',voices[voice_var.get()].id)
    # change Speed (word per minutes)
    bot.setProperty('rate',speed_scale.get() )
    text = text_box.get(0.0,tk.END)
    if len(text.strip())>1:
        filename = filedialog.asksaveasfilename(defaultextension=".wav")
        if filename:
            bot.save_to_file(text, filename)
            bot.runAndWait()
            messagebox.showinfo('Successful','Audio is Saved')
    else:
        messagebox.showwarning("Warning",'First Enter Some Data to convert into audio')

This section defines a function named stopEngine(). This function is not yet fully implemented but is intended to stop the speech engine if it is currently in use.

# Stil working on this function...
def stopEngine():
    if bot.isBusy():
        print(bot.isBusy()) 
        bot.stop()

This section creates the main window of the application. It sets the window title, size, and background color. It also makes the window non-resizable.

root = tk.Tk()
root.title("Text To Speech Convertor App")
root.geometry("600x280")
root.configure(bg='#999')
root.resizable(0,0)

This section initializes the Pyttsx3 library and creates an instance of the Engine class, which will be used to convert text to speech.

bot = pyttsx3.init()

voice_var = tk.IntVar()

This section creates a ScrolledText widget using the ScrolledText class from the tkinter.scrolledtext module. This widget is placed in the main window (root) at position (x=5, y=5) with a height of 270 pixels and a width of 390 pixels. The widget is also configured with a font size of 11, a border width of 2, a relief style of GROOVE, and undo functionality.

text_box = ScrolledText(root,font=("Sitka Small",11),bd=2,relief=tk.GROOVE,wrap=tk.WORD,undo=True)
text_box.place(x=5,y=5,height=270,width=390)

This section creates a Frame widget using the Frame class from tkinter. This widget is also placed in the main window (root) at position (x=395, y=5) with a height of 270 pixels and a width of 200 pixels. This frame will contain the controls for changing the voice and speed of the speech engine.

frame = tk.Frame(root,bd=2,relief=tk.SUNKEN)
frame.place(x=395,y=5,height=270,width=200)

frame2 = ttk.Labelframe(frame,text='Change Speed')
frame2.grid(row=0,column=0,pady=5,padx=4)

This section creates a LabelFrame widget using the ttk.Labelframe class from tkinter.ttk. This widget is placed inside the frame created in section 8 and is labeled "Change Speed". A Scale widget is created inside this label frame, which allows the user to adjust the speed of the speech engine. The scale ranges from 100 to 300, with a default value of 200.

speed_scale = tk.Scale(frame2,from_=100,to=300,orient=tk.HORIZONTAL,length=170,bg='#ffffff')
speed_scale.set(200) # 200 is default
speed_scale.grid(row=2,columnspan=1,ipadx=5,ipady=5)

This section creates another LabelFrame widget using ttk.Labelframe, labeled "Change Voice". This widget is placed below the label frame created in section 9 in the frame widget. This label frame will contain radio buttons that allow the user to select a voice for the speech engine.

frame3 = ttk.Labelframe(frame,text='Change Voice')
frame3.grid(row=1,column=0,pady=5)

the code creates two radio buttons for the user to select the voice type. The tk.Radiobutton widget creates a radio button that can be selected or deselected by the user. The text option sets the text that appears next to the radio button, and the variable option ties the radio button to the voice_var variable, which stores the selected value (either 0 or 1). The value option sets the value of the radio button to 0 or 1, depending on whether it is the male or female option. The grid method is used to position the radio buttons within the frame3.

R1 = tk.Radiobutton(frame3, text="Male", variable=voice_var, value=0)
R1.grid(row=0,column=0,ipadx=7,ipady=5,padx=5)

R2 = tk.Radiobutton(frame3, text="Female", variable=voice_var, value=1)
R2.grid(row=0,column=1,ipadx=7,ipady=5,padx=5)

the code creates a new frame (frame4) within frame using the tk.Frame widget. This frame will be used to hold the four buttons that allow the user to interact with the program.

frame4 = tk.Frame(frame,bd=2,relief=tk.SUNKEN)
frame4.grid(row=2,column=0,pady=10)

the code creates a button (btn_1) that says "Convert & Play" and has a width of 15 characters. The command option is used to set a function to be called when the button is clicked. The lambda function is used to create an anonymous function that calls the convertAndPlay function in a new thread. The daemon option is set to True to ensure that the thread terminates when the program exits. The grid method is used to position the button within frame4.

btn_1 = ttk.Button(frame4,text='Convert & Play',width=15,
command=lambda: threading.Thread(target=convertAndPlay, daemon=True).start())
btn_1.grid(row=0,column=0,ipady=5,padx=4,pady=5)

the code creates a button (btn_2) that says "Save as Audio" and has a width of 15 characters. The command option is used to set the saveAudio function to be called when the button is clicked. The grid method is used to position the button within frame4.

btn_2 = ttk.Button(frame4,text='Save as Audio',width=15,command=saveAudio)
btn_2.grid(row=1,column=0,ipady=5,padx=4,pady=5)

the code creates a button (btn_3) that says "Clear" and has a width of 10 characters. The command option is used to set an anonymous lambda function to be called when the button is clicked. This function deletes all the text in the text_box. The grid method is used to position the button within frame4.

btn_3 = ttk.Button(frame4,text='Clear',width=10,command=lambda:text_box.delete(0.0,tk.END))
btn_3.grid(row=0,column=1,ipady=5,padx=4,pady=5)

the code creates a button (btn_4) that says "Exit" and has a width of 10 characters. The command option is used to set the root.quit function to be called when the button is clicked. This function terminates the program. The grid method is used to position the button within frame4.

btn_4 = ttk.Button(frame4,text='Exit',width=10,command=root.quit)
btn_4.grid(row=1,column=1,ipady=5,padx=4,pady=5)

root.mainloop() is a method that runs the main event loop of the Tkinter window. It listens for events such as button clicks, keypresses, and mouse movements and responds to them accordingly. This method ensures that the window remains open and responsive to user interactions until the user decides to close it. In other words, it keeps the GUI application running and updating its contents.

root.mainloop()

Overall, this app is a simple and user-friendly text-to-speech converter that allows users to quickly and easily generate speech output from any written text.

You can watch the video below:

Building Your Own Python Alexa Assistant: A Step-by-Step Tutorial with SpeechRecognition, pyttsx3, pywhatkit, and pyjokes

This code implements a simple virtual assistant that can perform a few tasks based on the user's spoken commands. When the program is run, it initializes several modules for speech recognition, text-to-speech conversion, playing a song on YouTube, getting the current time, retrieving information from Wikipedia, and getting a random joke.

The program then defines a function that listens for the user's spoken command using the microphone, recognizes it using the Google Speech Recognition API, and returns the recognized command. Another function processes the command by checking for certain keywords (e.g., 'play', 'time', 'who is', etc.) and performs the appropriate task (e.g., playing a song on YouTube, getting the current time, retrieving information from Wikipedia, etc.). If the command is not recognized, the program asks the user to repeat the command.

Finally, the program enters an infinite loop that continuously listens for the user's spoken commands and executes them accordingly. The loop continues until the user says 'stop', at which point the program exits.

This section imports the necessary modules for the program to run. It imports the following:

  1. speech_recognition module: to recognize speech from the microphone
  2. pyttsx3 module: to convert text to speech
  3. pywhatkit module: to play a song on YouTube
  4. datetime module: to get the current time
  5. wikipedia module: to get information from Wikipedia
  6. pyjokes module: to get a random joke
import speech_recognition as sr
import pyttsx3
import pywhatkit
import datetime
import wikipedia
import pyjokes

This section initializes some variables:

  1. listener: an instance of the speech_recognition module's Recognizer class that will listen for speech from the microphone
  2. engine: an instance of the pyttsx3 module's init() function that will convert text to speech
  3. voices: a list of available voices that the engine can use
  4. engine.setProperty('voice', voices[1].id): sets the engine's voice to the second available voice (in this case, a female voice)
listener = sr.Recognizer()
engine = pyttsx3.init()
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)

This section defines a function called talk() that takes in a text parameter and converts it to speech using the pyttsx3 engine.

def talk(text):
    engine.say(text)
    engine.runAndWait()

This section defines a function called take_command() that listens to the microphone for speech and recognizes it using the Google Speech Recognition API provided by the speech_recognition module. If the API is not available, the function catches an exception and sets the command variable to an empty string. The function then returns the recognized command (or an empty string if there was an error).

def take_command():
    try:
        with sr.Microphone() as source:
            print('listening...')
            voice = listener.listen(source)
            command = listener.recognize_google(voice)
            command = command.lower()
            print(command)
    except sr.UnknownValueError:
        print("Sorry, I didn't understand that.")
        command = ""
    except sr.RequestError:
        print('Sorry, my speech service is down.')
        command = ""
    return command

This section defines a function called run_alexa() that calls the take_command() function to get the user's spoken command. It then processes the command by checking if it contains certain keywords (e.g., 'play', 'time', 'who is', etc.) and executes the appropriate action using the pywhatkit, datetime, wikipedia, and pyjokes modules. If the command is not recognized, the function asks the user to repeat the command.

def run_alexa():
    command = take_command()
    if 'play' in command:
        song = command.replace('play', '')
        talk('playing ' + song)
        pywhatkit.playonyt(song)
    elif 'time' in command:
        time = datetime.datetime.now().strftime('%I:%M %p')
        talk('Current time is ' + time)
    elif 'who is' in command:
        person = command.replace('who is', '')
        info = wikipedia.summary(person, 1)
        talk(info)
    elif 'date' in command:
        talk('sorry, I have a headache')
    elif 'are you single' in command:
        talk('I am in a relationship with wifi')
    elif 'joke' in command:
        talk(pyjokes.get_joke())
    elif 'stop' in command:
        talk('Goodbye!')
        exit()
    else:
        talk('Please say the command again.')

This section first calls the talk() function to greet the user with a spoken message. It then enters an infinite loop that repeatedly calls the run_alexa() function to listen for the user's commands and execute them accordingly. The loop continues until the user says 'stop', in which case the program exits.

talk('Hello, how can I help you?')
while True:
    run_alexa()

Overall, this code implements a simple virtual assistant that can perform a limited number of tasks based on the user's spoken commands.

You can watch the video below:

Python PDF to Audio Conversion: A Step-by-Step Tutorial with PYTTSX3 and PYPDF2 Libraries

This step-by-step tutorial demonstrates how to utilize the Python PyPDF2 library to read a PDF file and extract the text from a specific page. By harnessing the power of the pyttsx3 library, you will learn how to convert this extracted text into speech. Follow along as we dive into the process of converting PDF documents to audio, enabling you to enhance accessibility and explore new ways of consuming textual information.

Here's what each section of the code does:

This line imports the pyttsx3 library to use text-to-speech functionality and PyPDF2 library which allows Python to read and manipulate PDF files.

import pyttsx3
import PyPDF2 

This line opens the PDF file 'python.PDF' in binary mode and creates a file object named path. The 'rb' argument specifies that the file should be opened in binary mode, which is necessary for reading PDF files.

# path of the PDF file
path = open('python.PDF', 'rb')

This line creates a PdfFileReader object named pdfReader from the path file object. This object provides a way to read and manipulate the contents of the PDF file.

# creating a pdf file reader object
pdfReader = PyPDF2.PdfFileReader(path)

This line extracts the text from the 13th page of the PDF file (since page numbers in PyPDF2 start at 0, not 1) and assigns it to the variable from_page.

# the page which you want to start to read
from_page = pdfReader.getPage(12)

This line extracts the text content from the from_page object and assigns it to the variable text.

# extracting the text from the pdf file
text = from_page.extractText()

This line initializes the pyttsx3 library and creates a speak object that can be used to generate speech from text.

speak = pyttsx3.init()

This line uses the say method of the speak object to generate speech from the text variable.

speak.say(text)

This line tells the speak object to generate speech and wait until it is finished before moving on to the next line of code. This ensures that the entire text is spoken before the program exits.

speak.runAndWait()

Overall, this code is very useful if you don't like to read a large amount of text in a PDF file.

You can watch the video below:

Comments...

banner