This repository contains a Python-based desktop voice assistant that leverages speech recognition, text-to-speech, and various web APIs to perform tasks on your computer. The project integrates a graphical user interface (GUI) built with Tkinter, enabling users to interact with the assistant through voice commands and see real-time feedback on the screen.
- Introduction
- Features
- Architecture and Workflow
- Prerequisites
- Installation
- Usage
- Commands and Functionalities
- Troubleshooting
- Contributing
- License
The Desktop Voice Assistant is designed to provide a hands-free interface for everyday computer tasks. It listens to your commands, processes them using APIs like WolframAlpha and Wikipedia, and responds via voice output using pyttsx3. Additionally, it integrates with web browsers to perform searches, play media files, launch applications, and even control system operations such as file searching and shutting down the PC.
-
Voice Interaction:
Utilizes thespeech_recognitionlibrary to capture and convert voice input into text. -
Text-to-Speech:
Usespyttsx3(configured for Windows' SAPI5) to provide spoken responses. -
Web and API Integration:
- Searches Google, YouTube, and Wikipedia.
- Retrieves information using WolframAlpha API for computational queries.
-
Application and File Management:
- Opens various installed applications (e.g., Visual Studio Code, Microsoft Office).
- Searches for and opens files on the computer.
-
Media Playback:
Plays random music or video files from specified directories. -
Graphical User Interface (GUI):
Built with Tkinter to display conversation logs, command lists, and status messages. -
Multi-threading:
Utilizes Python's threading module to ensure the GUI remains responsive while processing voice commands.
-
Initialization:
- The assistant initializes the Tkinter GUI, sets up the text variables to display user and assistant messages, and configures the voice engine.
- It also loads the list of available commands into a separate window for reference.
-
Voice Input and Processing:
- When activated, the assistant listens for a command using a microphone.
- The audio input is processed by the
speech_recognitionlibrary and converted to text.
-
Command Parsing and Execution:
- The text command is parsed to determine the required action (e.g., searching Google, opening an application, retrieving information from Wikipedia).
- Based on the command, appropriate functions are triggered, such as opening a URL in a web browser, launching an application via OS commands, or performing calculations using WolframAlpha.
-
Output and Feedback:
- The assistant speaks the response using the text-to-speech engine.
- Simultaneously, messages are updated on the GUI so the user can see what the assistant is doing.
- Python 3.x: Ensure Python is installed on your system.
- Required Python Libraries:
pyttsx3SpeechRecognitionwikipediawolframalpha
- System Requirements:
The assistant is built for Windows, utilizing SAPI5 for text-to-speech and expecting Windows-specific file paths and application shortcuts.
-
Clone the Repository:
git clone <repository-url> cd <repository-directory>
-
Install Dependencies:
Install the required packages using pip:
pip install pyttsx3 SpeechRecognition wikipedia wolframalpha
If you face issues with any dependencies, consult the package documentation for installation instructions specific to your environment.
-
Running the Assistant:
Open a command prompt in the project directory and run:
python VoiceAI.py
-
Interacting with the Assistant:
- The GUI window will display a welcome message and a list of available commands.
- Click Start! to initialize the assistant.
- Use Start Speaking! to begin issuing voice commands.
- You can view the command list by clicking the Command List button.
The assistant supports a variety of voice commands, including but not limited to:
-
Web Searches:
- "search google
<keyword>" - "search youtube
<keyword>" - "wikipedia
<keyword>"
- "search google
-
Opening Websites:
- "open google", "open youtube", "open facebook", etc.
-
Mapping:
- "google map
<location>" to search a location on Google Maps.
- "google map
-
Application Launching:
- "open code" to launch Visual Studio Code.
- "open word", "open excel", "open powerpoint", etc.
-
Media Playback:
- "play music" to play a random song.
- "play video" to play a random video from specified directories.
-
System Operations:
- "the time" to query the current time.
- "open a file" to search and open a file.
- "exit" or "shutdown" for system operations.
-
General Conversation:
The assistant also responds to greetings and casual conversation, and uses the WolframAlpha API to process complex queries.
-
Voice Recognition Issues:
Ensure your microphone is properly configured and that you have a stable internet connection for API calls. -
Application Launch Errors:
Verify the file paths in the code match the installed locations of your applications. -
Dependency Problems:
Reinstall or update any Python libraries if you encounter errors during runtime.
Contributions are welcome! If you have ideas for new features, bug fixes, or improvements, please open an issue or submit a pull request. Make sure to follow the coding style used in the project.
This project is licensed under the MIT License. See the LICENSE file for details.