Desktop Voice Assistant

This repository contains a Python-based desktop voice assistant that leverages speech recognition, text-to-speech, and various web APIs to perform tasks on your computer. The project integrates a graphical user interface (GUI) built with Tkinter, enabling users to interact with the assistant through voice commands and see real-time feedback on the screen.

Introduction

The Desktop Voice Assistant is designed to provide a hands-free interface for everyday computer tasks. It listens to your commands, processes them using APIs like WolframAlpha and Wikipedia, and responds via voice output using pyttsx3. Additionally, it integrates with web browsers to perform searches, play media files, launch applications, and even control system operations such as file searching and shutting down the PC.

Features

Voice Interaction:
Utilizes the speech_recognition library to capture and convert voice input into text.
Text-to-Speech:
Uses pyttsx3 (configured for Windows' SAPI5) to provide spoken responses.
Web and API Integration:
- Searches Google, YouTube, and Wikipedia.
- Retrieves information using WolframAlpha API for computational queries.
Application and File Management:
- Opens various installed applications (e.g., Visual Studio Code, Microsoft Office).
- Searches for and opens files on the computer.
Media Playback:
Plays random music or video files from specified directories.
Graphical User Interface (GUI):
Built with Tkinter to display conversation logs, command lists, and status messages.
Multi-threading:
Utilizes Python's threading module to ensure the GUI remains responsive while processing voice commands.

Architecture and Workflow

Initialization:
- The assistant initializes the Tkinter GUI, sets up the text variables to display user and assistant messages, and configures the voice engine.
- It also loads the list of available commands into a separate window for reference.
Voice Input and Processing:
- When activated, the assistant listens for a command using a microphone.
- The audio input is processed by the speech_recognition library and converted to text.
Command Parsing and Execution:
- The text command is parsed to determine the required action (e.g., searching Google, opening an application, retrieving information from Wikipedia).
- Based on the command, appropriate functions are triggered, such as opening a URL in a web browser, launching an application via OS commands, or performing calculations using WolframAlpha.
Output and Feedback:
- The assistant speaks the response using the text-to-speech engine.
- Simultaneously, messages are updated on the GUI so the user can see what the assistant is doing.

Prerequisites

Python 3.x: Ensure Python is installed on your system.
Required Python Libraries:
- pyttsx3
- SpeechRecognition
- wikipedia
- wolframalpha
System Requirements:
The assistant is built for Windows, utilizing SAPI5 for text-to-speech and expecting Windows-specific file paths and application shortcuts.

Installation

Clone the Repository:

git clone <repository-url>
cd <repository-directory>

Install Dependencies:

Install the required packages using pip:
```
pip install pyttsx3 SpeechRecognition wikipedia wolframalpha
```
If you face issues with any dependencies, consult the package documentation for installation instructions specific to your environment.

Usage

Running the Assistant:

Open a command prompt in the project directory and run:
```
python VoiceAI.py
```
Interacting with the Assistant:
- The GUI window will display a welcome message and a list of available commands.
- Click Start! to initialize the assistant.
- Use Start Speaking! to begin issuing voice commands.
- You can view the command list by clicking the Command List button.

Commands and Functionalities

The assistant supports a variety of voice commands, including but not limited to:

Web Searches:
- "search google <keyword>"
- "search youtube <keyword>"
- "wikipedia <keyword>"
Opening Websites:
- "open google", "open youtube", "open facebook", etc.
Mapping:
- "google map <location>" to search a location on Google Maps.
Application Launching:
- "open code" to launch Visual Studio Code.
- "open word", "open excel", "open powerpoint", etc.
Media Playback:
- "play music" to play a random song.
- "play video" to play a random video from specified directories.
System Operations:
- "the time" to query the current time.
- "open a file" to search and open a file.
- "exit" or "shutdown" for system operations.
General Conversation:
The assistant also responds to greetings and casual conversation, and uses the WolframAlpha API to process complex queries.

Troubleshooting

Voice Recognition Issues:
Ensure your microphone is properly configured and that you have a stable internet connection for API calls.
Application Launch Errors:
Verify the file paths in the code match the installed locations of your applications.
Dependency Problems:
Reinstall or update any Python libraries if you encounter errors during runtime.

Contributing

Contributions are welcome! If you have ideas for new features, bug fixes, or improvements, please open an issue or submit a pull request. Make sure to follow the coding style used in the project.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.MD		README.MD
VoiceAI.py		VoiceAI.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Desktop Voice Assistant

Table of Contents

Introduction

Features

Architecture and Workflow

Prerequisites

Installation

Usage

Commands and Functionalities

Troubleshooting

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Desktop Voice Assistant

Table of Contents

Introduction

Features

Architecture and Workflow

Prerequisites

Installation

Usage

Commands and Functionalities

Troubleshooting

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages