Welcome to the Text Classification AI repository! This project provides an AI-powered system for analyzing and categorizing textual data into predefined categories. Follow this guide to configure, install, and use the system.
- Flexible categorization based on customizable configurations.
- Works with any dataset containing text data.
- User-defined categories for classification.
- Supports one-hot or categorical output encoding.
git clone https://github.com/161tochtli/text-classification-ai.git
cd text-classification-aipython3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activatepip install -r requirements.txtThe system relies on a configuration YAML file to define input, output, and processing parameters. Modify the provided task_configuration.yaml to match your use case.
# Path to the input file
input_file: data/input_file.csv
# Path to the output file
output_file: data/output_file.csv
# Parameter columns
param_cols:
- comentario
# Fixed parameters
fixed_params:
idioma: Español
categories:
- Positivo
- Negativo
# Reprocess flag
reprocess: false
# Encoding
one_hot_encoding: false
# Model name
model: gpt-4o-mini
# Batch size
batch_size: 32input_file: Path to the file containing text data to classify.output_file: Path where the classified data will be saved.param_cols: Columns in the input file to process (e.g.,comentario).fixed_params: Define fixed parameters and categories for classification.one_hot_encoding: Set totruefor one-hot encoding output;falsefor categorical labels.model: Specify the model to use (e.g.,gpt-4o-mini).batch_size: Number of rows to process per batch.
To classify text, use the following command:
python data_classification.py --task sample_task--task: Name of the task to perform. It should be the name of the folder inside thetasksdirectory, containing the task configuration file and prompt files.
The output file (defined in the output_file parameter) will contain the input data along with the classification results. Example:
| comentario | classification |
|---|---|
| "This product is amazing!" | Positive |
| "I didn't like the service." | Negative |
.secrets/ # Directory for sensitive credentials (e.g., API keys)
.venv/ # Virtual environment (optional)
data/ # Directory for input and output files
input_file.csv # Input dataset
output_file.csv # Output results
tasks/ # Directory for tasks and configurations
sample_task/ # Example task configuration
task_configuration.yaml # Task configuration file
user_prompt.txt # File containing the user prompt template
system_prompt.txt # File containing the system prompt template
.config # Global configuration file, here you set working directory
requirements.txt # Python dependencies
For questions or issues, please create a GitHub issue or contact me.