GitHub - 161tochtli/text-classification-ai: An AI-powered system for automating the classification of text data into predefined categories.

Welcome to the Text Classification AI repository! This project provides an AI-powered system for analyzing and categorizing textual data into predefined categories. Follow this guide to configure, install, and use the system.

Features

Flexible categorization based on customizable configurations.
Works with any dataset containing text data.
User-defined categories for classification.
Supports one-hot or categorical output encoding.

Installation

1. Clone the Repository

git clone https://github.com/161tochtli/text-classification-ai.git
cd text-classification-ai

2. Create a Virtual Environment (Optional but Recommended)

python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

Configuration

The system relies on a configuration YAML file to define input, output, and processing parameters. Modify the provided task_configuration.yaml to match your use case.

Example Configuration

# Path to the input file
input_file: data/input_file.csv

# Path to the output file
output_file: data/output_file.csv

# Parameter columns
param_cols:
  - comentario

# Fixed parameters
fixed_params:
  idioma: Español
  categories:
    - Positivo
    - Negativo

# Reprocess flag
reprocess: false

# Encoding
one_hot_encoding: false

# Model name
model: gpt-4o-mini

# Batch size
batch_size: 32

What to Customize

input_file: Path to the file containing text data to classify.
output_file: Path where the classified data will be saved.
param_cols: Columns in the input file to process (e.g., comentario).
fixed_params: Define fixed parameters and categories for classification.
one_hot_encoding: Set to true for one-hot encoding output; false for categorical labels.
model: Specify the model to use (e.g., gpt-4o-mini).
batch_size: Number of rows to process per batch.

Usage

Run the Script

To classify text, use the following command:

python data_classification.py --task sample_task

Options

--task: Name of the task to perform. It should be the name of the folder inside the tasks directory, containing the task configuration file and prompt files.

Output

The output file (defined in the output_file parameter) will contain the input data along with the classification results. Example:

comentario	classification
"This product is amazing!"	Positive
"I didn't like the service."	Negative

Project Structure

.secrets/               # Directory for sensitive credentials (e.g., API keys)
.venv/                  # Virtual environment (optional)
data/                   # Directory for input and output files
  input_file.csv        # Input dataset
  output_file.csv       # Output results

tasks/                  # Directory for tasks and configurations
  sample_task/          # Example task configuration
    task_configuration.yaml  # Task configuration file
    user_prompt.txt     # File containing the user prompt template
    system_prompt.txt   # File containing the system prompt template

.config                 # Global configuration file, here you set working directory

requirements.txt        # Python dependencies

Support

For questions or issues, please create a GitHub issue or contact me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Installation

1. Clone the Repository

2. Create a Virtual Environment (Optional but Recommended)

3. Install Dependencies

Configuration

Example Configuration

What to Customize

Usage

Run the Script

Options

Output

Project Structure

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.secrets		.secrets
data		data
tasks/sample_task		tasks/sample_task
.config		.config
README.md		README.md
data_classification.py		data_classification.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Features

Installation

1. Clone the Repository

2. Create a Virtual Environment (Optional but Recommended)

3. Install Dependencies

Configuration

Example Configuration

What to Customize

Usage

Run the Script

Options

Output

Project Structure

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages