etlSUS

Docs are also available in Portuguese (PT)

An opinionated ETL (Extract, Transform, Load) pipeline designed to process Brazil's public healthcare data (DataSUS) from raw CSV files into analysis-ready datasets.

Overview

The Problem

Brazil's SUS (Sistema Único de Saúde) provides extensive public health data, but it requires domain-specific preprocessing before analysis. This includes removing unnecessary columns, handling missing values, and optimizing data types. Manually scripting these transformations for each dataset is time-consuming and error-prone.

The Solution

etlSUS automates the entire process. Simply specify the dataset, and the library handles downloading, transforming, and loading the data into a database and/or merging all files.

🚀 Quick Start

1. Installation

poetry add git+https://github.com/GOPAD-Datasus/etlSUS.git

2. Run the Pipeline

from etlsus import pipeline


if __name__ == '__main__':
    pipeline(
        dataset='SINASC',  # Choose between 'SINASC' or 'SIM'
        data_dir='path/to/data/dir',
    )

📌 Features

Simple Interface: Select your dataset (SINASC and SIM) and specify the base directory
Automated Processing: Handles download, transformation, and loading automatically
Optimized Transformations: Removes irrelevant columns and values while preserving analytical value
- SIM Dictionary (EN) (PT)
- SINASC Dictionary (EN) (PT)
Multiple Output Formats:
- Direct export to relational databases
- Merged single file for multi-year analysis
- Multiple files

📁 Project Structure

After running the pipeline, your data directory will be organized as follows:

# Using data_dir = "./data"

./data
├── raw/                  # Downloaded CSV files
├── processed/            # Cleaned and transformed files
└── dataset.parquet.gzip  # (Optional) Merged file

Limitation

Supports only PARQUET output files.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github/workflows		.github/workflows
docs		docs
etlsus		etlsus
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

etlSUS

Overview

The Problem

The Solution

🚀 Quick Start

1. Installation

2. Run the Pipeline

📌 Features

📁 Project Structure

Limitation

📝 License

About

Uh oh!

Releases 2

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

etlSUS

Overview

The Problem

The Solution

🚀 Quick Start

1. Installation

2. Run the Pipeline

📌 Features

📁 Project Structure

Limitation

📝 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors

Uh oh!

Languages