Skip to content

quprep/quprep

QuPrep — Quantum Data Preparation

The missing preprocessing layer between classical datasets and quantum computing frameworks.

PyPI version Python 3.10+ License: Apache 2.0 DOI Documentation CI codecov CodeQL OpenSSF Scorecard OpenSSF Best Practices Hugging Face Demo


QuPrep converts classical datasets into quantum-circuit-ready format. It is not a quantum computing framework, simulator, or training tool — it is the preprocessing step that feeds into Qiskit, PennyLane, Cirq, TKET, and any other quantum workflow.

CSV / DataFrame / NumPy / images / text / graphs  →  QuPrep  →  circuit-ready output

What QuPrep does

  • Ingest tabular data, time series, images, text, and graphs — all in the same pipeline API
  • Clean, normalize, and reduce dimensionality to fit your hardware qubit budget
  • Encode data into circuits using 13 encoding methods (Angle, Amplitude, IQP, ZZFeatureMap, GraphState, and more)
  • Recommend, compare, and auto-select the best encoding for your dataset and task
  • Export circuits to 8 frameworks: OpenQASM 3.0, Qiskit, PennyLane, Cirq, TKET, Braket, Q#, IQM
  • Formulate combinatorial optimization problems as QUBO / Ising models; export as QAOA circuit templates for your quantum framework

QuPrep does not train models, simulate circuits, run on quantum hardware, or optimize variational parameters.


Installation

pip install quprep

With optional extras:

# Framework exporters
pip install quprep[qiskit]     # Qiskit QuantumCircuit
pip install quprep[pennylane]  # PennyLane QNode
pip install quprep[cirq]       # Cirq Circuit
pip install quprep[tket]       # TKET/pytket Circuit
pip install quprep[braket]     # Amazon Braket Circuit
pip install quprep[qsharp]     # Q# / Azure Quantum
pip install quprep[iqm]        # IQM native format
pip install quprep[frameworks] # all framework exporters at once

# Data modalities
pip install quprep[image]      # image ingestion (Pillow)
pip install quprep[text]       # text embeddings (sentence-transformers, ~2 GB)
pip install quprep[modalities] # image + text at once

# Other
pip install quprep[umap]       # UMAP dimensionality reduction
pip install quprep[viz]        # matplotlib circuit diagrams
pip install quprep[all]        # everything

Requirements: Python ≥ 3.10. Core dependencies: numpy, scipy, pandas, scikit-learn.


Quickstart

One-liner

import quprep as qd

result = qd.prepare("data.csv", encoding="angle", framework="qasm")
print(result.circuit)

Full pipeline

import quprep as qd

pipeline = qd.Pipeline(
    cleaner=qd.Imputer(),
    reducer=qd.PCAReducer(n_components=8),
    encoder=qd.IQPEncoder(reps=2),
    exporter=qd.PennyLaneExporter(),   # pip install quprep[pennylane]
)
result = pipeline.fit_transform("data.csv")
qnode = result.circuit   # callable qml.QNode

Data modalities — time series, images, text, graphs

import quprep as qd

# Time series — sliding window then encode
from quprep.ingest.time_series_ingester import TimeSeriesIngester
from quprep.clean.window_transformer import WindowTransformer

result = qd.Pipeline(
    preprocessor=WindowTransformer(window_size=5, step=1),
    encoder=qd.AngleEncoder(),
).fit_transform(TimeSeriesIngester(time_column="date").load("sensor.csv"))

# Images — pip install quprep[image]
from quprep.ingest.image_ingester import ImageIngester
result = qd.prepare("images/", encoding="angle", ingester=ImageIngester(size=(8, 8), grayscale=True))

# Text — TF-IDF (no deps) or sentence-transformers (pip install quprep[text])
from quprep.ingest.text_ingester import TextIngester
texts = ["quantum computing is powerful", "machine learning meets QML", ...]
result = qd.prepare(texts, encoding="angle", ingester=TextIngester(method="tfidf", max_features=16))

# Graphs — lossless graph state encoding
from quprep.ingest.graph_ingester import GraphIngester
from quprep.encode.graph_state import GraphStateEncoder
import numpy as np
graph_list = [np.array([[0,1,1],[1,0,0],[1,0,0]], dtype=float), ...]  # adjacency matrices
result = qd.Pipeline(encoder=GraphStateEncoder()).fit_transform(
    GraphIngester(features="adjacency").load(graph_list)
)

More features

Feature Docs
Encoding recommendation — ranked by dataset profile and task guide
Qubit budget suggestion — NISQ-safe ceiling with reasoning API
Side-by-side encoder comparison — depth, gates, NISQ safety API
Data drift detection — warn when new data leaves training distribution API
Pipeline save / load — serialize fitted pipelines, no re-fitting API
Schema validation & cost estimation — gate count before encoding guide
QUBO / Ising formulation — Max-Cut, TSP, Knapsack, QAOA circuits, D-Wave export guide
Plugin system — register custom encoders and exporters guide
Circuit visualization — ASCII (no deps) or matplotlib API
Batch QASM export — save all samples to disk as individual files API

Supported encodings

Encoding Qubits Depth NISQ-safe Best for
Angle (Ry/Rx/Rz) n = d O(1) ✅ Excellent Most QML tasks
Amplitude ⌈log₂ d⌉ O(2ⁿ) ❌ Poor Qubit-limited scenarios
Basis n = d O(1) ✅ Excellent Binary features / QAOA
Entangled Angle n = d O(d · layers) ✅ Good Feature correlations
IQP n = d O(d² · reps) ⚠️ Medium Kernel methods
Re-uploading n = d O(d · layers) ✅ Good High-expressivity QNNs
Hamiltonian n = d O(d · steps) ⚠️ Medium Physics simulation / VQE
ZZ Feature Map n = d O(d² · reps) ⚠️ Medium Quantum kernel methods
Pauli Feature Map n = d O(d² · reps) ⚠️ Medium Configurable kernel methods
Random Fourier n_components O(1) ✅ Excellent RBF kernel approximation
Tensor Product ⌈d/2⌉ O(1) ✅ Excellent Qubit-efficient encoding
QAOA Problem n = d O(p) ✅ Good QAOA warm-start, problem-inspired maps
Graph State n = nodes O(edges) ✅ Good Graph-structured data (lossless)

Supported export frameworks

Framework Install Output
OpenQASM 3.0 (included) str
Qiskit pip install quprep[qiskit] QuantumCircuit
PennyLane pip install quprep[pennylane] qml.QNode
Cirq pip install quprep[cirq] cirq.Circuit
TKET pip install quprep[tket] pytket.Circuit
Amazon Braket pip install quprep[braket] braket.Circuit
Q# pip install quprep[qsharp] Q# operation string
IQM pip install quprep[iqm] IQM circuit JSON

Documentation

Full documentation at docs.quprep.org


Examples

# Topic Launch
01 Quickstart — prepare() one-liner Colab Binder
02 Full pipeline — clean → encode → export → save/load Colab Binder
03 All encoders compared Colab Binder
04 Framework export — QASM, Qiskit, PennyLane, Cirq, TKET, Braket, Q#, IQM Colab Binder
05 Encoding recommendation Colab Binder
06 Circuit visualization — ASCII + matplotlib Colab Binder
07 QUBO / Ising — Max-Cut, Knapsack, solvers, D-Wave export, QAOA Colab Binder
08 Validation, schema & cost Colab Binder
09 Data drift detection Colab Binder
10 Qubit suggestion — suggest_qubits, task hints, NISQ ceiling Colab Binder
11 Plugin system — register custom encoders and exporters Colab Binder
12 Data modalities — time series, image, text, graph Colab Binder

Contributing

Contributions are welcome. Please read CONTRIBUTING.md before opening a pull request.


License

Apache 2.0 — see LICENSE.


Citation

If you use QuPrep in your research, please cite:

@software{quprep2026,
  author    = {Perera, Hasarindu},
  title     = {QuPrep: Quantum Data Preparation},
  year      = {2026},
  publisher = {Zenodo},
  version   = {0.7.0},
  doi       = {10.5281/zenodo.19286258},
  url       = {https://doi.org/10.5281/zenodo.19286258},
  license   = {Apache-2.0},
}

About

QuPrep is an open-source Python library for converting classical datasets into quantum-computing-ready formats. It provides a framework-agnostic preprocessing pipeline with intelligent encoding selection, hardware-aware dimensionality reduction, and multi-framework export support for Qiskit, PennyLane, Cirq, and TKET.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages