Releases · quprep/quprep

Added

Data connectors

HuggingFaceIngester — load any HuggingFace dataset with automatic modality detection (modality="auto" sniffs Image/text/tabular/graph feature schemas); supports tabular, image, text, and graph modalities; pip install quprep[huggingface]
OpenMLIngester — load OpenML datasets by integer ID (load(61)) or name (load("iris")); version pinning; no account required for public datasets; pip install quprep[openml]
KaggleIngester — download Kaggle datasets (load("owner/name")) and competition data (load_competition("slug")); uses ~/.kaggle/kaggle.json or env vars; pip install quprep[kaggle]

CLI tools

quprep inspect <file> — profile a dataset without encoding: shape, feature types, missing values, sparsity, per-feature statistics, and an optional encoding recommendation
quprep benchmark <file> — run all (or selected) encoders on a dataset and report gate count, circuit depth, two-qubit gates, encode time, and NISQ safety in a side-by-side table; --include, --exclude, --task, --samples, --output flags

Reproducibility

fingerprint_pipeline(pipeline) → FingerprintResult — deterministic SHA-256 hash of the full pipeline configuration (stage classes + parameters + dependency versions); stable across runs for the same config; serialisable to dict/JSON for paper methods sections and experiment logs

New extras

quprep[huggingface] — datasets>=2.0
quprep[openml] — openml>=0.14
quprep[kaggle] — kaggle>=1.6

Added

Data modalities

TimeSeriesIngester — timestamped CSV ingestion; stores time column in dataset.metadata["time_index"]; excludes time from feature matrix
WindowTransformer — sliding window over (n_timesteps, n_features) → (n_windows, window × n_features); lag-named features; use via Pipeline(preprocessor=WindowTransformer(...))
ImageIngester — loads images from flat or subdirectory-labeled directory structures; grayscale/RGB, configurable resize and normalization; pip install quprep[image]
TextIngester — TF-IDF (no extra deps) or sentence-transformer dense embeddings (pip install quprep[text]); accepts CSV, .txt file lists, or Python string lists; target_column support
GraphIngester — two ingestion paths: lossy (Laplacian eigenvalues + degree features, configurable n_features, padding) and lossless (features="adjacency" — flattened upper triangle of adjacency matrix); networkx.Graph support
GraphStateEncoder — lossless graph state circuit: H⊗n followed by CZ gates per adjacency edge; pairs with GraphIngester(features="adjacency") for a clean pipeline path; QASM export
Sparse data support — NumpyIngester and Pipeline automatically convert scipy.sparse matrices to dense
Multi-label / multi-output — NumpyIngester(y=labels), CSVIngester(target_columns=[...]), Dataset.labels propagated through all pipeline stages

New extras

quprep[image] — Pillow for ImageIngester
quprep[text] — sentence-transformers (+ PyTorch ~1–2 GB) for neural text embeddings; TF-IDF works with base install
quprep[umap] — umap-learn for UMAPReducer; previously referenced in error messages but the extra did not exist
quprep[modalities] — Pillow + sentence-transformers convenience group

API improvements

prepare(..., ingester=) — new parameter accepts any modality ingester; enables one-liner use with TimeSeriesIngester, ImageIngester, TextIngester, GraphIngester
Pipeline(preprocessor=[t1, t2, ...]) — preprocessor slot now accepts a list of transformers; audit log labels them preprocessor[0], preprocessor[1], etc.

Changed

recommend() — added 5 missing v0.6.0 encoders: ZZFeatureMap, PauliFeatureMap, RandomFourier, TensorProduct, QAOAProblem; all with dataset-aware scoring; previously silently covered only 7 of 12 encoders
quprep.qubo.__all__ — solve_brute, solve_sa, SolveResult removed from __all__; canonical import is now from quprep.qubo.solver import solve_brute, solve_sa; backward-compat import kept (no breaking change)
AmplitudeEncoder — emits QuPrepWarning when input is zero-padded to the next power of two

Fixed

GraphStateEncoder had no documented pipeline path; GraphIngester(features="adjacency") added as the canonical lossless route
prepare() missing ingester= parameter made modality ingesters unusable via the one-liner API
quprep.qubo symbols (max_cut, to_qubo, qaoa_circuit, etc.) were not accessible via qd.* namespace

Added

ZZFeatureMapEncoder — Havlíček-style ZZ feature map (Qiskit-compatible convention)
PauliFeatureMapEncoder — generalized Pauli feature map with configurable Pauli strings
RandomFourierEncoder — quantum-inspired RBF kernel approximation via random Fourier features
TensorProductEncoder — full Bloch sphere encoding (Ry + Rz per qubit, 2 features per qubit)
BraketExporter — Amazon Braket SDK circuit output (quprep[braket])
QSharpExporter — Microsoft Q# program output (quprep[qsharp])
IQMExporter — IQM native circuit JSON output (quprep[iqm])
Plugin system — register_encoder / register_exporter decorators + unregister_*, list_*, get_encoder_class, get_exporter_class; all wired into prepare() and exported from top-level qd.*
quprep convert --framework braket|qsharp|iqm — new framework choices in CLI convert subcommand
QAOAProblemEncoder — QAOA-inspired feature map; features as cost Hamiltonian parameters; p layers, linear/full connectivity, configurable gamma/beta
quprep convert --encoding zz_feature_map|pauli_feature_map|random_fourier|tensor_product|qaoa_problem — new encoding choices in CLI
Binder + Colab launch badges on all 11 example notebooks and README
HuggingFace Spaces demo badge in README

Fixed

prepare() — closure variable capture bug: when both encoding and framework are custom plugins, the second plugin_cls assignment overwrote the first lambda's captured variable, causing the encoder lambda to instantiate the exporter class. Fixed with default-argument binding (lambda cls=plugin_cls: cls())

Added

Encoding comparison (quprep.compare)

compare_encodings(source, *, include, exclude, task, qubits) — analytical side-by-side cost comparison of all (or selected) encoders; no circuits generated
ComparisonResult — .rows (list of CostEstimate), .best(prefer="nisq"|"depth"|"gates"|"qubits"), .to_dict(), __str__() ASCII table with starred recommendation when task= is passed
quprep compare <file> [--task] [--qubits] [--include] [--exclude] CLI subcommand
Exported: qd.compare_encodings, qd.ComparisonResult

Smarter encoding recommendation (quprep.core.recommender)

entangled_angle added to recommendation engine (was an encoder but previously invisible to recommend())
4 new dataset profile signals: missing_rate, sparsity, has_negatives, feature_collinear (mean pairwise Pearson correlation)
9 new dataset-aware scoring rules: amplitude penalised for large sample counts and high missing rate; basis boosted for sparse data, penalised for negative values; IQP/entangled_angle boosted for correlated features; IQP penalised for wide data; reupload penalised for tiny datasets, boosted for large ones

Auto qubit count suggestion (quprep.core.qubit_suggestion)

suggest_qubits(source, *, task, max_qubits) → QubitSuggestion — recommends a qubit budget based on dataset size and target task
QubitSuggestion — .n_qubits, .n_features, .nisq_safe, .encoding_hint, .reasoning, .warning (set when reduction is needed)
quprep suggest <file> [--task] [--max-qubits] CLI subcommand
Exported: qd.suggest_qubits, qd.QubitSuggestion

Pipeline serialization (quprep.core.pipeline)

Pipeline.save(path) — pickles the fitted pipeline; creates parent directories automatically
Pipeline.load(path) — classmethod; restores a fitted pipeline ready for transform() without re-fitting; raises TypeError for non-Pipeline files

Batch export (quprep.export.qasm_export, quprep.__init__)

QASMExporter.save_batch(encoded_list, directory, stem) — saves each sample as {stem}_{i:04d}.qasm; creates output directory automatically; returns list of Path objects
qd.batch_export(source, directory, *, encoding, stem) — top-level one-liner: runs prepare() then save_batch()
quprep convert <file> --save-dir <dir> [--stem <stem>] — CLI flag added to convert subcommand

Data drift detection (quprep.core.drift)

DriftDetector(mean_threshold=3.0, std_threshold=2.0, warn=True) — detects statistical drift between training and new data
fit(dataset) — records per-feature mean and std from training data (NaN-safe)
check(dataset) → DriftReport — flags features where mean shifts > threshold σ or std ratio exceeds bounds; issues QuPrepWarning when drift found
DriftReport — .overall_drift, .drifted_features, .n_features_drifted, .feature_stats (per-feature train/new mean, std, σ-shift, std_ratio)
Pipeline(drift_detector=DriftDetector()) — detector fitted post-reduction, checked on every transform() call
PipelineResult.drift_report — DriftReport | None; preserved through save()/load()
Exported: qd.DriftDetector, qd.DriftReport

Changed

Pipeline.__init__ — new drift_detector parameter (default None; backwards compatible)
PipelineResult.__init__ — new drift_report attribute (default None; backwards compatible)
Pipeline.get_params() / set_params() — drift_detector included in parameter dict

Added

Validation & schema (quprep.validation)

QuPrepWarning — custom warning class; all pipeline warnings use this category so they can be filtered precisely
validate_dataset(dataset) — structural checks at pipeline entry: shape, dtype, NaN detection with fractional coverage warning
warn_qubit_mismatch(n_features, n_qubits, encoding) — warns when features exceed qubit budget
DataSchema / FeatureSpec / SchemaViolationError — declare expected feature names, types, and value ranges; attach via Pipeline(schema=...) to enforce at entry; all violations collected and reported together
DataSchema.infer(dataset) — auto-builds schema from a reference dataset
DataSchema.to_json() / from_json() / to_dict() / from_dict() — full serialisation round-trip; terse output (omits None fields and nullable=False)

Cost estimation (quprep.validation.cost)

CostEstimate — dataclass: encoding, n_features, n_qubits, gate_count, circuit_depth, two_qubit_gates, nisq_safe, warning
estimate_cost(encoder, n_features) — formula-accurate gate counts for all 7 encoders; NISQ-safe flag (depth < 200, CNOTs < 50)

Pipeline & PipelineResult (quprep.core.pipeline)

PipelineResult.cost — CostEstimate | None; populated at fit time whenever an encoder is configured; shown in repr()
PipelineResult.audit_log — list[dict] | None; one entry per preprocessing stage with {stage, n_samples_in, n_features_in, n_samples_out, n_features_out}
PipelineResult.summary() — prints audit log as an aligned table and cost breakdown
Pipeline.fit(source, y=None) / .transform(source) — full sklearn-compatible split; transform() raises RuntimeError before fit()
Pipeline.get_params() / .set_params(**params) — hyperparameter search ready
Pipeline(schema=...) — validates dataset at entry before any stage runs
Pipeline.summary() / __str__ — human-readable snapshot: configured stages, fitted status, resolved normalizer, schema feature count, last cost estimate

Sklearn-compatible fit/transform on all stateful stages

Every stage now has separate fit(dataset) and transform(dataset) methods; fit_transform delegates; NotFittedError raised on transform() before fit()
Stages: Scaler, Imputer, OutlierHandler, CategoricalEncoder, FeatureSelector, PCAReducer, LDAReducer, HardwareAwareReducer, SpectralReducer, TSNEReducer, UMAPReducer
CategoricalEncoder aligns one-hot columns between train and test sets at transform time
Dataset.copy() — deep copy for safe fit/transform stage splitting

import quprep as qd — top-level namespace alias

All public classes exported directly: all 7 encoders, all cleaners (Imputer, OutlierHandler, CategoricalEncoder, FeatureSelector), Scaler, all reducers, QASMExporter, PipelineResult, all validation classes
No sub-imports needed: qd.AngleEncoder(), qd.PCAReducer(), qd.DataSchema(...), etc.

quprep validate CLI

quprep validate dataset.csv — shape, column names, NaN report per column (count + %), value ranges
quprep validate dataset.csv --schema schema.json — validates against a JSON schema (array of {name, dtype, min_value?, max_value?, nullable?}); exits 1 on violation
quprep validate dataset.csv --infer-schema output.json — infers schema from the CSV and writes it to a file; use "-" to print to stdout

Type stubs (.pyi files)

Stubs added for: Dataset, Pipeline / PipelineResult, Scaler, BaseEncoder / EncodedResult, and the full validation public API

Zenodo DOI

Placeholder badge and doi field added to README and BibTeX citation
No custom GitHub Actions workflow needed — Zenodo's native GitHub integration archives each Release automatically

Added

QUBO / Ising conversion (quprep.qubo)

to_qubo(cost_matrix, constraints, penalty) — converts any square cost matrix to upper-triangular QUBO form; supports equality and inequality constraints via Lagrangian penalty
QUBOResult — holds Q matrix, offset, variable map, n_original; .to_ising(), .evaluate(x), .to_dwave(), .to_dict() / .from_dict() methods
IsingResult — holds h, J, offset; .to_qubo() round-trip conversion
qubo_to_ising(qubo) — QUBO → Ising transformation (s = 2x − 1); energy-consistent for all binary inputs
ising_to_qubo(ising) — Ising → QUBO inverse transformation; completes the bidirectional round-trip
equality_penalty(A, b, penalty) — encodes Ax = b as a QUBO penalty matrix
inequality_penalty(A, b, penalty) — encodes Ax ≤ b via binary slack variables; augments Q from (n,n) to (n+K,n+K)
add_qubo(q1, q2, weight) — combines two same-size QUBOs; useful for multi-objective problems

Problem library (quprep.qubo.problems) — 7 NP-hard combinatorial problems

max_cut(adjacency) — Max-Cut graph partitioning
knapsack(weights, values, capacity, penalty) — 0/1 Knapsack
tsp(distance_matrix, penalty) — Travelling Salesman Problem (n² binary variables)
portfolio(returns, covariance, budget, risk_penalty, budget_penalty) — Markowitz portfolio optimization
graph_color(adjacency, n_colors, penalty) — Graph Colouring (n×K binary variables)
scheduling(processing_times, n_machines, penalty) — Job scheduling / load balancing
number_partition(values, penalty) — Number Partitioning

Solvers (quprep.qubo.solver)

solve_brute(qubo, max_n=20) — exact exhaustive solver; evaluates all 2^n states; practical up to n=20
solve_sa(qubo, n_steps, T_start, T_end, seed, restarts) — simulated annealing heuristic; O(n) incremental energy update with geometric cooling; scales to n ~ 500+

QAOA circuit generator (quprep.qubo.qaoa)

qaoa_circuit(qubo, p, gamma, beta) — generates a p-layer QAOA ansatz as OpenQASM 3.0; converts QUBO → Ising internally; compatible with Qiskit, Cirq, and any QASM backend

Visualization (quprep.qubo.visualize) — requires pip install quprep[viz]

draw_qubo(qubo, title, cmap, ax) — heatmap of Q matrix with symmetric colour scale; annotates cells for n ≤ 10
draw_ising(ising, title, ax) — circular graph layout; node colour = h_i bias; edge colour/width = J_ij coupling strength

CLI (quprep qubo)

quprep qubo maxcut --adjacency ... [--solve]
quprep qubo knapsack --weights ... --values ... --capacity ... [--solve]
quprep qubo tsp --distances ... [--solve]
quprep qubo schedule --times ... --machines ... [--solve]
quprep qubo partition --values ... [--solve]
quprep qubo portfolio --returns ... --covariance ... --budget ... [--solve]
quprep qubo graphcolor --adjacency ... --colors ... [--solve]
quprep qubo qaoa <problem> ... [--p N] [--gamma ...] [--beta ...] [--output file]
quprep qubo export <problem> ... [--format json|npy] [--output file]
--solve auto-switches from exact to simulated annealing for n > 20

QuPrep v0.2.0

Install

pip install quprep                        # core
pip install quprep[qiskit]               # Qiskit export
pip install quprep[pennylane]            # PennyLane export
pip install quprep[cirq]                 # Cirq export
pip install quprep[tket]                 # TKET export
pip install quprep[viz]                  # matplotlib circuit diagrams
pip install quprep[all]                  # everything

Added

Reduce

PCAReducer — wraps sklearn PCA; supports integer or variance-fraction n_components; explained_variance_ratio_ property after fit
LDAReducer — wraps sklearn LDA; maximises class separability; labels passed at init or fit time
SpectralReducer — row-wise FFT, keeps first n frequency magnitudes; outputs always ≥ 0
TSNEReducer — wraps sklearn TSNE with random_state=42 for reproducibility
UMAPReducer — wraps umap-learn (optional: pip install umap-learn); raises ImportError with install hint if absent
HardwareAwareReducer — auto-reduces to a backend's qubit budget via PCA; accepts backend name (e.g. 'ibm_brisbane') or integer qubit count

Encode

EntangledAngleEncoder — rotation layer + CNOT entangling layer, repeated layers times; supports linear, circular, and full entanglement topologies
IQPEncoder — Havlíček et al. 2019 feature map with pairwise ZZ interactions; reps parameter
ReUploadEncoder — Pérez-Salinas et al. 2020 data re-uploading; layers and rotation parameters
HamiltonianEncoder — Trotterized single-qubit Z Hamiltonian evolution; evolution_time and trotter_steps parameters

Export

PennyLaneExporter — returns a callable qml.QNode; supports all encodings; interface and device parameters (pip install quprep[pennylane])
CirqExporter — returns a cirq.Circuit; supports angle, basis, IQP, re-upload, Hamiltonian encodings (pip install quprep[cirq])
TKETExporter — returns a pytket.Circuit; angles auto-converted to pytket half-turns (pip install quprep[tket])
draw_ascii(encoded) — no-dependency ASCII circuit diagram for any EncodedResult; returns a printable string
draw_matplotlib(encoded, filename=None) — matplotlib circuit diagram; returns a Figure or saves to PNG/PDF/SVG (pip install quprep[viz])

Recommend

recommend(source, task, qubits) — scores all encodings against dataset profile and task; returns EncodingRecommendation with ranked alternatives
EncodingRecommendation.apply() — directly applies the recommendation to data and returns a PipelineResult

CLI

quprep recommend <file> [--task classification|regression|qaoa|kernel|simulation] [--qubits N]
quprep convert now supports --framework pennylane|cirq|tket

Changed

QASMExporter now supports entangled angle, IQP, re-upload, and Hamiltonian encodings
Pipeline auto-normalizes IQP/re-upload → minmax_pm_pi, Hamiltonian → zscore
prepare() accepts encoding='iqp', 'reupload', 'hamiltonian' with matching kwargs

Fixed

HamiltonianEncoder via prepare() and Pipeline was broken — _encoding_key() returned "zscore" instead of "hamiltonian", causing auto_normalizer() to raise ValueError

Documentation — quprep.readthedocs.io

QuPrep v0.1.0

QuPrep is a focused Python library for preparing classical datasets for quantum computing.
It covers the full pipeline from raw data to circuit-ready output: ingest → clean → normalize → encode → export.

Install

pip install quprep          # core, no framework deps
pip install quprep[qiskit]  # with Qiskit export

What's included

Three encoders

Encoder	Qubits	Depth	Use case
`AngleEncoder` (Ry/Rx/Rz)	n = d	O(1)	Most QML tasks
`AmplitudeEncoder`	⌈log₂ d⌉	O(2ⁿ)	Qubit-limited
`BasisEncoder`	n = d	O(1)	Binary / QAOA

Two exporters

QASMExporter — OpenQASM 3.0 strings, no optional dependencies
QiskitExporter — Qiskit QuantumCircuit objects (angle, basis, amplitude via StatePreparation)

Full cleaning stage

Imputer — mean, median, mode, KNN, MICE, drop
OutlierHandler — IQR, Z-score, Isolation Forest (clip or remove)
CategoricalEncoder — one-hot, label, ordinal
FeatureSelector — correlation, mutual information, variance

Automatic normalization

Pipeline selects the mathematically correct scaler per encoder automatically (e.g. minmax → [0, π] for Ry, L2 norm for amplitude)

One-liner API

import quprep

result = quprep.prepare("data.csv", encoding="angle", framework="qasm")
print(result.circuit)

CLI

quprep convert data.csv --encoding angle --framework qasm

Examples — four worked examples as .py scripts and .ipynb notebooks in examples/.

Documentation — quprep.readthedocs.io

What's next (v0.2.0)

IQP, Data Re-upload, and Hamiltonian encoders
PennyLane, Cirq, and TKET exporters
PCA, LDA, and Spectral dimensionality reducers
Encoding recommendation engine (quprep recommend)

Releases: quprep/quprep

v0.8.0

Added

Uh oh!

v0.7.0

Added

Changed

Fixed

Uh oh!

v0.6.0

Added

Fixed

Uh oh!

v0.5.0

Added

Changed

Uh oh!

v0.4.0

Added

Uh oh!

v0.3.0

Added

Uh oh!

v0.2.0

QuPrep v0.2.0

Install

Added

Changed

Fixed

Uh oh!

QuPrep v0.1.0 — Initial Release

QuPrep v0.1.0

Install

What's included

What's next (v0.2.0)

Uh oh!