Skip to content

Releases: quprep/quprep

v0.8.0

14 Apr 13:01
fa277da

Choose a tag to compare

Added

Data connectors

  • HuggingFaceIngester — load any HuggingFace dataset with automatic modality detection (modality="auto" sniffs Image/text/tabular/graph feature schemas); supports tabular, image, text, and graph modalities; pip install quprep[huggingface]
  • OpenMLIngester — load OpenML datasets by integer ID (load(61)) or name (load("iris")); version pinning; no account required for public datasets; pip install quprep[openml]
  • KaggleIngester — download Kaggle datasets (load("owner/name")) and competition data (load_competition("slug")); uses ~/.kaggle/kaggle.json or env vars; pip install quprep[kaggle]

CLI tools

  • quprep inspect <file> — profile a dataset without encoding: shape, feature types, missing values, sparsity, per-feature statistics, and an optional encoding recommendation
  • quprep benchmark <file> — run all (or selected) encoders on a dataset and report gate count, circuit depth, two-qubit gates, encode time, and NISQ safety in a side-by-side table; --include, --exclude, --task, --samples, --output flags

Reproducibility

  • fingerprint_pipeline(pipeline)FingerprintResult — deterministic SHA-256 hash of the full pipeline configuration (stage classes + parameters + dependency versions); stable across runs for the same config; serialisable to dict/JSON for paper methods sections and experiment logs

New extras

  • quprep[huggingface]datasets>=2.0
  • quprep[openml]openml>=0.14
  • quprep[kaggle]kaggle>=1.6

v0.7.0

08 Apr 20:11
1e91955

Choose a tag to compare

Added

Data modalities

  • TimeSeriesIngester — timestamped CSV ingestion; stores time column in dataset.metadata["time_index"]; excludes time from feature matrix
  • WindowTransformer — sliding window over (n_timesteps, n_features)(n_windows, window × n_features); lag-named features; use via Pipeline(preprocessor=WindowTransformer(...))
  • ImageIngester — loads images from flat or subdirectory-labeled directory structures; grayscale/RGB, configurable resize and normalization; pip install quprep[image]
  • TextIngester — TF-IDF (no extra deps) or sentence-transformer dense embeddings (pip install quprep[text]); accepts CSV, .txt file lists, or Python string lists; target_column support
  • GraphIngester — two ingestion paths: lossy (Laplacian eigenvalues + degree features, configurable n_features, padding) and lossless (features="adjacency" — flattened upper triangle of adjacency matrix); networkx.Graph support
  • GraphStateEncoder — lossless graph state circuit: H⊗n followed by CZ gates per adjacency edge; pairs with GraphIngester(features="adjacency") for a clean pipeline path; QASM export
  • Sparse data support — NumpyIngester and Pipeline automatically convert scipy.sparse matrices to dense
  • Multi-label / multi-output — NumpyIngester(y=labels), CSVIngester(target_columns=[...]), Dataset.labels propagated through all pipeline stages

New extras

  • quprep[image]Pillow for ImageIngester
  • quprep[text]sentence-transformers (+ PyTorch ~1–2 GB) for neural text embeddings; TF-IDF works with base install
  • quprep[umap]umap-learn for UMAPReducer; previously referenced in error messages but the extra did not exist
  • quprep[modalities]Pillow + sentence-transformers convenience group

API improvements

  • prepare(..., ingester=) — new parameter accepts any modality ingester; enables one-liner use with TimeSeriesIngester, ImageIngester, TextIngester, GraphIngester
  • Pipeline(preprocessor=[t1, t2, ...])preprocessor slot now accepts a list of transformers; audit log labels them preprocessor[0], preprocessor[1], etc.

Changed

  • recommend() — added 5 missing v0.6.0 encoders: ZZFeatureMap, PauliFeatureMap, RandomFourier, TensorProduct, QAOAProblem; all with dataset-aware scoring; previously silently covered only 7 of 12 encoders
  • quprep.qubo.__all__solve_brute, solve_sa, SolveResult removed from __all__; canonical import is now from quprep.qubo.solver import solve_brute, solve_sa; backward-compat import kept (no breaking change)
  • AmplitudeEncoder — emits QuPrepWarning when input is zero-padded to the next power of two

Fixed

  • GraphStateEncoder had no documented pipeline path; GraphIngester(features="adjacency") added as the canonical lossless route
  • prepare() missing ingester= parameter made modality ingesters unusable via the one-liner API
  • quprep.qubo symbols (max_cut, to_qubo, qaoa_circuit, etc.) were not accessible via qd.* namespace

v0.6.0

02 Apr 18:48
50fb6f6

Choose a tag to compare

Added

  • ZZFeatureMapEncoder — Havlíček-style ZZ feature map (Qiskit-compatible convention)
  • PauliFeatureMapEncoder — generalized Pauli feature map with configurable Pauli strings
  • RandomFourierEncoder — quantum-inspired RBF kernel approximation via random Fourier features
  • TensorProductEncoder — full Bloch sphere encoding (Ry + Rz per qubit, 2 features per qubit)
  • BraketExporter — Amazon Braket SDK circuit output (quprep[braket])
  • QSharpExporter — Microsoft Q# program output (quprep[qsharp])
  • IQMExporter — IQM native circuit JSON output (quprep[iqm])
  • Plugin system — register_encoder / register_exporter decorators + unregister_*, list_*, get_encoder_class, get_exporter_class; all wired into prepare() and exported from top-level qd.*
  • quprep convert --framework braket|qsharp|iqm — new framework choices in CLI convert subcommand
  • QAOAProblemEncoder — QAOA-inspired feature map; features as cost Hamiltonian parameters; p layers, linear/full connectivity, configurable gamma/beta
  • quprep convert --encoding zz_feature_map|pauli_feature_map|random_fourier|tensor_product|qaoa_problem — new encoding choices in CLI
  • Binder + Colab launch badges on all 11 example notebooks and README
  • HuggingFace Spaces demo badge in README

Fixed

  • prepare() — closure variable capture bug: when both encoding and framework are custom plugins, the second plugin_cls assignment overwrote the first lambda's captured variable, causing the encoder lambda to instantiate the exporter class. Fixed with default-argument binding (lambda cls=plugin_cls: cls())

v0.5.0

01 Apr 00:36
7235b08

Choose a tag to compare

Added

Encoding comparison (quprep.compare)

  • compare_encodings(source, *, include, exclude, task, qubits) — analytical side-by-side cost comparison of all (or selected) encoders; no circuits generated
  • ComparisonResult.rows (list of CostEstimate), .best(prefer="nisq"|"depth"|"gates"|"qubits"), .to_dict(), __str__() ASCII table with starred recommendation when task= is passed
  • quprep compare <file> [--task] [--qubits] [--include] [--exclude] CLI subcommand
  • Exported: qd.compare_encodings, qd.ComparisonResult

Smarter encoding recommendation (quprep.core.recommender)

  • entangled_angle added to recommendation engine (was an encoder but previously invisible to recommend())
  • 4 new dataset profile signals: missing_rate, sparsity, has_negatives, feature_collinear (mean pairwise Pearson correlation)
  • 9 new dataset-aware scoring rules: amplitude penalised for large sample counts and high missing rate; basis boosted for sparse data, penalised for negative values; IQP/entangled_angle boosted for correlated features; IQP penalised for wide data; reupload penalised for tiny datasets, boosted for large ones

Auto qubit count suggestion (quprep.core.qubit_suggestion)

  • suggest_qubits(source, *, task, max_qubits)QubitSuggestion — recommends a qubit budget based on dataset size and target task
  • QubitSuggestion.n_qubits, .n_features, .nisq_safe, .encoding_hint, .reasoning, .warning (set when reduction is needed)
  • quprep suggest <file> [--task] [--max-qubits] CLI subcommand
  • Exported: qd.suggest_qubits, qd.QubitSuggestion

Pipeline serialization (quprep.core.pipeline)

  • Pipeline.save(path) — pickles the fitted pipeline; creates parent directories automatically
  • Pipeline.load(path) — classmethod; restores a fitted pipeline ready for transform() without re-fitting; raises TypeError for non-Pipeline files

Batch export (quprep.export.qasm_export, quprep.__init__)

  • QASMExporter.save_batch(encoded_list, directory, stem) — saves each sample as {stem}_{i:04d}.qasm; creates output directory automatically; returns list of Path objects
  • qd.batch_export(source, directory, *, encoding, stem) — top-level one-liner: runs prepare() then save_batch()
  • quprep convert <file> --save-dir <dir> [--stem <stem>] — CLI flag added to convert subcommand

Data drift detection (quprep.core.drift)

  • DriftDetector(mean_threshold=3.0, std_threshold=2.0, warn=True) — detects statistical drift between training and new data
  • fit(dataset) — records per-feature mean and std from training data (NaN-safe)
  • check(dataset)DriftReport — flags features where mean shifts > threshold σ or std ratio exceeds bounds; issues QuPrepWarning when drift found
  • DriftReport.overall_drift, .drifted_features, .n_features_drifted, .feature_stats (per-feature train/new mean, std, σ-shift, std_ratio)
  • Pipeline(drift_detector=DriftDetector()) — detector fitted post-reduction, checked on every transform() call
  • PipelineResult.drift_reportDriftReport | None; preserved through save()/load()
  • Exported: qd.DriftDetector, qd.DriftReport

Changed

  • Pipeline.__init__ — new drift_detector parameter (default None; backwards compatible)
  • PipelineResult.__init__ — new drift_report attribute (default None; backwards compatible)
  • Pipeline.get_params() / set_params()drift_detector included in parameter dict

v0.4.0

28 Mar 13:19

Choose a tag to compare

Added

Validation & schema (quprep.validation)

  • QuPrepWarning — custom warning class; all pipeline warnings use this category so they can be filtered precisely
  • validate_dataset(dataset) — structural checks at pipeline entry: shape, dtype, NaN detection with fractional coverage warning
  • warn_qubit_mismatch(n_features, n_qubits, encoding) — warns when features exceed qubit budget
  • DataSchema / FeatureSpec / SchemaViolationError — declare expected feature names, types, and value ranges; attach via Pipeline(schema=...) to enforce at entry; all violations collected and reported together
  • DataSchema.infer(dataset) — auto-builds schema from a reference dataset
  • DataSchema.to_json() / from_json() / to_dict() / from_dict() — full serialisation round-trip; terse output (omits None fields and nullable=False)

Cost estimation (quprep.validation.cost)

  • CostEstimate — dataclass: encoding, n_features, n_qubits, gate_count, circuit_depth, two_qubit_gates, nisq_safe, warning
  • estimate_cost(encoder, n_features) — formula-accurate gate counts for all 7 encoders; NISQ-safe flag (depth < 200, CNOTs < 50)

Pipeline & PipelineResult (quprep.core.pipeline)

  • PipelineResult.costCostEstimate | None; populated at fit time whenever an encoder is configured; shown in repr()
  • PipelineResult.audit_loglist[dict] | None; one entry per preprocessing stage with {stage, n_samples_in, n_features_in, n_samples_out, n_features_out}
  • PipelineResult.summary() — prints audit log as an aligned table and cost breakdown
  • Pipeline.fit(source, y=None) / .transform(source) — full sklearn-compatible split; transform() raises RuntimeError before fit()
  • Pipeline.get_params() / .set_params(**params) — hyperparameter search ready
  • Pipeline(schema=...) — validates dataset at entry before any stage runs
  • Pipeline.summary() / __str__ — human-readable snapshot: configured stages, fitted status, resolved normalizer, schema feature count, last cost estimate

Sklearn-compatible fit/transform on all stateful stages

  • Every stage now has separate fit(dataset) and transform(dataset) methods; fit_transform delegates; NotFittedError raised on transform() before fit()
  • Stages: Scaler, Imputer, OutlierHandler, CategoricalEncoder, FeatureSelector, PCAReducer, LDAReducer, HardwareAwareReducer, SpectralReducer, TSNEReducer, UMAPReducer
  • CategoricalEncoder aligns one-hot columns between train and test sets at transform time
  • Dataset.copy() — deep copy for safe fit/transform stage splitting

import quprep as qd — top-level namespace alias

  • All public classes exported directly: all 7 encoders, all cleaners (Imputer, OutlierHandler, CategoricalEncoder, FeatureSelector), Scaler, all reducers, QASMExporter, PipelineResult, all validation classes
  • No sub-imports needed: qd.AngleEncoder(), qd.PCAReducer(), qd.DataSchema(...), etc.

quprep validate CLI

  • quprep validate dataset.csv — shape, column names, NaN report per column (count + %), value ranges
  • quprep validate dataset.csv --schema schema.json — validates against a JSON schema (array of {name, dtype, min_value?, max_value?, nullable?}); exits 1 on violation
  • quprep validate dataset.csv --infer-schema output.json — infers schema from the CSV and writes it to a file; use "-" to print to stdout

Type stubs (.pyi files)

  • Stubs added for: Dataset, Pipeline / PipelineResult, Scaler, BaseEncoder / EncodedResult, and the full validation public API

Zenodo DOI

  • Placeholder badge and doi field added to README and BibTeX citation
  • No custom GitHub Actions workflow needed — Zenodo's native GitHub integration archives each Release automatically

v0.3.0

21 Mar 16:51

Choose a tag to compare

Added

QUBO / Ising conversion (quprep.qubo)

  • to_qubo(cost_matrix, constraints, penalty) — converts any square cost matrix to upper-triangular QUBO form; supports equality and inequality constraints via Lagrangian penalty
  • QUBOResult — holds Q matrix, offset, variable map, n_original; .to_ising(), .evaluate(x), .to_dwave(), .to_dict() / .from_dict() methods
  • IsingResult — holds h, J, offset; .to_qubo() round-trip conversion
  • qubo_to_ising(qubo) — QUBO → Ising transformation (s = 2x − 1); energy-consistent for all binary inputs
  • ising_to_qubo(ising) — Ising → QUBO inverse transformation; completes the bidirectional round-trip
  • equality_penalty(A, b, penalty) — encodes Ax = b as a QUBO penalty matrix
  • inequality_penalty(A, b, penalty) — encodes Ax ≤ b via binary slack variables; augments Q from (n,n) to (n+K,n+K)
  • add_qubo(q1, q2, weight) — combines two same-size QUBOs; useful for multi-objective problems

Problem library (quprep.qubo.problems) — 7 NP-hard combinatorial problems

  • max_cut(adjacency) — Max-Cut graph partitioning
  • knapsack(weights, values, capacity, penalty) — 0/1 Knapsack
  • tsp(distance_matrix, penalty) — Travelling Salesman Problem (n² binary variables)
  • portfolio(returns, covariance, budget, risk_penalty, budget_penalty) — Markowitz portfolio optimization
  • graph_color(adjacency, n_colors, penalty) — Graph Colouring (n×K binary variables)
  • scheduling(processing_times, n_machines, penalty) — Job scheduling / load balancing
  • number_partition(values, penalty) — Number Partitioning

Solvers (quprep.qubo.solver)

  • solve_brute(qubo, max_n=20) — exact exhaustive solver; evaluates all 2^n states; practical up to n=20
  • solve_sa(qubo, n_steps, T_start, T_end, seed, restarts) — simulated annealing heuristic; O(n) incremental energy update with geometric cooling; scales to n ~ 500+

QAOA circuit generator (quprep.qubo.qaoa)

  • qaoa_circuit(qubo, p, gamma, beta) — generates a p-layer QAOA ansatz as OpenQASM 3.0; converts QUBO → Ising internally; compatible with Qiskit, Cirq, and any QASM backend

Visualization (quprep.qubo.visualize) — requires pip install quprep[viz]

  • draw_qubo(qubo, title, cmap, ax) — heatmap of Q matrix with symmetric colour scale; annotates cells for n ≤ 10
  • draw_ising(ising, title, ax) — circular graph layout; node colour = h_i bias; edge colour/width = J_ij coupling strength

CLI (quprep qubo)

  • quprep qubo maxcut --adjacency ... [--solve]
  • quprep qubo knapsack --weights ... --values ... --capacity ... [--solve]
  • quprep qubo tsp --distances ... [--solve]
  • quprep qubo schedule --times ... --machines ... [--solve]
  • quprep qubo partition --values ... [--solve]
  • quprep qubo portfolio --returns ... --covariance ... --budget ... [--solve]
  • quprep qubo graphcolor --adjacency ... --colors ... [--solve]
  • quprep qubo qaoa <problem> ... [--p N] [--gamma ...] [--beta ...] [--output file]
  • quprep qubo export <problem> ... [--format json|npy] [--output file]
  • --solve auto-switches from exact to simulated annealing for n > 20

v0.2.0

20 Mar 23:14

Choose a tag to compare

QuPrep v0.2.0

Install

pip install quprep                        # core
pip install quprep[qiskit]               # Qiskit export
pip install quprep[pennylane]            # PennyLane export
pip install quprep[cirq]                 # Cirq export
pip install quprep[tket]                 # TKET export
pip install quprep[viz]                  # matplotlib circuit diagrams
pip install quprep[all]                  # everything

Added

Reduce

  • PCAReducer — wraps sklearn PCA; supports integer or variance-fraction n_components; explained_variance_ratio_ property after fit
  • LDAReducer — wraps sklearn LDA; maximises class separability; labels passed at init or fit time
  • SpectralReducer — row-wise FFT, keeps first n frequency magnitudes; outputs always ≥ 0
  • TSNEReducer — wraps sklearn TSNE with random_state=42 for reproducibility
  • UMAPReducer — wraps umap-learn (optional: pip install umap-learn); raises ImportError with install hint if absent
  • HardwareAwareReducer — auto-reduces to a backend's qubit budget via PCA; accepts backend name (e.g. 'ibm_brisbane') or integer qubit count

Encode

  • EntangledAngleEncoder — rotation layer + CNOT entangling layer, repeated layers times; supports linear, circular, and full entanglement topologies
  • IQPEncoder — Havlíček et al. 2019 feature map with pairwise ZZ interactions; reps parameter
  • ReUploadEncoder — Pérez-Salinas et al. 2020 data re-uploading; layers and rotation parameters
  • HamiltonianEncoder — Trotterized single-qubit Z Hamiltonian evolution; evolution_time and trotter_steps parameters

Export

  • PennyLaneExporter — returns a callable qml.QNode; supports all encodings; interface and device parameters (pip install quprep[pennylane])
  • CirqExporter — returns a cirq.Circuit; supports angle, basis, IQP, re-upload, Hamiltonian encodings (pip install quprep[cirq])
  • TKETExporter — returns a pytket.Circuit; angles auto-converted to pytket half-turns (pip install quprep[tket])
  • draw_ascii(encoded) — no-dependency ASCII circuit diagram for any EncodedResult; returns a printable string
  • draw_matplotlib(encoded, filename=None) — matplotlib circuit diagram; returns a Figure or saves to PNG/PDF/SVG (pip install quprep[viz])

Recommend

  • recommend(source, task, qubits) — scores all encodings against dataset profile and task; returns EncodingRecommendation with ranked alternatives
  • EncodingRecommendation.apply() — directly applies the recommendation to data and returns a PipelineResult

CLI

  • quprep recommend <file> [--task classification|regression|qaoa|kernel|simulation] [--qubits N]
  • quprep convert now supports --framework pennylane|cirq|tket

Changed

  • QASMExporter now supports entangled angle, IQP, re-upload, and Hamiltonian encodings
  • Pipeline auto-normalizes IQP/re-upload → minmax_pm_pi, Hamiltonian → zscore
  • prepare() accepts encoding='iqp', 'reupload', 'hamiltonian' with matching kwargs

Fixed

  • HamiltonianEncoder via prepare() and Pipeline was broken — _encoding_key() returned "zscore" instead of "hamiltonian", causing auto_normalizer() to raise ValueError

Documentationquprep.readthedocs.io

QuPrep v0.1.0 — Initial Release

20 Mar 18:14

Choose a tag to compare

QuPrep v0.1.0

QuPrep is a focused Python library for preparing classical datasets for quantum computing.
It covers the full pipeline from raw data to circuit-ready output: ingest → clean → normalize → encode → export.

Install

pip install quprep          # core, no framework deps
pip install quprep[qiskit]  # with Qiskit export

What's included

Three encoders

Encoder Qubits Depth Use case
AngleEncoder (Ry/Rx/Rz) n = d O(1) Most QML tasks
AmplitudeEncoder ⌈log₂ d⌉ O(2ⁿ) Qubit-limited
BasisEncoder n = d O(1) Binary / QAOA

Two exporters

  • QASMExporter — OpenQASM 3.0 strings, no optional dependencies
  • QiskitExporter — Qiskit QuantumCircuit objects (angle, basis, amplitude via StatePreparation)

Full cleaning stage

  • Imputer — mean, median, mode, KNN, MICE, drop
  • OutlierHandler — IQR, Z-score, Isolation Forest (clip or remove)
  • CategoricalEncoder — one-hot, label, ordinal
  • FeatureSelector — correlation, mutual information, variance

Automatic normalization

  • Pipeline selects the mathematically correct scaler per encoder automatically (e.g. minmax → [0, π] for Ry, L2 norm for amplitude)

One-liner API

import quprep

result = quprep.prepare("data.csv", encoding="angle", framework="qasm")
print(result.circuit)

CLI

quprep convert data.csv --encoding angle --framework qasm

Examples — four worked examples as .py scripts and .ipynb notebooks in examples/.

Documentationquprep.readthedocs.io


What's next (v0.2.0)

  • IQP, Data Re-upload, and Hamiltonian encoders
  • PennyLane, Cirq, and TKET exporters
  • PCA, LDA, and Spectral dimensionality reducers
  • Encoding recommendation engine (quprep recommend)