Releases: quprep/quprep
v0.8.0
Added
Data connectors
HuggingFaceIngester— load any HuggingFace dataset with automatic modality detection (modality="auto"sniffs Image/text/tabular/graph feature schemas); supports tabular, image, text, and graph modalities;pip install quprep[huggingface]OpenMLIngester— load OpenML datasets by integer ID (load(61)) or name (load("iris")); version pinning; no account required for public datasets;pip install quprep[openml]KaggleIngester— download Kaggle datasets (load("owner/name")) and competition data (load_competition("slug")); uses~/.kaggle/kaggle.jsonor env vars;pip install quprep[kaggle]
CLI tools
quprep inspect <file>— profile a dataset without encoding: shape, feature types, missing values, sparsity, per-feature statistics, and an optional encoding recommendationquprep benchmark <file>— run all (or selected) encoders on a dataset and report gate count, circuit depth, two-qubit gates, encode time, and NISQ safety in a side-by-side table;--include,--exclude,--task,--samples,--outputflags
Reproducibility
fingerprint_pipeline(pipeline)→FingerprintResult— deterministic SHA-256 hash of the full pipeline configuration (stage classes + parameters + dependency versions); stable across runs for the same config; serialisable to dict/JSON for paper methods sections and experiment logs
New extras
quprep[huggingface]—datasets>=2.0quprep[openml]—openml>=0.14quprep[kaggle]—kaggle>=1.6
v0.7.0
Added
Data modalities
TimeSeriesIngester— timestamped CSV ingestion; stores time column indataset.metadata["time_index"]; excludes time from feature matrixWindowTransformer— sliding window over(n_timesteps, n_features)→(n_windows, window × n_features); lag-named features; use viaPipeline(preprocessor=WindowTransformer(...))ImageIngester— loads images from flat or subdirectory-labeled directory structures; grayscale/RGB, configurable resize and normalization;pip install quprep[image]TextIngester— TF-IDF (no extra deps) or sentence-transformer dense embeddings (pip install quprep[text]); accepts CSV,.txtfile lists, or Python string lists;target_columnsupportGraphIngester— two ingestion paths: lossy (Laplacian eigenvalues + degree features, configurablen_features, padding) and lossless (features="adjacency"— flattened upper triangle of adjacency matrix);networkx.GraphsupportGraphStateEncoder— lossless graph state circuit: H⊗n followed by CZ gates per adjacency edge; pairs withGraphIngester(features="adjacency")for a clean pipeline path; QASM export- Sparse data support —
NumpyIngesterandPipelineautomatically convertscipy.sparsematrices to dense - Multi-label / multi-output —
NumpyIngester(y=labels),CSVIngester(target_columns=[...]),Dataset.labelspropagated through all pipeline stages
New extras
quprep[image]—PillowforImageIngesterquprep[text]—sentence-transformers(+ PyTorch ~1–2 GB) for neural text embeddings; TF-IDF works with base installquprep[umap]—umap-learnforUMAPReducer; previously referenced in error messages but the extra did not existquprep[modalities]—Pillow+sentence-transformersconvenience group
API improvements
prepare(..., ingester=)— new parameter accepts any modality ingester; enables one-liner use withTimeSeriesIngester,ImageIngester,TextIngester,GraphIngesterPipeline(preprocessor=[t1, t2, ...])—preprocessorslot now accepts a list of transformers; audit log labels thempreprocessor[0],preprocessor[1], etc.
Changed
recommend()— added 5 missing v0.6.0 encoders:ZZFeatureMap,PauliFeatureMap,RandomFourier,TensorProduct,QAOAProblem; all with dataset-aware scoring; previously silently covered only 7 of 12 encodersquprep.qubo.__all__—solve_brute,solve_sa,SolveResultremoved from__all__; canonical import is nowfrom quprep.qubo.solver import solve_brute, solve_sa; backward-compat import kept (no breaking change)AmplitudeEncoder— emitsQuPrepWarningwhen input is zero-padded to the next power of two
Fixed
GraphStateEncoderhad no documented pipeline path;GraphIngester(features="adjacency")added as the canonical lossless routeprepare()missingingester=parameter made modality ingesters unusable via the one-liner APIquprep.qubosymbols (max_cut,to_qubo,qaoa_circuit, etc.) were not accessible viaqd.*namespace
v0.6.0
Added
ZZFeatureMapEncoder— Havlíček-style ZZ feature map (Qiskit-compatible convention)PauliFeatureMapEncoder— generalized Pauli feature map with configurable Pauli stringsRandomFourierEncoder— quantum-inspired RBF kernel approximation via random Fourier featuresTensorProductEncoder— full Bloch sphere encoding (Ry + Rz per qubit, 2 features per qubit)BraketExporter— Amazon Braket SDK circuit output (quprep[braket])QSharpExporter— Microsoft Q# program output (quprep[qsharp])IQMExporter— IQM native circuit JSON output (quprep[iqm])- Plugin system —
register_encoder/register_exporterdecorators +unregister_*,list_*,get_encoder_class,get_exporter_class; all wired intoprepare()and exported from top-levelqd.* quprep convert --framework braket|qsharp|iqm— new framework choices in CLIconvertsubcommandQAOAProblemEncoder— QAOA-inspired feature map; features as cost Hamiltonian parameters;players,linear/fullconnectivity, configurablegamma/betaquprep convert --encoding zz_feature_map|pauli_feature_map|random_fourier|tensor_product|qaoa_problem— new encoding choices in CLI- Binder + Colab launch badges on all 11 example notebooks and README
- HuggingFace Spaces demo badge in README
Fixed
prepare()— closure variable capture bug: when bothencodingandframeworkare custom plugins, the secondplugin_clsassignment overwrote the first lambda's captured variable, causing the encoder lambda to instantiate the exporter class. Fixed with default-argument binding (lambda cls=plugin_cls: cls())
v0.5.0
Added
Encoding comparison (quprep.compare)
compare_encodings(source, *, include, exclude, task, qubits)— analytical side-by-side cost comparison of all (or selected) encoders; no circuits generatedComparisonResult—.rows(list ofCostEstimate),.best(prefer="nisq"|"depth"|"gates"|"qubits"),.to_dict(),__str__()ASCII table with starred recommendation whentask=is passedquprep compare <file> [--task] [--qubits] [--include] [--exclude]CLI subcommand- Exported:
qd.compare_encodings,qd.ComparisonResult
Smarter encoding recommendation (quprep.core.recommender)
entangled_angleadded to recommendation engine (was an encoder but previously invisible torecommend())- 4 new dataset profile signals:
missing_rate,sparsity,has_negatives,feature_collinear(mean pairwise Pearson correlation) - 9 new dataset-aware scoring rules: amplitude penalised for large sample counts and high missing rate; basis boosted for sparse data, penalised for negative values; IQP/entangled_angle boosted for correlated features; IQP penalised for wide data; reupload penalised for tiny datasets, boosted for large ones
Auto qubit count suggestion (quprep.core.qubit_suggestion)
suggest_qubits(source, *, task, max_qubits)→QubitSuggestion— recommends a qubit budget based on dataset size and target taskQubitSuggestion—.n_qubits,.n_features,.nisq_safe,.encoding_hint,.reasoning,.warning(set when reduction is needed)quprep suggest <file> [--task] [--max-qubits]CLI subcommand- Exported:
qd.suggest_qubits,qd.QubitSuggestion
Pipeline serialization (quprep.core.pipeline)
Pipeline.save(path)— pickles the fitted pipeline; creates parent directories automaticallyPipeline.load(path)— classmethod; restores a fitted pipeline ready fortransform()without re-fitting; raisesTypeErrorfor non-Pipeline files
Batch export (quprep.export.qasm_export, quprep.__init__)
QASMExporter.save_batch(encoded_list, directory, stem)— saves each sample as{stem}_{i:04d}.qasm; creates output directory automatically; returns list ofPathobjectsqd.batch_export(source, directory, *, encoding, stem)— top-level one-liner: runsprepare()thensave_batch()quprep convert <file> --save-dir <dir> [--stem <stem>]— CLI flag added toconvertsubcommand
Data drift detection (quprep.core.drift)
DriftDetector(mean_threshold=3.0, std_threshold=2.0, warn=True)— detects statistical drift between training and new datafit(dataset)— records per-feature mean and std from training data (NaN-safe)check(dataset)→DriftReport— flags features where mean shifts > threshold σ or std ratio exceeds bounds; issuesQuPrepWarningwhen drift foundDriftReport—.overall_drift,.drifted_features,.n_features_drifted,.feature_stats(per-feature train/new mean, std, σ-shift, std_ratio)Pipeline(drift_detector=DriftDetector())— detector fitted post-reduction, checked on everytransform()callPipelineResult.drift_report—DriftReport | None; preserved throughsave()/load()- Exported:
qd.DriftDetector,qd.DriftReport
Changed
Pipeline.__init__— newdrift_detectorparameter (defaultNone; backwards compatible)PipelineResult.__init__— newdrift_reportattribute (defaultNone; backwards compatible)Pipeline.get_params()/set_params()—drift_detectorincluded in parameter dict
v0.4.0
Added
Validation & schema (quprep.validation)
QuPrepWarning— custom warning class; all pipeline warnings use this category so they can be filtered preciselyvalidate_dataset(dataset)— structural checks at pipeline entry: shape, dtype, NaN detection with fractional coverage warningwarn_qubit_mismatch(n_features, n_qubits, encoding)— warns when features exceed qubit budgetDataSchema/FeatureSpec/SchemaViolationError— declare expected feature names, types, and value ranges; attach viaPipeline(schema=...)to enforce at entry; all violations collected and reported togetherDataSchema.infer(dataset)— auto-builds schema from a reference datasetDataSchema.to_json()/from_json()/to_dict()/from_dict()— full serialisation round-trip; terse output (omitsNonefields andnullable=False)
Cost estimation (quprep.validation.cost)
CostEstimate— dataclass:encoding,n_features,n_qubits,gate_count,circuit_depth,two_qubit_gates,nisq_safe,warningestimate_cost(encoder, n_features)— formula-accurate gate counts for all 7 encoders; NISQ-safe flag (depth < 200, CNOTs < 50)
Pipeline & PipelineResult (quprep.core.pipeline)
PipelineResult.cost—CostEstimate | None; populated at fit time whenever an encoder is configured; shown inrepr()PipelineResult.audit_log—list[dict] | None; one entry per preprocessing stage with{stage, n_samples_in, n_features_in, n_samples_out, n_features_out}PipelineResult.summary()— prints audit log as an aligned table and cost breakdownPipeline.fit(source, y=None)/.transform(source)— full sklearn-compatible split;transform()raisesRuntimeErrorbeforefit()Pipeline.get_params()/.set_params(**params)— hyperparameter search readyPipeline(schema=...)— validates dataset at entry before any stage runsPipeline.summary()/__str__— human-readable snapshot: configured stages, fitted status, resolved normalizer, schema feature count, last cost estimate
Sklearn-compatible fit/transform on all stateful stages
- Every stage now has separate
fit(dataset)andtransform(dataset)methods;fit_transformdelegates;NotFittedErrorraised ontransform()beforefit() - Stages:
Scaler,Imputer,OutlierHandler,CategoricalEncoder,FeatureSelector,PCAReducer,LDAReducer,HardwareAwareReducer,SpectralReducer,TSNEReducer,UMAPReducer CategoricalEncoderaligns one-hot columns between train and test sets at transform timeDataset.copy()— deep copy for safe fit/transform stage splitting
import quprep as qd — top-level namespace alias
- All public classes exported directly: all 7 encoders, all cleaners (
Imputer,OutlierHandler,CategoricalEncoder,FeatureSelector),Scaler, all reducers,QASMExporter,PipelineResult, all validation classes - No sub-imports needed:
qd.AngleEncoder(),qd.PCAReducer(),qd.DataSchema(...), etc.
quprep validate CLI
quprep validate dataset.csv— shape, column names, NaN report per column (count + %), value rangesquprep validate dataset.csv --schema schema.json— validates against a JSON schema (array of{name, dtype, min_value?, max_value?, nullable?}); exits 1 on violationquprep validate dataset.csv --infer-schema output.json— infers schema from the CSV and writes it to a file; use"-"to print to stdout
Type stubs (.pyi files)
- Stubs added for:
Dataset,Pipeline/PipelineResult,Scaler,BaseEncoder/EncodedResult, and the fullvalidationpublic API
Zenodo DOI
- Placeholder badge and
doifield added to README and BibTeX citation - No custom GitHub Actions workflow needed — Zenodo's native GitHub integration archives each Release automatically
v0.3.0
Added
QUBO / Ising conversion (quprep.qubo)
to_qubo(cost_matrix, constraints, penalty)— converts any square cost matrix to upper-triangular QUBO form; supports equality and inequality constraints via Lagrangian penaltyQUBOResult— holds Q matrix, offset, variable map, n_original;.to_ising(),.evaluate(x),.to_dwave(),.to_dict()/.from_dict()methodsIsingResult— holds h, J, offset;.to_qubo()round-trip conversionqubo_to_ising(qubo)— QUBO → Ising transformation (s = 2x − 1); energy-consistent for all binary inputsising_to_qubo(ising)— Ising → QUBO inverse transformation; completes the bidirectional round-tripequality_penalty(A, b, penalty)— encodes Ax = b as a QUBO penalty matrixinequality_penalty(A, b, penalty)— encodes Ax ≤ b via binary slack variables; augments Q from (n,n) to (n+K,n+K)add_qubo(q1, q2, weight)— combines two same-size QUBOs; useful for multi-objective problems
Problem library (quprep.qubo.problems) — 7 NP-hard combinatorial problems
max_cut(adjacency)— Max-Cut graph partitioningknapsack(weights, values, capacity, penalty)— 0/1 Knapsacktsp(distance_matrix, penalty)— Travelling Salesman Problem (n² binary variables)portfolio(returns, covariance, budget, risk_penalty, budget_penalty)— Markowitz portfolio optimizationgraph_color(adjacency, n_colors, penalty)— Graph Colouring (n×K binary variables)scheduling(processing_times, n_machines, penalty)— Job scheduling / load balancingnumber_partition(values, penalty)— Number Partitioning
Solvers (quprep.qubo.solver)
solve_brute(qubo, max_n=20)— exact exhaustive solver; evaluates all 2^n states; practical up to n=20solve_sa(qubo, n_steps, T_start, T_end, seed, restarts)— simulated annealing heuristic; O(n) incremental energy update with geometric cooling; scales to n ~ 500+
QAOA circuit generator (quprep.qubo.qaoa)
qaoa_circuit(qubo, p, gamma, beta)— generates a p-layer QAOA ansatz as OpenQASM 3.0; converts QUBO → Ising internally; compatible with Qiskit, Cirq, and any QASM backend
Visualization (quprep.qubo.visualize) — requires pip install quprep[viz]
draw_qubo(qubo, title, cmap, ax)— heatmap of Q matrix with symmetric colour scale; annotates cells for n ≤ 10draw_ising(ising, title, ax)— circular graph layout; node colour = h_i bias; edge colour/width = J_ij coupling strength
CLI (quprep qubo)
quprep qubo maxcut --adjacency ... [--solve]quprep qubo knapsack --weights ... --values ... --capacity ... [--solve]quprep qubo tsp --distances ... [--solve]quprep qubo schedule --times ... --machines ... [--solve]quprep qubo partition --values ... [--solve]quprep qubo portfolio --returns ... --covariance ... --budget ... [--solve]quprep qubo graphcolor --adjacency ... --colors ... [--solve]quprep qubo qaoa <problem> ... [--p N] [--gamma ...] [--beta ...] [--output file]quprep qubo export <problem> ... [--format json|npy] [--output file]--solveauto-switches from exact to simulated annealing for n > 20
v0.2.0
QuPrep v0.2.0
Install
pip install quprep # core
pip install quprep[qiskit] # Qiskit export
pip install quprep[pennylane] # PennyLane export
pip install quprep[cirq] # Cirq export
pip install quprep[tket] # TKET export
pip install quprep[viz] # matplotlib circuit diagrams
pip install quprep[all] # everythingAdded
Reduce
PCAReducer— wraps sklearn PCA; supports integer or variance-fractionn_components;explained_variance_ratio_property after fitLDAReducer— wraps sklearn LDA; maximises class separability; labels passed at init or fit timeSpectralReducer— row-wise FFT, keeps first n frequency magnitudes; outputs always ≥ 0TSNEReducer— wraps sklearn TSNE withrandom_state=42for reproducibilityUMAPReducer— wraps umap-learn (optional:pip install umap-learn); raisesImportErrorwith install hint if absentHardwareAwareReducer— auto-reduces to a backend's qubit budget via PCA; accepts backend name (e.g.'ibm_brisbane') or integer qubit count
Encode
EntangledAngleEncoder— rotation layer + CNOT entangling layer, repeatedlayerstimes; supportslinear,circular, andfullentanglement topologiesIQPEncoder— Havlíček et al. 2019 feature map with pairwise ZZ interactions;repsparameterReUploadEncoder— Pérez-Salinas et al. 2020 data re-uploading;layersandrotationparametersHamiltonianEncoder— Trotterized single-qubit Z Hamiltonian evolution;evolution_timeandtrotter_stepsparameters
Export
PennyLaneExporter— returns a callableqml.QNode; supports all encodings;interfaceanddeviceparameters (pip install quprep[pennylane])CirqExporter— returns acirq.Circuit; supports angle, basis, IQP, re-upload, Hamiltonian encodings (pip install quprep[cirq])TKETExporter— returns apytket.Circuit; angles auto-converted to pytket half-turns (pip install quprep[tket])draw_ascii(encoded)— no-dependency ASCII circuit diagram for anyEncodedResult; returns a printable stringdraw_matplotlib(encoded, filename=None)— matplotlib circuit diagram; returns aFigureor saves to PNG/PDF/SVG (pip install quprep[viz])
Recommend
recommend(source, task, qubits)— scores all encodings against dataset profile and task; returnsEncodingRecommendationwith ranked alternativesEncodingRecommendation.apply()— directly applies the recommendation to data and returns aPipelineResult
CLI
quprep recommend <file> [--task classification|regression|qaoa|kernel|simulation] [--qubits N]quprep convertnow supports--framework pennylane|cirq|tket
Changed
QASMExporternow supports entangled angle, IQP, re-upload, and Hamiltonian encodingsPipelineauto-normalizes IQP/re-upload →minmax_pm_pi, Hamiltonian →zscoreprepare()acceptsencoding='iqp','reupload','hamiltonian'with matching kwargs
Fixed
HamiltonianEncoderviaprepare()andPipelinewas broken —_encoding_key()returned"zscore"instead of"hamiltonian", causingauto_normalizer()to raiseValueError
Documentation — quprep.readthedocs.io
QuPrep v0.1.0 — Initial Release
QuPrep v0.1.0
QuPrep is a focused Python library for preparing classical datasets for quantum computing.
It covers the full pipeline from raw data to circuit-ready output: ingest → clean → normalize → encode → export.
Install
pip install quprep # core, no framework deps
pip install quprep[qiskit] # with Qiskit exportWhat's included
Three encoders
| Encoder | Qubits | Depth | Use case |
|---|---|---|---|
AngleEncoder (Ry/Rx/Rz) |
n = d | O(1) | Most QML tasks |
AmplitudeEncoder |
⌈log₂ d⌉ | O(2ⁿ) | Qubit-limited |
BasisEncoder |
n = d | O(1) | Binary / QAOA |
Two exporters
QASMExporter— OpenQASM 3.0 strings, no optional dependenciesQiskitExporter— QiskitQuantumCircuitobjects (angle, basis, amplitude via StatePreparation)
Full cleaning stage
Imputer— mean, median, mode, KNN, MICE, dropOutlierHandler— IQR, Z-score, Isolation Forest (clip or remove)CategoricalEncoder— one-hot, label, ordinalFeatureSelector— correlation, mutual information, variance
Automatic normalization
- Pipeline selects the mathematically correct scaler per encoder automatically (e.g. minmax → [0, π] for Ry, L2 norm for amplitude)
One-liner API
import quprep
result = quprep.prepare("data.csv", encoding="angle", framework="qasm")
print(result.circuit)CLI
quprep convert data.csv --encoding angle --framework qasmExamples — four worked examples as .py scripts and .ipynb notebooks in examples/.
Documentation — quprep.readthedocs.io
What's next (v0.2.0)
- IQP, Data Re-upload, and Hamiltonian encoders
- PennyLane, Cirq, and TKET exporters
- PCA, LDA, and Spectral dimensionality reducers
- Encoding recommendation engine (
quprep recommend)