Dataset: Xenium Human Lung Preview — Non-diseased FFPE
Source & Download: 10x Genomics — Xenium Human Lung Preview (standard)
Licensed under CC BY 4.0
This repository implements a transparent and flexible pipeline for processing Xenium spatial transcriptomics data from raw output to spatial visualization.
Unlike workflows that depend on Zarr or aggregated formats, this approach uses step-by-step raw data processing for reproducibility and educational clarity.
-
Reproducibility & clarity
You manually stream and filter transcripts (e.g., Q ≥ 20) and build the cell×gene matrix, making every step clear and auditable. -
Robustness to changes
Xenium data format may change over time; this pipeline handles schema differences gracefully (e.g. variations in column names likex_locationvsx_centroid). -
Scalability & memory efficiency
The two-pass Arrow/Parquet batching avoids memory issues, allowing for large datasets to be handled smoothly. -
Customizable & extensible
Users can easily adjust quality thresholds or extend the pipeline to other spatial platforms (e.g., CosMx, MERFISH).
- scripts/unpack_all.py — Extracts Xenium data from zipped output bundles.
- notebooks/preview_quickstart.ipynb — Walk-through notebook to:
- Load raw
cells.parquetandtranscripts.parquet, plus image and metrics files - Filter and build a sparse count matrix
- Construct and QC an
AnnDataobject - Normalize, cluster, and visualize spatial patterns
- Compute neighborhood enrichment and save results
- Load raw
- Clone this repository
git clone https://github.com/jrs-orellana/xenium2anndata-analysis-workflow
- Download the Xenium dataset ZIP from the link above and place it in
data/. - Run
python scripts/unpack_all.pyto extract the dataset. - Install dependencies (see below).
- Open and run the notebooks to process the data:
01_xenium_raw2anndata.ipynb— conversion from raw Parquet → AnnData02_xenium_downstream.ipynb— QC, clustering, marker detection, spatial plots
- Explore results in
results/figures/and the processed.h5adfile.
Main packages required (see full requirements.txt for exact versions):
- scanpy — single-cell analysis and visualization
- squidpy — spatial transcriptomics analysis
- spatialdata-io — Xenium/Visium data import
- pyarrow — efficient batch processing of Parquet files
- numpy and pandas — data handling
- matplotlib and seaborn — visualization
Install via:
pip install -r requirements.txt| Total Counts | Genes per Cell | Counts vs Genes |
|---|---|---|
![]() |
![]() |
![]() |
| Post-QC Density | PCA Scree | UMAP (Leiden) |
|---|---|---|
![]() |
![]() |
![]() |
| UMAP: n Genes by Cell | UMAP: Total Counts | Spatial Leiden |
|---|---|---|
![]() |
![]() |
![]() |
| Cell-Type Scores | Compartments (High) |
|---|---|
![]() |
![]() |
| Marker Dotplot | Marker Heatmap | Neighborhood Enrichment |
|---|---|---|
![]() |
![]() |
![]() |
If you use this repository or adapt parts of the workflow, please cite it as:
APA style:
Orellana-Montes, J. (2025). xenium2anndata-analysis-workflow: Transparent pipeline for Xenium spatial transcriptomics. GitHub. Available at: https://github.com/jrs-orellana/xenium2anndata-analysis-workflow
BibTeX:
@misc{xenium2anndata2025,
author = {Julio Orellana-Montes},
title = {xenium2anndata-analysis-workflow: Transparent pipeline for Xenium spatial transcriptomics},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/jrs-orellana/xenium2anndata-analysis-workflow}}
}Dataset: Xenium Human Lung Preview — Non-diseased FFPE, 10x Genomics (licensed under CC BY 4.0).
Please cite per 10x Genomics citation guidelines.
- Name:
xenium2anndata-analysis-workflow - Purpose: Detailed, manual parsing and processing of Xenium raw data
- Strengths: Transparency, flexibility, reproducibility over convenience
This project is released under the MIT License. See LICENSE for details.
Dataset belongs to 10x Genomics and is licensed under CC BY 4.0.
Author: Julio Orellana-Montes
For questions, suggestions, or collaborations: open an issue or pull request on GitHub,
or contact me at [email protected]













