xenium2anndata-analysis-workflow

Dataset: Xenium Human Lung Preview — Non-diseased FFPE
Source & Download: 10x Genomics — Xenium Human Lung Preview (standard)
Licensed under CC BY 4.0

Glossary

Overview
Why this detailed pipeline matters
Repository Contents
Getting Started
Figures
Attribution

Overview

This repository implements a transparent and flexible pipeline for processing Xenium spatial transcriptomics data from raw output to spatial visualization.

Unlike workflows that depend on Zarr or aggregated formats, this approach uses step-by-step raw data processing for reproducibility and educational clarity.

Why this detailed pipeline matters

Reproducibility & clarity
You manually stream and filter transcripts (e.g., Q ≥ 20) and build the cell×gene matrix, making every step clear and auditable.
Robustness to changes
Xenium data format may change over time; this pipeline handles schema differences gracefully (e.g. variations in column names like x_location vs x_centroid).
Scalability & memory efficiency
The two-pass Arrow/Parquet batching avoids memory issues, allowing for large datasets to be handled smoothly.
Customizable & extensible
Users can easily adjust quality thresholds or extend the pipeline to other spatial platforms (e.g., CosMx, MERFISH).

Repository Contents

scripts/unpack_all.py — Extracts Xenium data from zipped output bundles.
notebooks/preview_quickstart.ipynb — Walk-through notebook to:
- Load raw cells.parquet and transcripts.parquet, plus image and metrics files
- Filter and build a sparse count matrix
- Construct and QC an AnnData object
- Normalize, cluster, and visualize spatial patterns
- Compute neighborhood enrichment and save results

Getting Started

Clone this repository

git clone https://github.com/jrs-orellana/xenium2anndata-analysis-workflow

Download the Xenium dataset ZIP from the link above and place it in data/.
Run python scripts/unpack_all.py to extract the dataset.
Install dependencies (see below).
Open and run the notebooks to process the data:
- 01_xenium_raw2anndata.ipynb — conversion from raw Parquet → AnnData
- 02_xenium_downstream.ipynb — QC, clustering, marker detection, spatial plots
Explore results in results/figures/ and the processed .h5ad file.

Dependencies

Main packages required (see full requirements.txt for exact versions):

scanpy — single-cell analysis and visualization
squidpy — spatial transcriptomics analysis
spatialdata-io — Xenium/Visium data import
pyarrow — efficient batch processing of Parquet files
numpy and pandas — data handling
matplotlib and seaborn — visualization

Install via:

pip install -r requirements.txt

Figures

Quality Control

Total Counts	Genes per Cell	Counts vs Genes

Spatial Density & Dimensionality Reduction

Post-QC Density	PCA Scree	UMAP (Leiden)

Gene Count Gradients & Global Clustering

UMAP: n Genes by Cell	UMAP: Total Counts	Spatial Leiden

Cell-Type Inference & Compartments

Cell-Type Scores	Compartments (High)

Markers, Enrichment & Sizes

Marker Dotplot	Marker Heatmap	Neighborhood Enrichment

Citation

If you use this repository or adapt parts of the workflow, please cite it as:

APA style:

Orellana-Montes, J. (2025). xenium2anndata-analysis-workflow: Transparent pipeline for Xenium spatial transcriptomics. GitHub. Available at: https://github.com/jrs-orellana/xenium2anndata-analysis-workflow

BibTeX:

@misc{xenium2anndata2025,
  author       = {Julio Orellana-Montes},
  title        = {xenium2anndata-analysis-workflow: Transparent pipeline for Xenium spatial transcriptomics},
  year         = {2025},
  publisher    = {GitHub},
  journal      = {GitHub repository},
  howpublished = {\url{https://github.com/jrs-orellana/xenium2anndata-analysis-workflow}}
}

Attribution

Dataset: Xenium Human Lung Preview — Non-diseased FFPE, 10x Genomics (licensed under CC BY 4.0).
Please cite per 10x Genomics citation guidelines.

Repo Summary

Name: xenium2anndata-analysis-workflow
Purpose: Detailed, manual parsing and processing of Xenium raw data
Strengths: Transparency, flexibility, reproducibility over convenience

License

This project is released under the MIT License. See LICENSE for details.

Dataset belongs to 10x Genomics and is licensed under CC BY 4.0.

Contact

Author: Julio Orellana-Montes
For questions, suggestions, or collaborations: open an issue or pull request on GitHub,
or contact me at [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xenium2anndata-analysis-workflow

Glossary

Overview

Why this detailed pipeline matters

Repository Contents

Getting Started

Dependencies

Figures

Quality Control

Spatial Density & Dimensionality Reduction

Gene Count Gradients & Global Clustering

Cell-Type Inference & Compartments

Markers, Enrichment & Sizes

Citation

Attribution

Repo Summary

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
notebooks		notebooks
results/figures		results/figures
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

xenium2anndata-analysis-workflow

Glossary

Overview

Why this detailed pipeline matters

Repository Contents

Getting Started

Dependencies

Figures

Quality Control

Spatial Density & Dimensionality Reduction

Gene Count Gradients & Global Clustering

Cell-Type Inference & Compartments

Markers, Enrichment & Sizes

Citation

Attribution

Repo Summary

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages