MetaX Cookbook

This guidebook is for the MetaX GUI version. If you are using the CLI, we recommend reading the documentation for instructions on how to use each MetaX module from the command line.

Overview

MetaX is a novel tool for linking peptide sequences with taxonomic and functional information in Metaproteomics. We introduce the Operational Taxon-Function (OTF) concept to explore microbial roles and interactions ("who is doing what and how") within ecosystems.

MetaX also features statistical modules and plotting tools for analyzing peptides, taxa, functions, proteins, and taxon-function contributions across groups.

abstract

Project Page

Visit GitHub to get more information:

https://github.com/byemaxx/MetaX

Getting Started

main_window tools_menu


Exploring Data with MetaX

See the Preparing Your Data section to build the database and annotate peptides to OTFs before starting.

Module 1. OTF Analyzer

After obtaining the Operational Taxa-Functions (OTF) Table using the Peptide Annotator, you can perform downstream analysis with the OTF Analyzer.

1. Data Preparation

OTFs (Operational Taxa-Functions) Table: Obtained from the Peptide Annotator module.

Meta Table: The first column is sample names, and the other columns represent different groups. If no meta table is provided, meta info will be generated automatically: (1) all samples are in the same group; (2) each sample is a separate group.

Example Meta Table:

samples Individuals Treatment Sweetener
sample_1 V1 Treatment XYL
sample_2 V1 Treatment XYL
sample_3 V1 Treatment XYL
sample_4 V1 Control PBS
sample_5 V1 Control PBS
sample_6 V1 Control PBS

You can load example data by clicking the button.

load_example

Then, click Go to start the analysis.

2. Data Overview

The Data Overview provides basic information about your data, such as the number of taxa, functions, and proportions.

data_overview data_overview_func data_overview_filter

3. Set TaxaFunc

set_multi_table

Data Selection

FUNC_prop

Sum Proteins Intensity

Click Generate Protein Intensity Table to sum peptides to proteins if the Protein column is in the original table.

Data preprocessing

There are several methods for detecting and handling outliers.

In all methods, you can choose one meta column for outlier detection and another meta column for handling outliers.

You can choose outlier imputation by each group or by all samples.

If you use Z-Score, Mean centring, or Pareto Scaling for data normalization, the data will be given a minimum offset again to avoid negative values.

Then, click Go to create a TaxaFunc object for analysis.

TaxaFunc_ready

Then you can check the tables in the Table Review section and export them.

table_review table_review_open_window

4. Basic Stats

PCA, Correlation and Box Plot

basic_stats_pca

You can select meta groups or samples (default: all) to plot PCA, Correlation, and Box Plot for Taxa, Function, Taxa-Func, Peptide, and Protein tables.

pca pca_3d correlation boxplot

Heatmap and Bar Plot

add_to_list add_top_list add_a_list heatmap_original basic_stats_bar basic_stats_bar_setting

Peptide Query

peptide_query

5. Cross Test

T-TEST

t_test

ANOVA-TEST

anova_test

Significant Taxa-Func

Plot Cross Heatmap

t_test_res corss_heatmap_setting corss_heatmap t_test_heatmap

Group-Control TEST

Set a Group as "Control", then compare all groups to Control

Bingo! You noticed the hidden function of MetaX, click Help -> About -> Like 3 times to unlock the function to compare all groups to control.

DESeq2

(Ultra-Up(Down): |log2FC| > Max log2FC)

Tukey Test

tukey_test taxa_func_linked_only tukey_plot

6. Expression Analysis

Co-Expression Networks & Heatmap

image-20230728142905839 image-20230728143058568 co_network_pic image-20230728152236517 image-20230728150853953 bar_switch_satck bar_to_line

Taxa-Func Network

taxa_func_network

8. Restore Last TaxaFunc Object

Preparing Your Data

Module 2. Database Builder

Note: The results from MetaLab v2.3 MaxQuant workflow do not require database building. However, we do not recommend using these results as input to MetaX, as many peptides may be discarded.

Option 1: Build Database Using MGnify Data

Ensure you download the correct database type corresponding to your data.

dbbuilder

Option 2: Build Database Using Own Data

  1. Annotation Table: A TSV table (tab-separated), with the first column as protein name joined with Genome by "_", e.g., "Genome1_protein1", and other columns containing annotation information.
dbbuilder_own
  1. Taxa Table: A TSV table (tab-separated), with the first column as Genome name, e.g., "Genome1", and the second column as taxa.

Example Annotation Table:

Query Preferred_name EC KEGG_ko
MGYG000000001_00696 mfd - ko:K03723
MGYG000000001_02838 hxlR - -
MGYG000000001_01674 ispG 1.17.7.1,1.17.7.3 ko:K03526
MGYG000000001_02710 glsA 3.5.1.2 ko:K01425
MGYG000000001_01356 mutS2 - ko:K07456
MGYG000000001_02630 - - -
MGYG000000001_02418 ackA 2.7.2.1 ko:K00925
MGYG000000001_00728 atpA 3.6.3.14 ko:K02111
MGYG000000001_00695 pth 3.1.1.29 ko:K01056
MGYG000000001_02907 - - ko:K03086
MGYG000000001_02592 rplC - ko:K02906
MGYG000000001_00137 - - ko:K03480,ko:K03488

Example Taxa Table:

Genome Lineage
MGYG000000001 d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Peptostreptococcales;f_Peptostreptococcaceae;g_GCA-900066495;s_GCA-900066495 sp902362365
MGYG000000002 d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Lachnospirales;f_Lachnospiraceae;g_Blautia_A;s_Blautia_A faecis
MGYG000000003 d_Bacteria;p_Bacteroidota;c_Bacteroidia;o_Bacteroidales;f_Rikenellaceae;g_Alistipes;s_Alistipes shahii
MGYG000000004 d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Oscillospirales;f_Ruminococcaceae;g_Anaerotruncus;s_Anaerotruncus colihominis
MGYG000000005 d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Peptostreptococcales;f_Peptostreptococcaceae;g_Terrisporobacter;s_Terrisporobacter glycolicus_A
MGYG000000006 d_Bacteria;p_Firmicutes;c_Bacilli;o_Staphylococcales;f_Staphylococcaceae;g_Staphylococcus;s_Staphylococcus xylosus
MGYG000000007 d_Bacteria;p_Firmicutes;c_Bacilli;o_Lactobacillales;f_Lactobacillaceae;g_Lactobacillus;s_Lactobacillus intestinalis
MGYG000000008 d_Bacteria;p_Firmicutes;c_Bacilli;o_Lactobacillales;f_Lactobacillaceae;g_Lactobacillus;s_Lactobacillus johnsonii
MGYG000000009 d_Bacteria;p_Firmicutes;c_Bacilli;o_Lactobacillales;f_Lactobacillaceae;g_Ligilactobacillus;s_Ligilactobacillus murinus

Module 3. Database Updater

The Database Updater allows updating the database built by the Database Builder or adding more annotations. This step is optional.

db_updater

Option 1: Built-in Mode

We recommend some extended databases, such as dbCAN_seq.

Option 2: TSV Table

Extend the database by adding a new database to the database table. Ensure the column separator is a tab and the first column is the Protein name, with other columns containing function annotations.

Example:

Protein ID COG KEGG ...
MGYG000000001_02630 Function 1 Function 1 ...
MGYG000000001_01475 Function 2 Function 1 ...
MGYG000000001_01539 Function 3 Function 1 ...

Module 4. Peptide Annotator

1. Results from MAG Workflow

These peptide results use metagenome-assembled genomes (MAGs) as the reference database for protein searches, such as DIA-NN, MetaLab-MAG, MetaLab-DIA, and other workflows that use MAG databases like MGnify or custom MAG databases.

peptide2taxafunc

Required:

2. Results from MaxQuant Workflow

These peptide results come from the MetaLab 2.3 MaxQuant workflow.

peptide2taxafunc_tab2_1 peptide2taxafunc_tab2_2


Developer Tools

show_console

Enjoy MetaX

If you have any issues or suggestions, please open a new issue on GitHub.