MetaX Cookbook
This guidebook is for the MetaX GUI version. If you are using the CLI, we recommend reading the documentation for instructions on how to use each MetaX module from the command line.
Overview
MetaX is a novel tool for linking peptide sequences with taxonomic and functional information in Metaproteomics. We introduce the Operational Taxon-Function (OTF) concept to explore microbial roles and interactions ("who is doing what and how") within ecosystems.
MetaX also features statistical modules and plotting tools for analyzing peptides, taxa, functions, proteins, and taxon-function contributions across groups.
Project Page
Visit GitHub to get more information:
https://github.com/byemaxx/MetaX
Getting Started
- The main window of MetaX
- Click 'Tools Menu' to switch different modules
Exploring Data with MetaX
See the Preparing Your Data section to build the database and annotate peptides to OTFs before starting.
Module 1. OTF Analyzer
After obtaining the Operational Taxa-Functions (OTF) Table using the Peptide Annotator, you can perform downstream analysis with the OTF Analyzer.
1. Data Preparation
OTFs (Operational Taxa-Functions) Table: Obtained from the Peptide Annotator module.
Meta Table: The first column is sample names, and the other columns represent different groups. If no meta table is provided, meta info will be generated automatically: (1) all samples are in the same group; (2) each sample is a separate group.
Example Meta Table:
| samples | Individuals | Treatment | Sweetener |
|---|---|---|---|
| sample_1 | V1 | Treatment | XYL |
| sample_2 | V1 | Treatment | XYL |
| sample_3 | V1 | Treatment | XYL |
| sample_4 | V1 | Control | PBS |
| sample_5 | V1 | Control | PBS |
| sample_6 | V1 | Control | PBS |
You can load example data by clicking the button.
Then, click Go to start the analysis.
- Advanced Settings

- Peptide Column Name: Specifies the column in the OTF table that contains peptide information.
- Protein Column Name: Specifies the column in the OTF table that contains protein information (only required if protein summation is performed in downstream analysis).
- Sample Column Prefix: Identifies the prefix of sample columns to determine intensity columns in the OTF table.
- Any Data Mode: Allows analysis of any table using MetaX, not limited to OTF tables (only partial tool functionality is available).
- Customized Table Item Column Name: Specifies the column containing item names in any data mode. If left empty, the first column will be selected by default.
2. Data Overview
The Data Overview provides basic information about your data, such as the number of taxa, functions, and proportions.
- Set the threshold for linked peptides and the differences between them to plot figures.
- Select different functions to plot the proportion distribution.
- Filter out samples for downstream analysis.
3. Set TaxaFunc
Data Selection
-
Function: Select a function for downstream analysis (None in the list means no function is selected, focusing only on peptides and taxa).
-
Function Filter Threshold: If a specific function within a protein group of a peptide has the highest proportion, it will be considered the representative function for that peptide. The default threshold is 1.00 (100%).
-
Taxa Level: Select a taxa level for downstream analysis (Life in the list means no filtering by taxa, and the following analysis focuses on functions).
-
Peptide Number Threshold: Only keep taxa, functions, or OTFs that have at least the specified number of peptides.
-
Split Function: Split the annotations with multi-functions.
KO Intensity ko:K00625,ko:K13788 10 to
KO Intensity ko:K00625 10 ko:K13788 10 If Share Intensity is checked, the intensity above will be split equally, giving 5 to each KO.
-
Remove unknown taxa: Checked by default. When enabled, peptides that are not annotated to the selected taxonomic level will be removed. When unchecked, such peptides will be retained and labeled as unknown, for example:
text d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__UMGS363;s_to
text d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__UMGS363;s_unknown -
Create Taxa and Func only from OTFs:
-
Without selection (checkbox not checked):
- Taxa table: Peptides are filtered based solely on taxa levels, without considering any functional categories.
- Function table: Peptides are filtered solely by functional categories and thresholds, regardless of their taxa levels.
- Taxa-Function (OTFs) table: Peptides are filtered by both taxa levels and functional categories simultaneously.
- With selection (checkbox checked):
- Taxa table: Peptides are filtered by both taxa levels and functional categories simultaneously.
- Function table: Peptides are filtered by both taxa levels and functional categories simultaneously.
- Taxa-Function (OTFs) table: Peptides are filtered by both taxa levels and functional categories simultaneously.
Sum Proteins Intensity
Click Generate Protein Intensity Table to sum peptides to proteins if the Protein column is in the original table.
-
Occam's Razor, Anti-Razor and Rank: Methods available for inferring shared peptides.
-
Razor:
- Build a minimal set of proteins to cover all peptides.
- For each peptide, choose the protein with the most peptides (if multiple proteins have the same number of peptides, share intensity to them).
- Anti-Razor:
- All proteins share the intensity of each peptide.
-
Rank:
- Build the rank of proteins.
- Choose the protein with a higher rank for the shared peptide.
Methods to Build Protein Rank: - unique_counts: Use the counts of proteins inferred by unique peptides. - all_count: Use the counts of all proteins. - unique_intensity: Use the intensity of proteins inferred by unique peptides. - shared_intensity: Use the intensity divided by the number of shared peptides for each protein.
-
Minimum peptide number per protein: Filters out proteins that contain fewer peptides than the specified threshold.
Data preprocessing
-
Quantitative Method:
-
Sum: Sum the peptides intensity directly to Taxa, Functions or OTFs intensity.
-
DirectLFQ: Use DirectLFQ to normalize peptides and then estimate intensity using intensity traces.
-
Outlier handling:
There are several methods for detecting and handling outliers.
- Two steps will be applied:
- Outlier Detection: Users can select a method to mark outlier values as NaN. Then the rows
only contain NaN values and 0will be removed. The remaining NaN values will be handled in the next step. -
Outlier Handling: Users can choose a method to fill the remaining NaN values.
-
Outlier Detection:
-
IQR: In a group, if the value is greater than Q3+1.5*IQR or less than Q1-1.5*IQR, the value will be marked as NaN.
-
Missing-Value: Detect nan values in the data. If a value is nan, it will be marked as a NaN.
-
Half-Zero:
Applies to grouped data.
-
If more than half of the values in a group are zero, all non-zero values are replaced with NaN.
-
If fewer than half of the values are zero, all zero values are replaced with NaN.
- If the number of zero and non-zero values is equal, all values in the group are replaced with NaN.
-
-
Zero-Dominant:
Applies to grouped data.
- If more than half of the values in a group are zero, all non-zero values are replaced with NaN.
- Otherwise, the group remains unchanged.
-
Zero-Inflated Poisson: This method is based on the Zero-Inflated Poisson (ZIP) model, which is a type of model that is used when the data contains a lot of zeros, more than what is expected in a standard Poisson model. In this context, the ZIP model is used to detect outliers in the data. The process involves fitting the ZIP model to the data and then predicting the data values. If the predicted value is less than 0.01, then the data point is marked as an outlier (NaN).
-
Negative Binomial: This method is based on the Negative Binomial model, which is a type of model used when the variance of the data is greater than the mean. Similar to the ZIP method, the Negative Binomial model is fitted to the data and then used to predict the data values. If the predicted value is less than 0.01, then the data point is marked as an outlier (NaN).
-
Z-Score: Z-score is a statistical measure that tells how far a data point is from the mean in terms of standard deviations. Outliers are often identified as points with Z-scores greater than 2.5 or less than -2.5.
-
Mahalanobis Distance: Mahalanobis distance measures the distance between a point and a distribution, considering the correlation among variables. Outliers can be identified as points with a Mahalanobis distance that exceeds a certain threshold.
In all methods, you can choose one meta column for outlier detection and another meta column for handling outliers.
-
Outliers Imputation:
-
Drop: Remove peptides that contain any NaN values.
-
Original: Keep the remaining NaN values as-is.
-
Mean: Outliers will be imputed by the mean.
-
Median: Outliers will be imputed by the median.
-
KNN: Outliers will be imputed by KNN (K=5). The K-Nearest Neighbors algorithm uses the mean or median of the nearest neighbours to fill in missing values.
-
Regression: Outliers will be imputed by using IterativeImputer with regression method. This method uses round-robin linear regression, modelling each feature with missing values as a function of other features.
-
Multiple: Outliers will be imputed by using IterativeImputer with multiple imputations method. It uses the IterativeImputer with a specified number (K=5) of the nearest features.
You can choose outlier imputation by each group or by all samples.
-
Remove Batch Effect:
-
Here, you can choose a group as the batch effect and then use reCombat to handle it.
-
Data Transformation:
-
Log2, Log10, Square root transformation, Cube root transformation and box-cox.
-
Data Normalization:
-
Trace Shifting: Reframing the Normalization Problem with Intensity traces (inspired by DirectLFQ).
- Note: If both trace shifting and transformation are applied, normalization will be done before transformation.
-
Standard Scaling (Z-Score), Min-Max Scaling, Pareto Scaling, Mean centring, and normalization by percentage.
If you use Z-Score, Mean centring, or Pareto Scaling for data normalization, the data will be given a minimum offset again to avoid negative values.
- Drag the item's name to change the order of data preprocessing.
Then, click Go to create a TaxaFunc object for analysis.
Then you can check the tables in the Table Review section and export them.
4. Basic Stats
PCA, Correlation and Box Plot
You can select meta groups or samples (default: all) to plot PCA, Correlation, and Box Plot for Taxa, Function, Taxa-Func, Peptide, and Protein tables.
-
Setting and modifying the plot
-
Show or hide labels in the figure by checking Show Labels.
-
Select Sub Meta to plot with two meta columns.
-
Change settings in the PLOT PARAMETER tab
-
Select specific Groups with condition
For example: Select PBS, BAS, and other groups only in Individual V1.
-
Select specific Samples to Analysis
-
Number stats
-
Plot the counts for each table by groups or by samples.
-
Taxa Specific
-
Alpha/Beta Diversity
-
Sunburst
-
TreeMap
-
Sankey
Heatmap and Bar Plot
- Select items (Taxa, Function, Taxa-Func, and Peptide) to plot:
- Add All Taxa, or select one we are interested in.
-
Add items to Top List: Select the top items to plot using a statistical method.
-
Clicking filter with threshold filters by the adjusted p-value of ANOVA and T-TEST, and by the adjusted p-value and Log2FC of DESeq2 results (configured on the corresponding page).
-
Add a list for plotting:
-
Make sure one row one item
-
Setting:
-
Change the setting fit for your data.
- Rename Samples: Add group info to each sample name
- Rename Taxa: Only keep the last taxonomic level to reduce to name
-
Plot Mean: calculate the mean of each group before plotting
-
Sub Meta: select a second meta, then combine two meta by mean for Heatmap and 3D bar plot

-
View all color maps by right-clicking Theme.
-
Plot:
-
Modify the pic to fit the window to get the Perfect picture:
-
Bar Plot:
- Interactive functions:
-
Change to a line plot:
-
3D Bar Plot
-
Plot 3D bar by selecting a sub meta.

Peptide Query
- Query everything of a peptide
5. Cross Test
T-TEST
- Select two groups for T-test analysis on Taxa, Function, Taxa-Func, Peptide, and Protein tables.
ANOVA-TEST
- Select some groups or all groups to run ANOVA on Taxa, Function, Taxa-Func, and Peptide tables.
Significant Taxa-Func
- Significant comparison helps identify cases where taxa show no significant differences between two groups, while their related functions are significantly different, and vice versa.

Plot Cross Heatmap
- The results of the T-test and ANOVA test will appear in a new window.
-
Plot Heatmap for results
-
Choose a table to plot a top differences heatmap or export the top table.
- Taxa-Func cross heatmap:
- The orange cells mean in the corresponding function ( X-axis) and Taxa( Y-axis) are significantly different between groups.
-
Func(Taxa) Heatmap:
-
The colour shows the intensity of the significant Func(Taxa) between groups.
-
Significant Taxa-Func Heatmap:
-
The colored tiles represent the taxa which were not significantly different between groups but the related functions were.
Group-Control TEST
- Dunnett's Test
Set a Group as "Control", then compare all groups to Control
-
Comparing in Each Condition: Select a meta such as individual, then compare groups to control in each individual.
-
DESeq2 Test
Bingo! You noticed the hidden function of MetaX, click Help -> About -> Like 3 times to unlock the function to compare all groups to control.

- Result of Dunnett's Test:
- T- Statistic value shown in the heatmap

- T- Statistic value shown in the heatmap
DESeq2
- Select two groups to calculate fold change with PyDESeq2.
- Select p-adjust, log2FC to plot
(Ultra-Up(Down): |log2FC| > Max log2FC)
-
Volcano:
-
Sankey:
- The last node level is the functions linked to each Taxon (When plotting Taxa-Func)
Tukey Test
-
Select a function:
-
Test the significant groups in this function.
-
Select a Taxon:
-
Test the significant groups in this taxon.
-
Select both function and taxon:
-
Test the significant groups in this function and this taxon.
-
Show Linked Taxa Only: only shows the taxa linked with the current function in the taxa combo box.
-
Show Linked Func Only: Only shows the functions linked with the current taxon in the function combo box.
Do not forget to click Reset Function Taxa List to restore all items after filtering.
-
Tukey result plot:
- The dots and lines show the difference in the mean value of the Tukey test
6. Expression Analysis
Co-Expression Networks & Heatmap
- Select groups or samples to calculate correlations and plot the network.
- Select a table, then set the correlation method and threshold.
- Add some items to the focus list (Optional)
-
Network Plot
-
The Red dots are focus items
- The depth of color and the width of edges represent the correlation value
- The size of the dot indicates the number of connections
- Expression correlation

Expression Trends
- Add items to the list window to plot the clusters with similar trends of intensity
-
Clusters plot (clustered by k-means)
-
The coloured line is the average.
-
Select a specific cluster to plot interactive Lines or get the table
-
The dashed red line is the average
7. Taxa-Func Link
Taxa-Func Link Plot
-
Check all taxa in one function (or all functions in one taxon).
-
select a function, and click the button Show Linked Taxa Only
- Linked Number: The number shows how many taxa are linked in this function
- The number starts with Taxa: The number shows how many peptides are in this Taxa-Func
- Filter items of the Taxa and Func list
-
Plot Heatmap or Bar
-
Select some groups (Default all) to get the intensity of each taxon of this function
- Plot peptides in one Function of a Taxon
- Switch Bar to Stacked or not ( Line)
- Change Bar plot to Lines
Taxa-Func Network
- Select some groups or samples (default: all).
- Add some taxa, functions, or taxa-func items to focus the view (optional).
- Plot list only
- Plot List Only: Show only the items in the list and the items linked to them.
-
Without Links: Only show the items in the focus list.

-
Network plot
- The yellow dots are taxa, and the grey dots are functions, the size of the dots presents the intensity
- The red dots are the taxa we focused on
- The green dots are the functions we focused on
- More parameters can be set in Dev->Settings->Others (e.g. Nodes Shape, color, Line Style)
8. Restore Last TaxaFunc Object
- Once you create TaxaFunc, the TaxaFunc Object is saved automatically, and you can restore it next time.
- You can also export the current MetaX object to a file and reload it later.

Preparing Your Data
Module 2. Database Builder
Note: The results from MetaLab v2.3 MaxQuant workflow do not require database building. However, we do not recommend using these results as input to MetaX, as many peptides may be discarded.
- Build the database for the first time using the Database Builder.
Option 1: Build Database Using MGnify Data
Ensure you download the correct database type corresponding to your data.
Option 2: Build Database Using Own Data
- Annotation Table: A TSV table (tab-separated), with the first column as protein name joined with Genome by "_", e.g., "Genome1_protein1", and other columns containing annotation information.
- Taxa Table: A TSV table (tab-separated), with the first column as Genome name, e.g., "Genome1", and the second column as taxa.
Example Annotation Table:
| Query | Preferred_name | EC | KEGG_ko |
|---|---|---|---|
| MGYG000000001_00696 | mfd | - | ko:K03723 |
| MGYG000000001_02838 | hxlR | - | - |
| MGYG000000001_01674 | ispG | 1.17.7.1,1.17.7.3 | ko:K03526 |
| MGYG000000001_02710 | glsA | 3.5.1.2 | ko:K01425 |
| MGYG000000001_01356 | mutS2 | - | ko:K07456 |
| MGYG000000001_02630 | - | - | - |
| MGYG000000001_02418 | ackA | 2.7.2.1 | ko:K00925 |
| MGYG000000001_00728 | atpA | 3.6.3.14 | ko:K02111 |
| MGYG000000001_00695 | pth | 3.1.1.29 | ko:K01056 |
| MGYG000000001_02907 | - | - | ko:K03086 |
| MGYG000000001_02592 | rplC | - | ko:K02906 |
| MGYG000000001_00137 | - | - | ko:K03480,ko:K03488 |
Example Taxa Table:
| Genome | Lineage |
|---|---|
| MGYG000000001 | d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Peptostreptococcales;f_Peptostreptococcaceae;g_GCA-900066495;s_GCA-900066495 sp902362365 |
| MGYG000000002 | d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Lachnospirales;f_Lachnospiraceae;g_Blautia_A;s_Blautia_A faecis |
| MGYG000000003 | d_Bacteria;p_Bacteroidota;c_Bacteroidia;o_Bacteroidales;f_Rikenellaceae;g_Alistipes;s_Alistipes shahii |
| MGYG000000004 | d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Oscillospirales;f_Ruminococcaceae;g_Anaerotruncus;s_Anaerotruncus colihominis |
| MGYG000000005 | d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Peptostreptococcales;f_Peptostreptococcaceae;g_Terrisporobacter;s_Terrisporobacter glycolicus_A |
| MGYG000000006 | d_Bacteria;p_Firmicutes;c_Bacilli;o_Staphylococcales;f_Staphylococcaceae;g_Staphylococcus;s_Staphylococcus xylosus |
| MGYG000000007 | d_Bacteria;p_Firmicutes;c_Bacilli;o_Lactobacillales;f_Lactobacillaceae;g_Lactobacillus;s_Lactobacillus intestinalis |
| MGYG000000008 | d_Bacteria;p_Firmicutes;c_Bacilli;o_Lactobacillales;f_Lactobacillaceae;g_Lactobacillus;s_Lactobacillus johnsonii |
| MGYG000000009 | d_Bacteria;p_Firmicutes;c_Bacilli;o_Lactobacillales;f_Lactobacillaceae;g_Ligilactobacillus;s_Ligilactobacillus murinus |
Module 3. Database Updater
The Database Updater allows updating the database built by the Database Builder or adding more annotations. This step is optional.
- Update the built database and extend annotations.
Option 1: Built-in Mode
We recommend some extended databases, such as dbCAN_seq.
Option 2: TSV Table
Extend the database by adding a new database to the database table. Ensure the column separator is a tab and the first column is the Protein name, with other columns containing function annotations.
Example:
| Protein ID | COG | KEGG | ... |
|---|---|---|---|
| MGYG000000001_02630 | Function 1 | Function 1 | ... |
| MGYG000000001_01475 | Function 2 | Function 1 | ... |
| MGYG000000001_01539 | Function 3 | Function 1 | ... |
Module 4. Peptide Annotator
1. Results from MAG Workflow
These peptide results use metagenome-assembled genomes (MAGs) as the reference database for protein searches, such as DIA-NN, MetaLab-MAG, MetaLab-DIA, and other workflows that use MAG databases like MGnify or custom MAG databases.
- Annotate the peptide to the Operational Taxa-Functions (OTF) Table before analysis using the Peptide Annotator.
Required:
-
Database: The database created by Database Builder
-
Peptide Table:
-
Option 1: From the Search engine which using Metagenome-assembled genomes (MAGs) as database. (e.g. final_peptides.tsv in MetaLab-MAG, xxx_report.pr_matrix.tsv in DIA-NN result)
-
Option 2: Manually create a table with one column for the peptide sequence and another column for the protein group (e.g., MGYG000003683_00301; MGYG000001490_01143) from the MGnify or your own database. The remaining columns should contain the intensity values for each sample.
Example:
Sequence Proteins Intensity_V1_01 Intensity_V1_02 Intensity_V1_03 Intensity_V1_04 (Acetyl)KGGVEPQSETVWR MGYG000002716_01681;MGYG000000195_00452;MGYG000001616_00519;MGYG000002926_00231;... 714650 0 0 0 (Acetyl)KVIPELNGK MGYG000003589_01892;MGYG000001560_01812;MGYG000001789_00244;... 0 0 0 0 (Acetyl)LAELGAKAVTLSGPDGYIYDPDGITTK MGYG000001199_02893 0 0 0 0 (Acetyl)LLTGLPDAYGR MGYG000001757_01206;MGYG000004547_02135;MGYG000001283_00124 0 307519 0 0 (Acetyl)MDFTLDKK MGYG000000076_01275;MGYG000003694_00879;MGYG000000312_02425;MGYG000000271_02102 306231 0 0 1214497 -
-
Output Save Path: The location to save the result table.
-
LCA Threshold: Find the LCA with the proportion threshold for each peptide. The default is 1.00 (100%).
2. Results from MaxQuant Workflow
These peptide results come from the MetaLab 2.3 MaxQuant workflow.
- Select the MetaLab result folder, which contains the maxquant_search folder.
-
The Peptide Annotator will automatically find the peptides_report.txt, BuiltIn.pepTaxa.csv, and functions.tsv in the maxquant_search folder. Alternatively, you can select the files manually.
-
Select OTFs Save To to set the location to save the result table.
Developer Tools
-
Export Log
-
You can export the log file for debugging or reporting the issue.
-
Show or Hide the Console
-
Settings
-
Check Auto Check Update to enable or disable update checks on launch.
- Choose whether to update from the stable version or beta version in Settings.

- Other Options Settings

Enjoy MetaX
If you have any issues or suggestions, please open a new issue on GitHub.

