Skip to content

Commit bec0a77

Browse files
Update detailsInput.rst
1 parent d720499 commit bec0a77

1 file changed

Lines changed: 8 additions & 15 deletions

File tree

docs/detailsInput.rst

Lines changed: 8 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -30,24 +30,23 @@ The p-value threshold for D\ :sub: `max` is per default set to 0.01. In our benc
3030

3131
Flag -k: dbSNP database (dbSNPs_sorted.txt.gz)
3232
===============================================
33-
To identify TFs which are more often affected than expected by chance in the given input SNP set, SNEEP can perform a statistical assessment to compare the result against proper random controls. To do so, the pipeline randomly samples SNPs from the `dbSNP database <??>`_ and rerun the analysis on these SNPs.
33+
To identify TFs which are more often affected by the given data than one would expected on random data, SNEEP can perform a statistical assessment to compare the result against proper random controls. To do so, the pipeline randomly samples SNPs from the `dbSNP database <https://www.ncbi.nlm.nih.gov/snp/>`_ and rerun the analysis on these SNPs.
3434
In order to sample the SNPs in a fast and efficient manner, we provide a file (in our `Zenodo repository <https://zenodo.org/record/4892591>`_ containing the SNPs of the dbSNP database. The file is a slightly modified version of the `public available one <ttps://ftp.ncbi.nlm.nih.gov/snp/latest_release/VCF/>`_ (file GCF_000001405.38). In detail, we
3535

3636
- removed all SNPs overlapping with a protein-coding region (annotation of the `human genome (GRCh38), version 36 (Ensembl 102) <https://www.gencodegenes.org/human/release_36.html>`_), (TODO: remove this sentence when zenodo dir is updated!)
3737
- removed all information not important for SNEEP,
3838
- removed mutations longer than 1 bp,
3939
- and sorted SNPs according to their MAF distribution in ascending order.
4040

41-
4241
Flag -r and -g: Epigenetic interactions
4342
===============================================
44-
We provide three files (in our `Zenodo repository <??>`_) containing epigenetic interactions associated to target genes:
43+
We provide three files (in our `Zenodo repository <https://zenodo.org/record/4892591>`_) containing epigenetic interactions associated to target genes:
4544

4645
- interactionsREMs.txt provides regulatory elements (REMs) linked to their target genes. The data was derived with the STITCHIT algorithm, which is a peak-calling free approach to identify gene-specific REMs by analyzing epigenetic signal of diverse human cell types with regard to gene expression of a certain gene. For more information, you can also have a look at our public `EpiRegio database <https://epiregio.de>`_ holding all REMs stored in the interactionsREMs.txt file.
4746
- interactionsREM_PRO.txt: Additional to the REMs the promoters (+/- 500 bp around TSS) of the genes are included as regions linked to their target genes.
4847
- interactionsREMs_PRO_HiC.txt: This file further includes enhancer-gene links predicted with the ABC algorithm on human heart data from a `published paper from Anene-Nzelu *et al.* <https://www.ahajournals.org/doi/10.1161/CIRCULATIONAHA.120.046040?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed>`_.
4948

50-
It is also possible to use your own epigenetic interactions file or extend on of ours with for instance cell type specific data. Please stick to our tab-separated format:
49+
It is also possible to use your own epigenetic interactions file (for instance generated with STARE's gABC score computation) or extend one of ours with for instance cell type specific data. Please stick to our tab-separated format:
5150

5251
- chr of the linked region
5352
- start of the linked region (0-based)
@@ -57,15 +56,9 @@ It is also possible to use your own epigenetic interactions file or extend on of
5756
- 7 tab-separated dots (or additional information which you wish to keep -> displayed in the result.txt file but not in the summary pdf).
5857

5958
Further, a file which provides a mapping between ensemblID to gene name must be given. This file comes along with our GitHub repository.
60-
61-
Flag -s: Estimated scale parameters for the TFs used
62-
=====================================================
6359

64-
Our modified Laplace distribution is dependent on two parameters: n, which is two times the length of the TF model and b, which needs to be estimated.
65-
For the TF set we provide within our GitHub repository, we also estimated the scale parameter listed in XX.
66-
In case a customized TF motif set is used, one also needs to estimate the scale parameter for each TF. Therefore we provide a script XXX (TODO: provide more details here).
6760

68-
Flag -a: Store Dmax values for all considered shifts
61+
Flag -a: Store D\ :sub: `max` values for all considered shifts
6962
=====================================================
7063
If this flag is set, for all shifts that exceed the TF binding affinity p-value threshold the resulting D-max value and the corresponding p-value is stored in <outputDir>/AllDiffBindAffinity.txt
7164

@@ -74,7 +67,7 @@ Flag -f: Include open chromatin data
7467

7568
To consider only the SNPs which overlap with cell type specific open chromatin data, a peak file in bed-format can be specified with this flag.
7669

77-
Flag -m: Get all Dmax values
70+
Flag -m: Get all D\ :sub: `max` values
7871
===============================
7972

8073
If this flag is set all absolute maximal differential TF binding scores are printed (to the console) even if they do not exceed the specified p-value threshold. This flag is useful for estimating the scale parameter
@@ -86,12 +79,12 @@ In order to only consider the TFs which are expressed in your analysed cell type
8679
Flag -j: Number of sampled background SNP sets
8780
=================================================
8881

89-
With this flag the number of background rounds can be specified, default 0.
82+
With this flag the number of background rounds can be specified. Default: -j 0.
9083

9184
Flag -l: Reproducible results for random background analysis
9285
==============================================================
93-
In order to reproduce the result of the random background analysis we recommend to specific a seed variable. Default is 1.
86+
In order to reproduce the result of the random background analysis we recommend to specific a seed variable. Default: -l 1.
9487

9588
Flag -q: TF count
9689
=====================
97-
This flags allows to exclude TFs from the baclground sampling which do not exceed a TF count (default 0).
90+
This flags allows to exclude TFs from the baclground sampling which do not exceed a TF count. Default: -q 0

0 commit comments

Comments
 (0)