-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathhtslib.html
More file actions
380 lines (341 loc) · 23.5 KB
/
htslib.html
File metadata and controls
380 lines (341 loc) · 23.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="chrome=1">
<title>VCFtools</title>
<link rel="stylesheet" href="stylesheets/styles.css">
<link rel="stylesheet" href="stylesheets/github-light.css">
<script src="javascripts/scale.fix.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no">
<!--[if lt IE 9]>
<script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body>
<div class="wrapper">
<header>
<h1 class="header">VCFtools</h1>
<p class="header">A set of tools written in Perl and C++ for working with VCF files.</p>
<ul>
<li><a href="index.html">Home</a></li>
<li><a href="examples.html">Documentation</a></li>
<li class="download"><a class="buttons" href="https://github.com/vcftools/vcftools/zipball/master">Download ZIP</a></li>
<li class="download"><a class="buttons" href="https://github.com/vcftools/vcftools/tarball/master">Download TAR</a></li>
<li><a class="buttons github" href="https://github.com/vcftools/vcftools">View On GitHub</a></li>
</ul>
</header>
<section>
<h2>The bcftools/htslib VCF commands</h2>
<p> <a href="https://github.com/samtools/htslib">HTSlib</a> is a C library for
high-throughput sequencing data formats. It is designed for speed and works with both VCF and
<a href="http://www.1000genomes.org/wiki/analysis/variant-call-format/bcf-binary-vcf-version-2">BCFv2</a>.
</p>
<h2>Download and installation</h2>
<p>
The library is hosted on github. It can be downloaded and compiled the usual way.
The <span class="cmd">clone</span> command is run only once, the <span class="cmd">pull</span>
command is run whenever the latest snapshots from github is needed.
Please see the <a href="https://github.com/samtools/bcftools">bcftools github
page</a> for the up-to-date version of the clone command. The software is under heavy
development and the option <span class="cmd">--branch</span> may be required.
</p>
<p class="codebox">
git clone [<i>--branch=name</i>] git://github.com/samtools/htslib.git htslib
<br> git clone git://github.com/samtools/bcftools.git bcftools
<br> cd htslib; git pull; cd ..
<br> cd bcftools; git pull; cd ..
<br>
<br> # Compile
<br> cd bcftools; make; make test
<br>
<br> # Run
<br> ./bcftools stats file.vcf.gz
<br>
</p>
<p>
<h2>The tools</h2>
<ul>
<li> <a href="#call">bcftools call</a>
<li> <a href="#filter">bcftools filter</a>
<li> <a href="#gtcheck">bcftools gtcheck</a>
<li> <a href="#isec">bcftools isec</a>
<li> <a href="#merge">bcftools merge</a>
<li> <a href="#norm">bcftools norm</a>
<li> <a href="#query">bcftools query</a>
<li> <a href="#stats">bcftools stats</a>
<li> <a href="#subset">bcftools subset</a>
<li> <a href="#view">bcftools view</a>
</ul></p>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js"></script>
<script type="text/javascript">
jQuery(document).ready(function() {
jQuery(".usageText").hide();
jQuery(".usageToggle").click(function() { jQuery(this).next(".usageText").slideToggle(100); });
});
</script>
<h3><a name="annotate" class="Q">bcftools annotate</a></h3>
<p> Adds or removes annotations, support for user-written plugins.
</p>
<p> Fast alternative to <A href="perl_module.html#vcf-annotate">vcf-annotate</A>
<div class="usageBox"><span class="usageToggle">(Read more)</span><pre class="usageText">
About: Annotate and edit VCF/BCF files.
Usage: bcftools annotate [options] <in.vcf.gz>
Options:
-a, --annotations <file> VCF file or tabix-indexed file with annotations: CHR\tPOS[\tVALUE]+
-c, --columns <list> list of columns in the annotation file, e.g. CHROM,POS,REF,ALT,-,INFO/TAG. See man page for details
-h, --header-lines <file> lines which should be appended to the VCF header
-l, --list-plugins list available plugins. See BCFTOOLS_PLUGINS environment variable and man page for details
-O, --output-type <b|u|z|v> b: compressed BCF, u: uncompressed BCF, z: compressed VCF, v: uncompressed VCF [v]
-p, --plugins <name|...> comma-separated list of dynamically loaded user-defined plugins. See man page for details
-r, --regions <reg|file> restrict to comma-separated list of regions or regions listed in a file, see man page for details
-R, --remove <list> list of annotations to remove (e.g. ID,INFO/DP,FORMAT/DP,FILTER). See man page for details
</pre></div>
<h3><a name="call" class="Q">bcftools call</a></h3>
<p> Formerly known as <span class="cmd">bcftools view</span>, this is the successor of the popular caller from the <span class="cmd">samtools</span> package with extended capabilities.
</p>
<div class="usageBox"><span class="usageToggle">(Read more)</span><pre class="usageText">
About: SNP/indel variant calling from VCF/BCF. To be used in conjunction with samtools mpileup.
This command replaces the former "bcftools view" caller. Some of the original
functionality has been temporarily lost in the process of transition to htslib,
but will be added back on popular demand. The original calling model can be
invoked with the -c option.
Usage: bcftools call [options] <in.vcf.gz>
File format options:
-O, --output-type <b|u|z|v> output type: 'b' compressed BCF; 'u' uncompressed BCF; 'z' compressed VCF; 'v' uncompressed VCF [v]
-r, --regions <reg|file> restrict to comma-separated list of regions or regions listed in a file, see man page for details
-s, --samples <list|:file> sample list, PED file or a file with optional second column for ploidy (0, 1 or 2) [all samples]
-t, --targets <reg|file> similar to -r but streams rather than index-jumps, see man page for details
Input/output options:
-A, --keep-alts keep all possible alternate alleles at variant sites
-M, --keep-masked-ref keep sites with masked reference allele (REF=N)
-S, --skip <snps|indels> skip indels/snps
-v, --variants-only output variant sites only
Consensus/variant calling options:
-c, --consensus-caller the original calling method (conflicts with -m)
-C, --constrain <str> one of: alleles, trio (see manual)
-m, --multiallelic-caller alternative model for multiallelic and rare-variant calling (conflicts with -c)
-n, --novel-rate <float>,[...] likelihood of novel mutation for constrained trio calling, see man page for details [1e-8,1e-9,1e-9]
-p, --pval-threshold <float> variant if P(ref|D)<FLOAT with -c [0.5] or another allele accepted if P(chi^2)>=1-FLOAT with -m [1e-2]
-X, --chromosome-X haploid output for male samples (requires PED file with -s)
-Y, --chromosome-Y haploid output for males and skips females (requires PED file with -s)
</pre></div>
<h3><a name="filter" class="Q">bcftools filter</a></h3>
<p> Powerful fixed-threshold filtering, accepts boolean and arithmetic expressions.
<br>
See also the <A href="view">bcftools view</A> below.
</p>
<div class="usageBox"><span class="usageToggle">(Read more)</span><pre class="usageText">
About: Apply fixed-threshold filters.
Usage: bcftools filter [options] <in.vcf.gz>
Options:
-e, --exclude <expr> exclude sites for which the expression is true (e.g. '%TYPE="snp" && %QUAL>=10 && (DP4[2]+DP4[3] > 2')
-g, --SnpGap <int> filter SNPs within <int> base pairs of an indel
-G, --IndelGap <int> filter clusters of indels separated by <int> or fewer base pairs allowing only one to pass
-i, --include <expr> include only sites for which the expression is true
-m, --mode <+|x> "+": do not replace but add to existing FILTER; "x": reset filters at sites which pass
-O, --output-type <b|u|z|v> b: compressed BCF, u: uncompressed BCF, z: compressed VCF, v: uncompressed VCF [v]
-r, --regions <reg|file> restrict to comma-separated list of regions or regions listed in a file, see man page for details
-s, --soft-filter <string> annotate FILTER column with <string> or unique filter name ("Filter%d") made up by the program ("+")
-t, --targets <reg|file> similar to -r but streams rather than index-jumps, see man page for details
Filter expressions may contain:
- arithmetic operators: +,*,-,/
- logical operators: && (same as &), || (same as |)
- comparison operators: == (same as =), >, >=, <=, <, !=
- parentheses: (, )
- array subscripts, such as (e.g. AC[0]>=10)
- double quotes for string values (e.g. %FILTER="PASS")
- 1 (or 0) for testing the presence (or absence) of a flag (e.g. FlagA=1 && FlagB=0)
- TAG or INFO/TAG for INFO values (e.g. DP<800 or INFO/DP<800)
- %QUAL, %FILTER, etc. for column names (note: currently only some columns are supported)
- %TYPE for variant type, such as %TYPE="indel"|"snp"|"mnp"|"other"
- %FUNC(TAG) where FUNC is one of MAX, MIN, AVG and TAG is one of the FORMAT fields (e.g. %MIN(DV)>5)
</pre></div>
<h3><a name="gtcheck" class="Q">bcftools gtcheck</a></h3>
<p> A tool for detecting sample swaps and contamination
</p>
<div class="usageBox"><span class="usageToggle">(Read more)</span><pre class="usageText">
About: Check sample identity. With no -g BCF given, multi-sample cross-check is performed.
Usage: bcftools gtcheck [options] [-g <genotypes.vcf.gz>] <query.vcf.gz>
Options:
-a, --all-sites output comparison for all sites
-g, --genotypes <file> genotypes to compare against
-G, --GTs-only <int> use GTs, ignore PLs, using <int> for unseen genotypes [99]
-H, --homs-only homozygous genotypes only (useful for low coverage data)
-p, --plot <prefix> plot
-r, --regions <file|reg> restrict to list of regions or regions listed in a file, see man page for details
-s, --query-sample <string> query sample (by default the first sample is checked)
-S, --target-sample <string> target sample in the -g file (used only for plotting)
-t, --targets <reg|file> similar to -r but streams rather than index-jumps, see man page for details
</pre></div>
<h3><a name="isec" class="Q">bcftools isec</a></h3>
<p> Fast alternative to <A href="perl_module.html#vcf-isec">vcf-isec</A>
</p>
<div class="usageBox"><span class="usageToggle">(Read more)</span><pre class="usageText">
About: Create intersections, unions and complements of VCF files.
Usage: bcftools isec [options] <A.vcf.gz> <B.vcf.gz> [...]
Options:
-c, --collapse <string> treat as identical records with <snps|indels|both|all|some|none>, see man page for details [none]
-C, --complement output positions present only in the first file but missing in the others
-f, --apply-filters <list> require at least one of the listed FILTER strings (e.g. "PASS,.")
-n, --nfiles [+-=]<int> output positions present in this many (=), this many or more (+), or this many or fewer (-) files
-O, --output-type <b|u|z|v> b: compressed BCF, u: uncompressed BCF, z: compressed VCF, v: uncompressed VCF [v]
-p, --prefix <dir> if given, subset each of the input files accordingly, see also -w
-r, --regions <file|reg> restrict to comma-separated list of regions or regions listed in a file, see man page for details
-t, --targets <file|reg> similar to -r but streams rather than index-jumps, see man page for details
-w, --write <list> list of files to write with -p given as 1-based indexes. By default, all files are written
Examples:
# Create intersection and complements of two sets saving the output in dir/*
bcftools isec A.vcf.gz B.vcf.gz -p dir
# Extract and write records from A shared by both A and B using exact allele match
bcftools isec A.vcf.gz B.vcf.gz -p dir -n =2 -w 1
# Extract records private to A or B comparing by position only
bcftools isec A.vcf.gz B.vcf.gz -p dir -n -1 -c all
</pre></div>
<h3><a name="merge" class="Q">bcftools merge</a></h3>
<p> Fast alternative to <A href="perl_module.html#vcf-merge">vcf-merge</A> with extended capabilities and correct handling of Number=A,G,R INFO fields.
</p>
<div class="usageBox"><span class="usageToggle">(Read more)</span><pre class="usageText">
About: Merge multiple VCF or BCF files to create one multi-sample file combining compatible records
into one according to the -m option.
Usage: bcftools merge [options] <A.vcf.gz> <B.vcf.gz> [...]
Options:
--use-header <file> use the provided header
--print-header print only the merged header and exit
-f, --apply-filters <list> require at least one of the listed FILTER strings (e.g. "PASS,.")
-i, --info-rules <tag:method,..> rules for merging INFO fields (method is one of sum,avg,min,max,join) or "-" to turn off the default [DP:sum,DP4:sum]
-m, --merge <string> merge sites with differing alleles for <snps|indels|both|all|none>, see man page for details [both]
-O, --output-type <b|u|z|v> 'b' compressed BCF; 'u' uncompressed BCF; 'z' compressed VCF; 'v' uncompressed VCF [v]
-r, --regions <reg|file> merge in the given regions only
</pre></div>
<h3><a name="norm" class="Q">bcftools norm</a></h3>
<p> Left-align and normalize indels to the shortest possible representation.
</p>
<div class="usageBox"><span class="usageToggle">(Read more)</span><pre class="usageText">
About: Left-align and normalize indels.
Usage: bcftools norm [options] -f <ref.fa> <in.vcf.gz>
Options:
-D, --remove-duplicates remove duplicate lines of the same type. [Todo: merge genotypes, don't just throw away.]
-f, --fasta-ref <file> reference sequence
-O, --output-type <type> 'b' compressed BCF; 'u' uncompressed BCF; 'z' compressed VCF; 'v' uncompressed VCF [v]
-r, --regions <file|reg> restrict to comma-separated list of regions or regions listed in a file, see man page for details
-w, --win <int,int> alignment window and buffer window [50,1000]
</pre></div>
<h3><a name="query" class="Q">bcftools query</a></h3>
<p> Fast alternative to <A href="perl_module.html#vcf-query">vcf-query</A>
</p>
<div class="usageBox"><span class="usageToggle">(Read more)</span><pre class="usageText">
About: Extracts fields from VCF/BCF file and prints them in user-defined format
Usage: bcftools query [options] <A.vcf.gz> [<B.vcf.gz> [...]]
Options:
-a, --annots <list> alias for -f '%CHROM\t%POS\t%MASK\t%REF\t%ALT\t%TYPE\t' + tab-separated <list> of tags
-c, --collapse <string> collapse lines with duplicate positions for <snps|indels|both|all|some|none>, see man page [none]
-f, --format <string> learn by example, see below
-H, --print-header print header
-l, --list-samples print the list of samples and exit
-r, --regions <reg|file> restrict to comma-separated list of regions or regions listed in a file, see man page for details
-t, --targets <reg|file> similar to -r but streams rather than index-jumps, see man page for details
-s, --samples <list|:file> comma-separated list of samples to include or one name per line in a file
-v, --vcf-list <file> process multiple VCFs listed in the file
Expressions:
%CHROM The CHROM column (similarly also other columns, such as POS, ID, QUAL, etc.)
%INFO/TAG Any tag in the INFO column
%TYPE Variant type (REF, SNP, MNP, INDEL, OTHER)
%MASK Indicates presence of the site in other files (with multiple files)
%TAG{INT} Curly brackets to subscript vectors (0-based)
[] The brackets loop over all samples
%GT Genotype (e.g. 0/1)
%TGT Translated genotype (e.g. C/A)
%LINE Prints the whole line
%SAMPLE Sample name
Examples:
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%SAMPLE=%GT]\n' file.vcf.gz
</pre></div>
<h3><a name="stats" class="Q">bcftools stats</a></h3>
<p> Formerly known as <span class="cmd">vcfcheck</span>. Extract stats from a VCF/BCF file or compare two VCF/BCF files. The resulting text file can be plotted using
<span class="cmd">plot-vcfstats</span>.
</p>
<p class="codebox">
bcftools stats file.vcf.gz > file.vchk
<br> plot-vcfstats file.vchk -p plots/</p>
<div class="usageBox"><span class="usageToggle">(Read more)</span><pre class="usageText">
About: Parses VCF or BCF and produces stats which can be plotted using plot-vcfstats.
When two files are given, the program generates separate stats for intersection
and the complements.
Usage: bcftools stats [options] <A.vcf.gz> [<B.vcf.gz>]
Options:
-1, --1st-allele-only include only 1st allele at multiallelic sites
-c, --collapse <string> treat as identical records with <snps|indels|both|all|some|none>, see man page for details [none]
-d, --depth <int,int,int> depth distribution: min,max,bin size [0,500,1]
--debug produce verbose per-site and per-sample output
-e, --exons <file.gz> tab-delimited file with exons for indel frameshifts (chr,from,to; 1-based, inclusive, bgzip compressed)
-f, --apply-filters <list> require at least one of the listed FILTER strings (e.g. "PASS,.")
-F, --fasta-ref <file> faidx indexed reference sequence file to determine INDEL context
-i, --split-by-ID collect stats for sites with ID separately (known vs novel)
-r, --regions <reg|file> restrict to comma-separated list of regions or regions listed in a file, see man page for details
-s, --samples <list|:file> produce sample stats, "-" to include all samples
-t, --targets <reg|file> similar to -r but streams rather than index-jumps, see man page for details
-u, --user-tstv <TAG[:min:max:n]> collect Ts/Tv stats for any tag using the given binning [0:1,100]
</pre></div>
<h3><a name="view" class="Q">bcftools view</a></h3>
<p> This versatile tool can be used for subsetting by sample, position and even flexible fixed-threshold filtering.
</p>
<div class="usageBox"><span class="usageToggle">(Read more)</span><pre class="usageText">
About: VCF/BCF conversion, view, subset and filter VCF/BCF files.
Usage: bcftools view [options] <in.vcf.gz> [region1 [...]]
Output options:
-G, --drop-genotypes drop individual genotype information (after subsetting if -s option set)
-h/H, --header-only/--no-header print the header only/suppress the header in VCF output
-l, --compression-level [0-9] compression level: 0 uncompressed, 1 best speed, 9 best compression [-1]
-o, --output-file <file> output file name [stdout]
-O, --output-type <b|u|z|v> b: compressed BCF, u: uncompressed BCF, z: compressed VCF, v: uncompressed VCF [v]
-r, --regions <reg|file> restrict to comma-separated list of regions or regions in a file, see man page for details
-t, --targets <reg|file> similar to -r but streams rather than index-jumps, see man page for details
Subset options:
-a, --trim-alt-alleles trim alternate alleles not seen in the subset
-I, --no-update do not (re)calculate INFO fields for the subset (currently INFO/AC and INFO/AN)
-s, --samples STR/FILE list of samples (FILE or comma separated list STR) [null]
Filter options:
-c/C, --min-ac/--max-ac <int>[:<type>] minimum/maximum count for non-reference (nref), 1st alternate (alt1) or minor (minor) alleles [nref]
-f, --apply-filters <list> require at least one of the listed FILTER strings (e.g. "PASS,.")
-i/e, --include/--exclude <expr> select/exclude sites for which the expression is true (see below for details)
-k/n, --known/--novel select known/novel sites only (ID is not/is '.')
-m/M, --min-alleles/--max-alleles <int> minimum/maximum number of alleles listed in REF and ALT (e.g. -m2 -M2 for biallelic sites)
-p/P, --phased/--exclude-phased select/exclude sites where all samples are phased/not all samples are phased
-q/Q, --min-af/--max-af <float>[:<type>] minimum/maximum frequency for non-reference (nref), 1st alternate (alt1) or minor (minor) alleles [nref]
-u/U, --uncalled/--exclude-uncalled select/exclude sites without a called genotype
-v/V, --types/--exclude-types <list> select/exclude comma-separated list of variant types: snps,indels,mnps,other [null]
-x/X, --private/--exclude-private select/exclude sites where the non-reference alleles are exclusive (private) to the subset samples
Filter expressions may contain:
- arithmetic operators: +,*,-,/
- logical operators: && (same as &), || (same as |)
- comparison operators: == (same as =), >, >=, <=, <, !=
- parentheses: (, )
- array subscripts, such as (e.g. AC[0]>=10)
- double quotes for string values (e.g. %FILTER="PASS")
- 1 (or 0) for testing the presence (or absence) of a flag (e.g. FlagA=1 && FlagB=0)
- TAG or INFO/TAG for INFO values (e.g. DP<800 or INFO/DP<800)
- %QUAL, %FILTER, etc. for column names (note: currently only some columns are supported)
- %TYPE for variant type, such as %TYPE="indel"|"snp"|"mnp"|"other"
- %FUNC(TAG) where FUNC is one of MAX, MIN, AVG and TAG is one of the FORMAT fields (e.g. %MIN(DV)>5)
</pre></div>
</pre>
</section>
<footer>
<p><small>Hosted on <a href="https://pages.github.com">GitHub Pages</a></small></p>
<p>Copyright 2015 © VCFtools</p>
</footer>
</div>
<!--[if !IE]><script>fixScale(document);</script><![endif]-->
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
try {
var pageTracker = _gat._getTracker("UA-272183-4");
pageTracker._trackPageview();
} catch(err) {}
</script>
</body>
</html>