-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathdocumentation.html
More file actions
149 lines (134 loc) · 12 KB
/
documentation.html
File metadata and controls
149 lines (134 loc) · 12 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="chrome=1">
<title>VCFtools</title>
<link rel="stylesheet" href="stylesheets/styles.css">
<link rel="stylesheet" href="stylesheets/github-light.css">
<script src="javascripts/scale.fix.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no">
<!--[if lt IE 9]>
<script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body>
<div class="wrapper">
<header>
<h1 class="header">VCFtools</h1>
<p class="header">A set of tools written in Perl and C++ for working with VCF files.</p>
<ul>
<li><a href="index.html">Home</a></li>
<li><a href="examples.html">Documentation</a></li>
<li class="download"><a class="buttons" href="https://github.com/vcftools/vcftools/zipball/master">Download ZIP</a></li>
<li class="download"><a class="buttons" href="https://github.com/vcftools/vcftools/tarball/master">Download TAR</a></li>
<li><a class="buttons github" href="https://github.com/vcftools/vcftools">View On GitHub</a></li>
</ul>
</header>
<section>
<h2>The C++ executable module examples</h2>
<p>This page provides usage examples for the executable module. Extended documentation <strong>for all of the options</strong> can be found on the <a href="man_latest.html">manual page</a>.
</p>
<ul>
<li><a href="#run">Running the program</a></li>
<li><a href="#file">Getting basic file statistics</a></li>
<li><a href="#filter">Applying a filter</a></li>
<li><a href="#newvcf">Writing to a new VCF file</a></li>
<li><a href="#streamvcf">Writing out to screen</a></li>
<li><a href="#convertbcf">Converting a VCF file to BCF</a></li>
<li><a href="#comparevcf">Comparing two VCF files</a></li>
<li><a href="#freq">Getting allele frequency</a></li>
<li><a href="#depth">Getting sequencing depth information</a></li>
<li><a href="#ld">Getting linkage disequilibrium statistics</a></li>
<li><a href="#fst">Getting Fst population statistics</a></li>
<li><a href="#plink">Converting VCF files to PLINK format</a></li>
</ul>
<h2><a name="run" class="Q">Run the program</a></h2>
<p>By default the executable can be found in the <strong>bin/</strong> subdirectory. To run the program, type:</p>
<p class="codebox">./vcftools</p>
<p>The program will return information regarding the version number.</p>
<h2><a name="file" class="Q">Get basic file statistics</a></h2>
<p>The executable can be run with only an input VCF file without any other options, and will return basic information regarding the contents of the file.
To specify an input file you must use the one of the input options ( <strong>--vcf</strong>, <strong>--gzvcf</strong>, or <strong>--bcf </strong> ) depending on the type of file.
For example, for a VCF file called <strong>input_data.vcf</strong> the following command could be run:</p>
<p class="codebox">./vcftools --vcf input_data.vcf</p>
<p>It will return information about the file such as the <strong>number of variants</strong> and the <strong>number of individuals</strong> in the file.</p>
<p>Beginning with vcftools v0.1.12, the program can also take input in from standard input (<strong>stdin</strong>). To do this, use any of the normal file type input options followed by the dash <strong>-</strong> character.</p>
<p class="codebox">zcat input_data.vcf.gz | ./vcftools --vcf -</p>
<h2><a name="filter" class="Q">Applying a filter</a></h2>
<p>You can use VCFtools to filter out variants or individuals based on the values within the file.
For example, to filter the sites within a file based upon their location in genome, use the options
<strong>--chr</strong>, <strong>--from-bp</strong>, and <strong>--to-bp</strong> to specify the region.</p>
<p class="codebox">./vcftools --vcf input_data.vcf --chr 1 --from-bp 1000000 --to-bp 2000000</p>
<p>After running this line, the program will return the amount of sites in the file that are included in the chromosomal region chr1:1000000-2000000.
This option can be modified to work with any desired region.</p>
<h2><a name="newvcf" class="Q">Writing to a new VCF file</a></h2>
<p>VCFtools can perform analyses on the variants that pass through the filters or simply <strong>write those variants out to a new file</strong>.
This function is helpful for creating subsets of VCF files or just removing unwanted variants from VCF files. To write out the variants that pass through filters use the <strong>--recode</strong> option.
In addition, use <strong>--recode-INFO-all</strong> to include all data from the INFO fields in the output. By default INFO fields are not written because many filters will alter the variants in a file, rendering
the INFO values incorrect.</p>
<p class="codebox">./vcftools --vcf input_data.vcf --chr 1 --from-bp 1000000 --to-bp 2000000 --recode --recode-INFO-all</p>
<p>In this example, VCFtools will create a new VCF file containing only variants within the specified chromosomal region while keeping all INFO fields included in the original file.</p>
<p>Any files written out by VCFtools will be in the current working directory and have the prefix <i>./out.SUFFIX</i> by default.
To change the path, specify the new path using the option <strong>--out</strong> followed by the desired path. The program will add a suffix to that path based on the chosen output function.</p>
<p class="codebox">./vcftools --vcf input_data.vcf --chr 1 --from-bp 1000000 --to-bp 2000000 --recode --out subset</p>
<h2><a name="streamvcf" class="Q">Writing out to screen</a></h2>
<p>Beginning with VCFtools v0.1.12, the program can also write out to screen instead of having the program write to a specified path. Using the options
<strong>--stdout</strong> or <strong>-c</strong> will redirect all output to standard out. The output can then be piped into other programs or written out to a specified file name.</p>
<p class="codebox">./vcftools --vcf input_data.vcf --chr 1 --from-bp 1000000 --to-bp 2000000 --recode --stdout | more</p>
<p>The above example will output the resulting file to screen one line at a time for <strong>quick inspection</strong> of the results.</p>
<p class="codebox">./vcftools --vcf input_data.vcf --chr 1 --from-bp 1000000 --to-bp 2000000 --recode -c > /home/usr/data/subset.vcf</p>
<p>The above example will redirect the output and <strong>write</strong> it to the specified file name.</p>
<p class="codebox">./vcftools --vcf input_data.vcf --chr 1 --from-bp 1000000 --to-bp 2000000 --recode -c | bgzip -c > /home/usr/data/subset.vcf.gz</p>
<p>The above example will redirect the output into <strong>bgzip</strong> (assuming it is installed) for <strong>compression</strong>, and then bgzip will write the file to the specified destination.</p>
<h2><a name="convertbcf" class="Q">Converting a VCF file to BCF</a></h2>
<p>Beginning with VCFftools v0.1.11, the program has the ability to read and write <strong>BCF files</strong>. This means that the program can also convert files between the two formats.
This is accomplished in a similar way as the above example, instead using the <strong>--recode-bcf</strong> option. All output BCF files are automatically compressed using BGZF.</p>
<p class="codebox">./vcftools --vcf input_data.vcf --recode-bcf --recode-INFO-all --out converted_output</p>
<h2><a name="comparevcf" class="Q">Comparing two files</a></h2>
<p>Using VCFtools, two VCF files can be compared to determine which sites and individuals are shared between them. The first file is declared using the input file options just like any other output function. The second file must be
specified using <strong>--diff</strong>, <strong>--gzdiff</strong>, or <strong>--diff-bcf</strong>. There are also advanced options to determine additional discordance between the two files.</p>
<p class="codebox">./vcftools --vcf input_data.vcf --diff other_data.vcf --out compare</p>
<h2><a name="freq" class="Q">Getting allele frequency</a></h2>
<p>To determine the frequency of each allele over all individuals in a VCF file, the <strong>--freq</strong> argument is used.</p>
<p class="codebox">./vcftools --vcf input_data.vcf --freq --out output</p>
<p>The output file will be written to <i>output.frq</i>.</p>
<h2><a name="depth" class="Q">Getting sequencing depth information</a></h2>
<p>Another useful output function <strong>summarizes sequencing depth</strong> for each individual or for each site. Just like the allele frequency example above, this output function follows the same basic model.
</p>
<p class="codebox">./vcftools --vcf input_data.vcf --depth -c > depth_summary.txt</p>
<p>With VCFtools, you can use many combinations of filters and an output function. For example, to write out site-wise sequence depths only at sites that have no missing data,
include the <strong>--max-missing</strong> argument.</p>
<p class="codebox">./vcftools --vcf input_data.vcf --site-depth --max-missing 1.0 --out site_depth_summary</p>
<h2><a name="ld" class="Q">Getting linkage disequilibrium statistics</a></h2>
<p>Linkage disequilibrium between sites can be determined as well. This is accomplished using the <strong>--hap-r2</strong>, <strong>--geno-r2</strong>, or <strong>--geno-chisq</strong> arguments.
Since the program must do pairwise site comparisons, this analysis can be time consuming, so it is recommended to filter the sites first or use one of the other options (<strong>--ld-window</strong>, <strong>--ld-window-bp</strong> or <strong>--min-r2</strong>) to reduce the number of comparisons.
In this example, the VCFtools will only compare sites within 50,000 base pairs of one another.</p>
<p class="codebox">./vcftools --vcf input_data.vcf --hap-r2 --ld-window-bp 50000 --out ld_window_50000</p>
<h2><a name="fst" class="Q">Getting Fst population statistics</a></h2>
<p>VCFtools can also calculate <strong>Fst statistics</strong> between individuals of different populations. It is an estimate calculated in accordance to Weir and Cockerham’s 1984 paper. The user must supply text files that contain <strong>lists of individuals</strong> (one per line) that are members of each population. The function will work with multiple populations if multiple <strong>--weir-fst-pop</strong> arguments are used.
The following example shows how to calculate a per-site Fst calculation with two populations. Other arguments can be used in conjunction with this function, such as <strong>--fst-window-size</strong> and <strong>--fst-window-step</strong>.</p>
<p class="codebox">./vcftools --vcf input_data.vcf --weir-fst-pop population_1.txt --weir-fst-pop population_2.txt --out pop1_vs_pop2</p>
<h2><a name="plink" class="Q">Converting VCF files to PLINK format</a></h2>
<p>VCFtools can convert VCF files into formats convenient for use in other programs. One such example is the ability to convert into PLINK format.
The following function will output the variants in <strong>.ped</strong> and <strong>.map</strong> files.</p>
<p class="codebox">./vcftools --vcf input_data.vcf --plink --chr 1 --out output_in_plink</p>
</section>
<footer>
<p><small>Hosted on <a href="https://pages.github.com">GitHub Pages</a></small></p>
<p>Copyright 2015 © VCFtools</p>
</footer>
</div>
<!--[if !IE]><script>fixScale(document);</script><![endif]-->
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
try {
var pageTracker = _gat._getTracker("UA-272183-4");
pageTracker._trackPageview();
} catch(err) {}
</script>
</body>
</html>