The tutorial for a yale training session: TCGA RNA-seq Data, Download and Analyses all on your laptop.
See the slides used during the workshop here.
Go to TCGA data hub
- Navigate and select files to basket
- Download metadata and manifest from basket
- Download the files with GDC-client
- Convert to csv use the online tool json-to-csv
- Metadata Description here
- Choose and rename fields in a speadsheet or a R script.
.. Note: To to run the R script, you can install Rstudio.
- Convert the downloaded files to a FPKM matrix in unix shell/terminal
for f in */*.gz; do
id=$(dirname $f); echo $id > $id.tmp;
zcat $f | cut -f2 >> $id.tmp;
done
echo 'featureId' > tmp.index
zcat $f | cut -f1 >> tmp.index
paste tmp.index *.tmp > ../geneId_fileId_FPKM.txt
rm tmp.index; rm *.tmp.. Note: to use linux shell, run terminal on mac (OS X); install and run babun on a PC (windows).
- Description of the Barcode
- Description of the pipeline
- Download the GENCODE gene annotation file
- Map the FPKM matrix to gene symbol and barcode with preprocess_count_matrix.R.
Using the script to:
- Filter the genes and convert FPKM to log scale
- Id genes coexpressed with your gene of interest
- Id genes differently expressed between paired normal and tumor
- PCA plot
- Gene
- Cohort summary
- Cohort data and workflow
- Cohort analysis
- Convert the downloaded files to a FPKM matrix in unix shell/terminal
# cd to the folder with all the txt files under each sample directory.
for f in */*.txt; do
id=$(dirname $f); echo $id > $id.tmp; #colnames to-be
cat $f | cut -f3 >> $id.tmp; #cell values to-be
done
echo 'featureId' > tmp.index
cat $f | cut -f1 >> tmp.index #the rownames to be
paste tmp.index *.tmp > ../featureId_fileId_FPKM.txt
rm tmp.index; rm *.tmp